So finally, we're ready to start testing our research questions statistically. While it took us a while to get here, our earlier steps should never be avoided. That is, no matter how sophisticated you may become as a quantitative researcher, you'll always need to examine your code book, manage your data, and examine descriptive statistics for the variables of interest. >> In the description of hypothesis testing, when we looked at the association between depression and smoking, we were working with a categorical explanatory variable, the presence or absence of depression, and a quantitative response variable, the number of cigarettes smoked per month. When you're testing hypothesis with the categorical explanatory variable and a quantitative response variable the tool that you should use is Analysis of Variance, also called ANOVA. >> Now that you understand in what situations you would use ANOVA, we're ready to learn how it works or more specifically what the idea is behind comparing means. The test that you'll be using is called ANOVA F-test. So lets use another categorical to quantitative research question. >> Is academic frustration related to major? In this example, a college dean believes that students with different majors may experience different levels of academic frustration. Random samples of 35 individuals, each of Business, English, Mathematics, and Psychology majors were asked to rate their level of academic frustration, on a scale of one, the lowest, to twenty, the highest. >> This figure highlights that we'll be examining the relationship between major, our explanatory or X variable, and frustration level, our response, or Y variable to compare the different means of frustration levels among the four majors defined by X. The null hypothesis claims that there's no relationship between the explanatory and response variables, x and y. Since the relationship is examined by comparing the means of y in the populations, defined by the values of x, no relationship would signify that all the means are equal. Therefore the null hypothesis of the f test is population mean 1 equals population mean 2 equals population mean 3 equals population mean 4. Here we have just one alternative hypothesis which claims that there is a relationship between x and y. The explanatory and response variable. In terms of the means, it simply says the opposite, that not all of the means are equal and we simply write h subscript a, not all of the population means are equal. There are many ways for the population means not to be equal. We'll talk about that later. For now, let's think about how we would go about testing whether the population means are equal. We could calculate the mean frustration level for each major and see how far apart those sample means are. Or, in other words, measure the variation between the sample means. If we find that the four sample means are not all close together, we'll say that we have evidence against the null hypothesis. And otherwise, if they are close together, we'll say that we do not have evidence against the null hypothesis. This seems quite simple, but is this enough? Let's see. It turns out that the sample mean frustration score of the 35 business majors is 7.3. The sample mean frustration score for the 35 English majors is 11.8. The sample mean frustration score for the 35 Math majors is 13.2. And the sample mean frustration score for the 35 Psychology majors is 14.0. Here's a graphical representation of two hypothetical data sets taken from two different different populations. For instance, students in Country One and students in Country Two. In our hypothetical samples, the means are the same, but they appear in this boxplot very differently. A boxplot is a convenient way of graphically depicting groups of numerical data including such descriptive information as the smallest observation of the group, the mean and median, the largest observation, and the spread or variability of the values. The top of the line that sticks out of the top of the box plot and the bottom of the line that sticks out of the bottom of the box plot are the highest and lowest values. The red dot is the mean. The middle horizontal line is the median. You can see that each data set has the same set of means and thus the same differences among them. That is, students in Country One and students in Country Two. Both show data for four groups with a sample means of 7.3, 11.8, 13.2 and 14.0 indicated with red marks. The important difference between the two data sets is that the first represents data with a large amount of variation within each of the four groups. The second represents data with a small amount of variation within each of the four groups. Boxplots for Country One show plenty of overlap among the four groups because of the large amount of variation in frustration scores within the groups. One could image the data arising from four random samples taken from four populations, all having the same mean of about 11 or 12. The first group of values may have been a bit on the low side and the other three a bit on the high side. But such differences could conceivably have come about by chance. This would be the case if the null hypothesis claiming equal population means were true. Boxplots for country two show very little overlap because of the small amount of variation and frustration scores within the groups. It would be very hard to believe that we're sampling from four groups that have equal population needs. This case is an example of when the null hypothesis claiming equal population needs would be false. The question we need to answer with the ANOVA F Test is, are the differences among the sample means due to true differences among the population means, or merely due to sampling variability? In order to answer this question, using our data, we obviously need to look at the variation among the sample means. But that's not enough. We also need to look at the variation among the sample means relative to the variation within the groups. So F is the variation among sample means divided by the variation within groups. In other words, we need to look at the quantity, variation among sample means, divided by variation within groups. Which measures to what extent the difference among the sample groups, means, dominates over the usual variation within sample groups. Which reflects differences in individuals that are typical in random samples. When the variation within the groups is large, like in Country One, the differences or variation among the sample means could become negligible. And the data would provide very little evidence against the null hypothesis. When the variation within groups is small, like in Country Two, the variation among the sample means dominates. And the data have stronger evidence against the null hypothesis. Looking at the ratio of variations is the idea behind comparisons and means thus the name analysis of variance. >> Here are the results of the analysis of variance for Country Two. Testing the relationship between major and frustration score. The F statistic circled in red is 46.60. Since we know this is the variability among sample means divided by the variability within groups, this large number suggests that the variability among sample means is much greater than that within sample groups. The P value of the ANOVA F Test is the probability of getting an F statistic as largest we got or even larger had the null hypothesis been true. That is, had the population means been equal. In other words, it tells us how surprising it is to find data like those observed, assuming that there is no difference among the population means. This P value is practically 0, telling us that it would be next to impossible to get data like those observed had the mean frustration level of the four majors been the same as the null hypothesis claims. The P value 0.0001 suggests that we will incorrectly reject the null hypothesis one in ten thousand times. And we will be correct in accepting the alternate hypothesis 9999 times out of 10,000 times. So we can confidently conclude that the frustration level means of the four majors are not all the same. Or in other words, there's a significant association between frustration level and major. So we accept the alternate hypothesis and reject the null hypothesis. Now that you have a feel for analysis of variance, we'll run the test using SAS. We'll use an example first described in hypothesis testing.