Next up, we use hypothesis testing to compare two proportions as opposed to confidence intervals, to estimate the difference between the two proportions. As usual we're going to introduce this new concept using a concrete real data example. A SurveyUSA poll asked respondents whether any of their children have ever been the victim of bullying. Also recorded on the survey was the gender of the respondent, the parent. Below is the distribution of responses by gender of the respondent. Before we proceed with the calculations, I'd like to make note of one thing. If we were to take this somewhat narrow minded view, that only a male and female couple has children. In that case, the proportion of kids that are bullied should be the same for males and females. Remember here, we're asking individuals if their kid has been bullied, not families or households. If we see differences between the percentages of bullied kids of males and of females, these may be due to a variety of reasons. It could be, single parents but we said we're going to take the narrow minded view that we usually have one mother and one father in the household. It's probably true for majority of the population. It could be that one gender is more likely than other to even know that their kid has been bullied. Or not, and it could also be that one gender is more likely than other to actually report this on a survey. So of the 90 males that were surveyed, 34 of them said that their kid had been bullied. And of the 122 females that were surveyed, 61 said the same thing as well. So, to calculate our sample proportions for the males, that would be 34 over 90 38%. And for the female, 61 over 122, 50%. If we want to compare these two proportions, our null hypothesis should be that the proportion of males and proportion of females who, whose children have ever been the victim of bullying. Should be equal to each other or in other words the difference between the two population proportions should be 0. If we're simply looking for a difference between the two, the alternative hypothesis will set this difference between the two population proportions to be not equal to 0. Before we can proceed we need to check our conditions. And we need to calculate a test statistic based on which we can then calculate a p value. We'll get to the calculations in a little bit, but let's flash back for a moment to working with one proportion. We talked about this idea of using a p hat versus a p. And we said that when we're working with one proportion and we want to check the success failure condition within the context of a confidence interval, we used our observed proportion to do that so these are the observed successes and observed failures. Versus when we're doing a hypothesis test, we use the value of the population proportion that we set equal to in the null hypothesis. In other words, the null value of p is what we use to calculate the expected successes and expected failures. In terms of the standard error, once again, we use p hat in calculation of the standard error for confidence intervals. Versus we use p, the null value of the population proportion, for the hypothesis test. So, the moral of the story here, is that when you're dealing with a confidence interval, use the, use observed counts and observed proportions. When you're dealing with a hypothesis test, use expected counts and expected proportions. So how will this translate to working with true proportions? For confidence intervals, we want to look at the total number of observed successes and failures for each one of the groups. So we've already done this in the previous video. We simply look at the sample sizes for each of the groups and multiply them by the observed sample proportions. To calculate the observed number of successes and observed number of failures. Also for the calculation of the standard error we use the observed proportions from the two groups as well. However, calculating the expected successes and failures or the expected proportion for the hypothesis test. For difference between two proportions is not as simple. Take a look at this null hypothesis. We simply say in the null hypothesis that the two population proportions should be equal to each other. Or that their difference should be equal to 0. But at no point do we define what this, these should be equal to. So we don't have a readily available null value. Of the population proportion that we can use for the two groups to calculate expected successes and expected failures. What do we do? We make one up. This is called the pooled proportion. So the idea here is that even though your null hypothesis does not equate the two population proportions to something. Could we actually come up with a best guess for what these could be equal to under the assumption of the null hypothesis. And there, what we use is the idea of the pooled proportion. This pooled proportion is simply the number of successes divided by the overall sample size for the two groups. So we're pooling data from the two groups together, so it can be calculated as the number of successes in group one plus the number of successes in group two divided by the sample sizes for the two groups. So let's right away put that to use, and calculate the pooled proportion of males and females who said that at least one of their children has been a victim of bullying. So our p hat pool is going to be the total number of successes from the males, so that's 34. Plus the total number of successes from the females, so that's 61. Divided by the sample size for the males, which is 90, plus the sample size for the females, which is 122. So this gives us roughly 45% as the pooled proportion of males and females, who said that at least one of their children has been a victim of bullying. So now that we have a good estimate for a common proportion for the two groups, we can actually revisit the chart we were working with earlier and see how we can calculate our success, failure condition and our standard error for doing a hypothesis test for comparing two population proportions. For the success failure condition for both of the groups, we actually use this p hat pooled value to calculate the expected number of successes and expected number of failures. The reason why we're making this distinction once again, is that in a hypothesis test, we must assume that the null hypothesis is true. And when we're doing a hypothesis test for comparing two proportions. Our null hypothesis states that the two proportions are equal to each other, so we're going to use this value of the pooled proportion to say this is the value they're equal to and use that as the truth in going through the hypothesis test. For a calculation of the standard error we use the same idea everywhere we see a p hat 1 or a p hat 2, we simply plug in this common proportion that we calculated for the two groups. You might be asking what about means? When we talked bout doing inference for means, we did not provide different formulas for the standard error, when we were calculating a confidence interval verses a hypothesis test. But we seem to be making a pretty big deal about it now that we're talking about proportions. Well with means our parameter of interest is a Mu, and in our null hypothesis we said Mu equal to some null value, but in our calculation of the standard error. This is simply calculated as S over square root of n. So Mu, our population mean, does not appear in the calculation of standard error. So it really doesn't matter what that number is set equal to, it's not going to influence the calculation of the standard error. On the other hand, when you're, you're doing a hypothesis test for proportion, we set p equal to some null value, and that same p does actually appear in calculation of the standard error. And hence because it does appear in the calculation, and because we must assume that the null hypothesis is true when going through our calculations, we need to make a different distinction between when we do have a null hypothesis that we must assume is true. Within the context of hypothesis testing, versus when we don't have a null hypothesis that we must assume to be true, that is within the context of a confidence interval. [BLANK_AUDIO] Let's take a look real quick then. Our conditions for inference met for conducting a hypothesis test to compare the two proportions here? In terms of the condition of independence, within group independence we have a random sample and the 10% condition is met. 90 and 122 are obviously less than 10% of all males and females. So, sampled males, in art can be assumed to be independent of each other, as well as sampled females can be assumed to be independent of each other, as well When it comes to between group independence, we want to think about how these data were collected in the first place. This was an overall random sample, and some of the people in this sample happen to be male, and some of the people in the sample happen to be female. Therefore, we really have no reason to expect that sampled males in this sample, and the sampled females in this sample are dependent on each other. These are not necessarily paired people in anyways and even if we had any worries about that, with the different sample sizes from the two groups we know that they're definitely not one to one pair. So since there is no reason to expect dependence between the two groups, we can assume that this between group independence condition is met as well. When it comes to sample size and skew, we want to remember consider the success failure condition. However, we're doing a hypothesis test for the difference between the two proportions, so to check our success failure condition, we use the pooled proportion that we had calculated which was the 45% shown above on top in our data summary table. For the males then we have 90 males times 0.45 gives 40.5 and 90 males times 0.55 that's the probability of failure or the proportion of failures gives 49.5 both numbers are greater than 10. Similarly for the females we have 122 females times 0.45 gives 54.9, and 122 females times 0.55 gives 67.1. So the success failure condition is met for females as well. Therefore, we can assume that this sampling distribution of the difference between the two sample proportions is nearly normal. Since our conditions are met, we can finally conduct a hypothesis test, and we'll do so at a 5% significance level, evaluating if males and females are equally likely to answer yes. To the question about whether any of their children have ever been the victim of bullying. So the null hypothesis here was that the two proportions are equal to each other, and the alternative is that the two are different from each other. We had already set these hypotheses early on in the first slide of this video. Our ultimate goal with a hypothesis test is to calculate a p value, but before we get there, we need a test statistic, and for that, we need to figure out our sampling distribution. The sampling distribution of the difference between the two sample means is going to be nearly normal with mean 0. That 0 comes from our null value. And we know how to calculate the standard error using the pool proportion. So that's 0.44 times 0.55 divided by 90, the number of males, plus 0.45 times 0.55 divided by 122, the number of females. And then we take the square root of the whole thing. That gives roughly 0.0691. Our point estimate in this case is the difference between the two sample proportions, so that proportion of males minus proportion of females in the sample in other words 0.38 minus 0.50 our point estimate is negative 0.12. So, we finally have everything we need to calculate our p value. Let's draw our sampling distribution real quick and show there what the p value actually corresponds to. We're doing a two-tailed hypothesis test, so we want to shade beyond our point estimate, both on the lower tail end and the higher tail end as well. And to calculate that area we can use a z score that we calculate as our point estimate minus the null value divided by the standard error that we had calculated. The z-score comes out to be roughly negative 1.74. Then our p value can be described as the absolute value of the z-score being beyond 1.74. So that really corresponds to a z-score of negative 1.74 or lower, or a z-score of positive 1.74, or higher. You can use a table, R, or an applet to calculate this. So I recommend that you get a little bit of practice doing so. But you will see that the p value comes out to be roughly 8%. The final step would be to compare this to our significance level, and finally make a decision on the research question we were working with.