Of thumb. Okay, so now, let's actually get to comparing two proportions rather than simply looking at one proportion. So we want to test whether the side effects is the same in the two groups or, or different. so imagine if a is some new formulation and b is the standard and you want to test whether or not the new formulation has, has more side effects than the standard. so in general for two by two tables I'm going to use the following notation. I'm going to you know, have x. n1 minus X, and n1 plus Y, and, n2 minus Y and n2 plus, and then, if I need to, I'l refer to the four cells, indexing them by their matrix coordinates, n11, n12, n21, n22. I'll call n1 the, the right margin, n2. The right n1 the right top margin into the right bottom margin but if, in, in, in the case that I'm referring to both margins, I'll say n1 plus, n2 plus, n plus 1, and n plus 2, for the, for the respective margins, in other words, just summing. The notation meaning summing over that index. Okay. So now, let's do a, a score test type, test of a hypothesis that p1 equals p2. So our null hypothesis is h not p1 equals p2 versus not equal to, greater than or less than. and then score test for this null hypothesis are, are, are numerator is p1 minus p1 [INAUDIBLE] minus p2 [INAUDIBLE]. The sample proportion in group 1 minus the sample proportion in group 2. And then if, if we were assuming that this difference was a constant other than 0, we will put that in the numerator here the null hypothesis difference but it typically. The null hypothesis is that they're equal. So there's minus 0 here to hypothesize null value of the difference so we can just omit that. And then in the denominator, the the, under the hypothesis that p1 equals p2, then the stand, the variance of p1 hat minus p2 hat. Is p times 1 minus p, quantity times 1 over n, 1 plus 1 over n2, where p is the common proportion p1 equal to p2. So, under the null hypothesis, we need an estimated version of that if we're going to actually get a number here that we can use compared to a normal quantile. So we need a value of p to plug in there. So we say plug in p hat if under the null hypothesis the sample proportions are identical then group A is a bunch of IID draws IAD Bernoulli draws from group 1. Group B is a bunch of IID Bernoulli draws. From group 2, but they have the same proportions so we really just have n1 plus n2 Bernoulli draws and our estimate of the proportion would simply be the total number of events, so that p hat is X plus Y over n1 plus n2. And that is exactly the [UNKNOWN] for p. The common proportion under the null hypothesis that the due proportions are equal. So, we plug that into the denominator p hat times 1 minus p hat, and then we get, our test statistic which is just estimate minus hypothesis, hypothesize value divided by the standard air. And then this statistic is normally distributed under the null hypothesis for large n, and standard normally distributed under the null hypothesis for large n1 and n2. So if we want to invert this to create a confidence interval, well we don't have a closed form like we do in the score task for a single proportion. the walled interval is p1 hat minus p2 hat. And then it, it doesn't utilize the fact that, under the null hypothesis, the proportions are equal. So then you just have a separate p1 hat, 1 minus p1 hat. Or m1 plus p2 hat. 1 minus p2 hat over n2. In the denominator, square root the whole thing. and you can of course invert that to get a, a confidence interval. P1 hat minus p2 hat plus or minus Z1 minus alpha over 2 times the square root of the standard error. by the way do you see why you can't invert the, the score test? The reason being. That if you change the, the, the denominator was explicitly calculated under the specific null hypothesis that p1 equals p2. Here in this test statistic, if we were to have a different null, that p1 minus p2 wasn't just equal to 0 but was equal to some other value. We could, we, we would add that into the numerator, and the, the denominator wouldn't change. Whereas, in our score test we wouldn't have any, anyway to adapt that denominator and that's there are no immediate way to adapt, adapt the denominator and that's why you have to use some, some programming to get the competent interval from that one. But this one, the wald test, we can invert very easily and we get an interval that should be fairly familiar to us. P1 hat minus p2 hat plus or minus the normal quantile times the square root of the standard error. That's the, the so called wald interval, it's very easy. To calculate and its taught in nearly every statistics text book. So it, it, this, this performs poorly. This Wald interval performs poorly and its relative to this score interval in, in test. The Wald test and the Wald interval perform relatively poorly. But, but they're, they're decrease in performances less so in the one sample case. In the one sample case there is a huge decrease in performance but, but the subtraction in the 2 proportions you know, subtracting two things tends to make them more normally distributed so it helps a little bit and the, the decrease in performance Wald interval so is it any where near as that as it is in the single proportion [INAUDIBLE]. Case. U, so for testing I would just say always use the score test, that's easy. For intervals, inverting the score test is hard and it's not in standard software, so our simple fix that we propose in, in an American statistician paper is to add one success and. And, and one failure in each group. So calculate p1 tilde which is x1 plus x plus 1 over n1 plus 2, n1 tilde which is n1 plus 2, p2 tilde which is y plus 1 over n2 plus 2 and n2 tilde which is n2 plus 2. So, this is exactly taking this two by two table. that has the successes and failures for each group and adding one to every cell. That's exactly what this is. And then just treat that as if it's the data and construct a Wald interval. And this interval it doesn't approximate the score interval like the, in the, in the, in the Agresti-Coull Interval. but it does perform better than the Wald interval and I'll have a slide in a second to show you this. Okay so let's just perform the test the score test, test whether or not the proportion of side effects is the same for the two drugs. Pa had 0.55 pb hat is 5 over 20 which is 0.25. p hat, the common proportion, is 16 over 4,011 plus 5 over 20 plus 20, which is 0.4, so our test statistic is 0.55 minus 0.25 over 0.4 times 0.6 times square root 2 over 20, sq-, I'm sorry. Square root the whole thing. You, anyway. You can plug in the formula. You get 1.61. And then we fail to reject h, not at the 5% level. In other words, you compare it with 1.96 for a 2 sided test. The two sided p value calculate the probability that a standard, the absolute value of a standard normal is bigger than 1.61. Which is that the positive part of a normal is bigger than 1.61 plus the probability that the negative part of a normal is below negative 1.61. that's I guess 0.055 in either tail. So we fail to reject, there's our p value. and so hopefully everyone can do this calculation very easily at this point in the class. Okay. So, here is the same picture as before where, in the previous picture I showed the true value of the proportion by the coverage rate of the interval, for the single proportion. Now here's there's two proportions, p1 and p2. So here by the true value of p1 and p2, here's the coverage probability on the left, I have the Wald interval. On the right I have this Agresti-Caffo interval where you add one to one success to one failure to each group, one to every cell in the two by two table. And you can see that we get these big kind of dips down toward 0 on the Wald interval. If, if either of the proportions is, is, is if either of the proportions is either very low or very high you get very bad performance and you get you know, performance that's well below 0.95 and this shrinkage towards 0.5 for each of the means for each of the proportions you know, improves things dramatically and it's a very easy thing to do. And then here's a simple another exact same, same plot. just some cross sections through it of different sorts. In the top ones I have where p1 minus p2 equals particular values and then on the bottom one I have ones where ratios of p1 and p2 are fixed. In other words, it's just sort of slices maybe not slices or curves through that, that two dimensional picture and it again it just shows that in a, in a nice easy 2D plot what the Relative performance of the Agreti-Caffo interval is relative to the Wald interval. Okay, let's briefly go over some likelihood plots and, and Bayesian analysis of two binomial proportions. So, likelihood analysis requires the use of profile likelihoods or some other technique to reduce the dimension down, if you want to do a 1D likelihood plot. and we can actually show you later on away you can use the so-called non-central hyper geometric distribution to get an exact likelihood plot for the odds ratio. But for the difference in the proportions it's a little harder. Probably doing a profile likelihood would be the way to go. So is a little hard, so let's, let's. leave that discussion for, for elsewhere. So, instead let's talk about being a Bayesian. So imagine, so we talked about, for a single binomial proportion, butting a beta prior on a, on a probability to get a posterior. So so imagine putting an independent beta alpha 1 beta 1 prior, and an inde, and a beta alpha 2 beta 2 prior. p1 and p2 respectively, then the posterior so remember how the calculation goes. You take likelihood times prior equals posterior. so here the likelihood is p1 to the x1, 1 minus p1 to the n1 minus x1. P2 to the y2, 1 minus p2 to the n2 minus y2 and then the beta prior is p2 to the alpha 1 minus 1, 1 minus p1 to the beta 1 minus 1, p2 to the alpha 2 to the minus 1, 1 minus p2 to the beta 2 minus 1. So if we multiply all those together we get this formula right here. Which exactly shows that if we have two independent binomials and then we multiply them by two independent betas, we wind up with an independent a pair of independent Beta posteriors. One Beta posterior for p1, one Beta posterior for p2 where now the Beta parameter is no longer alpha one but alpha one plus x1 for p1. And the beta parameter for, for p1 is n1 plus beta 1. The, the alpha parameter for p2 is y plus alpha 2. And the beta parameter for p2 is 1 minus. Is, n2 plus beta 2. So it's basically like, alpha and. Alpha 1 and beta 1 are the. the, the, the beta, alpha and beta parameters for p1, a priori, after you factor in the data, the just, you add the successes to alpha and the failures to beta, you add, and, and, the same for, for p2, and then you get the, the, the beta posteriors. And the easiest way to explore this posterior is with Monte Carlo simulation and I'll show that right here. So it's, it's very simple. So here, I, I define my x, my n1, my, my, alpha 1, my my beta 1, my y, my n2, my alpha 2 and beta 2. And here I just did a uniform pr-, prior, so, so if I have a beta with a 1 and a 1, that's just uniform. So I put a uniform on both p1 and p2. Then I'm going to sample from the posterior. So, I just simulate random data as a simulated a thousand data pairs, we're now in my alpha parameters x plus alpha 1, n minus x plus beta 1 and then for p2 my alpha parameter is y plus alpha 2 and n minus y plus beta 2. So, imagine if I want to look at the risk difference. Read here the risk of side effects. P2 minus P1 is the parameter I want and it does so, here p1 is, is a bunch of, of, posterior p1 simulations. P2 is bunch of posterior p2 simulations. If I subtract them, r does it component by component, so I get a collection of a thousand risk differences. I could plot the density of the risk differences in the next line. I could calculate the, the lower 25th and the upper 97.5th quantiles of these simulations to get Bayesian credible interval for them. I could calculate the posterior mean and I could calculate the posterior median. And In the, in the, in the next side you see exactly this. I, I have some r-code called twoBinomPost, which I'll, which is on the get hub repository. But also will be on the I'll put on the course website. it, it puts out the mean. The median for those three, the mode for those three, and equi-tail confidence intervals. Well, what I mean by equi-tail confidence intervals, I mean it's 25% in the lower tail, in, in, in, 90. 7.25, 2.5% in the lower tail and 2.5% in the upper tail which I think we discussed on the on the for the one sample binomial case we discussed that maybe its better not to do equi-tail confidence intervals but or credible intervals but in this case its easy enough to do it that way so why don't we just do it that way. and you know, go through the twoBinomPost code. It's very simple to do this. And here what I'm showing is the posterior for the risk difference. And this is what's nice about Bayesian intervals. So here we're simulating p1 and p2 a posteriori. So we're getting the posterior joint, draws from the joint posterior distribution of p1 and p2. Any function of p1 and p2 that you then want to investigate, it becomes very easy to do. Any function of p1 and p2 that you then want to investigate, it becomes very easy to do. And so here I took the risk difference and plotted the density. I put some blue lines where the credible interval occurs, and I bel, The red line is, is identically at 0, and so you can see that 0 does fly within the credible interval which can also be seen with the posterior, where there are kind of more, what points are better supported by the data for the risk difference. And even though 0 is in our credible interval, you know. it's not a, a terribly well supported value in the, value in the data. I should say, it's not a terribly well supported value, a posteriori. Well, that's the end of the lecture. That was a whirlwind tour of, of, risk different style intervals for 2 binomial proportions. I'm hoping at this point that a lot of these topics in the class will start to come very easily to you, because we're just kind of using the same techniques over and over again. And I look forward to seeing you for the next lecture.