In the lesson before, we talked about model based inference in the context of completely randomized experiment and a block randomized experiment. Now, we're going to reformulate model based inference in the context of the linear regression model. So, I'm going to start with causal regression models for the potential outcomes. So, for each subject and for treatment or no treatment, I've got a model for the potential outcomes where there's an intercept. Then, if the treatment is one, you get a tau, and then there's an error term. So, in the causal model, so this is for everybody, Y zero for everyone. I'm going to have for identification, the expectation of the epsilon i zero, that's going to be zero, and then the expectation of the epsilon i one will, in the model, where Y i one, that will also be zero. Now, it's clear that we can only see either or subjects Y zero, Y one, so the two models are only hypothetical. But nevertheless, it's quite useful, this device. So, if you look at the equation, it's very clear that if you take expectations, alpha is going to be the expected value when there's no treatment and tau is going to be the average treatment effect. Now, a researcher can't deal directly with that model, as we said. So, the researcher can only estimate the regression model for the observed outcomes Yi, which I've written now as Yi, Zi to emphasize the dependence on the random variable, what we see. So, the observed data model is, analogous regression model is Yi is equal to alpha star plus a tau star Z plus a Vi. Now, to identify alpha star and tau star, we use as a rule that expected value of Vi given Zi equals zero. Then, once we have that rule, we see that alpha star is the expected value of Y given Z is zero, tau star is the expected value of Yi given Z equals one, minus expected value of Yi given Z equals zero. Not quite what we want, we need the z's to go away for the alpha stars to be equal to the alphas. So, let's look at this. What is the relationship between the parameters of the causal regression and the observed data model which I've reproduced? Now, recall we can write Yi as this combination of Zi Yi one plus one, minus Zi, Yi zero, and we can substitute in the Yi ones and Yi zeros into the equation for Yi, Zi to get an alternative representation as alpha plus tau Zi, plus epsilon i, Zi, where epsilon i, Zi has this expression below. So, now if I take expectations when Z is zero, I'm going to end up with alpha, plus the expectation of epsilon i, Zi when Zi is zero, and that's going to equal alpha star. Similarly, I'm doing the same thing with tau star, you can do it for yourself. Tau star is going to be equal to tau plus the expectation of epsilon i, one, given Z is one minus the expectation of epsilon i, zero, given Z is zero. Okay. That means that alpha star is not equal to alpha and tau star is not equal to tau, generally speaking. But, if the potential outcomes are independent of the treatment assignment Z, or equivalently the potential errors are independent of treatment assignment Zi, then we get equality. So, that's to say that the parameters alpha and tau, the causal regression, that's what we're interested in. These were identified from the parameters alpha star and tau star of the observed data model, what we can actually get our hands on, if treatment assignment is independent of the potential outcomes or equivalently, the potential errors, as would be the case in a completely randomized experiment. Another way to say this is that equality holds if Zi is uncorrelated with the error epsilon i, Zi. But don't confuse the error epsilon i, Zi with the error Vi. In the regression equation two, the error Vi is uncorrelated with Zi by definition. In a completely randomized experiment, treatment assignment is independent of the potential outcomes or potential errors. But in an observational study, that needn't be the case, and that accounts for the difference between the alphas and the taus, and the alpha star and the tau stars, respectively. How might this work out? Let's just consider a simple example. A depressed person chooses whether or not to take a medication to improve their mood. Now, suppose subjects who take the drug do so because they believe the drug is going to help them, and they're correct, and subjects who don't take the drug realize that the drug's not going to help them. So, let's suppose that half of the subjects take the drug and half don't. Now, the subjects who take the treatment improve from a score of five to a score of 10. So, since the other half of the subjects would not improve, we have an average treatment effect of 2.5. But if we look at the difference between sample means, that's two. So, the investigator would have underestimated the treatment effect. But now let's make a different pretend. Supposing, instead of improving from 5-10, subjects improve from 5-15. In that case, the average treatment effect is five. But if you look at 15 minus eight, the difference in sample means that's seven. Now, a slightly different way to see this all is to start with the ordinary least squares estimator in the observed data model. Alpha star is then Y naught bar and tau star is Y1 bar minus Y naught bar. Though that's fairly easy to see from the fact that, as you know, the residuals sum to zero. Now, we know that tau star is unbiased, where the expectation of Y(1) given Z is one, minus the expectation of Y(0) given Z equals zero. But in general, we know that's not what we want, we want E of Y(1) minus Y(0). But when treatment assignment Z is independent of potential outcomes, then the expectation of Y given Z equals one is the expectation of Y(1) and the expectation of Y(0) given Z equals zero, is the expectation of Y(0). So, we're in business. All right. So, as before, we can extend this to the case where the investigator is interested in the average treatment effects within strata defined by covariates. So, let's let S be for the stratum and there is going to be a bunch of different strata. If you condition on S, we'll get tau S as the stratum average treatment effect and we can write a regression model that's analogous to our observed data model for this case. So, I've just written it out there. To identify this model, now we're going to say that the expectation of the V, given not only Z but also S, is zero. Now, we're going to write down the causal model, the analogous causal model and if we have a completely randomized experiment or a block randomized experiment, you can see that the alpha S will be equal to the alpha S star, and the tau S is equal to the tau S star. Now, the key thing is, this is also true for an observational study if the strata are formed from covariates and it's the case that treatment assignment is unconfounded within strata. That would be what happens in the analogous block randomized study. So, if the observational study conditioning on covariates successfully mimics the analogous block randomized study, we're in business. Clearly, if you average over the distribution of covariates, you'll get back the average treatment effect.