So, in the previous lesson, we talked about model-based inferences or linear regression problem. Now in this lesson, we're going to add covariates that are often collected in both completely randomized experiments and observational studies, and we're going to look at the difference between what happens, and contrast these two. Now, in the completely randomized study, we've seen that the difference between the sample means is unbiased for the average treatment effect. Nevertheless, investigators often consider regressing the outcomes on both treatment assignment and the covariates. Now, recall that the ordinary least squares estimators makes the residuals and weighted residuals add to zero. Now, it follows from that that the sample average in the control group has that expression, and the sample average in the treatment group just depends upon has the tau star, the estimated treatment effect, and the covariates that there's at their average. So, looking at the average in the control group and the average in the treatment group, we get that the difference between them is equal to tau star plus beta star transpose times the difference between the means for the covariance. Now, in a completely randomized experiment, potential outcomes and the covariates are independent of treatment assignment. Furthermore, the expectation of the covariates is the same in the treatment group, and the control group, of course, follows after from independence. So, that tau star hat, which in general is an equal to the difference in sample means as you can see above, but because of this thing about the expectations, tau star hat is an alternative unbiased estimator of the average treatment effect, alternative relative to Y_1 bar minus Y_0 bar. All right. So, that's what happens with covariance. Now apparently, by assuming this model, we are imposing a restriction that ATE is constant for all levels of the covariates. Actually, we're not imposing that. We're only assuming that the errors v_i are uncorrelated with the regressors, not the stronger condition. In other words, we're not assuming that the first alpha star plus tau star z plus beta prime star X is the conditional expectation. We're not assuming that. Now, so we have these two estimators of the average treatment effect, and it can be shown that the asymptotic variance of the second estimator is smaller than that of the difference between the sample averages, and I referred you to chapter seven of Imbens-Rubin for proof. So, we would prefer tau star hat is the estimate. All right. Now, in observational studies, investigators often use linear regression as well with the covariates, and they treat the headed tau star as an estimate of the ATE. Now, we want to look at this practice and differentiate it from what happens in the randomized experiment. Now, in observational studies, tau hat star is a consistent estimator of the expression following. But it is not necessarily the cases with the completely randomized experiment that the expectation of Y given Z equals z is the expectation Y_z, nor is it the case that the expectation of the covariates in the treatment group and control group are the same, so this is what we want to look at. So now, let's assume that treatment assignment is unconfounded given the covariates X. So, that's the happy situation. That's our starting point. If that weren't true, we'd be in trouble. But with this starting point, the conditional expectation for the observed data we can write it as follows. So, if we can specify the regression function, we're certainly in business. We're going to assume however that the true model is for Y is equal to some g plus the E, g doesn't need to be linear regression. So, in this case, the average treatment effect at level X of the covariates will be g_1,X minus g_0,X. If we average across the distribution of the X's we'll get the average treatment effect. So, now in this case because of the unconfoundedness, the expression from the previous page which is the difference reduces to the following expected g_1,X minus B star prime X given Z equals 1. So, we see that in general, this expression reduces to difference depending upon the difference between the ATE, which is at X which is g_1, X minus g_0,X, and also the difference between the covariates because they may differ in their distribution in the treatment and control groups. Let's pursue this further. So, let's pretend that g_1, X is just g_0,X plus the treatment at X, the average treatment effect at X and then we're saying that the average treatment effects is the same for all X, it's a very happy case. All right. Perhaps not always believable but often not believable, but we want to make this just even show in this simple case that how things can go awry treating the observational study is if we get exactly the same when we done the randomized experiment. So, now let's return. So, now we're looking at this expression and under this additional assumption that reduces to tau plus this bias. So, where does the bias come from? We see that it comes from the differences in treatment group and control group because we have the same expression, just one in the case of the treatment group and then in the case of the control group. Let's examine it a little further. If the distribution of the covariates is the same in the treatment and control groups, then we can see that g is actually the beta prime X, i.e. linear regression really is true, then if either of these conditions hold then the bias vanishes. That said, even in this very happy additive case, where the value of the average treatment effect is constant across levels of the covariates, the bias incurred by using linear regression can be substantial if the linear specification is very far off the mark, i.e. if two above isn't true, or if the distribution of covariates is very different in the treatment and the control groups. Now, if we look at observational studies where the covariate distributions are often quite different in treatment and control groups, and the investigator really doesn't have enough knowledge to specify the regression function properly, it can give really bad estimates of the treatment effects. It's precisely this concern that initially motivated the development of other methods for the estimation of treatment effects in observational studies. We'll be taking this up.