As we talked about before, the goal of matching is to create samples of treated and untreated observations that have similar distributions on the confounders. When we can match each unit i exactly to M_i greater than or equal to one units, the difference between the treated union and the set of match controls in match pair i, is an unbiased estimate of the average treatment effect at that particular covariate value. Now, when the covariate values differ however, and we saw that oftentimes we're going to be forced to get matches where that's the case in the previous lesson, when the covariate values differ that's no longer the case. So, we want to have an analysis of that. So, let's consider the case where we just do one-to-one matching. So, now we're going to take into consideration in the following equation the fact that the covariates have different values of the confounders. So, the difference between the two values now is equal to the average treatment effect at the value X_i1. But now we have to take into consideration the fact that the control unit isn't properly matched. So, it's going to be the difference between the control unit at the value X_i1, which it doesn't have and the value X_i2 which it does have. So, now we can say, let's look at the simple matched estimator, which would be just great if we had exact matching. Now, that simple match estimator has the following biased, we're just shoving in the formula into the formula for T hat, so that would be the bias of T hat, but it's not possible to have close matches, it is desirable to try to decrease the bias. So, one way you might try to do that is to combine regression adjustment with matching. For example, you can model the regression function in the control group and adjust for the difference in the covariate values. Remember, we're trying to estimate the factor treatment on the treated. So, now using linear regression for the control group observations, will give us predicted values Y hat, and that would yield an estimate bias of the average of the individual leveled estimated biases. We're just plugging in to the formula for the bias and giving an estimate of it. However, if the covariates are well balanced across the treatment and control groups, the estimated bias using adjustment with linear regression you can see from the formula above, will be small. In that case, unless the treatment effect is also expected to be small, adjusting for the biased as above may not be critical. Now, let's return to the case where they may not be as well balanced as you would like. In this case, you might expect to do better by including higher order terms in the regression or by estimating the regression non-parametrically. But especially in the latter case, it is not so clear that one does better by matching in the first place, let's think about why? The effect of treatment on the treated is defined as the difference, the expected value of Y treated, given that you're treated minus the expected value of Y zero given treated. The first term can be estimated directly using the sample average amongst the treated. The untreated observations could then be used to estimate the expected value of Y given Z equals zero, and X, but by unconfoundedness that will be, expected value of Y zero given X and then you could just use that and did find imputed values and averaging the imputed values would then give you an estimate of expected value of the Y zero, given Z equals one. So, that's an obvious alternative to matching. So, now let's look at standard errors. Now, that's a tricky complicated subject and we're not going to really take it up in any great detail. I'm just going to try to introduce you to some computer programs where that's done and to introduce you to some of the ideas that are involved, and then people that want to take this up I've given some references. Now, in the general case, where a treated unit may be matched with one or more untreated units and the untreated units may be used more than once i.e. sampling with replacement, it might be thought that using a resampling estimator of the variance such as the bootstrap, would be the best way to estimate the variance of the effective treatment on the treated. However, the bootstrap doesn't work well for this setup, for both theoretical and practical reasons. So, we're not going to talk further about that. In the simplest case where the each treatment unit is matched to one untreated unit, the simple matching estimator which we already looked at,is used and the matching is done without replacement, so there's no dependence across pairs. The variants can be estimated as below, on obvious estimate. For the bias corrected version, maybe it's not quite right to proceed in this way, but you would impute the value which takes into consideration the bias and then you'd make the bias corrected estimator which I'm denoting tall tilda and that would yield the estimate if we proceeded as before, we just use the imputed value in place of the actual value above like the Y_2 and instead of using tall hat we'd use tall tilda. Now, that said, we can extend these results pretty readily to the case where there is more than one untreated used in the match and also to the case where the matching is done with replacement, in which case units i and i prime may share matches. The sharing introduces dependence amongst the units but we can take that into account when computing the estimator of the variance. In general however, the following estimators, which I'm about to describe which are used by the nmatch program in Stata and described in the Stata journal by Abadie, Drukker, Herr and Imbens are to be preferred. For further justification, one should look at Chapter 19 of Imbens and Rubin. Now, let's take up the case where M_i is greater than or equal to one that is to say, that each treated unit is matched to one or more controls and also the matching is done with replacement. Now, for this case, Abadie et al. re-write the unadjusted matching estimator as tall hat, it's not the same T hat as before, as T hat is its average of the Y_i's but they're weighted, the Z_i, Y_i that's just the Y_i's for the treated observations and the one minus Z_iK_iY_i's are the control observations but they're weighted by this K_i. What is this K_i? All right. Now, the idea behind this K_i is that, i for the n_1 plus one, the untreated observations used in the matching, M_i there is going to be the set of L such that L is matched to unit i. So, for each unit i, will have this set of units L that are matched to i and K_i is the weighted number of times that unit i is used in the matching. So, that's what formula K_i is and Abadie et al. also discussed estimating the variance for the estimated effect of treatment on the treated, the population effective treatment on treated. For this they suggest the following estimator below. So, I've defined all the terms in there, I'm not going to go through saying what each and every little thing means. But now you'll see that in both these cases I've not said anything about the estimate of sigma squared, the sigma squared hat. How are we going to get it the sigma squared hat? Now, to estimate this, Abadie et al. suggest an estimator that's based on matching the units in the treatment group, to similar units in the treatment group. You can see intuitively why this makes a lot of sense. So they get a matching estimator based on that. So, now, the variance of the estimated population effective treatment on the treated is larger than that of the effect of treatment on the treated for the sample, and of course that makes sense. When the population effective treatment on the treated is the parameter of interest, the effective treatment on the treated and the sample is just an estimate of this. However, the two formulas above, it is possible that the estimated variance for the sample effective treatment on the treated might be less than that for the population effective treatment on the treated. I've focused on the effect of treatment on the treated because that's what typically is done when people use matching, but it is important to note that a similar approach can be used to estimate the variance of the average treatment effect in the sample or the population average treatment effect. Now, the Nmatch program that I've talked about, matches using the Euclidean or Mahalanobis distance as defaults. There's also a program called teffects psmatch in STATA that matches using the propensity score.