So recall that in the lesson one we introduced the topic of causation. We do not attempt to say what causation is, we merely identify two features that we will carry forward as we set up our approach to causal inference. First, we permit causation at the singular level. For those persons, including myself, who believe causation is deterministic. It is important to understand that allowing causation at the singular level is essentially the same as focusing on one or several causes to the exclusion of others. And that doing so does not make our approach incompatible with the deterministic view. It does, of course, have the implications that can be undesirable. A cause in one population is not necessarily a cause in another. As a silly example, one might end up saying speeding causes cars to skid off the road in Mexico, but not in Canada. Nevertheless, as a practical matter, scientist using statistics often use causal language despite their inability to give complete accounts of the causes and what effect. Either they should not be doing so or if we think that it's reasonable to speak about causation in the absence of complete knowledge. Then it is desirable to have methods for causal inference that are appropriate for this situation. Second, in tandem with much philosophical discussion, we require causal relationships to sustain counterfactual conditionals. Now, we put these criteria to work to develop a notation that captures these ideas. To fix ideas, let's think about a single unit, say, John, and let's see what it means for him to get better or not when he takes or does not take medicine. First, in order to form a counterfactual, we need to imagine John's reaction to the medicine under both conditions. That is, when he takes the medicine and when he does not. So now we can imagine four different scenarios; John takes the medicine and gets better, John does not take the medicine and he gets better. John takes the medicine and he gets better, John does not take the medicine and he does not get better. John takes the medicine and he does not get better. John does not take the medicine and he does not get better. Or last, John takes the medicine and he does not get better. John does not take the medicine and he gets better. Clearly in the second situation we would say taking the medicine causes John to get better. And in the last situation, we might say taking the medicine causes John to get worse. Equivalently, not taking the medicine causes John to get better, okay? In the other two situations, there is no change in John's condition, in which case we would say the medicine has no effect on John. Notice that causation here is a within subject comparison. It's John to John, apples to apples. To capture this, we need to represent John's outcome when he takes the medicine, John's outcome when he does not take the medicine. That is, John has two outcomes. Of course, we only get to see one of these outcomes in real life. And this is a so called fundamental problem of causal inference. If we got to see both, causal inference would turn out to be much easier. So let's denote John by i. And in general, we may see data from n subjects, i equals 1 through n. We'll get to the n subjects later on but for now let's focus on John. Let's begin by denoting the two conditions which we need to denote in order to make causal statements. Little z equals zero for the case where John does not take the medicine. Little z equals 1 for the case where he does. Now we can denote John's outcome under little z as yiz, and let us say that yiz equal 1, if John gets better. yiz equals zero if John does not get better. In letting z equals zero, we have John's outcome, y0, he does not take the medicine. In John's otucome, iI1 if he takes the medicine. Now, of course, we only see one of these outcomes. All right, now we want to use these outcomes to define the unit effect, which we can find generally as a function of yi0 and yi1. Comparing the two potential outcomes that represents John's change in status under the two conditions. If we define yi1 minus y0, as the effect of the medicine on getting better for John, we get the four possibilities above. K in the paragraph, above that I talked about. Make sure you see this for yourself. Of course, we only see one of y0 and y1, which we also want to represent that. So to that end, we're going to let capital Z sub i denote the random variable indexing whether or not unit i takes the medicine or not. Capital Zi equals 0 if i does not take the medicine, and capital Zi equals 1 if i takes the medicine. So let's let yi, equivalently yiZi, denote the actual outcome we observe. And note that yi is capital Ziy1 plus 1, minus capital Zi, times y0. We’ll use that later on. Thus, if we were to see, of course we cannot see, the triple y0 y1 and Zi, then we would know the unit effect, also whether or not the unit was treated or not. Now in a randomized experiment of subjects treatment assignment Zi is a coin flip. For now we are assuming the treatment assignment is the same as treatment received. Later in the sequel we shall look at what happens when this is not the case. In contrast in an observational study, a subject chooses his or her own treatment assignment, Zi. Continuing, we cannot observe the unit effects, of course it would be great if we could. But statistics is about probabilities and it is certainly conceivable that despite the fact that the unit effects cannot be observed, we can get at by which I mean identify an average of the unit effects. This would also be a very good thing to know, not only in but especially in context where decisions must be made. For example, a doctor needs to know whether or not to recommend a surgery. Both the doctor and the patient would prefer to know the effect of the surgery for this particular patient, but that cannot be known. However, knowledge of the average effectiveness of surgery facilitates better decision-making than would occur in the absence of information. Similarly, knowledge of the effectiveness on average, of a training program with some outcome of interest is an important piece of information a policymaker will use in deciding whether or not to implement the program more broadly. So with the above in mind, interest now centers on the average of the unit effects. There are several different averages that might be of interest. First, averaging n unit effects gives a sample average treatment effect or say. Okay, that's just the average value of the ys, i0s minus the y. Well, excuse me, the y1s minus the y0s amongst the end subjects. Okay, all right. Now, often the n units are a sample from a larger aggregate. But in some cases the N units constitutes the population of interest. In that case, when the units are of interest in and of themself, we should be interested in the sample have a treatment effect. But when the n units are regarded as a random sample from a finite population of size in. The estimand of interest is often the finite population average treatment effect, okay, which I've denoted fate. Now that would be the sum of the y1s minus y0s amongst the capital N subjects. The little n subjects are the sample from the finite population of capital size N. Okay? Now, for both the sample average treatment effect and the finite average treatment effect. We shall regard the potential outcomes as fixed constants as opposed to random variables. To emphasize this in later lessons, we shall write the potential outcomes using lower case ys in this case, okay? Finally, when the n units are treated as a random sample from an infinite population. Hence, why yi0 and yi1 are independently and identically distributed y0 and y1. The potential outcomes are treated as random variables, and the estimand is the average treatment effect or ATE, which is the expectation in the super population of the y1s minus the y0s. Now because of the way we define the unit effects, the average of the unit effects is the difference between the averages in the presence and absence of treatment. This turns out to be convenient, also substantively reasonable in many cases. If we can estimate the marginal distributions of the outcome in the presence in the absence of treatment, we can estimate these estimands. But, there are a number of other estimates of interest that also depend on the marginal distributions. For example, the average treatment effect conditional and covariates, okay? Covariates are pre-treatment variables, okay? Of interest, usually. Another estimand, which has received much attention in the literature that we shall take up. Is the average treatment effect for the treated or sometimes just stated treatment effect on the treated. Okay, you'll notice that this is the way I've written it, it's a super population analog of the average treatment effect, but only amongst the subjects that are treated, okay? Now, in an observational study where people take up their own assignments. The rationale for considering the average effective treatment on the treated is that, if an intervention for policy under consideration for adoption is not to be compulsory. It is more important to know the effect for those who are actually take up the intervention, than to know the average treatment effect. Which is a mixture of the average treatment effect on the treated and the effect of treatment on the untreated. There are certainly other estimands that depend only on the marginal distributions that we could be interested in. But the focus for this course will be primarily on those above. Okay, I should also note that there may be circumstances where you're interested in an estimand that does not depend only on the marginal distributions. For example, if our estimand is the median of the unit effects, this is not the difference in the medians of y1 and y0. As another example, suppose we want to know the percentage of persons who would improve by taking the medicine. Okay, then we should define the unit effect as one if y1 is greater than y0, and zero otherwise. In this case, the average effect would be the probability that y1 is greater than y0. But in general, we cannot write this as a function of the marginal distributions, at least without making some additional assumptions. Such as maybe the medicine was never harmful. But as you might expect because estimating probability y1 greater than y0 will require knowledge of not only the marginal distributions of the potential outcomes, but also the joint distribution. And the data provide limited information on this. This is more challenging. Okay, for now we shall focus on the estimands above. The sample average treatment effect to find that average treatment effect population and the average treatment effect. We want to make inferences about the values of these. But can we do so in a good way as we only see one outcome per unit. That's the question. For the sample average treatment effect for example, half the data is missing. However, the remarkable thing is that under certain so called ignorability conditions. Also called unconfounded conditions which are reasonable in randomized experiments and sometimes in observational studies. We can use the study outcomes that we do see to unbiasedly and, or consistently estimate these parameters. We're going to take this up in the next lesson. However, before we continue to that lesson, two important points are in order. The first is obvious, potential outcomes have been defined only for the simple case where unit i is assigned to a treatment group or a control group. The notation extends straightforwardly to many, infinitely many treatments, and all the theory we will develop carries over. In practice of course, some difficulties may occur when we wish to consider additional treatments for example, in the case of a continuous treatment. One will generally have to introduce additional assumptions to model the dependence of the potential outcomes on treatment. That is a different matter. For this reason, we shall focus mainly on the simple case above. Second, in defining the potential outcomes, there is an important assumption that we have not made explicit. You may have missed this due to the simplicity and reasonableness of the example we used to motivate the notation. Reuben has called this assumption that we're making a table unit treatment value assumption. And this assumption makes the potential outcomes well defined. The assumption is two-fold. First, there are no alternative representations of the treatment that matter. For example, if a medicine is administered in both pill and capsule form, and the outcome depends on this. Then, these should be regarded as different treatments. Otherwise, the representation of the outcome would be ambiguous. Okay, second, although it may be far fetched for this particular example. Perhaps John outcome depends not only on his treatment status but that of others as well. As a more realistic example where this might occur, supposedly one has studied the effectiveness of flu shots. Whether or not John gets the flu might depend not only on whether he is vaccinated or not, but the vaccination status of others. For example, his wife. This is commonly referred to as interference which we mentioned preciously. Fortunately, the stable unit treatment value assumption is reasonable in many instances. When it is not, it is easy to develop an extended notation. But new estimands must be defined. And the kinds of conditions we shall put forth in the next lesson to identify and estimate these estimands would require a modification. We shall briefly touch on such matters in the sequel. Unless explicitly stated otherwise, hereafter, a stable unit treatment value assumption or SUTVA will be assumed.