In this video we're going to discuss identification and estimation of causal effects from instrumental variable type of analysis. So we'll aim to understand how the complier average causal effect can be estimated from the observed data and also how that effect relates to intention to treat effects. So, just before we begin, let's also remind ourselves what the assumptions we're making are. So, first of all, assume that Z is a valid instrument, and so we assume that it's... that Z effects treatment, and we also make the exclusion restriction assumption which is that Z does not directly effect the outcome. Finally, we're also going to make the monotonicity assumption, which is the assumption in this case that there's no defiers. So we're imagining we're in a randomized trial with noncompliance kind of setting and we're assuming that there's no defiers, there's no people who do the opposite of what they're encouraged to do, okay? And then, also recall that our goal is to estimate the average causal effect of treatment received among compliers. So among the subpopulation of people who will do whatever they're sort of assigned to do, so they'll take whatever they're assigned to take in terms of treatment. We're basically going to walk you through the steps on how we can estimate or identify this causal effect of interest. But to do that, let's first begin by looking at something that we know we can identify, which is this intention to treat effect. So, the intention to treat effect is contrasting potential outcomes based on treatment assignment. So we're looking at the expected value of Y if everybody had been assigned to treatment versus the expected value of Y if no one had been assigned the treatment, or if everybody had been assigned to the controlled condition. And that is just a contrast of these two quantities based on observed data. So, the expected value of Y among people who were actually treated minus the expected value of Y among people who were assigned to the controlled condition. And this equality holds because of randomization. So, basically, you could think of Z as a coin flip, and so we have this one-to-one correspondence between these sort of conditional distributions and the distribution of the potential outcomes. But we also could write, we can also write this kind of conditional expectation in terms of these sort of compliance-like classes. So let's just look at, for now, the expected value of Y given Z=1. So this is the expected value of the outcome and the subpopulation of people who were assigned to treatment. Well, we could have write that expectation just using the standard rules about expectations and probability, where you can take expectation conditional on something but then multiply by the probability of it and then sum over all possibilities, essentially to condition and then average over. So I'll walk you through it a little more carefully than that but. So, for example, the expected value of Y given Z =1, well, we could additionally condition on the pair A0 and A1 but then integrate it out or average over it. So that's what we're doing here. So we could say, well, this overall expected value of Y among people with Z =1, that's equal to the expected value of Y given Z =1 among always takers. At least part of this average over here is going to be based on the average over here, right? So we're restricting to a smaller subpopulation of people, so people who have Z =1 and are always takers. But then we want to weight that by what proportion of this subpopulation are always takers. So that's what this is. And then we do the same thing, we can take the mean of Y among people with Z =1 who are also never takers but then weighted by the probability of being a never taker. And you can do the same thing with compliers. So, all we're doing here are saying, okay, this mean of Y is a mean of Y in these three subpopulations and we're going to weight it by how big those subpopulations are. So that's sort of how to say that in words, I think. So, hopefully, that part's clear. You could just think of, if I wanted a mean of some whole population, here by "whole population" I mean the population that have Z =1. Well, I could break it apart into subpopulations, take the mean in each of those but then weight them by how big those subpopulations are. So that's all that I'm doing here. But then there's something else we can note. So, among always takers and never takers, Z doesn't do anything. Remember, these are people who are always going to take treatment are never going to take treatment and, whether I encourage them or not, has no impact, right? So, in this case, I can actually drop the conditioning on Z =1. So if I'm looking at the subpopulation of always takers, for example, also conditioned on Z =1 doesn't do anything because Z has no impact on this population. And remember Z was randomized. So I can just drop that conditioning. And also the same thing holds for never takers. I start by conditioning on both Z =1 and never takers, but among never takers Z has no impact. I'm randomizing Z but Z doesn't have actually influence anything in this population and so I can drop the condition on Z =1. I can't do that for the compliers, because among compliers, Z is having an impact, Z is effecting what treatment they receive. So I can't drop it there. Okay, so I'm just simplifying this expression by dropping the Z =1 in those couple of cases. And then the other thing to note is that the probability of being an always taker given Z is the same as probability of being and always taker, and so on. So, above here, I didn't condition on Z =1 anywhere because that's having no impact on the size of these populations. So how many people are always takers has nothing to do with this treatment assignment because these are things that sort of exist in the population to begin with. So my population of compliers is defined completely independent from the result of some coin flip. Okay, so this is why that this expression I wrote on top is valid. And also I mention ways in which you could simplify it, which I'll show you in a little more detail. But now we can do the same kind of thing for the Z =0 group. For the Z =0 group, I can write the same kind of expression and I can get rid of the conditioning on Z =1 and Z =0 for among always takers and never takers, so I just described previously why you can do that. All right, so you'll see that these expectations are simplifying a little bit. So I can write that a little nicer, just like this. So, from this slide to the next one, all I've done is I've dropped these things here, these conditioned, the Z =0 and Z=1, from those groups. So that's the only difference on the slide now. And, remember, when we started out a few slides ago by talking about the intention to treat effect, which actually is the difference between this and this. Well, what we see is that there are some things in common between these two things. So we'll see that there are some things that are actually equal and so that when we take a difference, though, will actually go away. So I have this black arrow here and that's showing that, look, these two expectations times probabilities are the same. So, in both cases, the expected value of Y among always takers times the probability of always takers, well that's the same in each of these groups. And then we have, you know, the blue arrow which is the same kind of thing, the expected value of Y among never takers times the probability of never takers. That appears in both these expressions as well. And so, when I go back to my intention to treat effect, those are going to difference out, those are going to go away because I'm going to end up taking a difference in these two things. And that's all I'm doing here, is on the left, we have the intention to treat. So this is just the definition of intention to treat is on the left. And on the right we have the simplified expression now. So all I've done is I've taken the difference of these two of sort of big equations but dropped the things that cancelled out. And so we end up with this. So we end up with the difference in expected values over here, can be written as a difference in these expressions here, where this first one is the average value of Y among people who were assigned treatment among the compliers. So there's a subpopulation of compliers who were assigned to Z =1, and this is the same, what's the expected value of Y. But then times the probability that there are compliers. In other words, the probability of compliers means what proportion of the whole population are compliers. And then we have the same kind of thing for the Z =0 group, times the probability of compliers. So this expression holds for the reasons I showed on the previous slide. And so you might be wondering why am I doing all this. Well, what we're trying to get to, remember, is we actually are interested in a causal effect among compliers and now you'll see that we have compliers appearing as something we're conditioning on on the right hand side there. So we're starting to get closer to something that we ultimately want. So this implies that if we divide this expression over here, this difference in expected values, by the probability of compliers-- so that's what this part is. We've divided the left-hand side by the probability of being complier. Well, then that's just equal to this contrast, expected value of Y given Z =1 among compliers minus expected value of Y given Z =0 among compliers. So I did one step of algebra there, I just divided by the probability of compliers from both sides. And now we can rewrite this expectation. So we can go from this line to this line. We can now write it in terms of potential outcomes because, remember, first of all, Z was randomized. Second of all, we're looking at the subpopulation of compliers. So, because we're randomizing Z, and these are compliant people, among these people we're actually randomizing A. So we're directly randomizing treatment itself. So then we can write the expression now in terms of potential outcomes of treatment received. And, remember, this is the thing we actually wanted. This is a complier average causal effect. So the complier average causal effect can be written in this way right here. So what we have on top is an intention to treat and then divided it by the probability of compliers. So we'll look at that a little more carefully. So the complier average causal effect we just showed can be written as what we have here. So on the top we have a difference in expected values of Y given Z =1 versus =0, and in the denominator we have the probability of compliers. And I also mentioned here that the probability of compliers is just this thing: expected value of A given Z =1 minus the expected value of A given Z =0 [E(Y|Z = 1) - E(Y|Z = 0)]. Then I ask why. So this is another good thing to think about, pause the video, see if you can figure out for yourself why it's true. Assuming you've done that and we're back, we can look at it. So, the expected value of A given Z =1, that's the proportion of people who are either always takers or compliers, right? And so the expected value of A, because A is binary, A is just treatment yes or no, expected value of binary variables just the probability of it. So expected value of A given Z =1 is the probability that A =1 given Z =1. So, in other words, it's the proportion of people in the population who would take the treatment if they were assigned the treatment. While that's the population of either always takers or compliers. It's a union of those sets. So what proportion of the whole population are either always takers or compliers? So that's the expected value of A given Z =1. The expected value of A given Z =0, that you could also think of as a probability of taking treatment if you were assigned Z =0. Well, that's the population of always takers, because we've assumed there's no defiers, right? So those are the always takers. And now what happens if you take the difference of those two? Right, if we take the difference, this minus this, the always takers cancel out and you're just left with the proportion of compliers. Okay, hopefully that's clear. So the proportion of compliers, the probability of compliers, that's the same thing as -- so I can read and write the compliers causal effect as this ratio. Where on the top here, this is the intention to treat effect. So that's a causal effect of treatment assignment on the outcome. And, remember that something, this is something we can identify from data, right, because these are means of involving observable variables. So we observe Y and we observe Z. So we can identify or estimate the intention to treat effect. Our denominator is a causal effect of treatment assignment on treatment received. Right, so this is sort of... this is like the causal effect of treatment assignment on treatment. And this is also something we can observe because we've randomized Z, A and Z are observed so that we can also identify this. So that means we can identify the complier average causal effect. Right, so everything in this expression are things that we can identify, that we could estimate from data, so we can actually estimate the complier average causal effect. So even though we don't know for sure who the compliers are, we still can estimate a causal effect among compliers and it's just this ratio of the intention to treat effect divided by the causal effect of treatment assignment on treatment received. And so if you had perfect compliance, so if there was no noncompliance, then that denominator would just be =1 and the complier average causal effect would be the same thing as the intention to treat as effect, they would just correspond with one another. And also note that the denominator is going to be between 0 and 1. Right, so it's a proportion of compliers. And so what happens then is we're it's going to be dividing by something that's between 0 and 1. It's essentially going to increase, it's going to sort of blow up the ITT effect. So it's going to make the -- you'll take the ITT effect, which is in the numerator and I'm going to make it bigger. We're going to be dividing by something that's between 0 and 1. So the ITT effect is an under-estimate of the complier average causal effect. And hopefully that makes sense because what happens is in the intention to treat effect, you have some people sort of not -- we have some people who are assigned treatment who don't take it. And so, in that case, it sort of diminishes the true causal effect of treatment because you have some people who you're kind of treating as if they were treated but they really weren't. And so that sort of diminishes the causal effect of treatment itself. So then, by dividing by the proportion of compliers, you sort of get back to the thing that you might want, which is a complier average causal effect. So, as a big picture kind of summary, instrumental variables require two key assumptions and the strongest of which is the exclusion restriction, but just having a variable alone is not enough to be able to identify causal effects. We need to make some kind of additional assumption or assumptions, one that works is a monotonicity assumption. So, if we make the monotonicity assumption, we can identify complier average causal effect, which is also known as a local average treatment effect, and it happens to just be, the estimate report happens to just be the standard intention to treat effect estimate divided by the proportion of compliers. And, in general, the ITT effect will be an under-estimate of this sort of local average treatment effect complier average causal effect.