So now, for the case where x is discrete,

the sample means within co-vary classes can be used.

So, you just partitioning into covariate classes and doing the same thing.

But of course, if x is continuous or there are many levels of x,

we're going to need to introduce some modeling assumptions.

Standing back and looking at this case,

well I'm just going to start calling it case.

It's been argued that for a given intervention Z,

the policy relevant question is

the overall effect which is sometimes called the intent to treat or ITT estimator.

I think I've used that before of the intervention,

not the effect for some subset of units

who will comply with whatever assignment they're given.

So, for example, if Z is a medical treatment that has to be

taken in an unsupervised environment say at home,

the practical effectiveness of the treatment is

the effect in the context in which the treatment is delivered.

Furthermore, the case is not even the average effect

for all those who would take up treatment if it were offered,

as the experiment is not informative about units

who take up treatment regardless of their assignment.

In addition, even if you wanted to target

the treatment towards the units who might benefit from being offered treatment.

The compliers are a subpopulation that cannot be observed because we

cannot observe both potential outcomes M zero and M one.

These are some arguments against the utility of

the case as an estimand in a real practical context.

Now, on the other hand,

people have argued that cases sometimes more broadly representative of the population.

So, let's look at that.

So, when there are no defiers,

the average treatment effect is a mixture of

the average treatment effects among three subgroups,

the compliers, the always takers,

and then never takers.

Units with M zero equals M one equals zero,

and the always takers M zero equals M one equals one.

So, if the average treatment effect is the same for compliers and always takers,

this gives the effective treatment on the treated,

and if this extends also to the never takers,

the complier average causal effect is also the average treatment effect.

But of course, you'd have to make arguments for why that would be the case.

All right, substantive arguments,

and they really be tantamount to making assumptions.

Because you can't really know if it's true or not. That said.

Unfortunately, it's pretty common to see empirical work where

the ATT or the ATE seems to be the parameter substantive interests.

But where the case is estimated and extrapolated without

further argument or consideration to the parameter of interest,

that is you're estimating the case and some folks at doing empirical work seem to

forget that that's not the ATT or the ATE which is what they're really interested in.

So, if the compliers are a large majority of the population,

that problem may be not too bad.

But let's look at the Angrist, Imbens,

Rubin paper, great paper,

and the question of interests they start out with is the effect of

military service on excess civilian mortality,

and then we find out that the compliers are about 16 percent of the population.

So, they give you the complier average causal effect.

Now, in this case,

it might be reasonable to believe that never takers are

unhealthier than complies or always takers

and that the mortality rate would have been higher had it been

possible to take into account the mortality of this subpopulation.

So, it applies clearly to that 16 percent,

but how representative is that 16 plercent of all the other folks?

Now, to take confounders into consideration,

unless these are discrete and with few levels,

you can't proceed non-parametrically as we'd been doing above.

So, now let's suppose we have a random sample,

and let's suppose that the unconfoundedness assumption holds conditional on x.

So, let this f of M of Z equals M Y

given X denote the observable conditional distribution of the outcome

Y among subjects with the response M. Remember M is binary now as is Z.

So, therefore such distributions, you can write them out.

Now, if you use the monotinicity condition

which is M one greater than or equal to M zero.

Remember that's no defiers.

That implies its subjects with M zero equals one or always takers,

and thus if we see a Z is zero and M zero is one,

we know we're seeing in always taker.

So, that's why I've relabeled that as FA for always taker,

and zero for Z equals zero Y given X.

So similarly, subjects with M one equals zero are never takers by

the monotinicity and F M one equals zero is I've relabeled that F N one.

That's the distribution of the never takers under treatment assignment to treatment.

The remaining distributions.

If I see an M zero equal to zero,

when you're assigned the control group and I

see that you actually don't take up the treatment.

Well, you could be a complier.

You could also be a never taker. So, it's a mixture.

Similarly, if I see that when you're assigned

to take up treatment and you do take up treatment,

I don't really know if you're complier or whether you're an always taker.

That distribution is a mixture of those guys.

The always taker distributions under

the control group and that's when you're sign not to have

treatment and the distribution of never takers

when assigned to have treatment and the mixture probabilities,

and distributions you can identify those guys in

some instances under additional assumptions.

So, as an example,

suppose the outcome Y is continuous and

the mixture distributions are assumed to be mixtures of normals with these means.

So, Mu c zero X would be conditional on X.

The mean for the compliers under no treatment,

the mean for the compliers under treatment X et cetera,

and we might assume a common variant Sigma squared,

and a thing that's been done in the literature is that

the mixing probabilities follow multinomial logit model,

and you can see there's only two of them there because the other one by

default is the probability of being a never taker covariates X.

The assumption of a common variants you can relax,

but it can create some difficulties.

It's general phenomenon.

So, an assumption that was commonly made in these analyses is that I just

described is that the distributions and

never takers are always the same for Z equals zero and Z equals one,

which is basically a stochastic version of

the exclusion restriction that we made in

conjunction with the discussion of instrumental variables.

But it is important to note that this assumption doesn't require

the potential outcomes Y Zm as when we studied mediation.

They don't need to be well defined.

So, one of the earliest papers in this genre,

very nice paper was written by Little and Yau in

1998 that appeared in the Journal of the American Statistical Association.

Here, there are no always takers because you

really can't access treatment outside of the treatment group,

and so the analysis reduces a little bit and

that's a fairly common thing in actual experiments.