In this video, we'll talk about the introductory ideas associated with nonresponse adjustments. So the first thing that we need to think about are some missing data mechanisms. And the three that I'm going to describe in more detail in a bit are missing completely random MCAR, missing at random MAR, and non-ignorable non-response NINR. Now whether a unit responds or not is treated as a random event. When responses categorize, it's one of these. Now you could, in contrast, set up the whole thing as being deterministic, that is, the unit is either going to respond or it will not. It will not respond with probability 1. The world is divided into two kinds of units, respondence, non-respondence, they're always like that. There's no randomness in whether a unit responds to a given survey request or not. The methods that we use in practice are the ones where we model things as being random or stochastic response. We can work out the math in a useful way in that sense. Now certainly, you can probably think of people or businesses or schools whose chance of responding to a survey is essentially 0. Hardcore non-respondents, they're just never going to cooperate. We can still do this stochastic kind of thinking if we're willing to say that, well, a hardcore non-respondent is kind of exchangeable in some sense with somebody who is willing to respond. So if you've got a 45 to 50 year old African American male who would never respond to a survey, if that person is replaceable or exchangeable in some statistical sense with another person in that age and race ethnicity category, then you're okay. And you can use this sort of thinking. That's the reasoning that we go through to do what we're going to learn about. So let's think more about what these terms mean. MCAR is a categorization that says every unit's got the same probability of responding. That means, you can think of responding as just another stage of sampling. It is kind of a Bernoulli sampling in which you draw a unit in the sample, and then it has a random possibility of responding. That means we could have just drawn the smaller size sample right off the bat and treated it like it was a random sample. So that's the easy case. You don't need any weight adjustment in means because if you make the same weight adjustment for everybody, it's just going to cancel out top and bottom in a weighted mean. You do need to make one overall adjustment for totals in order to get the scaling of the weights correct, but that's easy to do. Now this one, MCAR, we basically figure it never happens. It's just too simple. The world's a more complicated place than that, and we need to account for covariates. So that's where these next two categories come in. One is called missing at random. Now what does that mean? It means the probability of responding depends on some covariates. So if the probability of responding differs from age group to race ethnicity to educational level, but we can account for that in our estimation, then we're okay. We've got these covariates, sort of both the respondents and non-respondents, as it says here. Then we can concoct an adjustment that makes statistical sense. Now the bad one is this one right here, non-ignorable non-response. The idea there is not only does response depend on covariates, but it also depends on the things you're trying to measure. So for example, if nonresponse depends on age, race, and sex, and you're doing a poll to see who a person's going to vote for in the next presidential election, and whether they respond depends on who they're going to vote for. That's dependent on the analytic variable. Because the analytic variable is just the outcome, are you going to vote for candidate A or candidate B. This one is tough to deal with because we typically don't have enough information to account for that dependence on the y's. So we, basically, do the best we can with the known covariates and hope that that takes care of everything. So I should mention that these ideas, these missing data categories are due to Little and Rubin, who've both written quite a bit on missing data problems. Now, we've defined those mechanisms, so what do we do? Suppose its probability that unit i responds is called phi i. If we can estimate that phi i, then what we can do is just take 1 over the selection probability of the unit times that estimated response probability, and use that as a weight. And that would have a what we call a quasi-randomization justification. So your estimators will be unbiased, or approximately so, when we think about averaging over both the random sampling mechanism and this random response mechanism. Now how do we estimate phi? That's the next analytic problem. It's a prediction problem. We're predicting a 0, 1 variable. And this opens up several possibilities for estimation that you may know about already. But we'll give you the details of how you estimate phi for each unit in upcoming videos.