Before we start talking about how to do a cluster survey, I wanted to first talk about how to decide where to do your survey and whether and how to stratify. Where should you do your household survey? This sounds like a pretty straightforward question, and to a large extent it is. Your survey should cover the area that your program is targeting. What do I mean by that? If your program's stated aim is to improve health indicators, health status in a particular area, let's say a district, your survey needs to cover that entire area. If your survey, going back to my example of Simiyu Region, if your program is saying we're going to reduce neonatal mortality in Simiyu Region, your survey needs to cover all of Simiyu Region, even if you are actually only doing implementation activities in a subset of Simiyu Region. You need to be measuring at the same level in which you're saying you're going to have an impact on coverage or mortality or whatnot. If you're doing an evaluation, and that evaluation includes a comparison area, the survey should of course cover both the program and the comparison areas, so that you are able to do that comparison. Finally, if you're doing a baseline and an endline survey, those surveys need to cover the same area in order to be comparable. That, again, sounds like common sense and it is, but sometimes it can get complicated if district boundaries change between baseline and endline. Sometimes this happens, there's redistricting, district boundaries change, regional boundaries change, and suddenly the same district is covering a different area. So you may need to adjust and maybe, where at baseline you covered an entire district, at endline you're only going to be covering part of the district or you're covering a district and a half in order to maintain that level of comparability. But you'll need to talk to your funder about this because it really depends also on where you're implementing and where you're trying to draw inferences about. It gets complicated if district boundaries change. Now I want to talk a little bit about stratification. We use stratification a lot in household surveys for various purposes that we'll talk about. But first of all, what is it? When you select your sample for household survey, the simplest way to do it is to take a random sample of clusters across your entire study area. What do I mean by cluster? A cluster is just a group of households always grouped by geographical area. We'll talk in a lot more detail about clusters in the next lecture. But for now what you need to know is a cluster is a group of households. The simplest way to sample is to take a random sample of groups of households across your study area. Sometimes doing this can result in a sample size that is too small within a particular area of interest. Let's say you're doing your study in a region and you're interested in getting estimates in the rural part of the region and the urban part of the region. If you just take a random sample of clusters across that region, you may end up with only a few clusters in the urban area and not enough sample size in that urban area to draw precise inferences. One solution to this problem is stratification. What is stratification? Stratification is just the division of a survey population into a set of sub-populations which we call strata. For household surveys in low and middle-income countries, strata are usually geographic areas, like districts, regions, and urban and rural areas. The important thing is that strata need to be mutually exclusive. A single cluster or household can only be included in a single stratum, it should not be able to be included in more than one stratum. The statistical idea behind stratification is that populations should be similar within strata and different between strata. If that's the case, then stratification will actually improve the precision of your estimates, it will reduce the variance. This is usually but not always true with geographic strata. Usually households in an urban area are going to look more like each other than households in urban and rural areas, they will look different from each other. You can stratify by multiple factors, as we've mentioned, district or geographic area, area of residents like urban and rural. You will want to have at least two sampling units or households in each stratum. Just an example to make this more concrete, Let's say that you are evaluating a community case management program implemented in three districts. There are some implementation differences between the districts. You anticipate that the impact of the program will vary by district and therefore, you want to have district level estimates of your indicators. You also expect that the program will have different impacts in urban and rural areas, so you also want estimates for urban and rural areas. You decide that you want to stratify your survey by district and by urban and rural area of residence. This will then produce six strata, two strata in each district. Your first stratum is district A, rural. Second stratum is district A, urban. The third stratum, district B, rural, district B, urban and so forth. You can see here that essentially we have subdivided the survey area into six sub areas that are mutually exclusive. Then we are able to produce estimates for each of those subareas as well as the overall survey area. Why do we do this? I've alluded to some of the advantages before, but stratification, one of the main advantages for household surveys in low and middle-income countries is that it allows you to vary the sample size between strata to ensure precise estimates for each stratum. You can figure out what your minimum sample size is in order to estimate your indicators with your desired level of precision. Then you can, using stratification, make sure that you hit that minimum sample size in each of your strata. As I mentioned in the previous slide, stratification in theory also reduces sampling error relative to simple random sampling; relative to not stratifying essentially. I don't want to get into the reasons for that in too much detail, but just conceptually, the total variance in a population is the sum of the variance within and the variance between. In stratified sampling, the variance between becomes zero because we're only looking within the stratum. You're reducing the total variance to the variance within the stratum. In theory, the variance should be lower. The extent to which that is true depends on the survey and the survey population. What are the disadvantages of stratification? There are too many, but it does make the analysis a little bit more complex. If you are stratifying, generally that means that you will need to weigh your final estimates for the overall survey area because the probability of selection will be different in different strata. Then when you are calculating the standard errors and the confidence intervals for your estimates, you won't need to account for the effect of stratification. But there are ways to do that using major statistical packages that are fairly straight forward.