0:06

In fact, we're going to talk about probability sampling for three modules.

Â In this module, we'll talk about simple random sampling.

Â Then we move to clustered sampling in the next module, and then stratified sampling.

Â They're all related.

Â And be clarifying the differences between them in the coming modules.

Â 0:26

What is probability sampling?

Â From the early 20th century, probably-based or

Â random sampling has been advocated as an alternative to purposive sampling,

Â quota sampling and some of the other techniques that we talked about that were

Â used in the late 19th century or early 20th century.

Â 1:17

The population, we'll be talking about that a lot.

Â By population,

Â we mean the entity about which we are going to generalize from our sample.

Â So a population could consist of people.

Â It could be registered voters, it could be the residents of a particular country, or

Â it could be a population of firms or other organizations.

Â It's basically the larger set of units from which we intend to draw our sample,

Â and about which we want to make a statement.

Â 2:19

A sampling frame is a complete

Â list of the members of the population from which the sample will be drawn.

Â So whereas the population is an abstraction, registered voters,

Â the citizens of a particular country, the residents of a particular country,

Â a sampling frame is an actual list that will be the basis of our sample.

Â It's an actual concrete or real thing that we work with.

Â 3:03

There's households.

Â So that might be obtained, as a sampling frame, as a list of residential

Â addresses obtained from the post office or some other government agency.

Â Or perhaps a utility company that provides service to

Â all of the households in a particular area.

Â Telephone users.

Â A sampling frame might be a list of active phone numbers obtained

Â from a phone company.

Â 3:42

Firms in an industry.

Â We might make use for

Â our sampling frame of a list of members of a trade association.

Â Or a list of companies that have registered with the government in

Â connection with doing business in a particular area.

Â Now, I want to talk about simple random sampling.

Â This refers to the case where every unit in the population is equally likely to be

Â selected for the sample.

Â 4:10

Measurements in the sample provide direct estimates of population parameters.

Â So if we have a sample of registered voters,

Â and it's been drawn from the population of registered voters, we have a good sampling

Â frame consisting of a list of registered voters, the proportion that we measure

Â in the sample should be an estimate of the same proportion in the larger population.

Â 4:48

And then one issue is that if we are going to carry out a survey

Â with a sample based on simple random sampling,

Â it's normally necessary that contacting respondents needs to be straightforward.

Â That's, of course, easy with a mail survey or a telephone survey.

Â It can actually get more difficult if we're thinking about a household survey

Â that includes in-person visits.

Â 5:13

So one easy example, something that we can do with random sampling,

Â would be a household survey via mail,

Â where a survey is mailed to a sample of residential addresses.

Â We get sampling frame consisting of a list of all valid residential addresses in

Â a particular city, and we pick a certain number of them at random,

Â and we mail out a household survey.

Â That's straightforward.

Â What's the procedure?

Â Well, first we have to obtain a sampling frame.

Â Now, that can actually be one of the most difficult parts of conducting a survey.

Â It's fairly straightforward for certain things, like household surveys,

Â where we can get lists of valid residential addresses, or

Â surveys of voters, where we have lists of registered voters.

Â But it can be much more difficult for more specialized populations.

Â Professors, people working in a particular profession.

Â Or people that are actually trying to hide themselves, or

Â perhaps engaging in a behavior that they haven't made public, and

Â where there may not be a comprehensive list.

Â We'll talk about some of those issues in a later lecture.

Â Once we have our sampling frame,

Â we randomly select units from the frame to make up our sample.

Â This may be done with software.

Â So we can program a computer to generate random numbers, and use that to

Â pick the units within the sampling frame that will be part of our sample.

Â It may also be done by going down a list, if we

Â can come up with a comprehensive list of every element within our sampling frame,

Â for example, a complete list of all presidential addresses.

Â And then we can just select units at intervals defined by the ratio of

Â the population size to the intended sample size.

Â So, for example, if a list has 100,000 addresses,

Â and our intended sample size is 1,000, that is, 1 out of 100, then

Â we could go through our list of addresses and simply select every 100th address.

Â 7:16

Let's work through a simple example.

Â So imagine that we have a complete list of addresses for a city.

Â And here we have an extract which consists of 21 addresses on Main Street,

Â including some apartment buildings, so apartment 1, 2, 3, 4.

Â Now, if we wanted to construct a sample that consisted

Â of one out of every four households in the city,

Â 7:47

we could actually number the addresses,

Â 1,2,3,4 1,2,3,4 and so forth, as a first step to drawing the sample.

Â 7:57

So here we've done that numbering, 1,2,3,4, 1,2,3,4, etc.

Â So once we have that in place, we can simply go ahead and

Â select every fourth address like this.

Â We started with an offset of two, and then picked every fourth address after that.

Â And it turns out that that would produce a random sample

Â of addresses that consisted of one-fourth,

Â or one out of four, of the addresses in the city.

Â 8:59

More statistical power reduces the chances of failing to observe a relationship or

Â a difference that actually does exist in the population.

Â We refer to that sort of mistake or error as a Type II error.

Â It depends mainly on the strength of the relationship

Â that's actually in the population, the sample size, and then the criterion for

Â statistical significance that we're going to set.

Â So if we have a stringent criterion for statistical significance then,

Â in that case, we are probably going to set, we're going to need,

Â a large sample to get the statistical power that we need.

Â 9:40

Sample size as a share of the percentage of the population is

Â relatively unimportant.

Â So that's why typically, even for very large countries, like say, China,

Â typical surveys may just have a sample of 5 or 10,000 people.

Â Surveys are not much larger than they would be for the United States, or

Â even a much smaller country, because you don't get much bang for the buck by

Â shooting for a particular percentage of the population making up your sample.

Â Statistical power is driven by the size, the absolute size of the sample,

Â the number of cases.

Â 10:15

So I'm going to review and

Â talk a little bit about the advantages of probability sampling.

Â It's representative on all characteristics.

Â So when we have a genuinely random sample from a population,

Â whatever we measure in our sample will generalize to the larger population.

Â 10:38

There's no discretion on the part of the interviewer in terms of picking who they

Â want to interview, at least not if they've been trained properly.

Â Now, there are issues with response rates, and

Â so forth, that we'll talk about in a later module.

Â But if everything goes as planned, the interviewer doesn't get to pick and

Â choose the way they would with a quota or purposive sampling approach.

Â 11:07

Now, probability sampling can include some challenges.

Â Sometimes it's hard to find sampling frames, especially if we're looking for

Â a more specialized population,

Â a subset, consisting of people in a particular profession,

Â people with particular interests, or engaged in a particular hobby.

Â Non-response can be an issue.

Â We'll come back to this in a later module.

Â And, then, it can be logistically difficult and

Â expensive to have a simple random sample over a large geographic area.

Â And we'll talk about that in the next module when we talk about multi-stage

Â cluster sampling, which is one remedy.

Â 11:45

Now, I want to come back to the issue of sample size versus representativeness to

Â highlight it.

Â And I want to emphasize that large sample sizes do not compensate for

Â problems with representativeness, that basically a small

Â representative sample is always preferred over a large, unrepresentative one.

Â So that's why, when you look at typical surveys done for research,

Â they rarely have more than a few thousand respondents.

Â As long as the sampling is done properly, a few thousand respondents will give you

Â a good insight into the population that you're trying to study.

Â Whereas, perhaps online surveys, mail surveys that are done in an ad

Â hoc fashion that may have hundreds of thousands of responses,

Â are rarely used in serous research because it's not clear that they

Â are a sample that's actually representative of the larger population.

Â