So, what if we don't have any theories about the connection between variables? Are we better at getting the relationship right? Well, in this next demonstration, I'd like you to look at the names of people as they appear on the screen. Try to remember as many names as you can. Were there more names beginning with A or names beginning with Z? Was there an association between position of the first letter of the name in the alphabet and the font size for the name? In other words, were there bigger fonts for names that came early in the alphabet or smaller fonts for names that came early in the alphabet? Think about that for a minute. What you recall? There was actually a correlation of 0.65 for names and font size. That's a very, very large correlation. Later in the alphabet, the bigger the font, but you don't have a theory about that association. So you're not likely to see it and that's true, even when the events are very clear as they were here, couldn't be clearer than there is the name, there is the font. When the timing is right between the events at a given occasion, in this case, simultaneously, you see the name simultaneously with the font. And even when the timing is right across occasions, so you don't forget the associations you've seen before. So if you have no expectation at all, you're very unlikely to see a correlation even though it's staring you in the face. Now instead of you indicating what the association is, I'm going to tell you about a few associations. There all real associations, I have no more tricks. Your job is to tell me what accounts for the associations. So in the 1950s, there was found to be an association between the number of popsicles sold in a given week and the number of new cases of polio. Why do you suppose that was the case? Let's think about that for a bit. Would it have made sense to ban the sale of popsicles? Couples who spend more time on wedding preparations are less likely to be divorced. Why do you suppose that is? If you have friends who are going to be married, should you encourage them to spend a lot of time on wedding preparations? Pipe smokers live longer than most other people. Why do you suppose that is? Should you take up pipe smoking? There are hugely more fatal accidents per vehicle mile for Ford F 150 pickups than for Volvo station wagons. Why do you suppose that is? Should you advise your friends to stay away from Ford pickups? So, let's have a brief quiz. See if you can match the driver with the auto. So, does this suggest and idea about why the difference might be there in auto accident rates? Well, for each of the correlations we've just been talking about, there's a big problem. The variables in those examples are confounded with other variables. A variable which is associated with both variables of interest, X and Y and which could explain the association between them. The variable of type of automobile is confounding with the variable of type of owner. It could be the type of owner that so heavily influences the fatality rate of different kinds of autos. And in fact, there are some very strong associations between auto type and owner type. Young males have more accidents than other demographic groups and they're more likely to be driving sports cars, muscle cars, hot rods and pickups. Middle-aged women who wear sensible shoes are more likely to be driving Buick sedans, station wagons and Priuses, but those young guys are dangerous no matter what they drive and the middle ages women are relatively safe. So with the other findings I was just talking about, let's play spot the confound. So we're looking for a C variable, which could be causing both A and B. In the 1950s, there was found to be an association between the number of popsicles sold in a given week and the number of new cases of polio. Why do you suppose that was? Can you think of a c variable here? Well, heat would be a pretty good candidate. People eat popsicles on the summer and they go to swimming pools in the summer. In swimming pools is where kids were catching polio in the early 50s. Couples who spend more time on wedding preparations are less likely to be divorced. Why do you supposed that might be? If you have friends who are about to get married, should you encourage them to spend lots of time on wedding preparations? What could the C variable be? Well, in this case, people who spend lots of time on wedding preparations are in general, different sorts of people than those who have more hurried arrangements. They tend to be better off financially. They tend to be older when they get married and they tend to have know each other longer, and all of those C variables are associated with greater longevity of the marriage. Pipe smokers live longer than most other people. Why do you suppose that is? Should you take up a pipe? Well, C variable here is that very different kinds of people are smoking pipes and other kinds of things or not smoking at all. People who smoke pipes tend to be in high status occupations. They lead relatively relaxed lives and they have better access to healthcare. So, we've started to talk about causality. Disentangling correlation and causality is going to be the topic of the next lesson. How do you prove that some relationship is causal? But first, I want to talk about the statistical concept that applies to both correlations and experiments. If you found a correlation between two variables of 0.35, should you believe if there's a relationship or not? Well, that depends on whether the correlation is statistically significant. Statistical significance is the probability that a result at least as extreme as the one obtained could have occurred given that there is in fact, no relationship. That's expressed as p less than some quantity. For example, p less than 0.05 which means the probability that the result obtained or even a stronger one could have been obtained even if there's no relationship is less than 5 in a 100. So if I say, I've found a correlation of 0.35 between the amount of fish people eat and the number of television shows they watch, is that something you should pay attention to if you're interested in that particular topic of what people eat and what they watch? Now if the correlation was based on three subjects, you wouldn't be impressed. you say, I have a friend Joe and he eats lots of fish and watches lots of TV and Bill doesn't eat fish really at all and hardly ever watches TV, you're not going to be impressed. In fact, you would have to have 30 cases in order to reach 0.05 level of significance. In the next section, we'll be talking about experiments. Research of a kind that's usually essential if you want to do what correlations normally can't. Namely, show whether one of your variables actually exerts a causal influence on the other.