Here's your data. When you're not really asking for output, you're just saying I need to make some changes to my data goes before the proc sort. >> Thus far, you've taken the first steps in a larger picture of statistical inquiry. You've identified a dataset, and then used exploratory data analysis to organize and summarize the raw data in a meaningful and informative way. The tools of exploratory data analysis, including Examination of frequency distribution, Graphical representations of your variables of interest, and Calculations of center and spread, help us to discover important features and patterns in the data and any striking deviations from those patterns. >> This all falls under the rubric of Descriptive Statistics. Put simply Descriptive Statistics aims to quantitatively describe or summarize a sample of data. >> Now you'll be introduced to Inferential Statistics, which is our ultimate goal. >> Inferential Statistics allow us to Directly Test Our Hypothesis by evaluating, based on a sample, a research question with the goal of generalizing the results to the larger population from which the sample was drawn. Hypothesis Testing is one of the most important inferential tools in the application of statistics to real life problems. It's used when we need to make decisions concerning populations, on the basis of only a sample. >> Inferential statistics allow us to directly test our hypotheses by evaluating our research question based on a sample, with the goal of generalizing the results to the larger population from which the sample was drawn. Statistical Hypothesis Testing is defined as assessing evidence provided by the data in favor of or against each hypothesis about the population. >> In order to really understand how inference works though, we first need to talk about Probability. Because it's the underlying foundation of all statistical methods. >> Here's the basic idea. As you know, statistics uses a sample to learn about the larger population from which the sample has been drawn. Ideally, the sample should be random so that it might better represent the entire population. It's very important to acknowledge though that this does not mean all random samples are ideal. Random samples are still random and therefore no random sample will be exactly the same as any other. >> One random sample may be a fairly accurate representation of the larger population while another random sample might not be accurate, purely due to chance. Unfortunately when looking at a particular random sample, which is what happens in statistics, we will never know how much that particular random sample differs from the population. This uncertainty is where probability comes into the picture. We use probability to quantify how much we expect random samples to vary. This gives us a way to draw conclusions about the population in the face of the uncertainty that is generated by the use of a random sample. As an example, let's suppose we're interested in estimating the percentage of US adults who favor the death penalty. In order to do this, we choose a random sample of 1,200 US adults and ask their opinion either in favor of or against the death penalty. >> An eye for an eye makes the whole world blind. >> I just don't think violence, like doesn't beget violence. >> Why don't you have them put to death in the way they murdered the person? >> Mom. >> We find that 744 out of the 1200, or 62% are in favor. Here's a picture that illustrates what we've done and found in our example. Our goal here is to infer, to derive conclusions about the opinions of the entire population of US adults regarding the death penalty, based on the opinions of only 1200 of them, can we absolutely conclude that 62% of the population favors the death penalty? Another random sample could give a very different result, so we're uncertain. Since our sample is random, we know that our uncertainty is due to chance. It's not due to problems of how the sample was collected. Therefore, we can use probability to describe the likelihood that our sample is within a desired level of accuracy. For example, probability can answer the question, how likely is it that our sample estimate is within 3% of the ACTUAL percentage of ALL US adults who are in favor of the death penalty. The answer to this question, which we find using probability is obviously gonna have an important impact on the confidence we can attach to the inference step. In particular, if we find it quite unlikely that the sample percentage will be very different from the population percentage, then we have good confidence that we can draw conclusions about the population based on the sample. So let's define probability a bit more carefully. [MUSIC] The gambling industry makes enormous amounts of money from probability. The casinos know all about probability, while the customers, the gamblers, are trusting blind luck. >> You've got the edge >> But you've got less of an edge if I've got a good memory and I know some rules of probability? >> Absolutely. >> Okay. Absolutely. >> Okay. So let's try one or two ideas about probability. If you cut the cards, what are the chances that it's going to be red? 50-50. >> Okay, so it's 26 out of 52, 50-50. >> And it was. >> All right, it was that time. What are the chances of cutting a spade? Now we got 13 out of 52, one in four chance. >> Correct. >> So the odds were always against me. >> Right. >> So what are the chances of you cutting a king when you've got the whole pack to cut from. Four of them are kings. >> Correct. >> 52 cards. One in 13 chance, yeah? So the odds were never high. >> Yeah. >> How about this one? What are the chances of cutting a king. And then if we do cut a king, we take that out, now we cut another king. What are the chances of cutting a king each time if after the first time you remove the card? >> I haven't a clue. >> [LAUGH] >> Let's just cut them. >> Look at that, now that's. >> Slight of hand. >> It's just the way I do it, I [LAUGH] >> Now let's just try it again, just for. >> Okay >> Close. >> Close, but not there, yeah. What are the chances of cutting a king or a club [LAUGH] >> I couldn't figure that out with a calculator. >> [LAUGH] Now that's quite tricky because we can't just say four chances of it being a king, 13 chances of being a club. Cuz actually it might be the king of clubs. >> Correct. >> Okay. And then one last thing to have a go at. Could we just take the four aces out. You've done this before haven't you? >> I have. >> [LAUGH] Now I got four aces there. Suppose we were interested in getting a particular order, hearts, clubs, diamonds, spades. What chance have we got that that's the order of those four, hearts, clubs, diamonds, spades? >> Slim and none. >> One in four chance that that's the heart. >> Correct. If it was the heart there's now a one in three chance that that's the club. If it was there's a one in two chance that that one's a diamond. And if it is, it's a dead certainty that that ones a spade. [MUSIC] 4th times a 3rd times a 2nd times a 1st. One in 24 chance. >> Probability is defined as the likelihood of something occurring; of an event occurring; the chance of something happening. Probability is a mathematical description of randomness and uncertainty. It's a way to measure or quantify uncertainty. Another way to think about probability is that it's the official name for chance. So what values can the probability of an event take? And what does the value tell us about the likelihood of the event occurring? The probability of an event ranges from 0 to 1. Let's start with those extremes 0 and 1. A probability of 0 means that the event has zero chance of happening. It will never occur. An event has a probability of 1 if it will occur for certain. In the middle, a probability of one-half indicates the event has a 50% chance of happening. In other words, the event is as likely to occur as not to occur. Any probability that it is greater than one-half indicates that the event is more likely to occur than it is not to occur. And a probability that is below one-half indicates that the event is more likely not to occur than it is to occur. Many people prefer to express probability in percentages. Since all probabilities are decimals, each can be changed to an equivalent percentage. So we're actually saying the chance that an event will occur is between 0% and 100%.