0:01

So far in this unit, we work with

largest samples, where the success/failure conditions was met.

But what if it's not met?

Then what comes to our rescue is inference via simulation.

We did a little bit of this earlier in the

class when we worked over the gender discrimination case example.

So in this video, were going to review how we

set up a simulation assuming that the null hypothesis is true.

Because, remember that if we're doing any sort of hypothesis

test where the ultimate goal is to get a p-value, the definition

of the p-value stays regardless of what type of method you're using.

It's always an observed or more extreme

outcome given that the null hypothesis is true.

So we want to make sure that throughout our hypothesis

test, we act as if that null hypothesis is true,

and what that means is that we set up a

simulation scheme which assumes that that null hypothesis is true.

0:53

Let's give a quick example.

Remember this guy, Paul the Octopus, who became famous for predicting

correctly the outcome of soccer games during the 2010 World Cup.

So the setup that he had was, he was given these

two boxes, with a little bit of food in each of them.

And these boxes had flags of the countries who were

playing against each other in the World Cup that day.

And basically

the, his predictions were assumed to be whichever

box he chose to get the food out of.

So, he was, he became famous because he was actually able

to predict all eight World Cup games and predicted them all correctly.

We want to see, does this provide

convincing evidence that Paul actually has psychic powers?

In other words, that he does better than just randomly guessing.

Because in his

setup he had only two countries to choose from, if he is

randomly guessing, he would be expected to get right 50% of the time.

So the null hypothesis, which claims that no he does not have psychic powers, he's

simply randomly guessing, would set the, the true proportion of success to 0.5.

If he's doing better than randomly guessing, then

the alternative hypothesis should say that p is greater

than 0.5.

Let's check to see if our conditions for inference are met here.

We also know that the sample size is eight, or our number of trials is eight,

and Paul the Octopus guessed all of them correctly so p hat is one or 100%.

Let's check to see if the conditions for inference are met here.

In terms of independence, it seems like we can assume

that his guesses are independent of each other from one

time to the other.

In terms of sample size and skew, we need to check our success/failure condition.

We have eight trials times 0.5, our null value, gives us four.

So that it appears that our success/failure is not met.

Meaning that the distribution of sample proportions

cannot be assumed to be nearly normal.

Which means that we cannot use methods that rely on the central limit

theorem and the normality of the sampling distribution to find our p-value.

And this is one, one when once again

the simulation based inference comes to our rescue.

3:11

So how do we do simulation based inference?

Let try, let's try to remind ourselves.

Remember the ultimate goal of the hypothesis test is a p-value.

And the p-value is the probability of observed or more extreme outcome given

that the null hypothesis is true.

So what we want to do is to devise a

simulation scheme that assumes that the null hypothesis is true.

And we want to repeat the simulation

many times and record the relevant sample statistic.

Finally we calculate the p-value as the proportion of

simulations that yield a result favorable to the alternative hypothesis.

For those of you that remember the examples we

did earlier in terms of inference via simulation, these steps should make sense.

For those of you who do not remember them, then please revisit these

steps once again after we go through the calculations for this particular example.

So given that our null value is 0.5, how do we set up a simulation scheme?

We can use a fair coin and label heads as successes.

This is our correct guesses.

We could also use tail, but in this case we're choosing to use heads.

And one simulation can be comprised of a flip of the coin

eight times, and recording the proportion of the heads, the correct guesses.

Remember we're trying to simulate whatever Paul did as many times as

possible, and we need to think of his eight trials as one batch.

And we want at each simulation to recreate

that batch of eight trials, and calculate his rate of success.

Remember his rate of success was one.

And we were going to try to see if we leave things up

to chance, what does the rate of success come out to be.

5:12

So let's take a look to see how we can actually do this.

In our first simulation, we said that we're going to flip a coin eight times,

so lets flip the coin once, and it seems like we get a head first.

So that would be a success.

We can flip the coin one more time, another head.

Another flip of the coin, another head.

Another flip of

the coin, a tail. Another flip of the coin, another head.

We have three more to go in order to get to our eight tosses.

5:41

One more, another head.

Another head, and lastly yet another head. So in this case, our

sample proportion, or the proportion of success, is seven over eight or 0.875.

We record this

number, and we're going to collect these on the

dot plot, dot plot at the bottom of the screen.

For the second simulation, we once again have eight slots, and

we toss the coin eight times and we record the outcomes.

And in this case we have three out of five heads.

So our proportion of success here would be 0.375, and

we record that number on our dot plot as well.

Another simulation,

yet another set of eight flips, and then we

will want to count how many of those were heads.

So that seems like five out of eight, and we want to record that number as well.

We can keep doing this forever, and we would want to do it as many

times as possible, but for illustrative purposes, we're

only going to really get to ten simulations.

So let's say that at each iteration we're

collecting these data the simulated p hat, and finally

when we get to the last simulation again we roll the coin eight more times.

Seems like we have six out of eight heads, for 0.75.

So this is what our simulated distribution looks like for p hat.

Obviously, if in fact we actually had done a lot of simulations, as

we should, the shape of the simulation would look maybe only slightly different.

We would definitely have more

observations, and the shape should probably follow something similar to this,

but ten simulations is definitely not sufficient to make a call.

However, based on this and based on the definition of

the p-value, as the probably of observed or more extreme outcome.

So in this case, our observed outcome was 100% success.

So the p-value can be defined as what is the probability of

100% or more success, which doesn't even exactly make sense,

given that the true rate of success was only 50%.

We don't have any data, any simulated sample proportions that

actually fit the bill, so based on this simulation, our p-value is zero.

It's probably usually a good idea to say that it's almost zero.

And chances are if we had actually done

this properly with about 10,000 or so simulations

we would get a number that's small, which

would probably also yield a rejection of the

null hypothesis, but it may not be exactly zero.

8:17

Of course, when we're thinking 10,000 simulations, we would never think

of doing that by hand and we would use R for it.

So, the first thing we want to do is we want to load our inference

function that you should have been using in the labs and

we're going to use in order to do the simulation test.

We can then define what data from Paul the

Octopus looks like, that's eight yesses and zero noes.

And finally, we can write our inference

function as where we're estimating a proportion,

we're doing a hypothesis test using a

simulation method, and we’re calling the outcome yes,

a success.

Our null value is 0.5 and our alternative

hypothesis is looking for parameter greater than 0.5.

We’re looking for evidence of a parameter greater than 0.5.

In this case, the p-value with 10,000 simulations, which is the default for this

function, comes out to be 0.0037, meaning

that again we would reject the null hypothesis.

Let's think here though, what does

rejection of the null hypothesis here mean?

Does it mean that we found evidence that Paul is psychic?

Probably not, and chances are we've made some sort of

an error where the null hypothesis should not have been rejected.

We had a pretty small sample size. It appeared to show a trend in a certain

way, and what that, those particular data yielded is a small

p-value based on which, yes, we would definitely reject the null hypothesis.

But we might be making a type one error rejecting a null

hypothesis that says this octopus simply randomly picks when we shouldn't have.

The possibility would be to try to collect a little more data from

Paul the Octopus, but unfortunately he passed

away shortly after he became a sensation.