Previously we talked about statistical significance.

But in general, in genomic studies,

you're often consider, considering more than one data set at a time.

In other words, you might be a, analyzing the expression of every one of the genes

in your body, or you might be looking at hundreds of thousands of millions of

variants in the DNA, or many other multiple testing site, type scenarios.

So in these scenarios what you're doing is you're calculating a measure of

association between say some phenotype that you care about say cancer

versus control and every single data set that you collected.

Say, a data set for each possible gene.

So in this case what's happened is people are still applying

the hypothesis testing frame work.

They're using P-values and things like that.

But the issue is that, that framework wasn't built for doing many,

many hypothesis tests at once.

So if you remember when we talked about what a P-value was,

it's the probability of observing a statistic as, or more extreme,

than the one we, you calculated in an original sample.

And so what is it, one property of P-values that's very important, and

that we should pay attention to is that if there's nothing happening,

suppose that there's absolutely no difference between the two groups that

you're comparing, the P-values are uni, what's called uniformly distributed.

So this is a plot of some uniformly distributed data histogram.

On the x-axis you see the P-value, and

on the y-axis is the frequency of the number of P-value that fall to that bin.

And so, this is what the uniform distribution looks like.

And so, what a uniform distribution means is that 5% of the P-values will be

less than 0.05.

20% of the P-value would be less than 0.02, and so forth.

In other words, when there is no signal, the P-value distribution is flat.

So what does that mean?

How does that sort of play a role in a multiple testing problem?

And so here's an example with a cartoon.

Imagine that you're trying to investigate whether jelly,

jelly beans are associated with acne.

So what you could do is, you could perform a study where you compare

people who eat a lot of jelly beans and

people who don't eat a lot of jelly beans and look to see if they have acne or not.

And so if you do that, you probably won't find anything.

And so, at the first test, people go ahead and collect the data on the whole sample,

they calculate the statistic, the P-value's greater than 0.05, they conclude

there's no statistically significant association between jelly beans and acne.

But in the, you might consider, oh, well it might be just a kind of jelly beans.

So you could go back and test brown jelly beans and yellow jelly beans and so

forth, and in each case, most of the time, the P-value would be greater than 0.05.

And so it would not be statistically significant, and you wouldn't report it.

But then, since P-values are uniformly distributed,

about one out of every 20 tests that you do,

even if there's absolutely no association between jelly beans and

acne, about one out of 20 will still show up with a P-value less than 0.05.

And so a danger is that you do these many, many, many tests and

then you find the one with P-value is less than 0.05 and you just report that one.

So here's an example where there's a news article saying that green jelly beans have

been linked to acne.

So that's again, whether it's either reporting this with a statistical

significance measure that was designed when performing one hypothesis test, but

in reality they per, performed many.

So how do we deal with this?

How do we adapt the hypothesis testing framework

to the situation where you're doing many hypotheses?

Tests.

So the way that we do that is with different error rates.

So the two most commonly error rates that you'll probably hear about when doing

a genomic data analysis are the family wise error rate and

the false discovery rate.

So the family wise error rate says that if we're going to do many,

many hypothesis tests we want to control,

control the probability that there will be even one false positive.

This is a very strict criteria.

If you find many things that are significant, and

a false family wise error rate that's very low,

you're saying that the probability of even one false positive is very small.