Just like biology has a central dogma, statistics, and

in particular statistical inference, has a central dogma as well.

The central dogma is sort of the central idea that

explains what you're trying to do in the field.

And so, the central dogma of statistics has to do with this specific problem.

Suppose you have a huge population, like you see in the top left-hand corner, and

you might want to know something about that population.

In this case, it's an idealized example, so we might want to know how many pink and

how many gray samples are there.

So in general, the problem might be that measuring the whole population, or

taking measurements on the whole population, might be really expensive, or

it might be very hard to do for a number of different reasons.

And so, what we want to do is take advantage of, basically, probability to be

able to say something about the population without measuring the whole population.

So what we do is we take,

use probability to take a small sample from that population.

You may have heard of sort of a randomized sample,

there's a number of different ways you can use probability to get this sample, but

the idea is that you would like it to somehow represent the larger population.

So once you've taken that sample, we can maybe make measurements on the smaller

number of objects that we've collected here.

So we have these symbols in the lower right hand corner,

there's only three of them, so it might be relatively cheap,

or relatively easy to take measurements on it.

So we see that there are two pink symbols and one gray symbol, and

so then what we use is statistical inference

to make a guess about what the population looks like.

So we might say, you know, on average there are going to be more pink symbols

than there are gray symbols in the whole population,

because that's what happened in our sample.

And if we did the sampling right and the probability sampling right,

then that best guess might be pretty good.

Another important component of the central dogma of statistics is that

this best guess isn't quite enough.

So, we took a sample, we didn't measure actually everything in the population,

we only measured a subset.

So, it turns out that the whole, the, our best guess is actually, potentially,

kind of variable.

And so, it could be that the best guess is off in one direction,

we might actually have more gray symbols in the population.

Or, it could be in the other direction,

that it might be more pink symbols in the population.