0:00

Hi, my name is Brian Caffo and this is Mathematical Biostatistics Boot Camp

Lecture twelve on Bootstrapping. Today, we're going to talk about the tool

of bootstrapping which is an incredibly useful, handy result in statistics that

you can use in a variety of settings. It was made, sort of coincided with the

personal computer revolution. And so, it, it gives us a way to avoid an

awful lot of mathematics in biostatistics. Before we talk about the bootstrap, we're

going to talk about the jackknife, which is a precursor to the bootstrap that is as

its name suggests, a handy little tool. So, let's talk about the jackknife here

briefly, before we go into the bootstrap. The jackknife is as exactly as it's name

suggests, a handy little tool. The bootstrap, on the other hand, is like an

entire workshop of tools. The key idea in both the jackknife and the

bootstrap is, to use the data, so called resampling of the data, to get at

quantities that are difficult to get at otherwise.

For example, variances, and biases, and that sort of thing.

Now, we, we don't need either the bootstrap or the jackknife for something

like the sample mean where we know all its theoretical properties.

But, for other less obvious statistics, we need something that does it for us and,

You know, it'd be preferable if that something didn't require a year of

mathematics just to get us to the starting point.

And, in contrast, the bootstrap is you dream up a, statistic or something like

that. And you want to estimate a standard error

with it, you can start bootstrapping it immediately.

So, Let's talk a little bit about what the

jackknife does before we begin with the bootstrap because sort of, historically,

the jackknife came first. The first use of the jackknife was by the

statistician, I mean.

A butcher. His name, but I think it's pronounced

Quenouille. And, he used the jackknife to estimate

bias, I believe. Then, the jackknife was really popularized

and further refined by the extremely well-known statistician, John Tukey, who

we talked about a little bit in the lecture on plotting.

Tukey had numerous inventions including the fast discreet Fourier transform.

He coined the term bit for binary digit. He was the first person to do that,

And he did lots of things. He invented the box plot.

I think when you see it, you'll conclude along with me, the jackknifes a handy and

incredibly clever thing for someone to think of.

So, the idea behind the jackknife and similar to the idea behind the bootstrap

is to, you have something you don't know, like the bias in a statistics, or the

standard error of a statistic, and the idea is to use the data to get a sense of

it. Wellm what the jackknofe does is it says,

okay, well, one way to get at these quantities is to take one of the

observations out. And then, formulate the statistic on the remainder and see how

well the statistic does, you know, at estimating that one pulled out

observation. And this is very related to the idea that,

you know, frequently you hear of in machine learning and statistical

prediction of so-called cross validation. The jackknife tends to have a different

goal. In that, the goal of the jackknife tends

to be bias estimation, or variation estimation.

But the principle is very similar in that you're deleting observations.

Leave one out cross validation is typically used as an estimate of

prediction error. So, anyway, let's just focus on the

jackknife. And if you take classes in machine

learning or something like that, you'll talk about cross validation.

The jackknife deletes one observation and calculates whatever estimate you're

thinking of based on the remaining n - one of them.

And then, it uses this estimate based on n - one of them in which you get n estimates

having left out one observation one at a time.

It uses these n estimates to do something like, estimate biases and standard error.

And again, no, we don't need this for the sample mean. We know that the sample mean

is unbiased under certain assumptions, and we know exactly what the standard error of

the sample mean is under the standard setting. So, the jackknife isn't necessary

for those settings, but it's, maybe necessary for other ones.

So, let's just consider the jackknife for univariate data.

And let's let x1 to xn be a collection of univariate data points where we want to

estimate a parameter theta. And so, let's let theta be the estimate

based on the full data set. And then, let's let theta hat sub i.

Be the estimate of theta that you obtain, where you use the n - one observations

obtain by deleting observation i.. And then, let's let thta bar be the

average of the leave one out estimates. So, with that notation in mind, the

jackknife estimate of the bias of our statistic theta hat, is just n - one theta

bar minus theta hat. So, let's kind of consider the principal

of this before we've, get to why in the world that n - one is there.

So, theta hat is our estimate. Looking at how close it is to the averages

of estimates where we deleted an observation each time,

Is exactly going to give us a sense of kind of population level bias.

And then, you might wonder, where in the world does this n - one come from?

It's, factor that's based on the, For example, the sample variants where you

would experiment the bias of the sample of variants, it would give you the correct

answer. The n - one is sort of calibrated by,

statistics that we actually know. So again, this estimate is really related

to how far the average delete one estimate is from the actual estimate.

And then, this n - one is just a factor that was sort of, a good estimate of what

is the appropriate multiplier to have to get the bias to be an estimate of the true

bias. And then, the jacknife estimate of the

standard error is n - one over n times the sum of the squared deviations of the

delete-one estimates around the average of the delete-one estimates.

So, it's sort of like the square root of n - one times the variance of the delete-one

out estimates. You, so, again, the rationale for this

factor out front, The extra n - one, why not just, why not

just take the variance of delete-one out estimates as an estimate of the standard

error of the statistic? Well, it turns out that delete-one out

estimates because they have the majority of the data.

They have n - one of the data points included.

They tend to be quite close to one another,

And excessively close to one another. So, the variance, by itself, is not a good

estimate of the standard error of the statistic.

So, we need a fact, and they calibrated that n - one is a reasonable factor to do

that, And the same thing is true with a bias.

That, to delete-one out statistics tend to be a little too close to one another

unless you sort of multiply this by its estimate by a little, but you don't get

reasonable estimate. So, let's go through an example.

So, we had 630 measurements of gray matter volume from workers from a lead

manufacturing plant. The gray matter volume wound up to be

about 589 cubic centimeters. And, we want to estimate the bias and the

standard error of the median. And then, I'll come back to this

discussion of jackknife the median because that's where we're going to move forward

to the bootstrap. So, for example, the gist of the code to

do this. Now, you don't actually have to execute

the code. I'll show you in a page, how to do it.

But, you can do it in any language, not just R.

You just have to figure out how to delete observations one at a time.

So, let's let n just be the number of observations we have.

Theta hat is the median of these grey matter volumes.

And then, the jackknife estimates are the median that I obtain each time where I

delete the i-th observation, This sapply function is exactly that.

Then, theta bar, just exactly from the notation from the previous couple of

slides is just the mean of these delete-one out jackknife estimates.

Then, my bias estimate is going to be n - one times the difference between theta bar

and theta hat. And, the standard error is going to be the

square root of n - one times the average squared deviation of the jackknife

estimates around the average of the jackknife estimates.

And then, on the next page, it's a lot easier to do this. [laugh] If you want to

just use the software in the bootstrap library, you can jackknife, out is the

jackknife function is the list of my grey matter volumes and the function I want to

calculate the jackknife estimate of is the median.

And then, I assign that to a variable out, then I pick out the standard error and the

bias calculation. Both methods yield a estimated bias of

zero and a standard error of 9.94. And,

There's an odd little fact. The jackknife tends to work well for sort

of smooth functions, and empirical quantiles often don't satisfy that

requirement. The median is an example.

So, it's an odd little fact the jackknife estimate of the bias for the median is

always zero when the number of observations is even.

So, the medians an example where the jackknife isn't that good of a thing to

do. In general, if your function of the data, a nice smooth function,

The estimate that you're getting is a nice smooth function of the data, then the

jackknife will work fine. But, if it's not, then it tends to work pretty poorly.

In that, there was a very well known paper by Efron, the inventor of the bootstrap

that illustrated this quite starkly. And the jackknife has been shown to be a

linear approximation of the bootstrap. So, if you're in some setting where it's

going to be difficult to program off the bootstrap, then doing a jackknife, which

is a pretty simple thing to do, is a handy little tool to use.

And then, just to remind you, you know, don't use the jackknife for sample

quantiles. It's a handy procedure and it works in a

lot of settings, but maybe not for sample quantiles, like the median, as it's been

shown to have some poor properties. And what could you possibly use then?

Well, why not try to use the bootstrap. So, let's move on to the bootstrap which

is maybe a little bit more of a complete toolbox but it's certainly a little less

compact of a tool than the jackknife in exactly the way the analogy to the tools

sounds like. By the way, the term bootstrap comes from

this idea of pulling one out by ones own bootstraps, right?

And, you know, of course, This has been discussed a lot.

It's kind of an unfortunate title for a statistical procedure, because it makes it

sound like the information's coming from nowhere,

Right? Because you can't pull yourself up from

your own bootstraps. It's physically impossible.

But, you know, there's been plenty of theoretical work that shows where the

information is coming from, from the bootstrap in, sort of, when it is

applicable. Another thing I would note is this idea of

pulling oneself up from one's own bootstrap is from the fable of Baron

Munchhausen. And so, there's a great movie called The Adventures of Baron Munchausen.

And it was done by some of the people who made the Monty Python series.

If you get a chance, you should, you know, in honor of this lecture, watch the Baron

Munchausen movie. But, at any rate, from that fable is where

the term, pulling oneselves up from one's own bootstrap comes from.

And then, that's where they got the idea for the name from this procedure.

Any rate, Back to the jacknife.

So, another way to think about the jackknife is this idea of so called,

pseudo observations. So, if you take n times theta hat minus n

minus one times theta hat sub i, you can kind of think of these as whatever

observation I contribute to the estimate of theta.

And then, notice that if, if the theta hat is the sample mean, then these pseudo

observations are exactly the data themselves.

So, it's sort of this idea of taking what worked in a very neat and tidy sense for

the sample mean in trying to extend the idea to other statistics.

And then, the sample standard error of these observations is the jackknife

standard error. And, the mean of these observations is a

sort of bias corrected estimate of the parameter that you're interested in.

So, it takes your ordinary estimate and attempts to correct the bias.

I have to admit, for my thinking about the jackknife, I kind of prefer to think about

it this way in terms of the pseudo observations than in the, sort of,

classical development of it.