0:03

Hi, my name is Stephanie and

I'm a PhD student in Duke in the Statistical Sciences Department.

>> I'm Willem van den Boom,

also PhD student in the Department of Statistical Science.

And here, we have Jim Berger who is a professor in our department, as well.

To start off with, how did you get interested in Bayesian statistics?

0:23

>> That was a long time ago.

My original research was in instructive destination which arose from a famous

resulted trials time about 50 years ago where Li discovered that the very commonly

used least square estimator, or say regression parameters, was not optimal

from a frequencies perspective if there were three or more regression parameters.

0:44

This calls a big flurry in the community and it was a lot of research on this.

The curious part about this shrinkage estimator is you had to

shrink say the least square estimator towards some point.

And as soon as I started doing research and it kind of went well.

Which point do I shrink to?

because could I pick any point and quickly I started saying, well,

you have to use Bayesian prior knowledge to decide where to shrink, and

that was my first interest in Bayesian.

1:10

>> I see, so in your Decision Theory book, you mentioned that by that

via writing the book, you became a rapid Bayesian, so why is that?

>> That was more for, I guess I would call it philosophical reasons,

where I was looking at foundations of statistics and

one of the things I encountered was something called the Likelihood Principle

which is my favorite principle in statistics.

Basically, what it implies is that this common frequentest

measures of statistics are not appropriate.

Now, that by itself is not a big deal except for

the fact that the Likelihood Principle was shown to follow from to other

principles which everybody believed in.

One was the sufficiency principle, and the other is the conditionality principle

which the conditionality principle basically says if you're given 2 different

measuring instruments and you better measure something with it and

1 of them has a variance 1, 1 has variance 3.

If you get the measuring instrument with variance 3 to report a variance

3 as your error or do you say I could have gotten the other instruments that

report 1 plus 3 over 2, now that thing is nuts to everybody.

Obviously you should use the measuring instrument that you were given but

if you simply believe that together with sufficiency, that implies the likelihood

principle, which in turn more or less throws out all the frequentist statistics.

And so, at that point I was astonished, and sort of began to more heavily

embrace the Bayesian world and started using it in applications.

>> So what applications have you found where Bayesian statistics proved useful?

3:02

One, Bayesian analysis is often associated with the use of prior information,

like I just mentioned earlier.

And one application as an example I did 20 years ago,

was something where you had to use prior information to solve the problem.

It was a problem what we were with the,

actually all of the automotive companies in Michigan.

A problem of trying to assess how much fuel economy gain was possible,

this had to do with government regulations.

We built high replica models.

We built a huge complicated high replica models with all sorts

of different vehicles, all sorts of different manufacturers,

all sorts of different car parts that might affect fuel efficiency.

And then, after we had done this,

we found that there is a part of the problem where we had no data about.

And so, we had to go back to the engineers and do about two months long of

elicitation of their expert knowledge in order to complete the problem.

So here was a problem that would have been impossible to do without

the Bayesian both because the subject elicitation was necessary and

the super complex hierarchical model could not have been in the Bayesian way.

A second class of problems I would say are problems that I just mentioned,

easier in a Bayesian way.

4:26

They are very prominent in, say the chemical industry.

And with variance component models it's very straightforward

to do a Bayesian analysis but it's very difficult to do other kinds.

For instance, maximum likelihood is a very common alternative approach

4:44

that's used by non-Bayesians.

But the likelihood in various component problems are very nasty.

You often have loads of zero and

things like that which make it really difficult to analyze.

So there's a lot of problems that fall in that category which is so

much simpler to do than Bayesian.

A third class, I guess I will call it understanding statistics and

when you get to problems like testing, non-statisticians

don't understand what common statistical test mean for instance used of P values.

They don't understand what that means.

The Bayesian answers like the posterior probability that the hypothesis is true is

so much simpler and easier to understand.

And I think using Bayesian analysis on these problems just makes all

the non-statisticians more capable of doing good statistics,

because they understand it.

Then, there's entire fields of research that are almost entirely Bayesian.

The one I've been working in the last ten years is called uncertainty

quantification of stimulation models.

And that's become essentially completely Bayesian for

reasons that is the only way to do the problem.

>> So what is uncertainty quantification of simulations?

6:23

I started working, again, with the car companies on this, with General Motors.

Where their goal was to essentially, in the computer, create models of vehicles

and the different parts of vehicles so they could experiment on the computer

rather than having to build $200,000 vehicle prototypes to test.

6:41

So a simulation model in this vein is big, usually applied math computer program.

That's very intensive to run.

And the question is,

is this computer model a good representation of reality or not?

And to answer that question you have to involve all sorts of data,

all sorts of statistics, and you just can't do it in a non-Bayesian way.

7:06

>> How can misuse of P values contribute to the lack of

reproducibility of research?

>> I mentioned that in testing, P values are not well understood.

And if you ask a non-statistician what a P value of 0.05 means,

they will typically tell you that it means some kind of error.

It's the probability that you're making a mistake in rejecting the hypothesis or

they might say, it's the probability the hypothesis is true.

They always think of it as an error.

It's not.

It's something completely different.

And if you look at true errors, that are, either Bayesian, or, what are called,

conditional frequentist, a P value of 0.05 corresponds to a true error of about 0.3.

So now you see the problem.

If a scientist publishes a paper that rejects the hypothesis, and

proves a theory they're trying to establish, and the P value is 0.05,

they think they have only a 5% chance of being wrong.

Actually, it's a 30% chance of being wrong.

And so,

a lot of the papers published in science are simply wrong because of this reason.

In science, that often will get corrected,

because if it's an interesting result, somebody will go back and look at it and

redo the experiment and discover that that really wasn't a true theory.

8:34

experiments are not reproduced for reasons of cost.

And so, you really have to get it right the first time.

And so, in industry if P values are misinterpreted,

it can be really dangerous and financially costly.

So I think there, there is especially strong reasons to use Bayesian methods.

8:54

>> Great, well, thank you for letting us interview you today.

That was all very interesting and useful, thank you.

>> Well, my pleasure, and I hope the rest of the course goes very well.