0:03

So, this problem is particularly useful for motivating so-called Bayesian

Â analysis. And, we've spent a lot of time in this

Â class talking about frequentist analysis in the form of confidence intervals.

Â And we've spent a fair amount of time talking about the likelihood,

Â Probably, more time devoted to the likelihood than most introductory

Â statistics courses deal. So, we need to give at least some time to

Â talk about Bayesian statistics. So, here's how Bayesian statistics works.

Â So, Bayesians have to pose it a prior on the parameter of interest.

Â The prior is a density or mass function, It's a probability distribution on the

Â parameter where the probabilities, at least in the classical Bayesian sense,

Â represent our beliefs about that parameter.

Â And then, the likelihood is the component of the Bayesian equation that depends on

Â the data, the objective part. And then, the posterior we're going to obtain as the

Â likelihood times the prior. So, this is exactly like, if you remember

Â back when we were thinking about diagnostic tests.

Â We had, say, for example, some prior belief that a person had the condition

Â that the test was trying to diagnose. We have the data, which is the result of

Â the test. And the posterior odds of the person

Â having the disease or whatever the test is testing, wound up being related to the

Â likelihood ratio times the prior odds. And so, there's, it's the same exact sort of

Â relationship here, posterior equals likelihood times prior.

Â Now, I have to put a proportional to sign here because it's not exactly equal to,

Â we're off by a constant or proportionality. But, it's easiest to

Â think of as this way, we take out likelihood, we multiply it by our prior

Â and we get our posterior. The rub in Bayesian statistics Bayesian

Â statistics is very neat and conceptually clean way to think about statistics.

Â The rub is really in here, specifying the prior.

Â That's where we get into trouble in Bayesian statistics.

Â And we'll talk maybe a little bit about that.

Â But mostly, in this class, we're just going to talk about the mechanics of how

Â you go about performing a Bayesian inference.

Â And then, you can take later classes to delve into the specifics of all the

Â different ways in which Bayesians can think about doing analysis.

Â So, let's talk about how we can specify a prior for our binomial proportion.

Â So remember, our binomial data is discrete.

Â It can take only values between zero and n, but the proportion that we're trying to

Â estimate is a number that, let's say, we're going to treat as if it's

Â continuous. So, if we're going to specify a

Â probability distribution on that parameter, it's going to have to be a

Â continuous distribution. So, we need a continuous distribution

Â that's bounded from below by zero, And bounded from above by one.

Â And ideally, it would be a nice distribution that's easy to work with.

Â Well, there is one such distribution, it's called the Beta Distribution.

Â So , the beta distribution winds up being kind of a default prior for binomial

Â proportions. And the beta density depends on two

Â parameters, alpha and beta. Don't confuse the alpha here from the

Â alpha earlier on in the lecture that was related to the coverage rate of the

Â confidence interval. So, it depends on two parameters, alpha

Â and beta. And the beta density looks like this.

Â It's this so called gamma function. Gamma of alpha plus beta divided by gamma

Â of alpha times gamma of beta. And then, p raised to the alpha -one times

Â one minus p raised to the beta -one. And here, p is allowed to range between

Â zero and one. This constant term out front,

Â Gamma of alpha plus beta divided by gamma of alpha times gamma of beta,

Â That's simply the constant of proportionality that you have to obtain to

Â get this integral, Integral p to the alpha minus one, one

Â minus p to the beta minus one, to integrate to one.

Â So, you had some problems very early on in the class where if you had a kernel of a

Â density, in this case, p to the alpha minus one, one minus p to the beta minus

Â one, that had a finite integral, what you had to do was divide that function by its

Â integral over the whole range of values, and you get a proper density.

Â And that's exactly what people did to get to beta density.

Â So, here is this density. It does integrate to one.

Â And maybe it's a little bit beyond the scope of this class to verify that it

Â integrates to one. So, let's talk about some of the

Â properties of the beta density. So, the mean of the beta density is alpha

Â over alpha plus beta. And remember, alpha and beta are positive.

Â So, alpha over alpha plus beta has to be a number between zero and one.

Â So, we're good that the mean of the density lies in the range of values for

Â which the density is greater than zero. The variance of the density works out to

Â be alpha times beta divided by alpha plus beta squared, alpha plus beta plus one.

Â And we've seen special cases of the beta density before.

Â Take the special case when alpha equals beta equals one.

Â Well then, p to the alpha minus one, one minus p to the beta minus one, that all

Â just goes away and this density is just a constant between zero and one.

Â And we may not know what the gamma function of alpha plus beta over gamma of

Â alpha times gamma of beta is, But you don't need to because you know

Â that the density is a constant density between zero and one.

Â It has to be exactly the uniform density then.

Â So, the uniform density is exactly a special case of the beta density.

Â Here, on the next slide, I plug in a bunch of different values of alpha and beta and

Â I show you the shape of the uniform density.

Â So, if I plug in alpha equal to beta equal to 0.5, then I get something that looks

Â like a U shape. And, it heads off to infinity, height of

Â infinity at both the zero and the, as it heads towards zero and one.

Â If alpha equals 0.5 and beta equals one, it looks like this shape right here.

Â And then, as beta gets larger and larger, the rate at which it drops down to zero as

Â p approaches one gets faster. And then, of course, it just reverses

Â itself. If beta is 0.5 and alpha is one, or alpha

Â is two. Again, here's the uniform distribution

Â when alpha and beta are both one. If you plug in an alpha of one and a beta

Â of two, you just get a line pointing downward.

Â If you plug in an alpha two, beta of one, you get a line pointing upward.

Â Probably, the most kind of typical-looking cases of the beta is when the alpha and

Â beta are both greater than one, and then you get a hump-shaped density.

Â If they're equal, it's centered right at 0.5, and as alpha and beta get bigger and

Â bigger it gets more peaked around 0.5. But, by allowing alpha to be bigger than

Â beta or beta to be bigger than alpha, you can get this to be a distribution that's

Â skewed towards zero or skewed towards one. So, you can get quite a few shapes from

Â the beta density by playing around with alpha and beta.

Â So, if you're Bayesian, what you need to do is you need to pick values of alpha and

Â beta that represent where the shape of the density represents your beliefs about the

Â perimeter p. And then once you do that, then you can

Â start doing Bayesian analysis. So, here on the next side, we need to

Â choose values of alpha and beta so that the beta prior's indicative of our degree

Â of belief regarding p in the absence of data.

Â And then, we're going to use the rule that the posterior is the likelihood times the

Â prior. And again, because we're talking about

Â constance of proportionality, we'll throw out anything that doesn't depend on p.

Â So, in this case, the posterior is proportional to the likelihood, which is p

Â to the x, one minus p to n minus x. And here, when I say proportional to, I

Â mean proportional to in the parameter, p. So, p to the x, one minus p to n minus x,

Â that's the likelihood. And we throw out the binomial constant and

Â choose x cuz that doesn't depend on p. And then, we have p to the alpha minus

Â one, one minus p to the beta minus one. And we throw out the ratio of gamma

Â functions because that doesn't depend on p.

Â Now, we multiply those together and we get p to the x plus alpha minus one, one minus

Â p to the n minus x plus beta minus one, and the posterior is a density, that has

Â this form. Now, we know it's proportional to that.

Â But remember, it's proportional to that but look,

Â This density that we see here is exactly just a beta density,

Â Right? It's p raised to some power minus one, one

Â minus p raised to another power minus one. In fact, the alpha is just now x plus the

Â prior alpha, and the beta is just the number of failures plus the prior beta.

Â So, we could even tell you what the ratio of gamma functions you would have to have

Â to make this posterior proper density, To normalize the posterior.

Â But, we don't need to do any calculations or integrals to do that.

Â We can do that just by looking at it and saying, oh well, if I take a binomial

Â likelihood and multiply it times a beta prior and think of that as a posterior

Â density, Then that posterior density has exactly

Â the form of the kind of core part of a beta density so that I know it's a beta

Â density. So, if the posterior is a beta density

Â with parameter alpha tilde equal to x plus alpha and beta tilde equal to n minus x

Â plus beta, we know lots of its properties. As an example, we know what the posterior

Â mean is. So, what to I mean by posterior mean?

Â So, the posterior is the distribution of the parameter given the data,

Â Right? So, the likelihood is the probability of

Â the data given the parameter. The prior is the probability of the

Â parameter disregarding the data. So, the posterior winds up being the

Â probability of the parameter given the data.

Â So, we can calculate, as an example, the expected value of the parameter p given

Â the data. And, because p is in the posterior, a beta

Â density, this works out to just be the expected value of a beta distribution

Â which we've learned earlier as being the alpha parameter divided by alpha plus

Â beta. So, in this case, it's alpha tilde divided

Â by alpha tilde plus beta tilde. Well, let's just plug in alpha tilde equal

Â to x plus alpha, And beta tilde equal to n minus x plus

Â beta. And here, I do some manipulations and show

Â that you can get down to the point where it works out to be x over n times n

Â divided by n plus alpha plus beta plus alpha over alpha plus beta times alpha

Â plus beta divided by n plus alpha plus beta,

Â Which is a mouthful, but let me go through each term.

Â X over n is the sample proportion, It's the MLE, it's p hat.

Â So, x over n, is p hat. Let's take this second term, n over n plus

Â alpha plus beta. That's a number that has to be between

Â zero and one because n is positive and alpha and beta are positive.

Â So, we have n divided by something that's bigger than n.

Â And then notice, okay, so we have this number that's between zero and one, let's

Â call it pi. Okay?

Â And then, alpha over alpha plus beta is the prior mean.

Â Okay? And then, this term right here, alpha plus

Â beta over n plus alpha plus beta, You can check yourself, that's one minus

Â pi where we defined pi just a second ago. So, this equation works out to be an

Â average of the MLE and the prior mean, Okay? Now, it's not an average in the

Â sense that it's 0.5 on both things, right? It's a simplicious average. So, pi can be

Â between zero and one. And then, hence, one minus pi is the

Â opposite. But that's exactly an average.

Â It's an average of the MLE. So, let me try to state that in English.

Â The posterior mean is the average of the MLE and the prior mean.

Â Now, the average is a very specific kind of average where it weights the MLE

Â different than the prior mean. So, let's look at these weights.

Â So here, let's suppose n is really big. Then, what happens to n over n plus alpha

Â plus beta? Well, this term pi gets very big.

Â It gets much closer to one, and hence, one minus pi gets much closer to zero.

Â So then, when n is very big, this mixture weights the MLE a lot more than it weights

Â the prior mean. In other words, as you collect more data,

Â your prior means less and the data means more.

Â What happens on the other hand as alpha and beta get very big and n remains

Â constant? Was alpha and beta get really big,

Â Then we get alpha plus beta over n plus alpha plus beta.

Â This one minus pi part goes to one, that gets very big.

Â So, one minus pi gets very big, and pi gets very small.

Â So, what happens as alpha and beta get big?

Â What does that means in terms of our prior?

Â Well, if you remember back from a couple of slides ago, the shape of the beta

Â density as alpha and beta got bigger and bigger, the shape of the beta density got

Â more concentrated around the mean. And what that entails is that it's saying

Â that our prior belief was a lot more confidence and it's specific value of p.

Â And so, what that implies is if we are incredibly certain in our prior, then that

Â swamps the data, Right?

Â If we're incredibly certain in our prior that swamps the data.

Â Our MLW has very little weight and our prior has a lot of weight.

Â And this actually explains a lot of politics for you, for example,

Â Right? So here, your opinion is a mixture of the

Â data and your prior beliefs. If you're immovable off your prior

Â beliefs, then it doesn't matter how much data you collect,

Â Tight? On the other hand, of course, if your, in

Â this case, if your alpha and beta are quite low, then you wind up that the MLE

Â dominates the posterior mean. So, let me just rehash this because it's

Â an important point. The posterior mean is a mixture of the MLE

Â p hat and the prior mean as pi goes to one it's end gets large.

Â And for large n, the data swamps the prior and the MLE dominates.

Â For small n, then the prior mean dominates.

Â So, when you have very little information, you rely on your prior knowledge.

Â The idea behind Bayesian statistics is that it should sort of generalize how

Â science is ideally working. As data becomes increasingly available,

Â prior beliefs should matter less and less. And then, again, prior that is degenerate

Â at a value, so as alpha and beta go to infinity, you wind up with a prior that is

Â 100% on a specific value of p. Then, no amount of data can overcome that prior.

Â So, let's also look at the posterior variance.

Â The posterior variance takes a nifty form as well.

Â So, let's look at the variance of p, The posterior variance, given the data.

Â So, p in absence of the data was beta with parameters alpha and beta.

Â P given the data, via the Bayesian calculation was also from a beta

Â distribution with parameters alpha tilde and beta tilde.

Â So, we can just calculate the variances directly the variance from a beta with

Â alpha tilde and beta tilde plugged in for alpha and beta.

Â And here, you see I plug in for alpha and beta x plus alpha for alpha tilde,

Â And n minus x plus beta for beta tilde. And you get this form.

Â So, let me let p tilde equal x plus alpha over n plus alpha plus beta, and n tilde

Â equal n plus alpha plus beta. Then, you wind up with the variance of p

Â given x works out to be p tilde, one minus p tilde divided by n tilde plus one.

Â Which is interesting, because it's not quite but very similar to the binomial

Â variance, the binomial variance being p times one minus p over n.

Â And so, the sample binomial variance would be p ha one minus p hat over n. So, it's

Â an awful lot like that. So, it takes this very, very convenient

Â form. And, in fact, let's go back to an earlier

Â point. If alpha and beta were both two, Then the posterior mean works out to be p

Â tilde, x plus two divided by n plus four. And the posterior variance works out to be

Â p tilde one minus p tilde divided by n tilde plus one.

Â So, this is exactly the sample proportion that we used in Agresti-Coull interval and

Â the posterior is almost the same, with the exception of this plus one.

Â So, what's a plus one among friends? So, we'll just say, it's roughly the same

Â variance as the Agresti-Coull interval. So, this one way to motivate the

Â Agresti-Coull interval, It is centered at the posterior mean and

Â it's standard error is not exactly, but almost the posterior variance.

Â So, you could view it as a normal approximation to a posterior interval. And

Â so, that's one way to motivate the Agresti-Coull interval is just to say

Â alpha and beta equals two from a Bayesian analysis, and you get something that's

Â very, very similar. So, let's go back to our previous example

Â and just do some of the Bayesian calculations.

Â Let's say x13 = thirteen and n20. = twenty.

Â So now, let's consider a uniform prior. Alpha equal beta equal one.

Â In that case, the prior is just one, a constant between zero and one.

Â What's interesting ab, in this case, about the uniform prior is that the posterior is

Â equal to the likelihood, Right? Because you have posterior equals

Â likely at times prior, in this case, the prior is just a constant one.

Â So, the posterior equals the likelihood. Now, you can't always get away with doing

Â this. This is particular to the fact that the

Â parameter that we're interested in is bounded between zero and one.

Â For example, if your parameter was anything between minus infinity and plus

Â infinity, you can't put a prior of one on that and have a finite integral.

Â Now, people have looked into that actually, and they said, well, maybe you

Â can do it. And, that's for part of the classes.

Â For this class, it's kind of nice to note that in this case, if we set alpha equal

Â to beta equal to one, we get a proper density exactly a uniform density, and our

Â posterior is exactly equal to the likelihood,

Â Which is interesting. If instead, we were to set alpha equal to

Â beta equal to two, remember this prior just looks like a hump right at 0.5, then

Â the posterior works out to be p to the x plus one, one minus p to the n minus x

Â plus one. And so, the very classical way to do

Â Bayesian analysis is you say that the prior is sort of governed by expert

Â knowledge, and the likelihood then is, of course, the objective part that's governed

Â by the data. And, of course, to say that it's the

Â objective part is a little bit misleading because someone had to subjectively elect

Â the model data as if it's binomial. So, there is of course, a subjective part

Â to the likelihood itself. But, you know, let's put that aside.

Â We have the supposedly objective part in the likelihood,

Â We have the subjective part in the prior and then the posterior is the mixture of

Â how you update your subjective beliefs with your objective prior knowledge.

Â That's the kind of classical Bayesian inference. But people said, well,

Â It's in many cases, many, many cases, people don't want statistics that depend

Â on expert opinions to start with. So, this idea of a subjective prior is

Â really is just not palatable to the idea of science.

Â So then, Bayesian's went back and thought hard about it and they said well, maybe we

Â can come up with priors that are sort of, go-to proiors for us.

Â Things that we can just use where we don't have to think about how to specify the

Â prior, it's so-called objective priors. And because of that, the collection of

Â Bayesian techniques then sort of ballooned onto a variety of different ways of

Â thinking about how to be a Bayesian. The only thing in common they have is that

Â they utilize the Bayesian machinery that the posterior is equal to the likelihood

Â times the prior. But then, they have lots of different ways

Â of thinking about it. And one way of thinking about it is the

Â so-called Jeffreys prior, where people said, well, maybe we can pick a prior that

Â has these specific mathematical properties.

Â And for this particular problem, the Jeffrey's prior sets alpha equal to beta

Â equal to 0.5. The uniform prior is another nice one

Â that's somewhat objective cuz we could say, well, why don't we put a constant

Â prior? That way the likelihood is the posterior

Â that seems pretty objective to me. There are problems with doing that. The

Â point is, is that, Uniformity on one scale is not uniformity

Â on another scale. So, the fact that the prior's uniform for

Â p means that it's not uniform for p2, squared, for example.

Â That you would calculate the distribution of p2, squared, it's no longer uniform.

Â So, a uniform distribution doesn't adequately represent absence of belief.

Â The, The problem with that is there's no

Â probability density that measures absence of belief about a parameter.

Â If, if you've written down a density, you've specified belief.

Â You've completely characterized it's probabilistic behavior. So, so anyway,

Â these are very technical problems with Bayesian analysis and they all kind of

Â revolve around, how in the world to we set this prior.

Â But, in this case, I think people would say the uniform prior seems pretty

Â reasonable, the Jeffrey's prior seems pretty reasonable.

Â And putting a prior that's humped at 0.5 because, you know, shrinking everything

Â towards 0.5 also seems pretty reasonable. All those things don't seem so bad.

Â And the benefit is, no matter what you choose, someone else could pick a

Â different prior as long as you gave them the likelihood, someone else could pick a

Â different prior than you. So, the, the idea that you could just pass

Â around the likelihood, and everyone could pick their own prior is also quite

Â palatable way to do Bayesian inference. So, I'm going to go through some pictures

Â just to show you and I fudged a little bit.

Â I'll tell you how I fudged a little bit on the pictures.

Â So here, I normalized everything so that it's one.

Â But then, here, in this first one the problem is that the prior heads off to

Â infinity near zero and near one. So, if I were to normalize it, I would

Â just get I can't divide by infinity so I, I fudged a little bit.

Â So, this U-shaped curve looks different than the U-shaped curve that I'm plotting

Â here. So, in order to get it on the same plot, I

Â fudged a little bit. So, if you try and do this, you'll see how

Â I fudged. But, okay. So, the U-shaped curve isn't to

Â the right scale but I put it on the same scale as the posterior in the likelihood

Â which both of those I normalize so its peak was at one.

Â So, the blue is the prior. In this case, the Jeffrey's prior to alpha

Â equal to beta equal to 0.5. The green is the likelihood and the red is

Â the posterior. So, you see what happens when you multiply

Â the green times the blue and then re-normalize,

Â You get a red curve that looks an awful lot like the likelihood.

Â So, in this case, the Jeffrey's prior doesn't move us off our likelihood very

Â much. And the posterior inference, which is

Â entirely based on this red curve, is pretty much exactly identical to the

Â likelihood. Then, of course, on the next slide, if the

Â prior is completely flat, the posterior and the likelihood are identical.

Â So, there is no green curve in this case, it's exactly underneath the red curve.

Â Now, let's look at alpha equal beta equal two.

Â Then, my prior is this hump shape at 0.5. You can see that my likelihood is the

Â green shape, and my posterior is the red shape.

Â And you can see it's ever so much shifted towards 0.5.

Â So, the red shape is the mathematical compromise between the knowledge codified

Â by my blue prior, And the objective part, codified by the

Â likelihood. Again, I should put objective in quotes.

Â Now, let's make it more extreme to kind of show you what's happening.

Â Let's put alpha2 = two and beta10 = ten. And then, the blue curve gets shifted a

Â lot towards zero. As beta gets much bigger than alpha, the prior becomes more pushed

Â up towards zero. As alpha becomes much bigger than beta, it

Â becomes pushed up towards one. And then, as, if alpha and beta are equal

Â and they get larger and larger, gets more peaked around 0.5.

Â So anyway, now we're all pushed up towards zero.

Â And you can see, here we have the blue curve is the prior, pushed up toward zero

Â because beta is much larger than alpha. And it has a finite maximum because both

Â of them are bigger than one. And then, we have the green likelihood

Â which has been constant through every, one of these pictures. And then, we have the

Â red posterior which is the compromise between the evidence represented by our

Â data and the assumed likelihood, and our blue prior which represents our knowledge,

Â our prior knowledge. And so, the red curve is the appropriate

Â mathematical compromise between these two opposing positions.

Â And, in this case, let's say, you had a prior belief that prevalence of

Â hypertension was very low, you thought it was on the order of 0.1.

Â Your data says, no, no, no. It's very high.

Â It's on the order of 0.65, right? And so, your likelihood is that compromise

Â to say, well, your data has moved me very far away from my prior towards the MLE of

Â 0.65 and that's how the mathematics works out.

Â And if, as n goes to infinity, this green curve, the likelihood, will get more and

Â more peaked around whatever the true value is, and it'll just grab this red curve and

Â pull it increasingly towards it. So, what happens in politics, for example?

Â Well, people have their blue curve is very spiked, right?

Â They're dead set in their opinions and no amount of data is going to move them off

Â of it. So here, is an example where I have alpha

Â = 100 and beta = 100. What happens then?

Â Alpha and beta are equal so the beta distribution centered at exactly 0.5.

Â But, as alpha and beta goes to infinity, the variance of the beta distribution gets

Â really small so our prior, we're quite sure according to our prior, that beta is

Â exactly 0.5. So then, we collect our data, and it says,

Â ehh, I don't think so. Beta is not 0.5.

Â It's somewhere above 0.6 more likely, Right?

Â And then, what happens to our posterior? Our posterior says, well, I don't know.

Â You were very sure. I'm not,

Â I'm going to kind of ignore the data because of how sure you were.

Â So, this is, of course, the problem with extremely informative priors, right?

Â No amount of data is going to knock you off them.

Â Here, the red curve almost overlaps with the blue curve.

Â So, the red curve in the previous examples is the posterior.

Â The posterior is the distribution of the parameter, given the data.

Â In Bayesian statistics, that's everything. If you give someone the posterior, that's

Â it. You've given them everything, that, that's

Â the summary of evidence as far as the Bayesian is concerned.

Â But it's a curve, it's hard to work with. You can only look at it in graphs.

Â And then, if you have multiple dimensions, it gets even worse.

Â So, you know, we want to summarize it. Well, one way to summarize that curve is

Â by it's mean, Right? The associated mean, the posterior

Â mean. Another way to summarize it is by it's

Â variance, the posterior variance. But we might want something analogous to a

Â confidence interval, but a confidence interval is a frequentous property.

Â It talks about supposed fictitious repetitions of experiments, that's not

Â within the Bayesian ideology really. So, we need something that's analogous to

Â a confidence interval. For all likelihood, we had something that

Â was analogous to a confidence interval and we called it a likelihood interval.

Â So, the Bayesians created something and they called it credible interval.

Â The Bayesian credible interval is just an analog of a confidence interval.

Â So, in 95% creditable interval, a to b, Just satisfies that probability that the

Â parameter lies in that interval given the data is 95%.

Â Really simple. You know, if you believe in the Bayesian inference, higher values of

Â the posterior represent kind of better supported values of the parameter.

Â So, just like the likelihood, you're better off chopping off the posterior with

Â the horizontal line and figuring out exactly what values of a and b that

Â entails to force it to be at 95%,. And, that's called the highest posterior

Â density interval. And I have a picture here, where I kind of

Â do that. So, if you could imagine this horizontal

Â line, the red area would vary as we moved it up and down.

Â As we moved it down, the red area would get bigger and bigger. As we moved it up

Â and up, the red area would get smaller and smaller.

Â So, you want to keep moving that horizontal line up and down, until the red

Â area is exactly 95%,, right? And this is density, so that would be the area under

Â the curve is exactly 0.95. So, once you hit that perfect point where

Â it's exactly 0.95, and can see where it intersects the curve.

Â And then, drop down to the horizontal axis,

Â And those two points are your a and b. So, the probability of p lies between that

Â a and b is, of course, just the integral between those points, which is exactly the

Â red area. So, you wind up with a credible integral.

Â In this case, it works out to be 0.44 t 0.84 which should be no surprise.

Â And, and in r, you can do that with the binom package. In this case, binom.bayes

Â thirteen, twenty, thirteen successes, twenty trials.

Â And you have to do type equals highest, And that gives you the 95% credible

Â interval. And, it uses a Jeffrey's interval.

Â As I said earlier, Bayesian credible intervals,

Â Even though they are constructed using Bayesian thinking,

Â If you turn around and evaluate them with frequentist performance, they tend to

Â perform very well. Just like our Agresti-Coull interval which

Â wasn't exactly a Bayesian confidence interval but was close enough among

Â friends. That actually has much better performance

Â than the directly CLT constructed Wald interval.

Â The other thing I want to mention before I go through the final bit of this lecture

Â is that another way to create a confidence interval would be to pick a to be the

Â lower 2.5th percentile of the posterior distribution.

Â And pick b to be the 97.5th percentile of the posterior distribution.

Â And that would give you exactly a 95% interval, for example.

Â But, the posterior height of the lower point and the posterior height of the

Â upper point would be different. So that is potentially a problem.

Â On the other hand, if you do the HPD interval, you've got to vary this line.

Â You have to solve a root equation to obtain them.

Â So, it's a little bit annoying. And finding the percentile interval, the

Â so-called percentile interval, the lower 2.5 percentile and the upper 97.5th

Â percentile as an example to get a 95% creditable interval is very easy.

Â So, another way to construct a Bayesian credible interval is just to take the

Â lower and upper percentile and run with it that way.

Â I think you're better off doing the HPD interval if you can.

Â So, I want to end with one nice aspect of the Bayesian credible interval, if you're

Â hardcore about these things. So, let me just say for a minute about

Â what I mean by being hardcore. So, probably many of you have taken an

Â introductory statistics class. And probably many of you have seen the

Â baffling interpretation associated with frequentist confidence intervals presented

Â as a test question. And that is just, you know, kind of

Â hard-ball frequentist. And it's accurate, you know, I don't want

Â to criticize it, it's accurate. And sob here's an example.

Â We have a Wald interval, it works out to be 0.44 to 0.86. And let's assume that the

Â 95% coverage of the Wald interval is good enough.

Â The CLT is kicked in, in this case, and we're fine.

Â And we're not worried about the mathematical performance of the confidence

Â interval. We're, we're interested in the, just the

Â strict interpretation assuming that the coverage is correct.

Â Then, the fuzzy interpretation is that we're 95% confident that p lies between

Â 0.44 to 0.86. But, that's not the actual interpretation.

Â The actual interpretation is the interval 0.44 to 0.86 was constructed such that in

Â repeated independent experiments, 95% of the intervals obtained would contain p.

Â That's the actual confidence interval interpretation.

Â It's this idea, performance frequentist refers to frequency ie.,

Â The definition of probability of it being entirely entwined with fictitious

Â repetitions of experiments. Or, you know, lifetime batting averages for success

Â probabilities and that sort of thing. That's what frequency interpretation is

Â the, the actual interpretation is almost exactly no one interprets frequentist

Â confidence interval this way because it's such a mouthful.

Â Everyone is kind of thinks, well, My interval 0.44 to 0.86 is a interval

Â that accounts for uncertainty at a, kind of, control rate of about 95%, where that

Â control rate has a contextual meaning with respect to frequentist statistics. And,

Â and I understand that, but I don't spit it out every time I interpret the confidence

Â interval. Every now and then, a confidence interval

Â makes its way into the news, and news people never interpret it right because

Â it's hard to interpret. So, a likelihood interval, let's go on to

Â the next one. The likelihood interval was 0.42 to 0.84,

Â the 1/8th likelihood interval. And, in the fuzzy interpretation for the

Â likelihood interval was that the interval 0.42 to 0.84 represents plausible values

Â of p. Here, plausible defined by the eight fold likelihood ratio associated with the

Â end points, relative to the MLE. So, yeah, that's okay.

Â And so, the fuzzy interpretation is okay, it's no worse than the frequentist fuzzy

Â interpretation. But the actual interpretation, let's go

Â through at the interval 0.42 to 0.84 represents plausible values for p.

Â In the sense, that for each point in the interval there is no other point that is

Â more than eight times better supported given the data.

Â Again, yikes. You know, this is a mouthful and, you

Â know, anyone who constructs a likelihood interval is not going to interpret that

Â way. They're going to say,

Â You know, it's an interval, it accounts for uncertainty, it's based on the

Â likelihood, the calibration is based on sort of eightfold likelihood ratios, and I

Â understand what it means, but I don't spit it out every time I use the interval.

Â The nice thing about the Bayesian interval is that you can spit out the actual

Â interpretation every single time you use it because the interpretation's very easy.

Â So, the Jeffrey's 95% credible interval was point 44 to point 84.

Â The actual interpretation is the probability that p lies between 0.44 and

Â 0.84 is 95% full stop. So,

Â That's super easy. Now, there's a lot loaded in this word

Â probability here because it's the Bayesian version of the word probability that maybe

Â not everyone would like to agree with, And not everyone would like something

Â that's more objective, or something like that.

Â But nonetheless, if you're willing to buy into the Bayesian way of thinking, the

Â simple interpretation of the credible intervals is quite nice. And this

Â interpretation is, you know, if you see a confidence interval in the news or if you

Â present a confidence interval to people who have just a little bit of statistics,

Â this is how they want to interpret confidence intervals.

Â And you can't say this statement for a frequentist interval,

Â