Suppose we're going to flip coin ten times and count the number of heads we see. We're thinking about this in advance of actually doing it, so we're interested in the predictive distribution. How many heads do we predict we're going to see? This, of course, will depend on the coin itself. What's the probability that it shows up heads? So, we'll need to choose a prior. In terms of the number of heads, I can write X for the number of heads, as X being the sum, as I goes from 1 to 10, of y sub i, where y is each individual coin flip, Y1 through Y10. If we think that all possible coins or all possible probabilities are equally likely, then we can put a prior for theta that's flat over the interval from 0 to 1. So now we can ask what's our predictive distribution for the number of heads we'll see? X can take possible values 0, 1, 2, 3, up to 10. The predictive distribution, f(x), is going to be the integral of the likelihood, f(x) given theta times the prior f of theta d theta. So writing this out, we have a binomial likelihood, so we're going to integrate theta from 0 to 1. Binomial likelihood is 10 factorial over x factorial 10- x factorial times theta to the x, 1- theta to the 10- x times our prior, 1 d theta. Note that because we're interested in x at the end, it's important that we distinguish between a binomial density and a Bernoulli density. So here we just care about the total count rather than the exact ordering which would be Bernoulli's. For most of the analyses we're doing, where we're interested in theta rather than x, the binomial and the Bernoulli are interchangeable because the part in here that depends on theta is the same. But here we hear about x for a predicted distribution so we do need to specify that we're looking at a binomial because we're looking head counts. All right, how do we simplify this? This is an integral that might be sort of difficult to see immediately how to do. Let's recall some interesting pieces of math. First recall that for an integer n we can write n factorial is equal to gamma of n + 1. The gamma function is a generalization of the factorial function and can be used for non-integers. The other thing is that of z follows a beta distribution with parameters alpha and beta, then the density for z, f(z), we can write as gamma of alpha + beta over gamma of alpha times gamma of beta times z to the alpha- 1, times 1- z to the beta- 1. All right, so we're interested in a beta distribution here because we can simplify this piece to look like a beta distribution or beta density. So let's do a little simplification. We write this 10 factorial as gamma of 11, and then write that as gamma of x + 1 times gamma of 11- x, and then theta to the x + 1- 1 times 1- theta to the 11- x- 1, d theta. This may not looked that much simpler, but it is closer to the beta density. Now if I just rearrange a little bit with the gamma terms, I can pull out this gamma of 11 and replace it with a gamma of 12. Now what I have inside the integral here is exactly a beta density, where the alpha parameter is x + 1 and the beta parameter is 11- x. Because it's a beta density, we know all densities integrate up to 1, so this whole integral here is just 1. So we can write this as gamma of 11 over gamma of 12 times 1. Much easier way to do integrals. And then switching back to a factorial notation, gamma of 11 is 10 factorial, gamma of 12 is 11 factorial. So this whole thing simplifies to 1/11. For possible x values of 0, 1, 2 and so on up to 10. Thus we see that if we start with a uniform prior, we then end up with a discrete uniform predictive density for X. If all possible coins or all possible probabilities are equally likely, then all possible X outcomes are equally likely.