In this video, we will see that there are many problems with Bayesian inference.

Let's see what happens. Here's our base formula.

We have the likelihood times the prior over the evidence.

However, what is the evidence?

Imagine that they are working with images.

For example, working with images of Van Gogh.

You will have, for example,

a Starry night, the Starry night over the Rhone.

And if you model the probability of these images,

you would also be able to draw new paintings that Van Gogh could have drawn.

And so modelling this distribution is usually really hard.

And in this module,

we'll try to come up with ideas how we can avoid computing the evidence.

It is called a Maximum a posteriori principle.

We try to find the value of the parameters that maximizes the posterior probability.

If we apply the base formula,

we will see that on the posterior probability equals to

the product of the likelihood on the prior over the evidence.

However, note that evidence does not depend on theta.

And so we can remove it.

And so, by computing the Maximum a posteriori,

we avoid conputing the evidence.

Also, it is an optimization problem,

and it can be done efficiently using numerical methods.

However, Maximum a posteriori estimation has a lot of problems.

And the major one is that it is not invariant to reparametrization.

If for example we had a Gaussian distribution,

and then you apply the sigmoid function to the random variable,

you get a blue distribution.

Note that the position of the maximum changed,

and this is a really big problem.

Next problem is that we can't use as a prior.

If we try to apply the Maximum a posteriori estimation as a prior to the next step,

we will get again a delta function of the maximum a posteriori estimation.

And so we'll not get any new information.

Another problem is that Maximum a posteriori estimation is actually untypical point.

Since there are may be not enough probus dense around it.

It also equals to the result of the minimization of

the indicator that you do not end up in the true ideal parameter value.

We could use either an object of functions.

For example, the squared error or the absolute error.

Those would lead to the mean of the posterior distribution or the median.

However, you will choose those functions.

We will have to estimate the evidence,

and this is exactly what we try to avoid.

Finally, we can't compute credible equations.

If you estimate the maximum probability theta to be equal to 12.53,

we cannot say how confident should we be about this prediction.

It can be 12.53 plus minus 1000,

as well plus minus 1000.

So to summarize it, it is really easy to compute the Maximum a posteriori estimation.

However, it has a lot of problems.

It is not invariant to reparametrization.

We can't use it as a prior.

It is also an untypical point.

And finally, we can't compute credible regions.

And in the next video,

we will see another approach to avoid the evidence.

It is called the conjugate distributions.