So now, we're going to look more closely into the belief propagation algorithm and understands some of the important properties that it has. So here's our example, cluster graph with the four clusters over the loop. And now, let's remind ourselves that the cluster beliefs are defined to be the product of the clust, of the factors assigned to the to the clusters of psi one up to psi four times the product of the messages that were incoming into the clusters. So for example, the beliefs over cluster one is psi one of a b times the message coming in from cluster four, whose scope is a, and the message coming in from cluster two, whose scope is b. Now a cluster graph is said to be calibrated. If the clusters agree with each other, so specifically if in terms of their beliefs beta I, if we were to ask cluster one what it thinks about say variable B, and we were to ask cluster two what it thinks about the variable B, they would agree with each other. Formally, this says that if we marginalize out the belief that say cluster I and, marginalize out the beliefs in cluster J and we ask them what they agree about their joint subset. S I J, they would agree with each other in terms of the marginal beliefs. Now, an important property of cluster graph belief propagation is that convergence of the belief propagation algorithm implies calibration. So, to understand why that's the case, let's go through a simple derivation. So the convergence of belief propagation occurs when the message at the next time step equals the message at the previous time step. So now, let's see what the implications of that are. So if this is the, message it's the next time step. It's computed. As the product of the. psi Is, which are the factors assigned to the cluster I times all of the messages other, except for. cluster J. And those are all multiplied together, and marginalized out over the subset. So now let's remind ourselves of what the beliefs would be at this point in the process if we were to compute them. And that's derived from this expression over here. So this is the same psi I that we had over here. And this is the product of all messages. So if this is, if, if here we have the product of psi I, and all messages except one and here we have psi I and all messages we can equally well, rewrite it, in this form over here where we multiply in all the messages, and the divide out, the one that wasn't included, delta JI. So now, because of convergence we have the this is equal to this is equal to delta IJ because of this equality over here. And so if we rewrite that, we can see that we have that delta IJ times delta JI is equal to this summation over here. Because we can multiply delta JI on the right-hand side by delta IJ on the left-hand side. So we have shown that delta JI times Delta IJ is, is effectively the marginal over the beliefs at that point in the process. But we can equally well, using the identical argument, show that Delta JI times Delta IJ is also the marginal over the beliefs for cluster J. So if is the beliefs for cluster I, this is the beliefs for cluster j, and so and that we've shown that in both cases. The marginals are equal to the same expression, which is the product of the messages on both sides of the link. And so, because both expressions are equal to the same thing, they must be equal to each other. And this is exactly the calibration property that we were trying to prove. So we've shown the convergence of the belief propagation algorithm implies calibration. This expression, which corresponds to the, subset beliefs. because it's the marginal over the cluster beliefs, is called u I j, and it's an expression that we'll use, a little bit later. In particular, one of the important properties that we get by putting these pieces together is another property called reparameterization. Now, that's a bit of a mouthful. So, let's first do the derivation. And then understand what the word means. so remember that, as we run belief propagation. At the end of it, we have these, beliefs over the clusters. Which are defined in this expression. So specifically, for example, we have for cluster one we would have psi one times delta four one times delta two one. And similarly for the other clusters. We've also just now defined these clusters, these subset beliefs. Which are, which we've just shown are a product of two messages on the two different dir going in each direction. So, specifically here we would have for example that mu of one two is equal to delta one two times delta two one. Now if we look at this graph with all these all these little factors attached to the clusters and the subsets. We see that there's a lot of repetition here. So delta four one appears for example, here, but it also appears there. And in fact, if we look at it, we see that each of these expressions appears, each of these message expressions appears exactly twice. So delta four one appears here and here. And for example, delta one two appears here and and here. And, so, if we write all of these together in in a different form. We can see that we, can, if we multiply all of the beliefs, and divide by all of the subsets. We would end up with multiplying in delta four one for example on the on the belief side but then canceling it out when we divide by mu 1,4. S, and, similarly, delta one two would be multiplied in, on. In terms of theta two but then cancelling out in the denominator in U one two. So if we, want to drade this in a little bit of a broader, setting, so if we have a product of the beliefs over here, multiplied, and then divide by a product of all of the sub sets, we can, see that this, numerator is equal to the product over I, of, psi I, times the product of all thee, incoming messages into cleek I. The denominator is simply equal, to, the product of all, of, the messages on, each direction. And we can see that here we're just counting the same message in two different ways. and so each message appears once in the numerator and once in the denominator, which means that they all cancel with each other. And so what we end up with is the product of all of the initial potentials of psi I. And that product is simply the unnormalized measure. and so. The implication of this is that this expression over here. This ratio is simply a different set of parameters that captures the original un-normalized measure that defined our distribution. And so we haven't lost information. No information loss as, as a result of the belief propagation algorithm. The representation of the un-normalized measure is still there, just using a different set of parameters. Specifically the cluster subsets, the cluster beliefs and the subset beliefs. So to summarize, we see the add convergence of belief propagation, the cluster graph beliefs all agree with each other on the variables that are shared among them. And, as a consequence of that, the cluster graphs beliefs are simply an alternative parameterization of the original un normalized density, but one that has this nice calibration property which allows us to ready off information about a variable from any clique in which it appears. And so, we have reparameterized the original distribution into a more convenient and easily usable form.