We described the belief propagation algorithm as passing message over cluster graph but we left unspecified how that cluster graph would be constructed and what properties it needs to satisfy in order to support reasonable message passing. So let's remind ourselves what cluster graphs are. A cluster graph is an undirected graph whose nodes are clusters that involve subsets of variables and edges involve a subset SIJ, which is a subset of the two clusters on the twin points. What are some properties that the cluster graph has to satisfy? The first property is the one called family preservation an obvious value. where remember that we need, given a set of factors phi, we need to be able to assign each phi K to some cluster, C alpha K, such that the fact, such that the cluster can accommodate. The scope. So can accommodate 5K. oops accommodate.. Well, in order for the cluster graph to allow this to be done, we need to have an appropriate cluster in the cluster graph. So, this imposes a constraint on the cluster graph that says that for every factor phi K in my set of factors phi, there exists some cluster CI that, such that CI accommodates phi K, which means that you can put phi k inside this cluster. That's the family preservation property. The second property is a little bit trickier to understand. It's called the Running Intersection Property. The Running Intersection Property, let's first read the definition and then understand what it says. It says that let's assume that we have a pair of clusters CI and CJ and a variable X that sub, that belongs to both of them. So for example we might have the variable X sitting here in C7, and we might have the variable X sitting here, in C5. This property says that there exists a, exists a unique path between Ci and Cj for which all clusters and subsets along the path contain x. What does that mean? It means that for example should I choose this to be my unique path. It means that X needs to sit here, here and here. So, there is a connecting path that involves X and that path is along the entire route and that path is unique. So let's try and understand the intuition between both sides of this definition, the existence and the uniqueness. Imagine that I, let's do the existence first. And imagine that for whatever reason I decide that there is not going to be a path over here. So there is no way for C7 to communicate to C3 about the variable X. Well, that means that we now have these two separate, isolated communities, each of which has some information about X and they can never talk to each other about X. So they're never going to get to agree about X, there's never going to be any information transfer, of these two pieces of information. Well, so that's not very good, which is why we need, that path to exist. Okay, so that's the exists part. What about the unique part? The unique part is a little bit trickier to understand but let's but let's try and provide some intuition for it. Let's imagine that I have two paths that involved X. Let's so now let's think about this message passing algorithm. So C3 can send the message. C5 with information about X. For example, I think X is taking the value one, I have a, I have a factor that suggests that X takes the value 1. Well, C5, you know, integrates that with it's own information and sends it on to C2. Which sends it back to C3. And now C3 hears from C2 that ooh, X needs to take the value one. Huh? That reinforces its beliefs that X needs to take the value one and so the probability goes up. It now goes back and sends that information on to C5, which sends it on to C2, which sends it back to C3, and then once again the probability in X taking the value 1 goes up. And so we have this sort of self-reinforcing loop that's going to give rise to very extreme and very skewed probabilities in many examples. And so a way of avoiding that is to or at least reducing that risk is to is to prevent these kinds of feedback loops. Now, it's important to realize that by preventing these loops that only reduces opposed to eliminates the problem. And specifically this is kind of a little bit of a digression but it's important to know, is that, imagine for example that we have X, and Y, that are very strongly correlated. So here, we have for example X and Y and here we have a path. That involves Y. Okay? So now what's going to happen is that C3 sends information to C5 about X. C5, translates that to information about Y, Y should take the value one. That information about Y goes back to C3 and increases the probability in X taking the value one. Now this is a little bit of a forward looking hint about some of the issues that we'll see with belief propagation later on, which is that belief propagation does very poorly. When we have strong correlations. And that's because of these feedback loops. The more skewed the probabilities in your graphical model, the harder time belief propagation has in terms of the results that it gets. So, with that digression, aside, let's go back to the running interception property, and let's provide an alternative definition of the running interception property, just to give ourselves some additional intuition. The running interception property is equivalent to saying that for any X. The set of clusters and subsets that contain X form a tree, so for example if we have X here, here, here, here, here and here. We can see that the set of a cluster is in sub sets that contain x form a tree. It has to be connected, because of the existence of the path. And it can't be, a non tree. Because, because that would give us two different paths. And so that's a different view of this. And so you can think of each variable inducing its own little tree. across which information about that variable flows in the graph Now, let's go back with that definition and consider some cluster graphs that we might adopt for for this example that we've seen before. So, here we have our five clusters so let's check whether it satisfies the running interception property. We've already done family preservation for a particular set of factors. So, let's consider for example the variable, B, and we can see that we have B here, here, here, here, here, here and here, and we can see that, that is a tree. Here's my tree. Note, very carefully, that B isn't here. If B were here that wouldn't satisfy the running intersection property. That would be an illegal cluster graph, that's actually on the next, on a subsequent slide. Here is an illegal cluster graph. It violates the running intersection property Y. Because here is B, and here is a bunch of other B's and there's no way to connect cluster two to any of the others. So this one violates the existance. This one, which we just talked about, violates the uniqueness because here we have the, all of the clusters and subsets that involve b, and there's this nice little loop over here that has two paths between say, cluster one and cluster four. If we wanted to take the cluster graph and still allow communication between B and C, so we want to have, for example we want cluster one and cluster two to be able to transmit to each other information about how B and C are correlated with each other, which they can't do in this graph over here. So, one way to do that is to we have, we have eliminated in this case, this edge that we have her that involves B, and now once again we have B. Being a tree in the graph now focused on be here but its not difficult to convince yourself that other that other nodes also satisfied that we satisfy the running intersection property also with respect to other variables so just as an example here is a set of clusters and subsets involved in D and too is a tree and here is the one's involving E and that too is a tree and so on. So and running intersection property needs to hold for every for every one of the variables. How do we construct a cluster graph that has, that has a desired properties. One very simple and in some ways degenerate but still because of its simplicity very often used is a cluster graph called the beta cluster graph. And that's a term from statistical physics where people use this kind of approximation to energy functions in in some, in some calculations, in, in Statistical Physics. So here, we have in the beta cluster graph, we have two types of clusters. We have big clusters and little clusters. The big clusters, these are the big clusters correspond, to factors in phi. So for each phi k, we have a cluster, a factor cluster whose scope is exactly the scope of phi k, the big clusters. The little clusters correspond into individual variables. So for each Xi, we have a cluster whose scope is just the single variable Xi itself. Now we're going to connect Ck to Xi exactly when Xi is a subset of Ck, is a member of Ck. So if we consider the set of factors that we had before, we can now produce a different cluster graph then one that we had previously. This is a beta cluster graph. Notice that we have these big clusters. These are these four, I'm sorry, these five. And we have these, little clusters corresponding to A, B, C, D, E, and f. And we can see that we have an edge for example, between the ABC cluster and the A cluster, the B cluster, and the C cluster. And that's how messages are passed. And you can see how this graph is degenerate in the sense that it only allows information about Singleton's to be passed. It loses, in every message passing step, any information about the correlation between variables. But, nevertheless, it's simple to construct and it's guaranteed to satisfy the running interception property. Why is that? So let's consider for example, all of the factors that involve the variable, umm, D. So here is, what are the clusters that involve the variable D? Well, there's this one. And then there's this, that, that. Those are the set-, where we have the subsets, which I didn't mark. And then these ones. And notice that it's a tree by definition. Because d doesn't appear on any other subset, except these ones. And so the trees is a star. It's the variable cluster plus the big factors that contain the variable. So d, and, in this case, the three clusters that contain it. Look to summarize. We have, we know, we looked at the properties of the coaster graph, we see there is two of them that it needs to satisfy, the first is family preservation which allows the set of factors to be encoded and the second is the running intersection property which serves two purposes, the first is to connect all information of any single variables that it can all be transmitted through the graph but without creating type feedback loops that are going to create the self reinforcement and highly inaccurate answers. And the beta cluster graph is often the first default that people use. Because it's easy to define. And it's guaranteed to be correct. But richer cluster graph structures of the type that we talked about can offer very different and sometimes significantly better trade-offs. With respect to, on the one hand, the computational cost. And on the other hand. which of course, as you start increasing the amount of, of the sizes of the messages that are passed. that can grow more expensive. But at the same time, allow us to improve. dip the preservation of dependencies as messages are passed in the graph so that more information is actually kept and not lost in this message passing process.