So previously we defined the Clique Tree Algorithm which is just a belief propagation algorithm run over a graph that happens to be a tree. As it turns out we can exploit the aspects of the tree to greatly improve the computational behavior of this algorithm, not just it's correctness which is what we showed before. So let's go back for a message passing in this very simple tree which has just these four clusters A, B, B, C, C, D, and E and let's see what the messages do. So here is the message delta one two which is the message that past on this edge and here's a message delta two three, and so on. And these are the six messages, that we have. And now we can notice one very important property. Delta one, two once computed, never changes. You get instant convergence of Delta one two, once you complete it for the very first time it is convergence. Now what about delta 2,3. Well that depends on when delta 2,3 is passed. If delta 2,3 is passed first before it gets before cluster two gets a message from cluster one then once it gets that message things might change but if cluster two is clever it's going to wait and it waits until that message is passed and once that message is passed it doesn't change delta 2,3 doesn't change either ever. If its smart enough to wait. What about delta three four? Delta three four needs to wait a little bit longer because it needs to wait for delta two three. But if it waits for both delta one two to be passed to cluster two and for delta three to be passed to cluster three, then now it's now that it's sent now that it sends delta three four, delta three four doesn't change either. So delta two three has to wait. Or, delta one two and delta three four has to wait for delta two three but assuming it waits then convergence is achieved in a single message passing style. Now there is nothing magical about going left to right if you go right to left you get the exact same behavior, so here you have the this one converges instantly. This one you need to wait or delta four three. And this one you need to wait or delta B2. But again, once assuming you wait then, the messages converge in a single step, and so we can compute all of the messages in this entire tree, in, one paths in either direction, one paths from left to right, and one paths from right to left. In the context of chain structures, like this one, this is actually, this actually has its own name, so for chains. You might see this under the name forward backward algorithm. And is very commonly used in, things like the hidden Markov model and and other similar chain structured, representations. And so the point is, the total effort that we had to compute all of these messages is just one message passing step in each direction. one from left to right, and one from right to left. And the result of that, remember that once all messages have been passed, we have beliefs. That are correct, beliefs that represent better marginals of P tilde Pi. And so we've computed all the marginals, of all the clusters in this graph in one in, in, from two sweeps over this cluster, over this clique tree. And, so more generally as we said, once CI receives a final message from all of its neighbors, except for CJ. If it waits, to send, if it waits to send the message to J until it receives a message from everybody else, than that message delta IJ is also final. So if these are final, this is also final. Can you always find a message passing order that will achieve that goal I mean maybe you kind of everybody ends up waiting for everybody else and they all get stuck and they never send anybody any messages. The point is that you could always start, because the message from a leaf is always immediately final. And once the message from a leaf is finalized, then you can send the message from the parents of the leaf, and so on and so forth. And because this is a tree, it's guaranteed that you will be able to find a legal message passing order. And if we pass the messages in the right order, then we only need to pass 2K minus 1 messages, where K is the total number of clusters. And why is that? Because if you have a tree over K nodes, in this case K clusters, then there's K minus one edges. And since each edge has a message in both directions, so we have, we have K minus one messages. I'm sorry, K minus one edges. And therefore two A minus one messages, one in each direction. Let's look at a couple of message passing orders to see which ones are going to work and which ones aren't. So we can start with any leaf, any leaf works so say we start with oh, I don't know C two? Once C two past its message what other clut, what other clique can pass this message. Well see we can't pass messages yet because its still waiting for messages from all sorts of other guys so we have to go to another leaf for example C1, C1 can pass a message now. Neither C3, nor C4 are are candidates for message passing yet cause each of them is still waiting for another message. But C6 can pass a message and now we have the option of activating C4, C4 can now send a message to C3, because we've got everything except the message from c3 before. C7 can say pass a message to C3 and you notice there is a lot of arbitrary decisions on how I ordered this I could have used C five next. And now C three has the option of passing a message to C5. And notice that at this point, everyone has received all of the messages, or all of the messages but one. And so now we can start passing in the other direction. So C5 can pass a message to c3 C3 can pass a message to any of C2, C7, or C4. And at this point, C4 can pass every, any, can pass a message to both C6 and C1. What is an example of an illegal message passing order? So for example if c1 passes the message to C4 and C4 jumps the gun and says yey I'm going to send the message to C3 next that is an illegal message passing order because C4 does not yet have all the information that it needs to pass the message to C3, it's still waiting for a message from C6. So that's not a good ordering. Clique trees have other very elegant computation properties that make them a useful data structure. So, let's think about the kind of queries that one can answer using clique trees and some of these are fairly obvious, but others are a little bit more subtle. So, imagine that we wanted to answer a posterior distribution query on either a single variable or a set of variables that appear together in a clique. Well, in this case you can take any clique that contains all of those variables. And sum out the variable that you don't care about from the clique to get a posterior over just those variables. Now remember that, that posterior is P tilda Pi, it's an unnormalized measure, so in order to get the normalized measure you actually need to renormalize. Now, you could also do something a little bit more clever. You could say, you wanted to introduce new evidence, regarding some variable and query another variable. So this is a form of incremental inference. Where you've already calibrated the cligue tree and you say now oh, wait a minute I made another observation let's see what happens now. So it turns out that clique trees are really good for that. and so we're going to divide this into two cases. The easy case and the slightly less easy case. So, imagine that we want that question to query a variable X that appeared in the clique with my new evidency. So here is my clique, and here is z and here is X, and I'm going to, I want to figure out what the posterior of X would be if I further observe Z. Well this is actually really easy because what I have here is P tilde Pi. lets assume for the moment just Z and X if there's other variables there its not it doesn't change anything because we can just get rid of them we can now condition or reduce the clique so this is a cligue reduction by just restricting attention to the entry's that are consistent with my evidence to be. And then, I end up with P tilde Pi, of little V,X and in order to get a posterior, I can sum out the relevant variables in the normalized the incremental inference. Well that's the easy case if the two variables are together in a clique. what about a somewhat more elaborate case where I introduce the new evidence and I query a variable that doesn't appear in the same variable, in the same clique as Z. So using an example, let's imagine that what I did is observe evidence on A, and I want to query, say E. Actually, let's do a different example. Let's imagine that I want to query B. Well, in this case I can do the following. I can multiply the this indicator function or what we call this reduction of the factor into a cluster that contains the evidence. In this case, we're going to multiply in the indicator function of A equals little A. Which corresponds to removing all entries in cluster one, in the beliefs of cluster one that are not consistent with with multiply it into the side one, removing all entries that are inconsistent with that. So now, what happens when we change psi one. Imagine that we're doing this entire computation from scratch. That is, forget everything that we did before. And think what would happen to this model in the hypothetical case that we were passing messages with a new Psy one instead of the old si one. Well, some messages would change. Which messages would change? Delta 1,2 would change, because Psy one changed. Delta two three would change because because delta two one changed. delta three four would change, but we don't care about delta three four, if we're interested in this belief over here. What about other messages? Delta 4-3 doesn't change, eeither does delta three two. Neither, neither hid delta two one, and so only a sub set of the messages in the clique tree changs as a result of this message of, of, this of this change, which means that we can reuse, at least half the messages if not more, in computing, are beliefs about in this case cluster three. And the only messages that we need to recompute are the messages that are along the path through a clique that contains the variable X that we want to [INAUDIBLE] which in this case is cluster three. So to summarize if we have a cliqque tree with K cliques and we pass messages in a careful order which means that we start at the leaf and propagate inwards then we can construct and ordering subset 2K minus one that's to justify compute all of the beliefs in in the in the clique tree which means we get a posterior over every single variable in the model. Contrast that with the computational cost of running variable elimination and times to compute posteriors and n different variables. Here, you get a considerable computational savings. In fact a, a, a computational savings of order, order of the number of variables. And so what we can do is we can compute all the marginals at only twice the cost of variable elimination as opposed to N times the cost of variable elmination. And we've also shown that by caching, restoring the messages, we can reuse inference for example, in the context of incremental queries. Which provides us with the useful data structure to keep updating, as we obtain additional evidence.