So, it took us on entire section of this talk to formulate a problem for Distance-Based Phylogeny Construction. But what algorithm should we use to solve this problem? Here's an idea. If we look at the tree that we know fits our toy distance matrix, we can see that the minimum element of this matrix corresponds to two leaves that are next to each other on the tree. Right? Here, the smallest element corresponds to seal and whale, and seal and whale are next to each other in the tree. We call such leaves "neighbors". So, formally this means that they share the same "parent node", where a parent is the unique node that a leaf is connected to. Of course, more than two leaves could share the same parent. But the good news is that we can always find a pair of neighboring leaves in the tree. Now, let's pretend that we don't know the tree that fits the distance matrix, and see if we can use the fact, that seal and whale are neighbors in order to reconstruct the tree. The first question that we have is how to compute the distances that are represented by the question marks? Because the distance between the seal and whale is 2, of course, we know that the sum of the question marks is equal to 2. But that's all we know right now, without using more information from the distance matrix. So, why don't we take a step back, and think about a tree with neighboring leaves i and j, that share a parent m. We're essentially trying to reconstruct the blue and purple distances. But if k is some other leaf in the tree, we're going to see that the red distance will help us out. So, I want you to notice, that red plus purple is the distance from i to k. Red plus blue is the distance from j to k. And blue plus purple is the distance from i to j. I also want you to notice that the following equation holds. The expression on the right is just a fancy way of writing d_k,m. Why is this? Because the blue terms will cancel out with the minus sign, and the same thing for the purple terms. And then there are two red terms remaining with the division by two. So, I hope you're wondering, why would I write d_k,m in this weird way? Well, remember that red plus purple was the distance from i to k. Red plus blue was the distance from j to k. And blue plus purple was the distance from i to j. And this is important because when we're working with a distance matrix, we don't know, a priori, what the weights of the internal edges are. We just know the distances between leaves. But the expression on the right side of this equation is now written in terms of distances between leaves, so we can substitute a lower case d here, for an upper case D. We therefore now have a formula for the distance from i to its parent node m, which is just the distance from i to k, minus the distance from k to m, which we just derived a formula for. Combining terms gives us that the distance from i to m is equal to the distance from i to k, plus the distance from i to j, minus the distance from j to k, all divided by 2. An analogous formula holds for the distance from j to m. Now, please remember here that our choice of k was completely arbitrary. It was just any leaf other than the neighboring leaves i and j. So, if you know that i and j are neighbors, you can compute the distance from them to their parent, just from the distance matrix alone. So, let's take this formula for the distance from a leaf to its parent, and return to where we were in our attempt to reconstruct the tree that fits our distance matrix. We know, that the sum of the two question marks is equal to 2. But now we can use our formula. Let m denote, the parent of neighbors seal and whale. Let's first replace i with seal in our equation. j is the neighbor of seal, right? So let's replace j, with Whale. Now, what about k? If you remember that I said k could be any other leaf in the tree that we choose, so we could choose chimp, or we could choose human. Let's choose chimp, OK, we don't know where chimp is in the tree yet, which is why I'm showing the dashed line here connecting it to m. Now we can just plug in the values, from the distance matrix, so, the distance from seal to chimp is 6. The distance from seal to whale is 2, and the distance from whale to chimp is 4. So the distance from seal to its parent, is 4 divided by 2, or 2. So, now that we know that, we can now fill in that, because the distance, from seal to whale is 2, and the distance from seal to it's parent is 2, we know that the distance from whale to m is now automatically equal to zero. The question, though is "what do we do now"? Well, because we know the distance from seal to chimp, which was 6. We know that, from the distance matrix. And we now know that the distance from seal to m is 2, we can simply infer that the distance from chimp to m must be equal to 4. Similar reasoning tells us that the distance from m to human, wherever human might be in the tree, is equal to 5. This allows us to add a new row and column to the distance matrix, corresponding to m. And because we have essentially already added seal and whale to the tree, we can now ignore these columns. In fact, let's just get rid of them completely. Getting rid of seal and whale entirely yields a smaller 3 x 3 matrix. So, we have added two neighbors to the tree and reduced the size of the distance matrix by one row and column, by simply recomputing the distance from the parent of these neighbors, which we said was m, to the other nodes in the tree. Now we just apply the same rules recursively. So, note that the minimum value of the matrix is 3. So, chimp and human must be neighbors. So, let's call their parent a. Using the formula that we had before for the distance from a node to its parent, we're going to have that the distance, from chimp to a, is the distance from chimp to m, plus the distance from chimp to human, minus the distance from human to m, all divided by 2. This gives us, when we plug in the values, 2 divided by 2, or 1. Since the distance from chimp to human is 3, we can now conclude that the distance, from human to a must be 2, automatically. And now we can conclude that the distance from a to m, must be equal to 3, as well. Because we knew, that the distance from chimp to m was equal to 4. And we just computed that the distance from chimp to a was 1. So, the remaining edge must be three. And that's the simple tree that I showed you before that fits our original matrix. So, the one thing I would like to point out though, is that it's possible when we're reconstructing the tree, an internal edge, to have weight zero, right? We didn't know that this edge had to have weight 3. We could have had some other distance matrix, where it had weight zero. In this case, rather than carrying along a weight zero internal edge. and by "internal edge", I mean an edge that connects two internal nodes, we would simply compress a and m into a single node. Now, we don't do this for whale, connected to m, because we can't make whale an internal node of our tree, because it's a present day species. So, before we move on, I'd just like to have you try this method for yourself. So, here's a sample 4 x 4 distance matrix. What's the tree that fits this distance matrix? If you somehow get stuck along your way of trying to reconstruct this, I'll see you in the next section.