[MUSIC] We are now ready to learn how to search for longest paths, not only in simple Manhattan line grids but in arbitrary alignment graphs. Here's an example of a graph. And, in this case, there is only one way to move to these three nodes from the source. And there is also one way to move to this node from the source. But when we move to this node, shown by the question mark, there are actually four different ways to move to this node. Which one should we choose? In this case, we actually have four choices. And as before, we choose the variant which provides the longest path. In this case, this path is the optimal one. And we know that the optimal path from zero to the shown node has length 10. We continue doing this and in the end, we will construct this graph. with every note marked by the length of the longest pass and we will learn that the length of the longest pass from the source to the send is 29. And as before, we of course preserved the backtracking pointers and that's why we will know how to move back in this graph and find the optimal path itself. As a result, we now know how to construct recurrency for the dynamic programming in alignment graph. If we define S(i,j) as the length of a longest path from (0, 0) to (i, j). Then in the alignment graph where we have diagonal edges, corresponding to the alignment gate, we have three choices. We can either compute S(i,j) as S(i-1) plus the weight of the vertical edge into node (i,j) or S(i-1,j) plus plus weight of the horizontal edge into (i,j) or S(i-1,j-1) plus the weight of the diagonal edge into (i,j). And as a result, our recurrency for the longest common subsequence problem becomes particularly simple, because the weight of all edges in the longest subsequence problem is zero. Except for the right edges, that correspend to matching letters. And therefore the recurrency is simply S(i-1, j) + 0 or s(i,j-1) + 0. We can drop 0, of course, from this equation, or s(i-1, j-1) + 1, in the case, i-th symbol in the first sequence equals to the j-th symbol in the second sequence. and as before we, of course, preserve the backtracking pointers for the longest common subsequence. And in this case, the red edge has weight 1, all the other edges have weight 0, and here are the backtracking pointers for our particular problem, finding longest common subsequence for these two strings. And we can move from the final destination backwards using backtracking pointers, and slowly but surely, we will arrive to the original node, initial node, and all passes that are traced by the blue edges, represent the solutions to the longest common subsequence problem. To compute backtracking pointers, we have three choices: We can either record a backtracking pointer as a vertical, horizontal, or diagonal edge. And as soon as the backtracking pointers are computed we can write a simple program that allows us to output the longest common subsequence by, once again, exploring three choices: green choice, blue choice and red choice. We have now learned that to compute a score at a given node, we need to consider all edges entering into this node. And the initial vertex of each such edge is called a predecessor of a given node. And therefore, the currency for computing the longest paths in an arbitrary graph will be the following one. Let's denote S(a) as the length of the longest path to node a, and to compute S(a), we need to compute the maximum among all predecessors of node A, and we take maximum amounts of value scorer of each predecessor plus weight of the edge from this predecessor to a given node. So let's try to apply this approach to a simple graph shown here. We start from the initial node, and of course, the length of the longest path to this node is 0, and we can compute two other scores for two other nodes, 4 and 6, and now, let's try to compute the score for the node shown by the question mark. There are two edges entering into this node. Score of one of them, of one predecessor is known as zero, but score of another predecessor is unknown, therefore before we compute the score of the node shown by the question mark, we need to compute one more score, which is this one. But to compute this score we need to compute score in another vertex, shown here. But to compute score of the vertex, we need to compute score for this one. And to compute score of for this one, we need to compute the score that we had been trying to compute some time ago. So we came to a vicious circle. It looks like a catch-22. We need to compute something that has not been computed yet, but to compute it, we need to compute something else again and again and again. Which tells us that the order in which we explore nodes of the graph is extremely important. By the time the note is analyzed, the score of all its predecessors should already have been computed. And if the graph has a directed cycle, like the graph you just looked at, there is simply no way to solve the problem. Because a traveler may travel along this directed cycle as long as he wants to. And therefore, they limit attention to so-called directed acyclic graphs, or DAGs, which are graphs without directed cycles. In this slide, Batman is looking at a directed acyclic graph. And is trying to solve a difficult dressing problem in the morning. And topological ordering of a graph is an ordering of nodes of the DAG on a line such that all edges in the graph go from left to right. There are different ways for Batman to get dressed in the morning, two of them are shown here, and each of them represents a topological ordering because by the time Batman needs to put something on, all necessary things already have been put on in this particular case. And, there is a simple theorem, that we are not proving but you can try to prove it to yourself, that every DAG has the topological ordering. And in fact, for our Manhattan grids, there are many different ordering, topological orderings. And two of them are shown on this slide. And after we learned the topological ordering is important, we can design the pseudocode for the longest path problem. And it works like this. We first topologically order the graph. And then we update nodes in the order given by this topological ordering as described on this slide. What is the runtime of this algorithm that we will use heavily in the remainder of this lecture? Well, when we run dynamic programming, we look at every edge in the graph just once to update the node that this edge goes into. And therefore, at the runtime of the dynamic programming algorithm for finding longest paths is proportional to the number of edges in the graph. And we are now ready to look at different varieties of sequence alignment problems in biology.