We are now ready to classify proteins into family using the Hidden Markov Models. Let's assume that they are given multiple alignment of already known members of the protein family and that we have have constructed HMM based on this family. And now a new protein comes in and we want to decide whether it belongs to given protein family. The easiest thing to do is simply apply Viterbi algorithm to find an optimal hidden path and decide whether this protein belongs to a family depending on such probability stress hole for this protein, so that's what's shown on this slide. And if the probability threshold is satisfied, then the classified protein is belong into this family and extends the alignment of proteins in this family as shown in this slide. So to do this classification we need to construct the Viterbi graph for the given HMM diagram. Please note that we have not discussed yet how to construct the Viterbi graph for an HMM diagram. For example, how many rows this Viterbi graph will have. Well, as before, the number of rows in the Viterbi graph should be equal to the number of states in this HMM diagram. So large number of rows, and here I showed the Viterbi graph, and I took a liberty to assign every single column to every state HMM is passing through. Does it make sense? So the number of columns in the Viterbi graph shown here equal to the number of visited states. There is one problem with this Viterbi graph. We don't know in advance what will be the number of visited states when we look at the newly sequenced protein and therefore it's not even clear how to construct this graph. For example in this case it has two deletion states but how do we know this before the constructors, the HMM Viterbi graph. And therefore we should actually try to think about how to construct the Viterbi graph based, or is the number of columns based on the number of emitted symbols. Something that we know rather than the number of visited states something that we don't know in the case of silent states such as deletion states. And therefore by definition the number of columns in the Viterbi graph should be equal to the number of emitted symbols. Which means it should be smaller than the number of columns in this erroneously constructed Viterbi graph. But here's a correctly constructed Viterbi graph. Every time we move to the deletion state and do not near symbol we actually transform the edge into a vertical edge within the same column. In this case, the number of columns in the Viterbi graph equal to the number of immediate symbol, and the Viterbi graph then is well defined. Then the thing we need to add is the zero column that contains only silent states, for technical details. So we just reduced the problem of alignment profile and on to the folding problem. We aligned a new sequence to a family of line sequences using the profile HMM. The input is multiplied alignment string text. Asterisk whole theta described in maximum fraction of insertions per column and the pseudo count sigma. An output is an optimal hidden path emitting text in the profile HMM noted HMM alignment, theta, sigma. Let's take a look at the hidden path constructed in the HMM diagram. And of course, this hidden path correspond to paths in the usual allignment graph that we considered before. The question then arises, we just described a rather complex decoding algorithm for constructing the hidden paths in the HMM. But wouldn't it be easier to just construct the same Layman path using the traditional alignment technique? Or, in other words, have I wasted your time? Indeed, the recurrence for the Viterbi algorithm is very similar to the recurrence for the traditional sequence alignment. For the Viterbi algorithm, the score of the node equal to the maximum of three scored of the preceding state. For the alignment, again is a score of node equal to the maximum of three scores over it's predecessor. So it indeed looks like I wasted your time. However, notice that the choice of alignment path is now based on varying transition and emission probabilities within in different columns. And therefore these individual scoring parameter allows us to detect subtle similarities using HMM. Similarities that the traditional sequence alignment often misses.