[MUSIC] We are now ready to learn how biologists compare sequences. And we'll start from something that, again, seemingly has nothing to do with biology. It's called the alignment game. We are given two sequences, and we can do the following things. We can either remove the first symbols from both sequences. And in this case, if these two symbols are the same, we get a score of one. If they are different, we get nothing. Alternatively, we can remove a single symbol from one of the sequences. Let's start. So, this is how a sequence is and we got lucky. They both start from A. We remove A from both sequences, and we get one point. Now, the sequences start from T, once again, we remove both T, and we get another point. Next, the sequences start from G and C, so we can either remove both, but we'll get no point, or we can remove one of them. That's what we do. And after we removed C from the second sequence, we actually have two sequences starting from G. That's good. We can remove both and get one more point. We continue, one more point, continue, continue, continue, and finally, we finish doing the game, and got 4 points. But maybe, if we played a different strategy, we could get 5 points. But before we decide how to play this game. Let's define the fundamental bioinformatics notion of sequence alignment. We didn't notice it, but while we were playing the game, we actually constructed, the alignment of two sequences. Alignment of two sequences is a two-row matrix. In the first row, we have symbols from the first sequence in order. Possibly interspaced by the space symbol. In the second row, we have symbols from the second sequence. Once again in order, possibly interspaced by the space symbol. And that's what's shown on this slide. They further classify different columns of the alignment matrix as matches. These are the columns for which we get score one with identical letters. Or mismatches, this is a column where we have two letters but they are different. And we also have insertions and deletions, sometimes collectively called indels. So, what we known is that the maximum score in the alignment game is actually the maximum number of matches, is alignment matrix. It doesn't help us, to solve either the alignment problem or to play optimally the alignment game. But we will see later how to do this. We will also define the notion of common subsequence. Common subsequence between two sequences is simply the matches in the alignment of these two sequences. And a classical problem in Computer Science is called the longest common subsequence problem. Given two strings, we want to find a longest common subsequence of two strings. This problem will be the key for our understanding of how biologists compare by sequences.