Let's fill in the blanks with the sequences of human haemoglobin subunit alpha and subunit
beta.

OK, this is how it looks after we filled in the blanks.

Please note that the greater-than sign in the first line is followed by the name of the sequence.

For example, the name for the alpha subunit is HBA_HUMAN.

The name for the beta subunit is HBB_HUMAN.

Starting from the second line we have the protein sequence made up by 20 amino acids.
Simple enough.

Let's just ignore Step 2 for now

because it says that the default settings will fulfill the needs of most users.

Now let's press the Submit button!

Now the result is here! Let's take a look.

The results are a little complex, so we will check them one by one.

Let's look at the bottom half first.

You can see that in addition to the two input sequences, there is an extra line below them

that describes the alignment itself, and is called a "markup line" or "alignment string".

Let's take a closer look at this extra line.

This "markup line" consists mainly of

vertical bars, colons, dots, and whitespaces.

It is easy to see that the vertical bars represent alignments of the same residues,

such as the M-to-M and V-to-V alignments.

What about the colons and dots?

You can find out immediately that they cannot denote alignments of the same residues,

because they are not vertical bars. The residues are different, too.

Let's take a look at the first colon for the S and T.

They denote a substitution [from S to T].

The colons and dots are used to denote the level of similarity between two aligned residues that
are not identical.

The colons denote aligned residues that are similar, whereas the dots denote those that are
not that similar.

Specifically, the similarity between a pair of amino acids is evaluated using the substitution
matrix,

which [here in this case] is the BLOSUM62 matrix at the upper-right corner.

For example, the score of substituting S with T in the substitution matrix BLOSUM62 is 1, so
you see a colon here.

The score of substituting A with E is -1 which is less than 0, so a dot was used

That's how the colons and dots are used.

Not difficult, right?

Let's look at the result more carefully.

You can see that all the substitution of S with T

and all the substitution of T with S are denoted by colons.

In fact, the substitution matrix is symmetric

with respect to the diagonal.

In other words, the substitution of S with T and the substitution of T with S will have the same
score.

The direction of substitution doesn't matter.

It is a symmetric [matrix].

You can also see that all the substitution for S have the same score,

which means that the substitution scores are related, and only related, to the two residues
involved.

The substitution matrix is context-free.

You can see that the first substitution of T with S is preceded by K, while the second is
preceded by L.

Their scores, however, are the same,

and both have a colon displayed [in the markup line].

In fact, the substitution score of a pair of aligned residues is independent of other pairs of
residues.

These seemingly trivial features

are in fact very important, as we will see later.

Finally let's look at the gaps.

From the view of evolution, gaps denote insertions and/or deletions of genomic fragments

during the course of evolution, often called "indels".

An insertion in one sequence can be regarded as a deletion in the other sequence.

Indels often have some effects on the function of sequences. So gaps in an alignment usually
receive negative scores, called the "Gap
penalty".

So gaps in an alignment usually receive negative scores, called the "Gap penalty".

Since an event of insertion or deletion often involve multiple residues, a gap often has a
length of more than one residue.

This is different from substitutions.

Gap penalty is often implemented as a linear combination

of gap opening penalty and gap extending penalty which were usually given different
penalty scores.

Let's take the penalty score for the second gap as an example.

As suggested by the formula at the lower-right corner, opening a new gap will receive a penalty
score of 10.

Extending it will receive a penalty score of 0.5.

So the total penalty score is 10.5.

As for the last gap, its length is five.

So the penalty score is 10+0.5*4 (or 10+0.5*(5-1)) = 12.

Finally, subtracting the sum of gap penalties from the sum of substitution scores

will give you the final score, 292.5, as shown in the result marked with a red line.

Some students might wonder why there is a score of 0.5.

The reason is that it is extending a gap, rather than opening a new gap.

We have used this example to illustrate some basic ideas involved in the most simple pairwise
alignment.

Here are several summary questions.

They are not assignments, but you are encouraged to think about them

and discuss your answers and ideas in the online forum.

That's all for Unit 1.

In Unit 2 we will illustrate how to use algorithms to do such sequence alignment.

Thank you! See you at the next unit!