In this module, we'll continue on the path of studying algorithms and
data structures to help us solve the read alignment problem.
By the end of this module, you will have seen and worked with a range of ideas
that underly fast pattern matching algorithms, both offline and
online algorithms, and both exact and approximate matching algorithms.
So far we've seen the naive exact matching algorithm, which isn't particularly fast,
and which isn't able to find approximate matches, but a useful solution to
the read alignment problem will have to be both fast and approximate.
So in this module, we'll push in both directions.
First, we'll discuss a very practical and a very fast and fairly simple alternative
to the naive exact matching algorithm, which is called Boyer-Moore.
Then we'll discuss indexing which we'll adapt to work with genomes and
the read alignment problem.
Indexing is a very powerful idea.
Indexing is the reason why when you type a Corey String into an Internet
search engine, you can get an answer back immediately.
And then finally we'll introduce an idea based on the pigeon hole principle,
that will allow us to look, not just for exact occurrences, but
also for approximate occurrences of a pattern in a text.
And, fortunately for us, this pigeon hole principle is going to allow us to reuse,
basically, all the algorithms that we learned for exact matching and
apply them to approximate matching problems.
So let's get started.