The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

En provenance du cours de Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

665 notes

Vous trouverez chez Coursera les meilleures vidéos de cours du monde. Voici quelques-unes de nos recommandations personnalisées pour vous

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

À partir de la leçon

Week 4

Advanced dynamic programming: the knapsack problem, sequence alignment, and optimal binary search trees.

- Tim RoughgardenProfessor

Computer Science

So now that we have motivated and formally defined the optimal binary

search tree problem, lets think about how to solve it.

After settling on dynamic programming as the paradigm we are going to try to use

we're going to proceed in the usual way, turning to the optimal solution for

clues, asking in what way is it composed of optimal solutions to smaller

sub-problems. So let me just remind you of the formal

problem statement. There's n objects we got to store in a

search tree, and let's just name them in the order of their keys, and let's name

them one, two, three, all the way up to n, for simplicity, and we're also given.

frequencies or weights reflecting how often the different objects are searched

for. So that's p1 up to pn positive numbers.

Canonically we think of these summing to one, being probabilities, but actually

won't use that fact so in general they're just arbitrary positive numbers.

The goal is to output a search tree. It should satisfy the search tree

property, it should contain all of these objects one through n, and amongst all

such search trees it should minimize the weighted search time.

So the sum over all of the items I of the probability of I times the search time

for I, namely its depth in the tree plus one.

Now in case you're feeling cocky about the fact that the greedy algorithm works

to solve the seemingly similar optimal prefix-free binary code problem in the

form of Huffman's algorithm, I want to spend a little time pointing out that

greedy algorithms are not sufficient, are not correct, to solve the optimal search

tree problem. So if we were to design a greedy

algorithm what kind of intuition would motivate a particular greedy rule.

Well, staying at the objective function it's very clear that we want the objects

that high frequency of access to be at or near the root and we want items of low

frequency access to be in the bottom levels of the tree, like the leaves.

So what are some ways we can compile this intuition into a Greedy algorithm?

Well one, perhaps motivated by the success of Huffman's algorithm, is we

could think about a bottom up approach. Now I'm not going to define what I mean

here precisely, but informally we want to start with the bottom most levels, with

the leaves and the nodes you want to put there are the objects that are accessed

the least frequently. Any reasonable way of implementing this

kind of body of greedy rule is not going to work.

Let me show a simple counter example. So, let's just assume we have four

objects, one, two, three, four. What I'm showing here on the right in

pink is two possible search trees valid for those four keys.

And let's assume that, the frequencies are as followed.

Object one is searched for two% of the time.

Object two for 23% of the time. Object three, the bulk of the time 73%.

An object for only one% of the time. Any greedy algorithm which insists on

putting the lowest frequency objects in the very bottommost level of the tree is

not going to produce this tree on the right, which has the two% object below,

at a lower level than the !% object. Instead, such a greedy algorithm would

presumably output a searchtree like the one on the left, which has the two as the

root, and then the object four, at the lowest level, at depth two.

But you should be able to easily convince yourself that, for these probabilities,

it's the tree on the right which is the one you want, that's optimal, because the

object three is the one that's searched for the bulk of the time, that's the one

you want at the root as opposed to the object two.

So I realize I'm being a little informal here but I hope you get the idea that a

naive bottom-up implementation of a greedy algorithm, which if you think

about it is really what we did in Huffman's algorithm, is not going to work

here. The same can be said about a top-down

approach. Perhaps the simplest top-down approach

would be just to take the most frequently searched for object and put that at the

root and then recursively develop an appropriate left and right sub-trees

under that most frequently accessed element.

So let me show you again and, formally, just the same kind of counter-example.

We're going to use the exact same four objects, the exact same two trees.

I, I will however, change the numbers. Now, let's imagine that object one is

searched for almost never, just one percent of the time and each of the other

three objects are searched for roughly a third of the time each.

But, let me sort of break ties, so that the object number two is the most

frequently one. Searched for 34%.

So that, in that case the Greedy algorithm will put the 34% node up in the

roots when really what should happen is you want a perfectly balanced sub-tree

for the objects two, three and four because each accounts for roughly a third

of the searches. So let's give object three 33% of the

searches and object four 32% of the searches.

And again I'll leave it for you to convince yourself that this is indeed a

counter example the tree's spit out by the greedy algorithm on the left, we have

an average search time of roughly two, where as the search tree on the right we

have an average search time of roughly five thirds.

So we'd like to produce the tree on the right but the greedy algorithm proposed

here will spit out the tree on the left. This of course doesn't exhaust the list

of potential greedy algorithms. You could try others but it's not going

to work. There's no known greedy algorithm that

successfully solves the optimal binary search tree problem.

So in particular if we focus on the top down approach.

The choice of the route. The choice of what to do at the uppermost

level. Has very hard to predict repercussions,

for what the two different sub-problems look like.

So this is what stymie is, not only the top down gritty approach, but also a

naive divide and conquer approach. So for example if we just wanted to split

the keys into the first half, and the second half, recursively compute and

optimal B is I need on each of those two half's, and then put them back together.

The search tree property would say that we have to, unite those two sub-solutions

under a root which is the median, in between the two sub-problems.

And who is to say that the median is a good choice for the root.

Again, because the ramifications further down the tree, maybe that's a stupid

root. But, boy, is it tempting to try to solve

this problem recursively, right?

We're trying to output this binary tree. It has recursive structure.

If only we knew which root we should pick.

We would love to recurse twice. Once to construct an optimal left

subtree. Once to construct an optimal right

subtree. Okay,

so if only we knew the right route. this is starting to sound familiar,

actually. How did it work in all our dynamic

programming solutions? We always said, oh.

If only a little birdie told us this one little piece of the solution.

Well, then, you know? Then we could, sort of, look up or

recursively compute the rest of the solution.

And extend it back to one for the original problem, easily.

So maybe the same thing's true here. Maybe, if only a little birdie told us

what the root was. Then we could look up or recursively

compute optimal solutions to smaller subproblems.

And paste everything back together, and be done.

That would be great. So as usual we want to make this precise

with an optimal substructure lemma. We want to understand the way in which an

optimal solution to an optimal BST problem must necessarily be constructed

from optimal solutions to smaller sub-problems.

So in the following quiz I'm going to ask you to guess what the appropriate optimal

substructure lemma is and then after that quiz once we've identified the right

statement at that point I will show you the proof.

Okay, so the answer I'm looking for is the fourth one, is D.

Which is the strongest statement of all. So the first point is that each of the

trees T1 and T2, now as subtrees of a binary search tree these of course are

themselves binary search trees, valid for the trees that they contain.

And not only can they be viewed as search trees on the keys they contain, but the

claim which we'll prove on the next couple slides is that they are indeed

optimal. They minimize the weighted search time

over all possible search trees for the objects contained in those two trees.

So that gets us down, it rules out A, it rules out B.

We can say something stronger than that, but we can even say something stronger

than C, that each of the two trees is optimal for the items that they contain.

We actually know exactly which items are in T1 and which items are in T2.

And this is by the search tree property. Search tree property said that every node

and in particular here at the roots everything to the left of the root is

less than that of everything to the right of the node is bigger than it.

So the root being R by assumption we know the objects one through R minus one, but

they got to be somewhere. The only way they can be is in the left

sub-tree, t1. So that's exactly the contents of T1.

Similarly, the contents of t2 are precisely the objects r+1 through n.

So the two sub-trees are optimal. And we know exactly which keys they are,

it's everything less than r on the left and everything bigger than r on the

right. Okay,

so here's where things stand at the end of this quiz.

We've identified a statement that we're really hoping is true.

We're really hoping that an optimal BST, binary search tree, must necessarily be

composed in this way of optimal binary search trees for the key sets to the left

of the root and the right of the root. If that's true, hopefully with the

experience we now have, we can sort of envision what a dynamic programming

algorithm might look like. I'll just fill in the details in the next

video. If this weren't true, if we didn't have

this optimal substructure, honestly, I have no idea how we'd get started.

It's really not clear what an algorithm would look like if this weren't true.

So the next couple slides, I'm going to prove this to you.

The format, you know, will not be radically different than the ones we've

already seen. I don't think there'll be any big

surprises, but it's so important, this really is the whole reason why the

algorithm is going to work, I'm still going to give you a fool proof of this

optimal substructure lemma.

Coursera propose un accès universel à la meilleure formation au monde,
en partenariat avec des universités et des organisations du plus haut niveau, pour proposer des cours en ligne.