The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

En provenance du cours de Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

643 notes

At Coursera, you will find the best lectures in the world. Here are some of our personalized recommendations for you

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

À partir de la leçon

Week 3

Huffman codes; introduction to dynamic programming.

- Tim RoughgardenProfessor

Computer Science

So, having out iterated through all of our algorithm design paradigms and

noticing that none of them seem to work very well for computing the maximum

weight independent set of a path graph, we're going to develop some ideas for a

new paradigm. And the key approach in this new paradigm

is to first reason about the structure of an optimal solution.

What I mean by this is we're going to seek out statements of the following

form. The optimal solution, whatever it may be,

has to possess a certain form. It has to have been built up from an

optimal solution to a sub-problem in a prescribed way.

So in fact, in much of our discussion of both divide and conquer and greedy

algorithms, this kind of reasoning was implicit.

With dynamic programming, we're going to make it systematic.

For example, implicit in the correctness of many of divide and conquer algorithm

is the fact that an optimal solution to the whole problem has to be expressible,

has to be constructable in a prescribed way from solutions to smaller

sub-problems. So, what's the motivation for doing this

thought experiment, trying to understand what the optimal solution could possibly

look like? Well, the plan is we're going to narrow

the possible candidates for the optimal solution down to a relatively small set

of candidates. With a small set, we can get away with

brute force search to pick the best one. So, one lesson you learn once you get

good at dynamic programming, is that it's not at all circular to reason about the

very object that you're trying to compute.

Remember, the goal here is to devise an algorithm to compute an optimal solution.

And now, I'm telling you to do a thought experiment as if you had already computed

it, as if I had handed it to you on a silver platter.

But that kind of daydreaming can be very productive.

And thinking hey, what if I did have an optimal solution?

What could I say about it? What would it look like?

Observations of that form can actually light up a trail to the computation of

that exact object and we'll see that in the next couple videos.

Alright so that's enough loft, lofty philosophy for now, lets get concrete.

Okay, so we've got this path graph, the vertices have weights.

We want the max weight independent set. Let's again do this thought experiment.

What if someone handed to us, what could we say about it's structure?

We'll be using the following notation when we reason about this maximum weight

independent set. S denotes the vertices in that optimal

solution, in that max weight independent set and we're going to let V sub n denote

the right most, the final vertex of the input graph.

So, here's a content-free statement. This last vertex of the path, V sub n.

Either it's an S or it isn't. So, that's going to give us two cases,

when we reason about the optimal solution.

Let's start with the case where Vn is excluded from the optimal solution

capital S. So, let's let G prime denote the path

graph you get by plucking off Vn, plucking off the right-most vertex from

the original graph G. So, let's make a couple of trivial

observations. So, first of all, this set, capital S,

it's an independent set in G. It doesn't include the last vertex.

So, we can regard this set as equally well as an independent set of the smaller

graph G prime. If it didn't contain consecutive vertices

in G, nor does it in G prime. But actually, we can say more.

Not only is S any old independent set in G prime.

It has to be an optimal, that is, maximum weight independent set in G prime.

Why? Well, if there was something better than

S in G prime, we could take that exact same independent set S star regarded as

an independent set in G and, of course, it would still be better than S in G.

But that contradicts our assumption that S was a max weight independent set in G.

So summarizing, if the max weight independent set S of the original path

graph G does not include the right-most vertex, it can be very simply described

in terms of an optimal solution to a smaller sub-problem.

It literally is a max weight independent set of G prime, the path graph with one

fewer vertex. Alright. So, case one is a warm up for

case two, which is similar but slightly more complicated.

Now, let's assume that the maximum independent S does, in fact, use the

right-most vertex, Vn. Now, by the definition of an independent

set, no two consecutive, no two adjacent

vertices can be chosen. So, by virtue of choosing, choosing V sub

n, the right-most vertex S, in this case, absolutely cannot include the penultimate

vertex V sub n - 1. So, we want to know by G double prime,

the path you get from G by plucking off both of the right-most two vertices.

So now, let's do our best to mimic the argument in case one.

In case one, we said well, S has to also be an independent set of G

prime. Now, here that doesn't make sense, right?

Here, S contains the last vertex so we can't talk about it even being a subset

of any smaller graph. However, if we think about S except for

the last vertex of V sub n, S with Vn removed is an independent set, in fact,

of G double prime, because remember, S can't contain the second to last vertex.

And once again, just like in case one we can say something stronger, S with Vn

removed is not any old independent set of G double prime, it actually must be an

optimal one. It must have maximum possible weight. The

reasoning is similar. Suppose S with Vn removed was not the best possible

independent set in G double prime. Then there's something else called an S

star which is even better, has even bigger weight.

How do we get a contradiction? Well, if we just add Vn to this even

bigger independent set S star that lies in G double prime, we get a bonafide

independent set of the entire graph G with overall weight even bigger than that

of S, but that contradicts the optimality of S.

So, for example you could imagine that this reported optimal solution capital S

had total weight 1,100 in two parts, it had 1,000 weight coming from vertices in

the G double prime, but it also had V sub n which had weight 100 on its own. And so

now, in the contradiction, you say, well, suppose there was an independent set that

had even more than 1,000 weight just in G double prime, say, 1,000 and 50.

Well then, we just add this last vertex V sub n to that, we get an independent set

with weight 1,150 and the original graph G.

But that contradicts the fact that S was supposed to be off more with weight

nearly 1,100. So, notice that the reason were using the

graph G double prime in this argument is to make sure that no matter what S star

is, no matter what this independent set of G

double prime is. We can just add V into it blively and not

worry about feasibility, right?

So, the right-most vertex S star could possibly possess is the third to last

vertex Vn - 2. So, there's no worries about feasibility

when we extend it by adding in the right-most vertex V sub n.

So, to make sure you don't lose the forest for the trees, let me just point,

let me remind you what our high level plan was, and let me point out that we

actually just executed that plan quite successfully in this problem.

The plan was to narrow down the candidates for what the optimal solution

could be. To reason about the form of the optimal

solution and argue that it has to look in a particular way.

What did we just prove on the previous slide?

We said that the optimal solution actually can only be one of two things.

Either it excludes the final vertex and it is nothing more than the max weight

independent set of G prime, or if it includes the right-most vertex than it

must be, a maximum weight independent set frp, G double prime augmented with this

last vertex V sub n. There's only two possibilities for what

the optimal solution could possibly look like, in terms of optimal solutions of

smaller sub problems. So, a corollary of this is that if a

little birdie told us which case we were in, whether or not V sub n, V sub n was

in the optimal solution, we could just complete this by recursing on the

appropriate sub-problem. The little birdie tells us the optimal

solution doesn't have Vn we just recurse on G prime.

If the little birdie tells us v sub n is in the optimal solution, we recurse on G

double prime and then add V sub n to the result.

Now, of course, there is no little birdie and we have no idea whether this

right-most vertex is in the optimal solution or not.

But hey, there is only two possibilities, right?

Here is an idea. Maybe it seems crazy, but why not try

both possibilities and just return whichever one is better?

Why do I suggest that maybe this is crazy? Well, if you stare at this and you

think about it and you think about the ramifications of trying both

possibilities as you recursed down the graph, this may start feeling a little

bit like brute force search. And, in fact it is.

This is just a recursive organization of a brute force search.

Nevertheless, as we'll see in the next video, if we're smart about eliminating

redundancy, we can actually implement this idea in a linear time.

Coursera propose un accès universel à la meilleure formation au monde,
en partenariat avec des universités et des organisations du plus haut niveau, pour proposer des cours en ligne.