The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

From the course by Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

248 ratings

Stanford University

248 ratings

Course 3 of 4 in the Specialization Algorithms

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

From the lesson

Week 2

Kruskal's MST algorithm and applications to clustering; advanced union-find (optional).

- Tim RoughgardenProfessor

Computer Science

So this algorithm will prove the correctness of Kruskal's minimum cost spanning tree algorithm. So to prove this correctness theorem, let's fix an arbitrary connected input graph G. And let's let T star denote the output of Kruskal's algorithm when we invoke it on this input graph. So, just like with our high level proof plan for Prim's algorithm, we're going to proceed in three steps. We're first going to just establish the more modest goal, the Kruskal's algorithm outputs a spanning tree. We're not going to make any initial claims about optimality. So those are the first two steps, one to argue there's no cycles, one to argue that the output's connected. And then, in the third step, relying on the cut property. We'll say, it's not just a spanning tree, it's actually the minimum cost spanning tree. For this proof, I am going to assume you're familiar with the machinery we developed to, to prove Prim's algorithm is correct. So the first step of the proof is going to be very easy. So one important property of a Spanish [INAUDIBLE] there's no cycles and it's quite obvious looking at the pseudocode of Kruskal's algorithm that its not going to produce any cycles. Every edge that creates a cycle is explicitly excluded from the output. What's a little less obvious is that as long as the, input graph is connected, then Kruskal's algorithm will output a connected sub-graph. And therefore a spanning tree. So to argue the output's connected, we're going to have two sub-parts of the proof. The first thing I want you to do is recall what we call the empty cut lemma. So this was a way of talking about when a graph is disconnected or equivalently when a graph is connected. And you'll recall a graph is connected if and only if for every cut there's at least one crossing edge. So to prove this second step of this proof that T star is connected we just have to prove that for every single cut it had at least one edge crossing that cut.

So, that's what we're going to show. To get started let's fix an arbitrary cut A, B. Using the assumption the input graph is connected, it certainly contains at least one edge that crosses this cut. The question is whether or not T star contains a crossing edge but certainly the original graph G contains a crossing edge. So, here's the key point of the argument. Kruskal's algorithm, by definition, it makes a single scan through all of the edges. That is, it considers every edge of the original input graph exactly once. So, what I want you to do is, I want you to think about this cut A, B which has at least one edge of G crossing. And I want you to fast forward Kruskals algorithm until the moment that it first considers an edge crossing this cut AB. The key claim is that this first edge seen by Kruskal's algorithm, is definitely be, going to be included in the final output, T star. So why is this true? Well, let me remind you of the lonely cut corollary. The lonely cut corollary says that if an edge is the sole edge crossing a cut, if it's lonely in a cut, then it cannot participate in any cycle, right? If it was in a cycle, that cycle would loop back around and cross the cut a second time. Now, what does that have to do with the picture here? Well, if this is the first edge that Kruskal sees crossing this cut, then certainly the tree so far T star contains nothing crossing this cut, right? It hasn't even seen anything crossing this cut yet. So at the moment it encounters the first edge, there's no way including that first edge will create a cycle because that first edge is going to be lonely in this cut AB. So again summarising, the first edge crossing a cut is guaranteed to be chosen by a cross because the algorithmic cannot create a cycle. That's why there's at least one edge of Kruskal's output crossing this particular cut AB. Since AB was arbitrary, all edges, all cuts have some edge of T star crossing them, that's why T star is connected. So this completes the part of the proof where we argue that Kruskal's algorithm outputs a spanning tree. Now we have to move on and argue that it's actually a minimum cost spanning tree. We're going to argue this in the same way as we did for Prim's algorithm. We're going to argue that every edge ever selected by Kruskal's algorithm is justified by the cut property. Satisfies the hypotheses of the cut property. Recall from our correctness proof for Prim's Algorithm, this is enough to argue correctness. That the output is a minimum cost spanning tree, right? An edge justified by the cut properties, not a mistake. It's got to belong to the minimum cost spanning tree. If you have an algorithm that outputs a spanning tree, and it never made a mistake, that's got to be the minimum cost spanning tree. And that's going to be the case for Kruskal. Now this statement was easy to prove for Prim's algorithm and that's because the definition of Prim's algorithm, it selects edges based on being the cheapest in some cut. So it's tailor made for the application of the cut property. Not so for Kruskal's algorithm. If you look at the pseudocode, nowhere does the pseudocode discuss taking cheap edges across cuts. So we have to show that Kruskal's algorithm in effect is inadvertently at every edge picking the cheapest edge crossing some cut. And we actually have to exhibit what that cut is in our proof of correctness. So, that's what we have to do here. So to argue that, let's just sort of freeze Kruskal's algorithm at some arbitrary iteration, where it adds a new edge. We need to show that this edge is justified by the cut property. So, maybe this edge has end points U and V, and let's let capital T denote the current value of the edges selected by the algorithm so far. So let's think about what this state of the world looks like, at a, sort of, intermediate iteration of Kruskal's algorithm. So Kruskal's algorithm maintains the invariant there's no cycles but remember it doesn't maintain any invariant of the current edges forming a connected set so in general in an intermediate iteration of Kruskal's algorithm, you've got a bunch of pieces, a bunch of little mini trees floating around the graph. Connect the components if you like, with respect to the current edge shed, capital T. And there can be some, you know, isolated vertices floating around as well. What more can we say? Well, if the current edge has endpoints U and V and Kruskal is deciding to add this edge UV to the current set capital T, we certainly know that capital T has no path between U and V, right? If it did have a path, then this new edge would create a cycle, then Kruskal would skip the edge. Since it isn't skipping the edge, U and V are currently in different pieces. They're in different components with respect to the current edge set. Now remember, ultimately, if we're going to apply the cut property, we have to somehow exhibit some cut from somewhere, justifying this particular edge. So here's where we get the cut from, and it's going to be a very similar argument to when we proved the empty cut limit characterizing disconnectedness in terms of empty cuts. We're going to say look, with respect to the tree edges we have so far, with respect to the green edges, there's no path from U to V. That means we can find a cut such that U is on one side, V is on the other side, and there's no edges crossing the cut. So, here's the cool part of the proof. And this is also the part where we actually use the fact that Kruskal was a greedy algorithm. That it considers the edges from cheapest to most expensive. Notice that we haven't actually used that fact up to this point and we better use that fact. So, the claim is that not only is this edge we're adding UV, not only does it cross this cut AB, but actually it's the cheapest edge crossing the cut. Nothing from the original input graph G that crosses it can be cheaper. So why is that true? Well, let's remember how we wrapped up the previous slide when we were arguing that the output of Kruskal's algorithm is connected. What did we argue? We said, well, fix any cut you like. Any cut of the graph. And freeze Kruskal's algorithm the very first time it considers some edge crossing that cut. We noticed that Kruskal's algorithm is guaranteed to include that first edge crossing the cut in its final solution. There's no way that, that first edge considered can create a cycle with respect to the edges already chosen. So, it's not going to trigger the exclusion criterion in Kruskal's algorithm. And that edge is going to get picked. So what's the significance about being the first edge crossing a cut well because of the gradient criterian because Kruskal considers edges from cheapest and most expensive. The first edge it encounters crossing a cut is also necessarily the cheapest edge that's crossing the cut. So here's how we weave these different strands together to complete the correctness proof. Alright, so let's remember where we are. We're focusing on a single iteration of Kruskal's algorithm. It's about to add this edge U, V to the edges, capital T, that it's chosen in the past. We've exhibited a cut. A, B with a property that no previously chosen edges, no edges of capital T cross this cut, and UV is going to be the first chosen edge crossing this cut. We just argued, that Kruskal's algorithm is guaranteed to pick the first edge crossing a cut. So by virtue of their not yet being any chosen edges, crossing the cut AB, it must be the case that this current edge UV is the first one that's ever been seen by Kruskal's algorithm that crosses this cut AB. Now, in being the first edge it's ever seen crossing this cut. It must also be the cheapest edge of the input graph that crosses this cut. That is the hypothesis of the cut property. That is why this edge UV and this current arbitrary iteration is justified by the cut property.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.