The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

From the course by Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

248 ratings

Stanford University

248 ratings

Course 3 of 4 in the Specialization Algorithms

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

From the lesson

Week 1

Two motivating applications; selected review; introduction to greedy algorithms; a scheduling application; Prim's MST algorithm.

- Tim RoughgardenProfessor

Computer Science

The designer analysis of algorithms is the interplay between on the one hand general principles and on the other hand its stantiations of those principles to solve specific problems. While there's no silver bullet in algorithm design, no one technique which solves every computational problem that's ever going to come up. There are general design principles which have proven useful over and over again over the decades for solving problems that arise in different application domains. Those, of course, are the principles that we focus on in this class. For example, in part one we studied the divide and conquer algorithm design paradigm, principles of graph search amongst others. On the other hand, we study specific substantiations of these techniques. So in part one, we studied divide and conquer and how it applies to say, Strassen matrix multiplication, merge short and quicksort. In graph search, we culminated with the rightfully famous Dijkstra's algorithm for computing shortest paths. This, of course, is useful not just, because as any card-carrying computer scientist or programmer, you want to know about what these algorithms are and what they do, but it also gives us a toolbox, a suite of four free primitives, which we can apply to our own computational problems as a building block in some larger program. Part two of the course will continue this narrative. We'll learn very general algorithm paradigms. Like greedy algorithms, dynamic programming algorithms and many applications, including a number of algorithms for the greatest hits compilation. And in this video and the next, I want to whet your appetite for what's to come, by plucking out two of the applications that we'll study in detail later in the course. Specifically, in the dynamic programming section of the course. First of all, for both of these problems, I think their importance is self evident. I don't think I'll have to really discuss why these are interesting problems. Why, in some sense, we really need to solve these two problems. Secondly, these are quite tricky computational problems. And I would expect that most of you do not currently know good algorithms for these problems and it would be challenging to design one. Third, by the end of this class you will know efficient algorithms for both of these problems. In fact, you'll know something much better. You'll know general algorithm design techniques which solve as a special case these two problems and have the potential to solve problems coming up in your own projects as well. And one comment before we get started on these two videos. They're both at a higher level than most of the class, by which I mean there won't be any equations or math. There won't be any concrete pseudo-code, and I'll be glossing over lots of details. The point is just to convey the spirit of what we're going to be studying, and to illustrate the range of applications of the techniques that we're going to learn. So what I want to talk about first is distributed shortest path routing and why it is fundamental to how the internet works. So let me begin with a kind of non very mathematical claim. I claim that we can usefully think of the internet as a graph, as a collection of verticies and a collection of edges. So this is clearly an, clearly an ambiguous statement. There's many things I might mean as we'll discuss. But here's the primary interpretation I want you to have for this particular video. So to specify this, the vertices I intend to be the end-hosts and the routers of the internet. So machines that generate traffic, machines that consume traffic, and machines that help traffic get from one place to another. So the edges are going to be directed and they are meant to represent physical or wireless connections indicating that one machine can talk directly to another one by either a physical link between the two or a direct wireless connection. So it's common that you'll have edges in both directions, so that if machine A can talk to machine B directly, then also machine B can talk directly to machine A, but you definitely want to allow the possibility of asymmetric communication. So, for example, imagine I send an email from my Stanford account to one of my old mentors at Cornell, where I did my graduate studies. So this piece of data, this email, has to somehow migrate from my machine local at Stanford to my mentor's machine over at Cornell. So how does that actually happen? Well, initially there's a phase of, sort of local transportation, so, this piece of data has to get from my local machine to a place within the Stanford network that can talk to the rest of the world. Just like if I was trying to travel to Cornell, I would have to first use local transportation to get to San Francisco airport and only from there could I take an airplane. So this machine from which data can escape from the Stanford network to the outside world is called the gateway router. The Stanford gateway router passes it on to a networks, whose job is to cross the country. So last I checked, the commercial internet service provider of Stanford was Cogent so they, of course, have their own gateway router which can talk to the Stanford one and vice versa. And of course, these two nodes and the edges between them are just this tiny, tiny, tiny piece embedded in this massive graph, comprising all the end hosts and routers of the internet. So that's the main version of a graph that we're going to talk about in this video, but let me just pause to mention a couple of other graphs that are related to the internet, and quite interesting in their own right. So one graph that is generated an enormous amount of an interest in study is the graph induces by the web. So here, the vertices are going to represent web pages and the edges which is certainly directed represent hyperlinks. Not one web page points to another one. So for example, my homepage is one node in this massive, massive graph. And as you might expect, there is a link from my home page to the course page for this class. It is of course essential to use directed edges to faithfully model the web. There is for example, no directed edge from this courses homepage to my own homepage at Stanford. So the web really exploded around, you know, mid 90's, late 90's, So for the past 15 plus years, there's been lots of research, about the web graph. I'm sure you won't be surprised to hear that, you know, around the middle of the last decade, people got extremely excited about properties of social networks. Those, of course, can also be fruitfully thought of as graphs. Here, the vertices are going to be people, and the lengths are going to denote relationships. So, for example, friend relationships and Facebook or the following relationship on Twitter.

So notice the different social networks may correspond to undirected or directed graphs. Facebook for example corresponding to an undirected graph, Twitter corresponding to a directed graph. So let's now return to the first interpretation I wanted to focus on, that where the vertices are in-hosts and routers and it does represent direct physical or wireless connections indicating that two machines can talk directly to each other. So going back to that graph, let's go back to the story where I'm sending an email to somebody at Cornell. And this data has to travel from my local machine to some local machine at Cornell. So, in particular, this piece of data has to get from the Stanford gateway router, in effect to the airport for Stanford's network to the Cornell gateway router. So there will be landing airport over on Cornell's side. Now it's not easy to figure out exactly out what the structure of the routes between Stanford and Cornell look like. But one thing I can promise you is that there is not a direct physical link between the Stanford gateway router and the Cornell gateway router. Any route between the two is going to comprise multiple hops. It will have intermediate stops. And there's not going to be a unique such route. So if you have the choice between taking one route which stops in Houston and then Atlanta and then in Washington D.C., how would you compare that to one which stops in Salt Lake City and Chicago. Well hopefully your first instinct and a perfectly good idea is all else being equal, prefer the path that is in some sense the shortest. Now in this context, shortest could mean many things, and it's interesting to think about different definitions. But for simplicity let's just focus on the fewest number of hops, equivalently the fewest number of intermediate stops. Well, if we want to actually execute this idea, we clearly need an algorithm that given a source and destination computes the shortest path between the two. So hopefully you feel well equipped to discuss this problem, because one of the highlights of part one of this class, was the discussion of Dijkstra's shortest pathalum rhythm in a blazing fast of limitation using heaps that run's in almost linear time. We did mention one caveat when we discussed Dijkstra's algorithm mainly that it requires all edge lengths to be non negative but in the context of internet routing almost any edge metric you'd imagine using will satisfy this non negativity assumption. There is, however, a serious issue with trying to apply Dijkstra's shortest path algorithm off the shelf to solve this distributed internet routing problem, and the issue was caused by the just massive, distributed scale of modern day internet. Probably back in the 1960's, when you had the 12-note ARPANET, you could get away with running Dijkstra's shortest path algorithm, but not in the twenty-first century. It's not feasible for the Stanford gateway router to maintain locally a reasonably accurate model of the entire internet graph. So how can we elude this issue? Is it fundamental that because the internet is so massive, it's impossible to run any shortest path algorithm? Well, the ray of hope would be if we could have a shortest path algorithm that admitted a distributed implementation. Whereby, a node could just interact, perhaps iteratively, with its neighbors with the machines to which its directly connected. And yet, somehow converge to having accurate shortest paths to all of the destinations. So perhaps, the first thing you'd try would be to seek out an implementation of Dijkstra's algorithm, where each vertex uses only local computation. That seems hard to do. If you look at the pseudo-code of Dijkstra, it just doesn't seem like a localizable algorithm. So instead, what we're going to do is we're going to learn a different shortest path algorithm. It's also a classic. Definitely on the greatest hits compilation. It's called the Bellman-Ford Algorithm. So the Bellman-Ford algorithm, as you'll see, can be thought of as a dynamic programming algorithm. And indeed, it correctly computes shortest path using only local computation. Each vertex only communicates in rounds with the other vertices to which it's directly connected. As a bonus, we'll see this algorithm also handles negative edge lengths. Which of course, Dijkstra's algorithm was not. But don't think Dijkstra's algorithm is obsolete. It still has faster running time in situations where you can get away with centralized computation. Now, it was really kind of amazing here is that the Bellman-Ford algorithm, it dates back to the 1950's. So, that's not just pre-internet, that pre-ARPANET. So that's before the internet was even a glimmer in anybody's eye. And yet, it really is the foundation for modern internet routing protocol's. Needless to say, there's a lot of really hard engineering work and further ideas required to translate the concept from Bellman-Ford to actually doing routing in the very complex modern day internet. But yet, those protocol's at their foundation, goes all the way back to the Bellman-Ford algorithm.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.