So at this point we understand, Prim's algorithm and we also know why it's

correct. That is why it always computes the

minimum cost spanning tree of a graph. So in this video, we're going to turn to

implementation issues and running time analysis.

We'll begin by just analyzing the straightforward implementation of Prim's

algorithm. That's already reasonable.

It's polynomial running time, but not especially close to linear.

Then we'll see how a suitable deployment of heaps very much in the way that we did

for Dijkstra's algorithm leads to a blazingly fast, near linear running time.

So let's briefly review the pseudocode for Prim's algorithm.

Recall that Prim grows a tree one edge at a time spanning one new vertex at each

iteration. So it maintains two sets,

capital X, a set of vertices that have spanned so far,

and capital T, these are the edges we've committed to thus far.

They start out by just being some arbitrary vertex, little s, and the empty

set, and in each iteration of this main while-loop, we add one new edge to the

tree. And whatever new vertex that edge spans,

we add that to capital X. The algorithm terminates when we're

spanning all of the vertices and as we've seen, it halts not just with a spanning

tree but with a minimum cost spanning tree.

So suppose we just literally implemented this algorithm as is, what would be the

running time? Well, the initialization stick, step

takes only constant time, so let's ignore that.

So let me just have this one loop. So let's just ask how many iterations

does the loop take and how much time is needed to execute each iteration?

Well, the number of loop iterations is going to be exactly n - 1, so, where n is

the number of vertices. X starts out with one vertex and

terminates when it has all n vertices. How much work do we need to implement

each iteration? Well, essentially, what each iteration is

doing is a brute-force search through the edges.

It's looking for the edges that have one endpoint inside X and one endpoint

outside, and amongst those, it just remembers the

cheapest. And it's easy to see that you could

implement each iteration in O of m time, where M is the number of edges.

For example, you can just, with each vertex associate a Boolean variable that

keeps track of whether or not it's in this capital X, that way when you see an

edge, you know whether it's crossing the frontier or not in constant time.

So putting it together, O of m iterations with O of m works for each gives us a

running time of O of m times n. So this running time is already nothing

to sneeze at. As we discussed, graphs can have an

exponential number of spanning trees. So, this algorithm is doing far less work

than examining each of these spanning trees.

It's homing in in polynomial time, to the minimum cost point.

So that's pretty cool. But remember the mantra of any algorithm

designer worth their salts, confronted with a solution, you should

always ask but can we do better? And can we do better than running time O

of m times n? We can as we'll see in the rest of this

video. The big idea for speeding up Prim's

Algorithm is exactly the big idea we used in part 1 to speed up Dijkstra's

algorithm, namely we're going to deploy a suitable

data structure. So, what data structure seems like it

might be a good idea for making Prim run faster?

Well, what's happening in the main workhorse while-loop of Prim's algorithm?

Over and over again, we keep meaning to do a minimum computation amongst all

edges crossing the frontier, we need to find the cheapest one.

So, the question we should ask ourselves is what kind of data structure would

facilitate, would speed-up repeated minimum computations.

And if you recall from part 1, we have a data structure where that's exactly what

it's raison d'etre is, the heap, the meaning of life for a heap is to

speed-up repeated minimum computations, just like in Prim's algorithm.

So let me just remind you briefly, what are the operations exported by heap data

structure and what is the running time? So first recall that a heap contains a

bunch of objects, and each of those objects should have some key value from

some totally ordered set, like a number, like for example, an edge cost.

So what can you do with a heap? Well, the salient operations for our

purposes today are, first of all, you can insert new stuff into the heap with

their, whatever their key value is. You can extract the object with the

minimum key value. And you can also delete stuff from the

middle of the heap. And all of these can be done in

logarithmic time, logarithmic in a number of objects stored by the heap.

So it's not going to be important for us today to know how heaps are implemented

and what they look like under the hood. We're just going to be clients of them.

We're just going to make use of these operations and the fact that they run in

logarithmic time. But you know, just for those of you who

are curious, and/or want to have your memory jogged. Under the hood, heaps are

implemented logically as complete binary tree.

They're actually laid out in an array, but you sort of think of them

conceptually as being in a complete binary tree. And they, they, they satisfy

what's called the heap property. And the heap property is to make sure that you

know where the object with the minimum key value is.

So the actual definition is, every parent should have a key value which is less

than that of its children. So as you go up the tree, the key value

can only drop and that means you know where the minimum is got to be.

It's got to be at the root of this tree orr the front of the array.

So that's great. That's how you locate the minimum so

quickly in a heap. Now, what do you do when you want to

extract the minimum? So you rip off the root of this tree,

and now, you have to rearrange the tree to restore the heap property.

So you swap the last leaf up to where the root was, you bubble-down as needed to

restore the heap property. how do you insert?

You put the new object as the new last leaf and you bubble it up as needed to

restore the heap property. To delete from the middle of a heap, you

just sort of rip it out and then bubble things up or down as necessary to restore

the heap property. Again, that's not supposed to, if you're

hearing this for the first time, I know this is too fast,

this is just to jog your memory for those of you who already learned this in part 1

of the course. For more details, you can go review the

appropriate videos there. So now that I've reminded you about the

salient properties of heaps. Let's return to the question of how do we deploy them

cleverly to speed-up Prim's algorithm. So our intuition is that because we're

doing repeated minimum computations in Prim's algorithm, each time that it's

while-looped, compute the cheapest edge cross in your frontier, that's sort of in

the wheelhouse of heaps. So how should we use heaps? Well, the

first idea, which is a pretty good idea, is to use the heap to store edges, right?

Because our minimum computation should result in us choosing an edge, so when we

EXTRACT-MIN from a heap, we want it to hand us an edge on a silver platter. So

it would seem this would be your first thought,

that the heap should store edges and that the key value that you use should just be

the edge cost, because you want to find the cheapest edge.

So this already a quite good idea using heaps in this manner.

We'll already definitely speed-up Prim's algorithm relative to the naive

implementation. And in fact.

and I'll leave this as an exercise for you to work out.

using heaps in this way results in an implementation that has, that runs in

time big O of m log n. What I'm going to show you instead is not

that implementation, but an even cleverer implementation of Prim using heaps.

We're not going to see a benefit in the asymptotic running time.

This more sophisticated version will also give us m log n running time, but it

would give you better constants and it is the version you would want to implement

in practice. [SOUND] So, the one slightly tricky point

in this exercise is remembering Prim's algorithm, you don't just want the

cheapest edge overall [INAUDIBLE] You want the cheapest edge which crosses the

current cut that has one endpoint in each of x and v - x.

And, when you use heaps in this way, it might hand you in a silver platter and

edge which is cheap, but isn't necessarily crossing the frontier.

So, you need some extra checks to ensure that you're always finding the minimum

edge and that that edge crosses the frontier between x and v - x.

So I'll leave it to you to work out the details of this implementation in the

privacy of your own home. What I want to spend our time together on

instead is this somewhat more sophisticated, more practical way to use

heaps. And for those of you who remember our

fast implementation of Dijkstra, this will be very familiar to you.

It will be the same kinds of ideas that we used for Dijkstra,

and the keypoint is, instead of using the heap to store edges, were going to use it

to store vertices. So, in a bit more detail, our plan is

going to be to maintain two invariants. The first invariant is going to describe

what the heap contains. The second invariant is going to be what

the key values of those heap object are. So as I said, we're now going to be

starting at vertices in the heap, not edges.

Which vertices? Exactly the vertices that we don't yet

span. The vertices of v - x.