Consider the following graph with five vertices. I'll label their edges with their costs in blue. So let's step through the values of i. And since there's five vertices, i's going to take on the values of 0, 1, 2, 3, 4. And let's look at where the results of each round of subproblem computations is. So in the base case when i equals 0, as always, the distance from s to itself is 0. And for all other vertices, the subproblem value is defined as plus infinity. Let me write down the recurrence again, just in case you've forgotten it. So now we enter the main 4 loops, and we start with i for 1. So now we just go through the vertices in arbitrary order and evaluate the recurrence. So node s is going to just inherit the solution from the previous step, it's still happy with the empty path of total length 0. Now, node v is certainly hoping not to have to inherit its solution of plus infinity from the previous round, when i equals 0. And indeed when i equals 1, the subproblem solution for the vertex v is going to be 2. That's because we can choose the last hop to be the edge s, comma v. That has length 2, and S's subproblem value last iteration when i equals 0, is 0. By the same reasoning x's new subproblem value is going to be four, because we can chose the last hop to be s, comma x, and add the cost of that edge four to s's subproblem value, last iteration when i equals 0. Now, the nodes w and t would love to throw out their plus infinity solutions and get something finite. And you might be thinking that because v and x now have finite distances, those would propagate to the nodes w and t. That will indeed happen, but we're going to have to wait until the next iteration, we're going to have to wait until i equals 2. The reason is if you look at the code, if you look at the recurrence, when we compute the subproblem value at a given iteration i, we only make use of the subproblem solutions from the previous iteration, i minus 1. We do not use any of the updates that have already happened in the current iteration i. So because when i equals 0, a of 0 v, and a of 0 x were both plus infinity, we're going to be stuck with a1w and a1t, both being plus infinity as well. So, now let's move onto the next generation of the outer 4 loop, when i equals 2. The subproblem solution s is not going to change, you're not going to do better than 0, so it's going to stay that way. Similarly at the vertex v, you're not going to do better than 2, so that's going to stay the same at this iteration. Something interesting happens at the vertex x, however. Now, in the recurrence you certainly have the option of inheriting the previous solution. So one option would be to set a of 2, comma x, to be equal to 4, but there's actually a better choice. So the recurrence is going to come out to be even smaller, specifically if we choose that last hop to be the unit cost arc from vx. We add that unit cost to v subproblem value, last iteration when i equals 1, that was 2, 2 plus 1 equals 3. So that's going to be the new subproblem value at x, in this iteration where i equals 2. So as advertised, the updates to the vertices v and x in the iteration where i equals 1, now that i equals 2, get propagated onto the vertices WNT. So, WNT shed their plus infinity values, and instead they pick up the values 4 and 8 respectively. Notice that I've labeled the vertex t with an 8, not with a 7. I've computed a of 2, comma t, to be 8. And the reason is again the same updates from this iteration, in particular the fact that x has dropped from 4 to 3 do not get reflected from other nodes in this same iteration. We have to wait until the next iteration of the outer 4 loop before they happen. So we're using the stale information at x, that when i equals 1 its solution value is 4. That's the information we're using to update t's solution value, so it's 4 plus 4 or 8. So in the penultimate iteration when i equals 3, most things stay the same. At s and v, at x and w, we've actually computed the shortest paths, so they will all just inherit the solution from the previous iteration. But at vertex t, it will now make use of the improved solution value at the vertex x, from the iteration where i equals 2. So its 8 gets updated to be a 7, reflecting the improvement at x, the previous iteration. And at this point we're actually done, we've computed shortest paths to all destinations but the algorithm doesn't know that we're done yet. So it still executes the final iteration of the outer 4 loop where i equals 4, but everybody just inherits their solutions from the previous round, at that point the algorithm terminates. For most of the dynamic programming algorithms that we've discussed, the running time analysis has been trivial. The Bellman-Ford algorithm is a little more interesting from a running time analysis perspective, please think about it in this quiz. So the correct answer is B, that is of all these running time bounds, the smallest bound, which is actually correct. So let me explain why, why it's the number of edges times the number of vertices, and also comment on some of the other options. So answer A, quadratic and squared, what this is, this is the number of subproblems, right? So subproblems are indexed by a choice of i, which is somewhere between 0 and n minus 1. And a choice of a destination v, there's n choices for each of those exactly n squared subproblems. If we only ever spent constant time evaluating a subproblem, then indeed the running time at Bellman-Ford would be big O of n squared. And in most dynamic programming algorithms we've discussed in this class, indeed you only spend constant time solving each subproblem. The one exception being the optimal binary search trees problem, where in general we spent linear time. Here again, like optimal binary search trees, we might spend more than constant time solving this subproblem. The reason being, we have to do brute force search through a list of candidates that might be super constant. The reason is that each arc that comes in to the destination v provides one candidate for what the correct solution to this subproblem might be. So remember, the number of candidates, we had a quiz on this, is proportional to the n degree of the vertex v. And that can be as large as n minus one, linear the number of vertices. So that's why the running time of the Bellman-Ford algorithm can in general be worse than n squared. Another interesting answer is C, that it's cubic in n, and D, the big O of n cubed is a valid upper bound on the running time of the Bellman-Ford algorithm. But it's not the sharpest possible upper bound, so why is it a valid upper bound, as discussed just now is a quadratic n squared number of subproblems, how much work do you do per subproblem? Well, it's proportional to the n degree of a vertex, the biggest n degree of a vertex is n minus 1, big O of n. So, linear work for each of the quadratic number of subproblems resulting in cubic running time. There is, however, a tighter, a better analysis of the Bellman-Ford algorithm. Now, why is O of mn bigger than n cubed? Well what is m? In a sparse graph, m is going to be theta of n, and in a dense graph, m is going to be n squared. So if it's a dense graph, it's true, big O of mn is no smaller than n cubed, but if it's not a dense graph then this really is an improved upper bound. So why does the bound hold? Well, think about the total amount of work done over all of the subproblems in the following way. So, all we're going to do is take the amount of work done in a given iteration of the outer 4 loop and multiply it by n, the number of iterations of the outer 4 loop. So how much work do we do on a given iteration for a given choice of i? Well, it's just the sum of the n degrees. When we consider the vertex v, we do our proportional of its n of degree, and we consider each vertex v once in a given iteration of the outer 4 loop. But we know a much simpler expression for the sum of the n degrees. This sum is simply equal to m, the number of edges. In any directed graph, the number of edges is exactly equal to the sum of the n degrees. One easy way to see that is take your favorite directed graph and imagine you plug the edges into the graph one at a time, starting from the empty set of edges. Well each time you plug in a new edge, obviously the number of edges in the graph goes up by one. And also, the n degree of exactly one vertex goes up by one, whichever vertex happens to be the head of the edge that you just stuck in. So therefore, the sum of the n degrees and the sum of the number of edges is always the same, no matter what the directed graph is. So that's why the total work is better than n cubed, it's actually big O of m times n. So a number of optimizations of the basic Bellman-Ford algorithm are possible, let me wrap up this video just for the quick one about stopping early. See also a separate video about a less trivial space optimization of the algorithm. The basic version of the algorithm, the outer 4 loop runs for n minus 1 iterations, generally you don't need all of them. We already saw that in our simple example where the final iteration did no useful work, it just inherited the solutions from the previous iteration. So in general, suppose that some iteration, earlier than the last one, so say with a current index value of j, it just so happens that nothing changes. That every destination v, you just reuse the optimal solution that you recomputed in the previous iteration of the outer 4 loop. Well then if you think about it, what's going to happen in the next iteration? You're going to do exactly the same set of computations, with exactly the same set of inputs, so you're going to get exactly the same set of results. That is, it will be true again in the next iteration that you will just inherit the optimal solutions from the previous one. And that will keep happening over and over again until the rest of time. So in particular, by the time you get to the n minus 1 iteration of the outer 4 loop, you will have exactly the same set of solution values as you do right now. We've already proven that the results at the end of iteration n minus 1 are correct. Those are the real shortest path distances. If you already have them in your grubby little hands now, well you may as well abort the algorithm and return those as the final correct shortest path distances.