0:00

Having completed our first analysis of an algorithm, namely an upper bound on the

running time of the Merge Short algorithm. What I wanna do next is take a step back,

and be explicit about three assumptions, three biases that we made when we did this

analysis of Merge Short, and interpreted the results. These three assumptions we

will adopt as guiding principles for how to reason about algorithms, and how to

define a so called fast algorithm for the rest of the course. So, the first guiding

principle is that we used what's often called worst case analysis. By worst case.

Analysis, I simply mean that our upper bound of six N log N plus six N. Applies

to the number of lines of executed for every single input array of length end. We

made absolutely no assumptions about the input, where it comes from, what it looks

like beyond what the input length N was. Put differently, if, hypothetically, we

had some adversary whose sole purpose in life was to concoct some malevolent input

designed to make our algorithm run as slow as possible. The worst this adversary

could do, is. Upper bounded by this same number, 6N log N + 6N. Now, this, so,

sort of worst case guarantee popped out so naturally from our analysis of Merge

Short, you might well be wondering, what else could you do? Well, two other methods

of analysis, which do have their place, although we won't really dicuss them in

this course, are quote unquote, average case analysis. And also the use of a set

of prespecified benchmarks. By average case analysis, I mean, you analyze the

average running time of an algorithm under some assumption about the relative

frequencies of different inputs. So, for example, in the sorting problem, one thing

you could do, although it's not what we do here. You could assume that every possible

input array is equally unlikely, and then analyze the average running time of

an algorithm. By benchmarks, I just mean that one agrees up front about some set,

say ten or twenty, Benchmark inputs, which are thought to represent practical or

typical inputs for the algorithm. Now, both average-case analysis and benchmarks

are useful in certain settings, but for them to make sense, you really have to

have domain knowledge about your problem. You need to have some understanding of

what inputs are more common than others, what inputs better represent typical

inputs than others. By contrast, in worst-case analysis, by definition you're

making absolutely no assumptions about where the input comes from. So, as a

result, worst-case analysis is particularly appropriate for

general-purpose sub-routines. Sub-routines that you design. Find without having any

knowledge of how they will be used or what kind of inputs they will be used on. And

happily, another bonus of doing worst case analysis, as we will in this course, it's

usually mathematically much more attractable than trying to analyze the

average performance of an algorithm under some distribution over inputs. Or to

understand the detailed behavior of an algorithm on a particular set of benchmark

inputs. This mathemetical tractabilty was reflected in our Merge Sort analysis,

where we had no a priori goal of analyzing the worst case, per se. But it's

naturally what popped out of our reasoning about the algorithm's running time. The

second and third guiding principles are closely related. The second one is that,

in this course, when we analyze algorithms, we won't worry unduly about

small constant factors or lower order terms. We saw this philosophy at work very

early on in our analysis of merge sort. When we discussed the number of lines of

code that the merge subroutine requires. We first upper-bounded it by 4m plus two,

for an array of length m, and then we said, eh, let's just think about it as 6m

instead. Let's have a simpler, sloppy upper-bound and work with that. So, that

was already an example of not worrying about small changes in the constant

factor. Now, the question you should be wondering about is, why do we do this, and

can we really get away with it? So let me tell you about the justifications for this

guiding principle. So the first motivation is clear, and we used it already in our

merge short analysis. Which is simply way easier mathematically, if we don't have

to, precisely pin down what the [inaudible] constant factors and lower-order terms are.

The second justification is a little less obvious, but is extremely important. So, I

claim that, given the level at which we're describing and analyzing algorithms in

this course, it would be totally inappropriate to obsess unduly about

exactly what the constant factors are. Recall our discussion of the merge

subroutine. So, we wrote that subroutine down in pseudocode, and we gave it

analysis of 4m plus two on the number of lines of code executed, given an input of

length m. We also noted that, it was somewhat ambiguous exactly how many lines

of code we should count it as, depending on how you count loop increments and so on. So

even there it's small constant factors could creep in given the under

specification of the pseudo code. Depending on how that pseudo code gets

translated into an actual program language like C or Java. You'll see the number of

lines of code deviate even further, not but a lot but again by small constant

factors. When such a program is then compiled down into machine code, you'll

see even greater variance depending on the exact processor, the compiler, the

compiler optimizations, the programming implementation, and so on. So to

summarize, because we're going to describe algorithms at a level. That transcends any

particular programming language. It would be inappropriate to specify precise

constants. The precise constants were ultimately determined by more machine

dependent aspects like who the programmer is, what the compiler is, what the

processor is, and so on. And now the third justification is frankly, we're just going

to be able to get away with it. [sound] That is, one might be concerned that

ignoring things like small constant factors leads us astray. That we wind up deriving

results which suggest that an algorithm is fast when it's really slow in practice, or

vice versa. And for the problems we discuss in this course we'll get extremely

accurate predictive power. Even though we won't be keeping track of lower terms and

constant factors. When the mathematical analysis we do suggests that an algorithm

is fast, indeed it will be. When it suggests that it's not fast, indeed that

will be the case. So we lose a little bit of granularity of information. But we

don't lose in what we really care about, which is accurate guidance about what

algorithms are gonna be faster than others. So the first two justifications, I

think, are pretty self evident. This third justification is more of an assertion, but

it's one we'll be baking up over and over again as we proceed through this course.

Now, don't get me wrong. I'm not saying constant factors aren't important in

practice. Obviously, for crucial programs the constant factors are hugely important.

If you're running the sorta crucial loop, you know, your start up's survival depends

on, by all means optimize the constant like crazy. The point is just that

understanding tiny constant factors in the analysis is an inappropriate level of

granularity for the kind of algorithm analysis we're going to be doing in this

course. Okay, lets move on the, the third and final guiding principle. So the third

principle is that we're going to use what's called asymptotic analysis, by

which I mean we will focus on the case of a large input sizes. The performance of an

algorithm as the size N of the input grows large, that is, tends to infinity. Now

this focus on large input size is it was already evident when we interpreted our

bound on Merge Sort. So, how did we describe the bound on Merge Sort? We said,

oh, well, it needs a number of operations proportional, a constant fact or times in

login. And we very cavalierly declared that this was better than any algorithm

which has quadratic dependence of it's running time on the number of operations.

So for example, we argued that merge sort is a better, faster algorithm than

something like insertion sort, without actually discussing the constant factors

at all. So mathematically. We were saying the running time of merge short, which we

know, which we can represent as the function. Six N log base two of N + 6N

is better than any function which has a quadratic dependence on n. Even one with a

small constant like lets say 1/2 N squared which might roughly be the running

time of insertion sort. And this is a mathematical statement that is true if and

only if N is sufficiently large once N grows large it is certainly true that

the expression on the left is smaller than the expression on the right but for small

N the expression on the right is actually going to be smaller because of the smaller

leading term so in saying that merge sort is superior to insertion sort the bias

is that we're focusing on problems with a large N so the question you should have is

that reasonable is that a justified assumption to focus on large input sizes

and the answer is certainly yes. So the reason we focus on large input sizes is

because, frankly, those are the only problems which are even, which are at all

interesting. If all you need to do is sort 100 numbers, use whatever method you want,

and it's gonna happen instantaneously on modern computers. You don't need to know

say, the divide and conquer paradigm, if all you need to do is sort 100 numbers. So

one thing you might be wondering is if, with computers getting faster all the time

according to Moore's Law, if really, it doesn't even matter to think about

algorithmic analysis, if eventually all problem sizes will just be trivially

solvable on super fast computers. But, in fact, the opposite is true. Moore's Law,

with computers getting faster, actually says that our computational ambitions will

naturally grow. We naturally focus on ever larger problem sizes. And the gulf between

an N squared algorithm and an m log n algorithm will become ever wider. A

different way to think about it is in terms of how much bigger a problem size

you can solve. As computers get faster. If you are using an algorithm with a running

time which is proportional to the input size then the computers get faster by a

factor of four then you can solve problems that are factor of four or larger. Whereas

if you are using an algorithm whose running time is proportional to the square

of the input size then a computer gets faster by a factor of four, you can only

solve double the problem size and we'll see even starker examples of this gulf

between different algorithm approaches as the time goes on. So to drive this point

home. Let me show you a couple of graphs. So what we're looking at here, is we're

looking at a graph, of two functions. So the solid function. Is the upper bound

that we proved on merge sort. So this is gonna be 6nlog(base2)n plus 6n. And the

dotted line is an estimate. A rather generous estimate about the running time

of, [inaudible] sort. Namely one-half times N. Squared. And we see here in the

graph exactly the behavior that we discussed earlier, which is that the small

N. Down here. In fact because one-half N. Squared has a smaller leading constant

it's actually a smaller function. And this is true up to this crossing point of maybe

90 or so. Again, beyond n=90. The quadratic growth in the N squared term.

Overwhelms the fact that it had a smaller constant and it starts being bigger than

this other function six of N + six N so in the regime below 90 it's predicting that

the insertion store will be better and in the regime above 90 it's predicting that

merge sort will be faster. Now here's what's interesting let's scale the X axis

let's look well beyond this crossing point of 90 let's just increase it in order of

magnitude up to a raise in size 1500. And I want to emphasize these are still very

small problem sizes. If all you need to do is sort arrays of size 1500 you really don't

need to know Divide-and-conquer or anything else we'll talk about -- that's a

pretty trivial problem on modern computers. [sound]. So what we're seeing

is, that even for very modest problem sizes here, array of, of, size, say 1500.

The quadratic dependence in the insertion sort bound is more than dwarfing the

fact, that it had a lower constant factor. So in this large regime, the gulf between

the two algorithms is growing. And of course, if I increased it another 10X or

100x or 1000x to get to genuinely interesting problem sizes, the gap between

these two algorithms would be even bigger, it would be huge. That said, I'm not

saying you should be completely ignorant of constant factors when you implement

algorithms. It's still good to have a general sense of what these constant

factors are so for example in highly tuned versions of Merge Sort which you'll find

in mny programming libraries. In fact, because of the difference in

constant factors, the algorithm will actually switch from Merge Sort over to

insertion sort, once the problem size drops below some particular threshold, say

seven elements, or something like that. So for small problem sizes, you use the

algorithm with smaller constant factors, and the insertion sort for larger problem

sizes, you use the algorithm and better rate of growth, mainly merge short. So, to

review our first guiding principal is that we're going to pursue worse case analysis.

We're going to look to bounds on the performance, on the running time of an

algorithm which make no domain assumptions, which make no assumptions

about which input of a given length the algorithm is provided. The second guiding

principal is we're not going to focus on constant factors or lower returns, that

would be inappropriate, given the level of granularity at which we're describing

algorithms and third is where going to focus on the rate of growth of algorithms

for large problem sizes. Putting these three principles together, we get a

mathematical definition of a fast algorithm. Namely, we're gonna pursue

algorithms whose worst case running time grows slowly as a function of the input

size. So let me tell you how you should interpret what I just wrote down in this

box. So on the left hand side is clearly what we want. Okay, we want algorithms

which run quickly if we implement them. And on the right hand side is a proposed

mathematical surrogate of a fast algorithm. Right, the left hand side is

not a mathematical definition. The right hand side is, as well become clear in the

next set of lectures. So we're identifying fast algorithms, which, those that have

good asymptotic running time which grows slowly with the input size. Now, what

would we want from a mathematical definition? We'd want a sweet spot. On one

hand we want something we can actually reason about. This is why we zoom out and

squint and ignore things like constant factors and lower terms. We can't keep

track of everything. Otherwise we'd never be able to analyze stuff. On the other hand

we don't want to throw out the baby with the bath water, we want to retain

predictive power and this turns out this definition turns out for the problems

we're going to talk about in this course, to be the sweet spot for reasoning about

algorithms okay worst case analysis using the asymptotic running time. We'll be able to

prove lots of theorems. We'll be able to establish a lot of performance guarantees

for fundamental algorithms but at the same time we'll have good predictive power what

the theory advocates will in fact be algorithms that are known to be best in

practice. So, the final explanation I owe you, is, what do I mean by, the running

time grows slowly, with respect to the input size? Well, the answer depends a

little bit on the context, but, for almost all of the problems we're going to

discuss, the holy grail will be to have what's called a linear time algorithm, an

algorithm whose number of instructions grows proportional to the input size. So,

we won't always be able to achieve linear time, but that's, in some sense, the best

case scenario. Notice, linear time is even better than what we achieved with our

merge short algorithm for sorting. Merge short runs a little bit superlinear, it's

n times log n, running as the input size. If possible, we. To be linear time. It's

not always gonna be possible, but that is what we will aspire toward. For most of

the problems we'll discuss in this course. Looking ahead, the next series of videos

is going to have two goals. First of all, on the analysis side, I'll describe

formally what I mean by asymptotic running time. I'll introduce "Big Oh"

notation and its variants, explain its mathematical definitions, and give a

number of examples. On the design side, we'll get more experience applying the

divide and conquer paradigm to further problems. See you then.