0:00

In this video I like to tell you about the idea of Vectorization.

So, whether you using Octave or a similar language like MATLAB or

whether you're using Python [INAUDIBLE], R, Java, C++,

all of these languages have either built into them or have regularly and

easily accessible difference in numerical linear algebra libraries.

They're usually very well written, highly optimized,

often sort of developed by people that have PhDs in numerical computing or

they're really specialized in numerical computing.

And when you're implementing machine learning algorithms, if you're

able to take advantage of these linear algebra libraries or these numerical

linear algebra libraries, and make some routine calls to them rather than sort of

write code yourself to do things that these libraries could be doing.

If you do that, then often you get code that, first, is more efficient, so

you just run more quickly and

take better advantage of any parallel hardware your computer may have and so on.

And second, it also means that you end up with less code that you need to write,

so it's a simpler implementation that is therefore maybe also more likely to be

by free.

And as a concrete example, rather than writing code yourself to

multiply matrices, if you let Octave do it by typing a times b,

that would use a very efficient routine to multiply the two matrices.

And there's a bunch of examples like these, where if you use appropriate

vectorization implementations you get much simpler code and much more efficient code.

Let's look at some examples.

1:33

Here's our usual hypothesis for linear regression, and

if you want to compute h(x), notice that there's a sum on the right.

And so one thing you could do is, compute the sum from j = 0 to j = n yourself.

Another way to think of this is to think of h(x) as theta transpose x,

and what you can do is, think of this as you are computing this inner product

between two vectors where theta is your vector, say, theta 0, theta 1, theta 2.

If you have two features, if n equals two, and

if you think x as this vector, x0, x1, x2, and

these two views can give you two different implementations.

Here's what I mean.

Here's an unvectorized implementation for how to compute and by unvectorize,

I mean without vectorization.

We might first initialize prediction just to be 0.0.

The prediction's going to eventually be h(x), and then I'm going to have a for

loop for j=1 through n+1, prediction gets incremented by theta(j) * x(j).

So it's kind of this expression over here.

By the way, I should mention,

in these vectors that I wrote over here, I had these vectors being 0 index.

So I had theta 0, theta 1, theta 2.

But because MATLAB is one index, theta 0 in that MATLAB, we would end up

representing as theta 1 and the second element ends up as theta 2 and

this third element may end up as theta 3, just because our vectors in

MATLAB are indexed starting from 1, even though I wrote theta and

x here, starting indexing from 0, which is why here I have a for loop.

j goes from 1 through n+1 rather than j goes through 0 up to n, right?

But so this is an unvectorized implementation in that we have for

loop that is summing up the n elements of the sum.

In contrast, here's how you would write a vectorized implementation,

which is that you would think of a x and theta as vectors.

You just said prediction = theta' * x.

You're just computing like so.

So instead of writing all these lines of code with a for loop,

you instead just have one line of code.

And what this line of code on the right will do is,

it will use Octaves highly optimized numerical linear algebra routines to

compute this inner product between the two vectors, theta and X, and not only is

the vectorized implementation simpler, it will also run much more efficiently.

4:54

In contrast, using a good numerical linear algebra library in C++,

you can instead write code that might look like this.

So depending on the details of your numerical linear algebra library,

you might be able to have an object, this is a C++ object, which is vector theta,

and a C++ object which is vector x, and you just take theta.transpose * x,

where this times becomes a C++ sort of overload operator so

you can just multiply these two vectors in C++.

And depending on the details of your numerical linear algebra library,

you might end up using a slightly different syntax, but

by relying on the library to do this inner product,

you can get a much simpler piece of code and a much more efficient one.

5:40

Let's now look at a more sophisticated example.

Just to remind you, here's our update rule for

a gradient descent of a linear regression.

And so we update theta j using this rule for all values of j = 0, 1, 2, and so on.

And if I just write out these equations for theta 0, theta 1,

theta 2, assuming we have two features, so n = 2.

Then these are the updates we perform for theta 0, theta 1, theta 2, where you might

remember my saying in an earlier video, that these should be simultaneous updates.

6:20

Here are my same three equations written in a slightly smaller font, and

you can imagine that one way to implement these three lines of code is to have a for

loop that says for j = 0, 1 through 2 to update theta j, or something like that.

But instead, let's come up with a vectorized implementation and see if

we can have a simpler way to basically compress these three lines of code or

a for loop that effectively does these three steps one set at a time.

Let's see if we can take these three steps and

compress them into one line of vectorized code.

Here's the idea.

What I'm going to do is,

I'm going to think of theta as a vector,

and I'm gonna update theta as theta-

alpha times some other vector delta,

where delta's is going to be equal to 1 over m,

sum from i = 1 through m.

And then this term over on the right, okay?

So, let me explain what's going on here.

7:31

Here, I'm going to treat theta as a vector, so

this is n plus one dimensional vector, and

I'm saying that theta gets here updated as that's a vector, Rn + 1.

Alpha is a real number, and delta, here is a vector.

So, this subtraction operation, that's a vector subtraction, okay?

Cuz alpha times delta is a vector, and so

I'm saying theta gets this vector, alpha times delta subtracted from it.

So, what is a vector delta?

Well this vector delta, looks like this, and

what it's meant to be is really meant to be this thing over here.

Concretely, delta will be a n plus one dimensional vector, and

the very first element of the vector delta is going to be equal to that.

So, if we have the delta, if we index it from 0,

if it's delta 0, delta 1, delta 2, what I want is

that delta 0 is equal to this first box in green up above.

And indeed, you might be able to convince yourself

that delta 0 is this 1 of the m sum of ho(x),

x(i) minus y(i) times x(i) 0.

So, let's just make sure we're on this same page

about how delta really is computed.

Delta is 1 over m times this sum over here, and what is this sum?

Well, this term over here, that's a real number,

and the second term over here, x i,

this term over there is a vector, right,

because x(i) may be a vector that would be,

say, x(i)0, x(i)1, x(i)2,

right, and what is the summation?

Well, what the summation is saying is that,

this term, that is this term over here,

this is equal to, (h of(x(1))-

y(1)) * x(1) + (h of(x(2))-

y(2) x x(2) +, and so on, okay?

Because this is summation of i, so as i ranges from i = 1 through m,

you get these different terms, and you're summing up these terms here.

And the meaning of these terms, this is a lot like if you remember actually

from the earlier quiz in this, right, you saw this equation.

We said that in order to vectorize this code we will instead said u = 2v + 5w.

So we're saying that the vector u is equal to two times the vector v

plus five times the vector w.

So this is an example of how to add different vectors and

this summation's the same thing.

This is saying that the summation over here is just some real number, right?

That's kinda like the number two or some other number times the vector, x1.

So it's kinda like 2v or say some other number times x1, and

then plus instead of 5w we instead have some other real number,

plus some other vector, and then you add on other vectors, plus dot,

dot, dot, plus the other vectors, which is why, over all,

this thing over here, that whole quantity, that delta is just some vector.

11:47

So, I know that there was a lot that happened on this slide, but again,

feel free to pause the video and if you aren't sure

what just happened I'd encourage you to step through this slide to make

sure you understand why is it that this update here with this definition of delta,

right, why is it that that's equal to this update on top?

And if it's still not clear, one insight is that, this thing over here,

that's exactly the vector x, and so

we're just taking all three of these computations, and compressing them into

one step with this vector delta, which is why we can come up

with a vectorized implementation of this step of the new refresh in this way.

12:37

So, I hope this step makes sense and do look at the video and

see if you can understand it.

In case you don't understand quite the equivalence of this map,

if you implement this, this turns out to be the right answer anyway.

So, even if you didn't quite understand equivalence,

if you just implement it this way, you'll be able to get linear regression to work.

But if you're able to figure out why these two steps are equivalent,

then hopefully that will give you a better understanding of vectorization as well.

And finally, if you are implementing linear regression using more than one or

two features, so sometimes we use linear regression with 10's or 100's or

1,000's of features.

But if you use the vectorized implementation of linear regression,

you'll see that will run much faster than if you had, say, your old for

loop that was updating theta zero, then theta one, then theta two yourself.

So, using a vectorized implementation, you should be

able to get a much more efficient implementation of linear regression.

And when you vectorize later algorithms that we'll see in this class,

there's good trick, whether in Octave or some other language like C++, Java,

for getting your code to run more efficiently.