0:00

So, let's work through some examples with using matrix decomposition.

So I'm going to use the swiss data set, and define my y as Fertility, and

I'm going to define my x as everything in this matrix except the Fertility.

So just let's look at head of x, and you'd see it has Agriculture,

Examination, Education, Catholic, and Infant Mortality.

Let me just, for convenience define n is the number of row of x.

So probably the easiest way to do to

get at this things is to do a so-called principle components.

There's a princomp function in r, so here's the princomp function.

0:48

I'm putting cor = TRUE here, and the reason is because these the,

I'm not so sure about the units of these measurements, and so

I want them to be comparable.

So, if we do decomposition on the correlation matrix,

that's a little bit more comparable across variables than a raw decomposition.

So for example if I were to plot the cumulative some

of the eigenvalues versus their overall sum,

that's the percentage of variation explained by that component.

So, 1 principle component explains just under 60% of the variation in the data.

2 principle components, so that's including the first and

the second principle component explains over 70% of the variation.

3 principle components almost explains 90% of the variation.

4 explains close to 100%.

And then 5 explains 100% because 5 at that

point is just a reorganization of the matrix x.

1:57

So, let me just show you.

So remember that our principal components is the decomposition of

the correlation matrix,

in this case it's either the correlation matrix or the covariance matrix.

So in this case, again,

we did the correlation matrix just to get rid of the units of the variables.

So we could do it directly by taking decomp,

by taking eigen of the correlation matrix.

So cor of x here, just calculates the correlation matrix of the x, so

a 5 by 5 matrix in this case.

And eigen just calculates the eigenvalue decomposition of that.

So if I look at, for example,

names decomp2 it gives me my values and my vectors.

So for example, if I were to do diag of decomp2.

2:58

Vectors times decomp2, so

if I take the v transposed v that's

just give me a vector of 1s.

So, that's just saying that v transposed v, the diagonal of that is 1s,

we know that all the off diagonals are 0s.

That's just reminding us the v transposed v,

the v is an orthonormal basis for our 5 in this case.

3:30

So, another way to do exactly the same thing is through the singular value

decomposition.

First, we need to normalize our columns.

So here I just go over the columns and replace them with their normalized values,

subtracted off the mean and divided by the standard deviation.

And then I directly take the singular value decomposition.

Now, decomp was the principal component decomposition,

decomp2 was the eigenvalue decomposition of the correlation matrix.

And decomp3 was the singular value decomposition of the normalized x matrix.

So decomp3, let me show you the names and variables it returns.

4:08

So, it decomp3 gives me three variables.

D, the singular values.

U, the left single vectors.

V, the right singular vectors.

So just to show that these are all getting at the same thing.

Let me combine the eigan vectors from the eiganvalue decomposition

of the correlation matrix, and

I'll do it with are called the loadings from the printcomp function.

And then I'll do it,

I'll also grab what is the v matrix from the singular value decomposition.

And here are those three quantities.

Notice like, take for example this first column, Comp.1, Comp.2,

Comp.3, Comp.4, 0.524, 0.258, and so on.

Then if I go down to the output from printcomp, 0.524, 0.258, -0.003.

There's no reason, by the way, that the signs should have to agree in this case.

I think we're just fortunate that we did because they're using

underlying the same exact numerical libraries.

But there's no reason that there's a sign in variance to all these decompositions.

And then of course, the singular value decomposition gives us the same things,

and then if we go through the second row.

It's the same thing, again.

So, what we see is that, the three different approaches are giving us

the same v matrix, and then if I were to plot my eigenvalues.

Here's the first row are the eigenvalues from

the eigenvalues decomposition of the correlation matrix.

The second one are the eigenvalues from the printcomp function that what calls

those standard deviations, so we have to square them to get the eigenvalues.

6:02

And then the third one are the squared singular vectors.

Now, I had to divide by n-1 because, remember that there was this idea that

when I wrote the formulas up on the board, I was omitting that one over n-1.

Now, let's look at the scores.

So, I'm going to take these singular value decomposition scores and

plot them versus the scores from the princomp function, and

if you look of course, they perfectly agree.

Now, notice the scale is off here.

And again, that's that n-1 factor that when we take the correlation,

it's divided by n-1, whereas when we just take x transposed x,

we have not divided by n-1.

So that's what's representing this change in scale.

The other thing is we could manually calculate the u matrix from the eigenvalue

decomposition of the correlation matrix.

So we need to take the normalized x factor, multiply it,

times the eigen vectors from that decomposition, and then multiply it,

times 1 over the square root of the eigenvalues.

7:12

Now, let's show you how you might use these things.

Let's just take the singular vectors from the singular value decomposition, and

I'm going to just take, let's say the first four of them.

So, I've omitted one variable.

And I'm going to try doing a linear model with my y and

with my u as my set of predictors, and look at the r squared.

It's about 70%.

Then I'm going to compare that through a linear model using all the xs,

which in this case, gives me about 70% as well.

So by virtue of removing that extra variable,

7:52

we have basically found the linear combination of our regressors

that retains the most amount of variability in our xs.

And so we can remove our regressor and

still have just about the same amount of variation as explained in our y.

And then the last thing I want to show you is,

so if I were to do a summary lm(y tilde u).

We see these estimates, 70, 57, 26, -16, 24.

Because u is orthonormal, I can just do t of u,

star y, and that should give me the same things.

57, 26, -16, 24, in this case the intercept is

taken care of in u because it's orthogonal

to the intercepts we've centered the x before we had done anything.

8:55

So at any rate that's how we start using principal component decompositions.

And more than anything, I know this is probably a little bit far along relative

to where the class should be.

At this point, more than anything what I'd like for

people to have gotten out of these series of lectures,

is how convenient things kind of work out when we have orthonormal basis in

linear regression and how nice the calculations are.

And then second, in specific, I hope you might a little bit of the language

of singular value decompositions and

principle component decompositions will have rubbed off on you.

And also to show you via the computing that none of this stuff is magical.

The printcomp function is just a collection of singular and

eigenvalue decompositions that we could do manually, and

the printcomp function just makes it a little bit easier.

So I looking forward to seeing you in the next class.