And when we look at the absolute values of each of these different terms,

we can see how much weight each feature represents.

So it looks like flavanoids and

non-flavanoid phenols represent the most variance in our data.

Now the question is how many eigenvectors should we keep?

Which eigenvectors should we keep to maintain the most amount of data from our

original data set?

because that is the whole point of pca, right?

We want to reduce dimensionality of our data.

But we have to figure out how much we want to reduce it to.

So to do this, we're going to look at our eigenvalues.

And we're going to quantify how much variance each vector represents.

So we're going to go ahead and sum all the values up, and

then calculate the percentage of the total for each value.

And then we'll use the cumulative sum function to progressively add up each of

these percentages.

And they should obviously add up to 100% at the end.

And then real quick, we're going to go ahead and

get a variable that tells us how many dimensions we have of our data.

This will be useful for a bunch of plotting functions.

And now we can just very simply go ahead and plot our cumulative sum array.

And now we can use this graph to tell us how many principal components

we should keep.