We now come to a particularly important concept that you will encounter time and time again in remote sensing image processing. We will see it has many uses, from classification and the general representation of images on a display system through to highlighting changes that have occurred with time. It is called the principal components transformation, PCT, or principal components analysis, PCA. All remote sensing image processing software packages will include a module for computing this transform. Depending on your background, you may find some of the mathematics here a bit complicated. However, once we have been through all of that, we will summarize the essential steps towards computing the transform. They are straightforward and easy to apply in practice. When we apply the principal components transform, we generate a new set of bands with which to describe the image. The pixel brightness values in these new bands turn out to be weighted linear combinations of the original pixel brightness values. There are a number of image transformations of this nature that will be encountered in image processing, including the so-called Fourier transform, the wavelet transform, and band arithmetic. We will only treat the principal components transform in these lectures and make some reference to band arithmetic, in which the elements of pixel vectors from two different images are added, subtracted, or divided. To start us thinking about principal components, consider again the two data sets from the previous lecture. One has no correlation between its bands or axes, the other has high correlation. Note the last comment on the slide. Our correlation matrices will always be diagonal matrices of the same dimensionality as the number of bands recorded by the sensor, with unity for each diagonal element. Although not important here, that type of matrix is called an identity matrix. We commence our development of the principal components transform by asking whether we can take a correlated data set and somehow transform it into a new set of coordinates in which the data exhibits no correlation. Consider data that is scattered in the form of a shaded ellipse shown in this slide. This is a bit the correlated set from our two examples. In the original coordinates, that is, as recorded by the sensor, the data is highly correlated. Bright pixels in one band tend to be bright in the other, and so on. But if we rotated our coordinates anti or counterclockwise, we'd see that there is a rotation angle, shown here as the y coordinates, that corresponds to a coordinate system in which the pixels in the data set are uncorrelated. In the language of vector and matrix analysis, the new coordinates can be generated from the old by the equation shown on the slide. The values of the g, i, j, etc., define the actual extent of the rotation. What we have to do is find those values so that the correct degree of rotation is given. We can write the matrix equation in the symbolic shorthand notation shown on the slide. Whereas vectors are written as bold lowercase, matrices are written and spelled uppercase. If you don't know how to multiply vectors and matrices, please consult some standard treatments or be prepared to follow the remainder of this lecture as best you can, even though it is addressed to the specific case of rotating axes. Here is the really important point, we define the new axes as those in which the data shows no correlation. Or in other words, in which the new covariance and correlation matrices are diagonal, that is, they have zeros everywhere except down the diagonals. Since we are interested in the covariance matrix, let's examine how it appears in the new rotated y coordinate system. This slide shows the standard definition for the covariance matrix that we have met in the last lecture, but we have added the y subscripts to remind us we are dealing in that coordinate space. The main vector in the y coordinates is easily related to its value in the x coordinates via the transformation matrix G, as shown. We can substitute that into the formula for the covariance matrix to get the very important expression in blue on the bottom right-hand side of the slide. To get to that point, we had to use some properties of matrices and vectors. If you do not have that background, just accept the last expression, noting that the transpose of a matrix flips it about its diagonal. The equation we last derived expresses the covariance matrix in the new y coordinate system in terms of the covariance matrix in the original x coordinate system. The two are linked by the matrix G and its transpose. Remember, when we started this analysis, we had the objective that the data would be uncorrelated in the y coordinates. Or in other words, the covariance matrix in the y coordinates would be diagonal. That constraint allows us to recognize this equation as the so-called diagonal form of the original covariance matrix. That tells us immediately what the elements of the matrix G are and also gives us the diagonal entries of the white space covariance matrix, remembering that the off diagonal entries are all zero. Each matrix has a set of properties call its eigenvalues and eigenvectors. We will see how to get them in the next lecture. Usually there are as many of each as the dimensionality of the matrix. The eigenvalues are just scalar quantities, which were called lambda in the slide, that can be rank ordered from largest to smallest, as seen at the bottom of the slide. They are the diagonal entries of C subscript y, rank ordered from largest to smallest. The eigenvectors are a corresponding set of vectors. The matrix G transpose consists of the set of eigenvectors so that G itself is a transpose matrix of eigenvectors. It turns out that the eigenvalues of the covariance matrix in the original set of bands are the variances of the brightnesses in the new axes. Remember, these axes are the brightness values of the image pixels in the new set of bands. Those new bands are called the principal components of the original image. We generate the actual principal component pixel brightness values using the elements of the eigenvectors of the original covariance matrix. In the next two lectures, we will demonstrate how all of that is done using first a model example and then a real set of images. Those examples will also answer the question as to why it is good to have low correlations among the bands of data. This summary just reminds us that the steps in principal components analysis are simple, provided we have the image processing software available. These questions require an understanding of how to multiply vectors and matrices. They provide useful guidance for material to come.