We now start our mathematical journey by looking at some properties of groups of pixels in the spectral domain. As noted in the last lecture, we now assume you have a background in vectors and matrices. If not, we will summarize the key points geometrically and descriptively, so that the key results should be clear. We are going to look at three important spectral space concepts in this lecture; the mean vector, the covariance matrix, and the concept of correlation. Because we'd like to talk about where pixels are located in the spectral domain, it is useful to have measures of where they are most likely to be found and how they are spread about that location. We start with the concept of the mean vector, which is just the average or mean value of a group of pixel vectors, as illustrated on this slide. We denote the mean vector by the symbol m. It is calculated by taking the mathematical average of the k pixel vectors. Another term for average is expectation. Note we have introduced the expectation operator, which is just the calculation of the average as shown. Here we show a very simple calculation of the mean of two pixel vectors. Note that the calculation is performed separately for each element of the pixel vector. Coming back to the concept of expectation to position m is where we most expect to find the pixel from that group. The mean vector tells us the average position of the group of pixels in the spectral domain. We would now like to describe how they spread about that mean position. That is a concept we're familiar with from statistics, where we use the concept of variance or standard deviation for that purpose. Because we are dealing with data of a dimensionality, the simple concept of variance does not apply, but we can develop a multidimensional equivalent. This is called the covariance matrix, and is again defined in terms of an expectation using the formula shown on the slide. If you look carefully, you can see that it is very similar in structure to the formula for variance in a one-dimensional case. Let's explore this a bit further. First though, note that the denominator term in the average calculation is k minus 1, and not just k. That gives a better estimate. First, we subtract the mean vector from each pixel vector as shown in the center of the slide, just like we take the difference from the mean in a one-dimensional variance calculation. We then need to square the mean difference, which is the role of the right-hand term, but it is turned into a row vector by the transpose operation shown. In the next slide, we will see what that does to the product. We then take the average or expected value of all the mean differences squared. The end result is a new construction called a matrix, represented by the uppercase letter C. We have added the subscript x to the symbol for the covariance matrix, since it relates to calculations where the pixels are described by x. Later on, we will encounter other coordinate systems, and will therefore use other subscripts. This slide shows the result of multiplying a column vector by a row vector, where the row vector appears on the right-hand side of the column vector. This gives a square array of numbers called a matrix, in which the elements are the result of the rules for multiplying vectors. The dimensions of the matrix are equal to the number of bands in the image data. A different result will be obtained if the order of the vectors is changed. We will see an example of that later in the course. To see how all of this works, we will now consider a simple example. Here we have a two-dimensional data set at points that are distributed around the space in a squashed circle arrangement. They will be the pixel brightness values of an image with just six pixels. A set of hand calculations are shown, including the mean vector, the steps to calculate the covariance matrix, and the covariance matrix, which results. Note that the covariance matrix has zeros in the upper right and lower left positions, and only has non-zero values down the diagonal entries of the matrix. Those diagonal values are respectively the individual variances of the data points in the horizontal abscissa and vertical ordinate directions. We have just talked about the diagonal of a matrix. By definition, that is the set of entries that run from the upper left-hand side to the lower right-hand side of a matrix. This particular covariance matrix is called diagonal because it has zero entries everywhere, except down the diagonal. This slide shows that the diagonal elements of the covariance matrix and the correlation matrix to follow relate to individual bands. Whereas the off-diagonal elements describe the relationship between one band and another. Because of the zeros in this case, there is no relationship between the bands. Now consider another two-dimensional data set. Here the data points seem to be spread in an elliptical pattern at an angle to the axis. The hand calculations for this data set show that the covariance matrix is non-diagonal. That is, it has non-zero entries everywhere. At this stage, it is important to define the correlation matrix, whose elements are computed from those of the covariance matrix as shown. They are the equivalent covariance matrix elements divided by the square root of the product of the two covariance elements on the diagonal, one on the same row and one on the same column as the entry under consideration. Let's see what the correlation matrices looked like for our two data sets. Here are the two data sets compared, including their covariance and correlation matrices. Note that the correlation matrices always have one for all of their diagonal entries. You can see why that is the case from the definition of the previous slide. It tells us that the pixel brightness values in a given band are fully correlated with themselves. Secondly, for the left-hand data set, there are zeros in the off-diagonal entries, implying that there is no correlation between the corresponding two bands of data. What does that mean? Put simply, it means that knowing the brightness of a data point or pixel in one band, we cannot, with any degree of certainty, predict what its brightness is likely to be in the other band. If we examine the right-hand data set though, the off-diagonal terms of its correlation matrix are non-zero and imply there is about 76 percent correlation between the bands. In other words, if it is brought in one band, it is, with about 76 percent certainty, likely to be brought in the other band. That is because the way the data is scattered largely about a line at an angle to the data axis. As reminder, for the first data set, the points do not distribute other than as parallel to the axis. There is no correlation between the bands, and one cannot say that the pixel will be brought in one band if it is brought in the other. This is an essential but simple dot point summary of the important elements from this lecture. Note the last particularly, because it is essential to what we are going to do next. Note that the last question relates the dimensionality of the two matrices and the mean vector to the number of bands recorded by a particular sensor.