We now turn to the topic of Unsupervised Classification, in which we are still interested in thematic mapping. But without the benefit of having labeled training data available beforehand, we won't develop the topic in this series of lectures based on the procedure called clustering. Indeed, most of what we will talk about concerns clustering algorithms. But we will present some unsupervised clustering examples later in the lectures. We will make unsupervised clustering or unsupervised classification, again, in Module 3 of the course. We start by being confronted with the situation in which now obvious training data is available. Yet, we still want to do thematic mapping of remote sensing image data. Cluster Analysis forms the backbone of what we are going to do. Clustering looks for groups of similar pixels assessed on the basis of this spectral properties. The groups we're searching for are in fact clusters in spectral space. We can often identify the pixel labels produced by clustering through the use of spatial clues in the image and by using the cluster means as surrogates for spectral reflectance information. We will see that in the examples to follow. Reference data like maps and air photos also give us hints as to what the class is in a cluster map might represent. Here we look at unsupervised classification as a two-stage process in which clustering takes us from the spectral domain to a map of symbolic labels. The challenge for the analyst is to turn those symbols into meaningful ground cover labels. We get the cluster map in the same way we get a thematic map in supervised classification. Here the clustering algorithms place pixels into clusters based on spectral similarity. We then assign symbols to the clusters and use those symbols in place of the pixel itself in the image that's producing a cluster map. By examining those pixels and their spatial layout, the analyst turns the symbols into ground cover class labels. Clustering algorithms place pixels into clusters based on their similarity, as we said before. As we indicated in the previous slide, the most common measure of similarity is based on the spectral measurements of the pixels. Two pixels with very similar measurement vectors are likely to belong to the same class and thus the same cluster. The simplest way of assessing similarity is to use a metric which measures the spectral distance between pixels. The most common of which is the Euclidean distance between the pixels in spectral space. For in-band data, Euclidean distance matrix is d as a function of X1 and X2, which is given as the norm of the difference between the two vectors X1 and X2 and which in fact, in the last expression is the square root of the sum of the squares that we are most familiar with. There are other distance metrics in use, including the more efficient but perhaps less accurate city-block distance, defined as D, X1 and X2 being the sum just of the absolute differences between each of the elements of the pixel vectors. It is similar to walking between two locations in the city where the streets are laid out on a rectangular grid. It is sometimes called the Manhattan distance for that reason. But in these lectures, we will concentrate on Euclidean distance. The set of clusters for a given image is not unique, even though we have a metric for spectral similarity. Here we see two plausible clusterings of eight pixel vectors in a two-dimensional spectrum space. Which one is correct? To help assess which set of clusters best represents the pixels in an image, we need a quality of clustering criterion. A common one at which is the sum of squared error measure or SSE, with the formula shown here. The SSE checks the distances of all the pixels in a given cluster from the cluster mean and then sums those distances within that cluster. It does so for all clusters and then sums the results. In other words, it is an accumulative measure of distances of the pixel vectors from the cluster means. The smaller it is, the better. Since then the clusters are compact. Other quality of clustering measures are possible. Some look at the average compactness of clusters compared with the average distances among them. In principle, we should be able to develop a clustering algorithm by minimizing the SSE for a given data set. But that turns out to be impractical since it would require examining an enormous number of candidate clusterings of the available data to find that which has the smallest SSE. In practice, some heuristic methods have been developed that work well, two of which we will look at here. The first is called the k-means or migrating means algorithm. It is perhaps the most commonly used, particularly for remote sensing problems. It asks the user first to specify beforehand how many clusters to search for, and secondly, to specify a set of initial cluster mean vectors. Clusters are located in spectral space by their mean vectors. The algorithm starts by the user guessing the set of cluster centers, initially. The image pixels are then assigned to the cluster of the closest mean, after which the set of means is re-computed. The pixels are then assigned to the nearest of the new set of means and so on until the means and the assignments do not change. This slide shows algorithmic-ally the steps in the k-means approach. In the first two steps, clustering is initiated by specifying a starting set of cluster mean vectors, both in number and position. In Step 3, all the available pixel vectors are then assigned to the cluster of the nearest mean. The mean vectors are then re-computed in Step 4. Then a new assignment of pixel vectors is carried out based on the re-computed means. The means are then re-computed again. During this process, pixels will often move between clusters, iteration by iteration, because of the change in positions of the means. Ultimately, we expect that the stage will be reached where the pixels do not migrate any further between the clusters, say that the situation is stable and we conclude that the correct set of clusters has been identified. That is checked by seeing whether the full set of mean vectors no longer changes between iterations. In practice, we may not be able to wait until full convergence has been achieved and instead, we stopped clustering when a given high percentage of the pixel vectors no longer shifts between cluster centers with further iteration. Before we implement the K-Means technique in the next lecture, we hear text stock of what we are trying to achieve. Our ultimate goal in many remote sensing situations is unsupervised classification, which we pursued through the application of clustering techniques. Even though we haven't yet seen an example of the application of the k-means approach, we can still think about some practical matters concerning its operation and especially how we choose the initial cluster centers. That choice will affect the speed of convergence of the algorithm and the actual cluster set found.