[MUSIC] So one way to compactly represent the results of hierarchical equestrian are through something called a dendrogram. And we're going to explain the dendrogram in the context of agglomerative clustering, even though this type of representation can be used for other hierarchical equestrian approaches as well. So the dendrogram representation is as follows. Along the x axis, we're going to place each one of our data points, but carefully ordered so that we get a very nice visual coming out of this and then along the y-axis, what we're indicating is the distance between different clusters. So more specifically, when we're thinking about single linkage, where we're looking at the minimum distance between the set of points in any pair of clusters, then what the height of a specific merge point is going to represent is the distance between those two clusters. So let's describe this a little bit more where if we look at a given merge point, and below that we have two separate trees, each one of those trees is going to represent a different cluster. So in this example, we have this blue cluster, so all these points that are colored in blue are data points that are assigned to this blue cluster at some iteration of this algorithm and likewise, we have this green cluster. And then the height of the merge between this blue branch and this green branch indicates what was the minimum distance between any point in that blue cluster and any point in that green cluster. And so that minimum distance is where we're going to put that merge of the blue branch with the green branch, so that specifies the height along the y-axis. And so, throughout this tree, we're seeing the different clusters that are present and the distances between these clusters as different merges were made throughout the algorithm, going from every data point being in its own cluster, all the way to all the data points being in one cluster. And likewise, if we look at any path down this dendrogram, what that indicates is the cluster membership of a given data point throughout all the different merge steps. So what cluster that data point belonged to and the sequence of merges made for those clusters. So in summary, we see that the dendrogram is able to capture all the critical elements of the hierarchical clustering result. So what are the different cluster memberships? Which cluster is merged with which? And what were the distances between those clusters at the point at which they merged? Well, one important thing we can think of doing from the dendrogram is extracting a partition. Because remember, if we run agglomerative clustering from beginning all the way to the end, we're starting with all data points in separate individual clusters and then ending with all data points in one cluster. And what we really want is we want to produce some clustering of our data points, that's somewhere in between. Not at the granularity of every point being its own cluster and not at this very coarse granularity of everything being one cluster. So this leads to a question of how do we extract some partition? How do we define a set of clusters to produce from this hierarchical clustering procedure? And one really simple approach is to perform a cut along the y-axis of the dendrogram. Then every branch that crosses this line that we chose is going to define a separate cluster. So in this example, we see we have this fuchsia cluster, blue, green, orange, and gray clusters. But remember in this visualization, each of these data points, this just represented different data indices in our data set. But what we can do is we can go back to our original feature space and visualize what the resulting clusters look like. So if our data were just in 2D, like all the examples we've been given, because of our nice 2D slide representation, this might be the clustering associated with the cut that we showed on the previous slide here. And as we think about varying where this cut point is, we're looking at different possible clusterings of our dataset through different granularities, going from really, really fine granularity all the way up to very coarse granularities. So let's spend a little time thinking about what it means to cut the dendrogram at some level D*. Well what we're saying then is that, for the resulting clusters produced by this cut, there are no pair of clusters at a distance less than D* that have not already been merged. So that means that D* is the minimum distance between our clusters at this level of the clustering. [MUSIC]