Now, let's see in more details how the Krustal algorithm works. Assume that we have a connected graph with N nodes and connection weights DIJ. This graph will have N times N minus one divided by two links. We want to find N minus one shortest links in this graph, such that we can visit all nodes on the graph following these N minus one links and without forming loops. A graph connection, this N minus one nodes with shortest links, is called the minimum spanning tree of the graph. And what the Kruskal algorithm does is find the minimum spanning tree. It's working can be best explained as the problem of calling edges of a graph. First, we find the minimal weight in the graph and color it in any color, for example, red. Second, we find the minimum uncolored edge that does not cross the colored or red circle. This edge is marked by a new color. Then we repeat the second step until all vertices on the graph are connected. The red edges resulting from such procedure form the minimum spanning tree. Let's illustrate the working of the Kruskal algorithm using an example from Wikipedia. Here, we have a weighted graph with five nodes, all weights of edges are shown in the graph and also in the table. The smallest weight is for the edge AE. So we paint this edge red. The next smallest edge is CD. So we painted in blue. Now, the next smallest edge is AB. Because we already painted connected edge AE in red, we paid edge AB in red as well. Now, the next smallest edge would be BE, but we can't connect BE as doing this would create a loop on the graph. So we pass on it and select the next smallest edge, which would be BC. Now, connecting these two points, merges two previous clusters, BE and CD, which cover all points on the graph. Therefore, all these points become now connected and colored in red. There are the edges between them form the minimum spanning tree for these graph. Now, please note the serving compression of data when building the minimal spanning tree or the MST for short. While the original graph contains N times N minus one divided by two weights. The MST compresses these to only N minus one weights. Now, the MST just formed can be used to de-noise the empirical correlation matrix D, and hence the empirical correlation matrix C. This is done by simply replacing all pairwise distances DIJ by a maximum distance along single steps in the path from node I to node J on their minimum spanning tree. Let's see how this prescription works in our simple example. We have already found the MST for this example, which is shown in red on the graph. Now the above prescription replaces three weights, BE, EC, and ED, by new weights computed from the MST. The new distance DIJ up less computed in this way is called the subdominant ultrametric distance. Now, what is it? The ultrametric distances are distances that satisfy all requirements to distance except the triangle inequality. And the subdominant ultrametric is the largest ultrametric among those that are less or equal to DIJ. So it means that the new corrected distance JJ plus is always equal or smaller than the original distance. And the example above shows exactly this. Now, techniques of clustering-based correlation matrix filtering both apply to the problem of portfolio optimization among others by Pantaleo and others in the paper data by 2010. They looked at an empirical correlation metrics for the US stocks and applied different filtering techniques to these metrics. Then they compared the realized risk for Markowitz-optimal portfolios formed with this filtering matrices. The alternatives used included the simple correlation matrix itself, different versions of the random matrix theory-based filtering, as well as the minimum spanning tree and other graph-based methods. What they found is that the results strongly depend on the ratio of the observation lengths T to the number of stocks N and whether short cells are allowed. If the ratio T to N is larger than one, both the RMT and MST-based methods perform well, and lead to substantial improvements in comparison to the baseline case, which would be based on the sample correlation metrics. On the other hand, if short cells are not allowed or when the ratio of T to N is less than one, no significant improvement was found. If you want to know more details of these experiments, please consult the original paper. Well, here, we want to next briefly discuss probabilistic clustering methods. As always, let's do it after we answer a control question for this video.