Now, after we introduced various clustering techniques, we are ready to apply them to some interesting problems in finance. And here, I suggest we take a look at one of the most fundamental problems for many areas from finance. We are looking to estimation of Assets Correlation Matrices, and more specifically, Equity Correlation Matrices. The problem of estimating equity correlations from data is very important in such areas of finances, quantitative trading, asset management, and systematic risk monitoring. Now, to explain the problem, let me set the limitation first. Assume that we have a set of normalized log-returns at site for some set of stocks with i between one and N. Using these standardized returns, we can't compute the empirical correlation matrix, C_ij. The number of free parameters in such matrix will be of the order of half of N squared. On the other hand, the total number of observations for N stocks observed over T steps will be N times T. This implies that for a reliable estimation of the true correlation matrix from an empirical correlation matrix, we need to have T much larger than N. However, this condition does not always hold in reality. For example, if your investment universe counts 3,000 stocks and you have 15 years of daily data, then you are closer to the case when T and N are of the same order and such case the empirical and the true correlation matrices may turn out to be quite different from one another. Therefore, if you want to use equity correlation matrices for out-of-sample forecasts or portfolio construction, we better have to somehow clear the empirical correlation matrix of observational noise. Such procedures are referred to as the de-noising of equity correlation matrices. Now, different approaches are available for such de-noising of correlation matrices. One class of methods is called the random matrix theory, where the spectrum of the empirical correlation matrix is compared with the spectrum of a random correlation matrix to identify and subtract the noise component in the matrix. This method was suggested in 1999 by a group of physicists led by Eugene Stanley. There are also some other methods that try to do the same. For example, there are Bayesian Methods and in particular the so-called shrinkage methods, where the empirical correlation matrix C is adjusted towards some pre-defined prior matrix C had that has some simple structure, for example, for us it's with the constant correlation within each industry group. And finally, there are clustering-based filtering methods. With this class of methods, the equity correlation matrix is viewed as a sort of graph, which we will de-noise aggregating links between sub-clusters. Let's now formalize this problem a bit. First, we formalize the clustering problem as follows. Given the set of N items with their pair-wise distances, d_ij, we want to divide it into K subsets in such a way that the minimum distance between points in different clusters is maximized. Here's how such a maximum minimum distance algorithm can be implemented. We maintain clusters as a set of connected components of graph. We then iteratively combine different clusters containing the two closest items by adding an edge between them. And we stop this procedure when we get exactly K clusters. What I described is known as the Kruskal algorithm. The Kruskal algorithm is an example of a single-linkage agglomerative clustering that we discussed in the last video. Now, the Kruskal algorithm operates on distances while we are only given the equity correlations, rho_ij. There exists a convenient way to define pair-wise distance d_ij from pairwise correlation rho_ij as square root of two, times one minus rho_ij. This choice is legitimate as long as it satisfies three axioms of an Euclidean metric. First, it's not negative and equals zero only when i equals j. Second, it's symmetric with respect to a placement of i by i and vice versa. Third, it satisfies the triangle inequality for distances between arbitrary three points i, j, and k. In their next video, will look into how the Kruskal algorithm works and how it can be used to filter equity correlation matrices.