0:06

In this session we're going to introduce the most popular clustering methods,

Â the K-Means clustering method.

Â So you may wonder, who first proposed this K-Means clustering method?

Â Some people refer to MacQueen because he published a paper on

Â the K-Means Clustering algorithm in 1967.

Â But, some other people said, Lloyd actually internally,

Â in a company, published the, the one in the internal report in 1957.

Â Since the publication was internally published mo,

Â most people may not even know it.

Â So, he republished this one in a journal in 1982.

Â That's the reason some people say Lloyd first got this algorithm.

Â 0:52

But no matter what, actually, the general common thing for this algorithm is,

Â they consider every center is represented by the center of the cluster,

Â or we can say, centroid.

Â The K-Means Algorithm essentially can be outlined as below.

Â It means, given K, the number of clusters,

Â first we can select the K points as initial centroid.

Â Of course, you can randomly select it, that's what our original K means,

Â that, or you may have some small way to select it we'll introduce later.

Â Then, we can put this one on repeat and here,

Â look, there is once we select the K centroid,

Â we can form K clusters by assigning each point to it's closest centroid.

Â Once assigning this, we may need to recompute the centroid

Â because the initial randomly selected centroid may not be a very good one.

Â So we recompute the centroid that means the mean point of each cluster.

Â Once you recompute the centroid,

Â some object initially assigned to say centroid A now

Â because of all the centroid changed this point may be even closer to centroid B.

Â Okay?

Â That means we need to go back to form K clusters

Â by assigning or re-assigning each point with its closest centroid now.

Â Okay?

Â Then we need to recompute the new centroid.

Â 2:45

We can use Manhattan distance, L1 norm, or Euclidean distance, L2 norm,

Â but often, our initial algorithm was using this Euclidean distance.

Â 3:00

Now let's look example.

Â How we can execute this K-Means Clustering Algorithm.

Â Suppose we got the into 2-D space.

Â Initially there were black points on the initial data point.

Â Then we can based on the first line of the algorithm,

Â we can select the K points as initial centroid.

Â Now, K is two.

Â We can select the two initial centroid.

Â 3:29

Once we select this initial centroid

Â we can form K clusters by assigning each point to it's closest centroid.

Â Now you probably can see we can assign these, these black points to either

Â red centroid or to the blue centroid, you can see that's a current assignment.

Â 3:50

Once you assign these to form two clusters, okay,

Â then we need to recompute the centroid, okay, where you see, you can

Â recompute the centroid once you recompute this red centroid mode down here, okay?

Â 4:06

Then this, blue one actually is also moved, okay.

Â Once we recompute the centroid, we may need to

Â go back to reassign these point to its closest centroid.

Â In this case you probably can see, the closest centroid, like this,

Â blue points assigned to the red ones, and

Â these red, this red one assigned to the blue side of the cluster.

Â Okay? Then we can calculate the centroid again.

Â Okay?

Â So this repeat the loop here until it becomes stable or

Â until the, the difference the different assignment becomes really, really small.

Â Now the first important thing we should know is,

Â this actually is a pretty efficient algorithm for clustering.

Â Because if we're told we'll have n objects,

Â we will want to find a K clusters.

Â Suppose we get a t iteration it becomes stable.

Â You probably can see the computational complexity is, every time every

Â object that you need to see which cluster is it belongs to, so this time,

Â you're assigning to the corresponding K centroid, so that's why it's n times K.

Â In total, you have t iterations,

Â that's why the computational complexity is big O of t times K times n.

Â usually K and t, number of cluster and

Â number of iterations, is far smaller than number of objects.

Â That's why if you give me n, supposed quite big number of objects,

Â this complexity essentially is a linear to the size of n.

Â The number of objects, that's efficient algorithms.

Â However, K-means clustering often may terminate at a local optimal.

Â 6:04

Then, the initialization becomes important if you want to find a high quality

Â Clusters.

Â We need to have a smart initialization or we need to randomly initialize this

Â many times, try to find a good clustering green dots.

Â 6:33

There are many ways to automatically determine the best K.

Â There are some research papers, there are lots of efforts contribute to this.

Â In practice you also can

Â run a range of values as a K then select which one is the best.

Â Then for K-Means messrs it's quite sensitive to the noise data and outliers.

Â So there are variations like K-medians, or

Â K-medoids algorithm we try to overcome this outlier noise data problem.

Â 7:07

Another thing is K-means actually works only to objects in the continuous.

Â That means numerical data in the, in the, in dimensional space.

Â How to handle clusters on the categorical data?

Â Actually, people invented something called K-modes algorithms.

Â 7:27

Another important thing we should know is, because we use the sum of the square

Â distance, it then, it is not a good idea to try to find

Â a clusters with non-convex shapes or non-circular shapes.

Â Okay. What you find is something

Â closer to a ball.

Â 7:46

So we can use density based clustering or

Â kernel K-means algorithm to do if we get into a non-convex shapes.

Â We're going to introduce this algorithms later.

Â So, because K-means

Â came quite early and very popular used, there are lots of studies

Â to work out the different variations of the K-means method.

Â For example, how to choose initial centroid.

Â 8:28

Another aspect is how to choose different representative prototypes

Â for the clusters.

Â That means we may not use the mean point,

Â then we may use K-Medoids or K-Medians or K-Modes.

Â We're going to introduce this in the subsequent discussion.

Â 8:48

Another thing is how to apply feature transformation techniques.

Â So, we may be able to cluster things even if they are not in convex shape.

Â For example, they are massive developed card, weighted K-Means or Kernel K-Means.

Â We are going to introduce the Kernel K-Means method

Â [MUSIC]

Â