And then once you have these centroids, we kind of defined we

kind of assign each of the observations to each of these centroids.

And that's how we do the group kind of assignment.

And so the basic approach is to kind of pick some, or the

algorithm for writing K-means is to kind of pick some, pick a centroid.

You know, assign all the points to the centroids.

Then maybe recalculate the centroids and reassign all the points.

So you kind of iterate back and forth until you reach a solution.

And so the things that you need, you need a distance metric.

You need a number of clusters, so a

fixed number of clusters that hit the specify beforehand.

And you need an initial guess as to where the centroids are.

And often you might just pick some random points, just

to start the algorithm in terms of where the centroids are.

But K-means clustering algorithm will produce

a, a final kind of estimate of. Where the cluster centroids are.

And it will tell you which centroid each observation is assigned to.

Here's a quick example of how you might use the K-means clustering algorithm.

I've generated just some random data here that

are in two dimensions so it's easy to visualize.

And so the x coordinates and the y coordinates

all come from a normal distribution with different means.

So I specifically

created three different kind of clusters for these twelve observations.

So each cluster has four observations in it.

So when I plot the data, it's very obvious that there are three clusters.

And I put labels on each of the points.