Notes - Machine Learning MT23, k-means clustering
Flashcards
Suppose we are performing $k$-means clustering with means $\pmb \mu _ 1, \cdots, \pmb \mu _ k$ and clusters $C _ 1, \cdots, C _ k$. What is the $k$-means objective, and what do we optimise over?
Optimise over all possible partions and means.
Give a quick description of the $k$-means algorithm.
- Initialise random means Repeatedly:
- Make clusters of points closest to each mean
- Create new means from the mean of each cluster
In the most basic case, how are the initial means selected in $k$-means clustering?
They are chosen at random.
Quickly sketch what the graph of $k$ against loss for $k$-means clustering would look like, for a set of points that clearly form $5$ clusters.
Todo, should be a curve roughly shaped like $e^{-k}$ with an “elbow” at $5$.
What’s a big issue with $k$-means clustering in terms of the shape of the clusters, and what are two approaches for getting around this?
The clusters are always convex. You could:
- Perform non-linear dimensionality reduction, then normal clustering
- Use spectral clustering
Suppose we have the following points:
(two distinct clusters of points, one with very few points on the left and one with lots of points on the right)
We want to perform $2$-means clustering on this data. Where would the optimal means lie, and what general condition does this imply on a dataset in order for it to have a good $k$-means clustering?
Since the right cloud contains many more points than the right cloud, both means would be in the right cloud (rather than what you’d expect where there was a mean in either).
Thus for a dataset to have a good $k$-means clustering, you need the clusters in the data to have roughly the same amount of points as one another.