Notes - Machine Learning MT23, k-means clustering


Flashcards

Suppose we are performing $k$-means clustering with means $\pmb \mu _ 1, \cdots, \pmb \mu _ k$ and clusters $C _ 1, \cdots, C _ k$. What is the $k$-means objective, and what do we optimise over?


\[\sum^k_{j = 1} \sum_{i \in C_j} ||\pmb x_i - \pmb \mu_j||^2_2\]

Optimise over all possible partions and means.

Give a quick description of the $k$-means algorithm.


  • Initialise random means Repeatedly:
  • Make clusters of points closest to each mean
  • Create new means from the mean of each cluster

In the most basic case, how are the initial means selected in $k$-means clustering?


They are chosen at random.

Quickly sketch what the graph of $k$ against loss for $k$-means clustering would look like, for a set of points that clearly form $5$ clusters.


Todo, should be a curve roughly shaped like $e^{-k}$ with an “elbow” at $5$.

What’s a big issue with $k$-means clustering in terms of the shape of the clusters, and what are two approaches for getting around this?


The clusters are always convex. You could:

  • Perform non-linear dimensionality reduction, then normal clustering
  • Use spectral clustering

Suppose we have the following points:

(two distinct clusters of points, one with very few points on the left and one with lots of points on the right)

We want to perform $2$-means clustering on this data. Where would the optimal means lie, and what general condition does this imply on a dataset in order for it to have a good $k$-means clustering?


Since the right cloud contains many more points than the right cloud, both means would be in the right cloud (rather than what you’d expect where there was a mean in either).

Thus for a dataset to have a good $k$-means clustering, you need the clusters in the data to have roughly the same amount of points as one another.

Proofs




Related posts