Notes - Machine Learning MT23, Clustering


Flashcards

What is the difference between flat and hierarchical clustering?


  • Flat: outputs a list of clusters (e.g. $k$-means clustering, spectral clustering)
  • Hierarchical: outputs a tree of clusters (e.g. linkage algorithms)

What is the output of a hierarchical clustering algorithm called?


A dendrogram.

What is the difference between divisive and agglomerative hierarchical clustering algorithms?


Divisive algorithms recursively splits data top-down, whereas agglomerative algorithms work from the bottom-up.

What’s the basic principle behind linkage algorithms for agglomerative hierarchical clustering of $N$ points?


Start with $N$ singleton clusters, at each step join the most similar clusters, until we have just 1 cluster.

In linkage algorithms for agglomerative hierarchical clustering, we require a notion of dissimilarities between two clusters $C$ and $C’$. What are three common metrics $d$ to do this?


Single linkage:

\[d(C, C') = \min_{\pmb x \in C, \pmb x' \in C'} d(\pmb x, \pmb x')\]

Average linkage:

\[d(C, C') = \frac{1}{|C| \cdot |C'|} \sum_{\pmb x \in C, \pmb x' \in C'} d(\pmb x, \pmb x')\]

Complete linkage:

\[d(C, C') = \max_{\pmb x \in C, \pmb x' \in C'} d(\pmb x, \pmb x')\]



Related posts