Notes - Machine Learning MT23, Clustering
Flashcards
What is the difference between flat and hierarchical clustering?
- Flat: outputs a list of clusters (e.g. $k$-means clustering, spectral clustering)
- Hierarchical: outputs a tree of clusters (e.g. linkage algorithms)
What is the output of a hierarchical clustering algorithm called?
A dendrogram.
What is the difference between divisive and agglomerative hierarchical clustering algorithms?
Divisive algorithms recursively splits data top-down, whereas agglomerative algorithms work from the bottom-up.
What’s the basic principle behind linkage algorithms for agglomerative hierarchical clustering of $N$ points?
Start with $N$ singleton clusters, at each step join the most similar clusters, until we have just 1 cluster.
In linkage algorithms for agglomerative hierarchical clustering, we require a notion of dissimilarities between two clusters $C$ and $C’$. What are three common metrics $d$ to do this?
Single linkage:
\[d(C, C') = \min_{\pmb x \in C, \pmb x' \in C'} d(\pmb x, \pmb x')\]Average linkage:
\[d(C, C') = \frac{1}{|C| \cdot |C'|} \sum_{\pmb x \in C, \pmb x' \in C'} d(\pmb x, \pmb x')\]Complete linkage:
\[d(C, C') = \max_{\pmb x \in C, \pmb x' \in C'} d(\pmb x, \pmb x')\]