Notes - Machine Learning MT23, Spectral clustering


Flashcards

What is the idea behind the spectral clustering algorithm to find clusters for a set of points $\pmb x _ 1, \cdots, \pmb x _ n$?


Convert the set of points into a weighted graph where edge weight represents similarity, and then use a graph clustering algorithm to find clusters.

Suppose we have a set of points $\pmb x _ 1, \cdots, \pmb x _ n$. Quickly describe the spectral clustering algorithm.


  • Construct a $k$-nearest neighbour graph where:
    • The nodes correspond to each point in the data
    • The weights correpond to a similarity measure between two points
    • (as a reminder: the $k$-nearest neighbour then links together $x _ i$ and $x _ j$ if either $x _ i$ is among the $k$ nearest neighbours of $x _ j$, or vice versa, and the most common similarity measure is $s _ {i, j} = \exp(-\gamma \vert \vert x _ i - x _ j \vert \vert $)).
  • Compute:
    • $W$, the weighted symmetric adjacency matrix
    • $D$, the weighted (always) symmetric degree matrix (each $D _ {ii}$ is the sum) of the weights for all edges it is connected to
    • $L := D - W$, the (always) positive semidefinite Laplacian matrix
  • Find the $\{v _ 1, \cdots, v _ k\}$, the $k$-smallest eigenvectors of $L$
    • (why is this useful? I’m not sure – there’s some results in algebraic graph theory that relate these vectors to important properties of the graph. E.g. taking the eigenvector corresponding to the second largest eigenvalue gives an approximate solution to the NP-hard “sparsest cut” problem, which you can view spectral clustering as a relaxation of)
  • Form the $N \times (k-1)$ embedding matrix given by $V _ k = [v _ 2, \cdots, v _ k]$
    • (why ignore $v _ 1$? there is a result that says that any Laplacian matrix $L$ has an eigenvalue corresponding to $0$, and that the corresponding eigenvector is just $\pmb 1$, so including it would be useful)
    • (here, the rows give the embeddings of the original points $x _ i$)
  • Apply a standard clustering algorithm (e.g. $k$-means) on $V _ k$

Suppose we have a set of points $\pmb x _ 1, \cdots, \pmb x _ n$. The spectral clustering algorithm works as follows:

  • Construct a $k$-nearest neighbour graph where:
    • The nodes correspond to each point in the data
    • The weights correpond to a similarity measure between two points
    • (as a reminder: the $k$-nearest neighbour then links together $x _ i$ and $x _ j$ if either $x _ i$ is among the $k$ nearest neighbours of $x _ j$, or vice versa, and the most common similarity measure is $s _ {i, j} = \exp(-\gamma \vert \vert x _ i - x _ j \vert \vert $)).
  • Compute:
    • $W$, the weighted symmetric adjacency matrix
    • $D$, the weighted (always) symmetric degree matrix (each $D _ {ii}$ is the sum) of the weights for all edges it is connected to
    • $L := D - W$, the (always) positive semidefinite Laplacian matrix
  • Find the $\{v _ 1, \cdots, v _ k\}$, the $k$-smallest eigenvectors of $L$
    • (why is this useful? I’m not sure – there’s some results in algebraic graph theory that relate these vectors to important properties of the graph. E.g. taking the eigenvector corresponding to the second largest eigenvalue gives an approximate solution to the NP-hard “sparsest cut” problem, which you can view spectral clustering as a relaxation of)
  • Form the $N \times (k-1)$ embedding matrix given by $V _ k = [v _ 2, \cdots, v _ k]$
    • Why ignore $v _ 1$?
    • What do the rows represent?
  • Apply a standard clustering algorithm (e.g. $k$-means) on $V _ k$

Answer the two questions in bold.


  • Why ignore $v _ 1$: There is a result that says that any Laplacian matrix $L$ has an eigenvalue corresponding to $0$, and that the corresponding eigenvector is just $\pmb 1$, so including it would be useful.
  • What do the rows represent?: The rows give the embeddings of the original points $x _ i$.

Suppose we have a set of points $\pmb x _ 1, \cdots, \pmb x _ n$. In the spectral clustering algorithm, we construct a $k$-nearest neighbour graph where:

  • The nodes correspond to each point in the data
  • The weights correpond to a similarity measure $s(x _ i, x _ j)$ between two points

What is a common similarity measure used to do this?


The radial basis function:

\[s(x_i, x_j) = \exp(-\gamma ||x_i - x_j||^2)\]

Suppose we have a set of points $\pmb x _ 1, \cdots, \pmb x _ n$ which we want to split into two clusters. One approach to this problem is to form a graph from the points and then use graph algorithms to find good clusters.

A naïve approach would be to form the complete graph of all the points where we use weights corresponding to a similarity measure between all the points, and then find the min-cut.

Why does this not work in practice?


Because the min-cut typically ends up creating a cut like $(\{a\}, V\setminus\{a\})$ where we have just a single point in one partition, which is not useful.

Proofs




Related posts