Notes - Machine Learning MT23, Multidimensional scaling
Flashcards
Sometimes we have data in Euclidean space that we wish to cluster using a method that is based on pairwise dissimilarities. Suppose $\pmb x = (x _ 1, \cdots, x _ n)^\top$. What’s a typical way of assigning a dissimilarity to points like this?
where $f$ is an increasing function and $d _ j$ is a distance function for each feature.
What is the problem of multidimensional scaling?
Finding a Euclidean representation of a set of points given only their pairwise dissimilarities.
Quickly describe how given a matrix
\[M_
{ij} = \pmb x_
i^\top \pmb x_
j\]
You can find vectors $\hat{\pmb x} _ 1, \cdots, \hat{\pmb x} _ N$ such that
\[M_
{ij} = \hat{\pmb x}_
i^\top \hat{\pmb x_
j}\]
(i.e. you can find points with the same pairwise similarities).
Consider the full SVD of $M$
\[M = U \Sigma V^\top\]Since $M$ is symmetric, $U = V$ and $U, \Sigma$ are $n \times n$, we have:
\[M = U\Sigma U^\top\]Now define
\[\widetilde{X} = U \Sigma^{1/2}\]And take row $i$ to be $\hat{\pmb x} _ i$. Then let’s check the pairwise similarities:
\[\widetilde{X} \widetilde{X}^\top = U \Sigma^{1/2} \Sigma^{1/2} U^\top = M\]Hence we have found an example of the points we require.
Suppose we have a dissimilarity matrix $D$. Can you define the stress function, and what its value represents?
Its value represents the degree to which it is not posssible to find a Euclidean embedding for the given dissimilarity matrix.
In the problem of multidimensional scaling, we have a matrix
\[M _ {ij} = \pmb x _ i^\top \pmb x _ j\]
and want to find vectors $\hat{\pmb x} _ 1, \cdots, \hat{\pmb x} _ N$ such that
\[M _ {ij} = \hat{\pmb x} _ i^\top \hat{\pmb x _ j}\]
(and we don’t know the original $\pmb x _ i$). The approach works as:
Consider the full SVD of $M$
\[M = U \Sigma V^\top\]
Since $M$ is symmetric, $U = V$ and $U, \Sigma$ are $n \times n$, we have:
\[M = U\Sigma U^\top\]
Now define
\[\widetilde{X} = U \Sigma^{1/2}\]
And take row $i$ to be $\hat{\pmb x} _ i$. Then the pairwise similarities do indeed match:
\[\widetilde{X} \widetilde{X}^\top = U \Sigma^{1/2} \Sigma^{1/2} U^\top = M\]
How can this be kernelised, and what do you require about the kernel in order to make sure this construction works?
Consider the full SVD of $M$
Since $M$ is symmetric, $U = V$ and $U, \Sigma$ are $n \times n$, we have:
Now define
- $M$ can be interpreted as the Gram matrix for a given kernel.
- We need symmetry and positive semidefiniteness (so that we can take square roots). This happens if it’s a Mercer kernel.