Machine Learning MT23, Kernels
Flashcards
What is a kernel $\kappa$?
Some function
\[\kappa : \mathcal X \times \mathcal X \to \mathbb R\]Can you give the Gram matrix $\pmb K$ for a kernel $\kappa : \mathcal X \times \mathcal X \to \mathbb R$ and inputs $\langle \pmb x _ i \rangle^N _ {i=1}$?
What does it mean for a kernel to be a Mercer kernel?
The Gram matrix is always positive semi-definite.
Why are Mercer kernels important in the context of the kernel trick for SVMs?
They keep the SVM dual optimisation problem convex.
Can you define the degree-$d$ polynomial kernel, and describe its computational advantages?
This implicitly represents the feature expansion into all terms in a degree $d$ polynomial in each entry of $\pmb x$, of which there are roughly $O(D^d)$. This allows computing the associated inner product in just $O(D\log d)$.
What is the spherical Gaussian kernel?
Radial basis function kernels are not just kernels of the form
\[\kappa(\pmb x, \pmb x') = \exp\left( -\frac{||\pmb x - \pmb x'||^2_2}{2\sigma^2} \right)\]
What are they in general?
Any kernel that depend only on $ \vert \vert \pmb x - \pmb x’ \vert \vert $.
Suppose we have two strings $\mathbf x$ and $\mathbf x’$ over an alphabet $\Sigma$. Can you describe a Mercer kernel for these inputs?
where $w _ s$ are weights and $\phi _ s(\mathbf x)$ is the number of occurences of the substring $s$ in $\mathbf x$.