# Notes - Machine Learning MT23, Kernels

### Flashcards

What is a kernel $\kappa$?

Some function

\[\kappa : \mathcal X \times \mathcal X \to \mathbb R\]Can you give the Gram matrix $\pmb K$ for a kernel $\kappa : \mathcal X \times \mathcal X \to \mathbb R$ and inputs $\langle \pmb x _ i \rangle^N _ {i=1}$?

What does it mean for a kernel to be a Mercer kernel?

The Gram matrix is always positive semi-definite.

Why are Mercer kernels important in the context of the kernel trick for SVMs?

They keep the SVM dual optimisation problem convex.

Can you define the degree-$d$ polynomial kernel, and describe its computational advantages?

This implicitly represents the feature expansion into all terms in a degree $d$ polynomial in each entry of $\pmb x$, of which there are roughly $O(D^d)$. This allows computing the associated inner product in just $O(D\log d)$.

What is the spherical Gaussian kernel?

Radial basis function kernels are not just kernels of the form

\[\kappa(\pmb x, \pmb x') = \exp\left( -\frac{||\pmb x - \pmb x'||^2_2}{2\sigma^2} \right)\]
What are they in general?

Any kernel that depend only on $ \vert \vert \pmb x - \pmb x’ \vert \vert $.

Suppose we have two strings $\mathbf x$ and $\mathbf x’$ over an alphabet $\Sigma$. Can you describe a Mercer kernel for these inputs?

where $w _ s$ are weights and $\phi _ s(\mathbf x)$ is the number of occurences of the substring $s$ in $\mathbf x$.