Notes - Machine Learning MT23, Basis expansion


Flashcards

Suppose we are training a model where the input is two-dimensional, i.e. $(x _ 1, x _ 2)$. We want to model quadratic data but we can only use linear regression. How can we achieve this?


Use a feature expansion $\phi(\pmb x) = [1, x _ 1, x _ 2, x _ 1x _ 2, x _ 1^2, x _ 2^2]^T$, and use $\phi(\pmb x)$ as the input.

Suppose we want to use feature expansion to fit $D$-dimensional data to a degree $d$ polynomial. Roughly how many inputs do we then have to give to the model?


\[D^d\]

Can you define the radial basis function kernel $\kappa(\pmb x’, \pmb x)$ with width parameter $\gamma$?


\[\kappa(\pmb x', \pmb x) = \exp\left(-\gamma ||\pmb x - \pmb x'||^2\right)\]

Suppose we are using a radial basis function $\kappa(\pmb x’, \pmb x)$ kernel to do basis expansion. If our input is originally $\pmb x$, what is our feature expansion $\phi$?


For some centres $\pmb \mu _ 1, \ldots, \pmb \mu _ M$

\[\phi(\pmb x) = [1, \kappa(\pmb \mu_1, \pmb x), \ldots, \kappa(\pmb \mu_M, \pmb x)]\]

Suppose we are using a radial basis function $\kappa(\pmb x’, \pmb x)$ kernel to do basis expansion. Where $\phi(\pmb x) = [1, \kappa(\pmb \mu _ 1, \pmb x), \ldots, \kappa(\pmb \mu _ M, \pmb x)]$ for some centres $\pmb \mu _ 1, \ldots, \pmb \mu _ M$. What is the most common approach to picking such centres?


Take the centres as the data points themselves.

Suppose we are using a radial basis function $\kappa(\pmb x’, \pmb x)$ kernel to do basis expansion. Where $\phi(\pmb x) = [1, \kappa(\pmb \mu _ 1, \pmb x), \ldots, \kappa(\pmb \mu _ M, \pmb x)]$ for some centres $\pmb \mu _ 1, \ldots, \pmb \mu _ M$. What happens if the width parameter $\gamma$ in the RBF kernel is too small?


The width of the kernel will be too narrow, so the model will likely overfit.

Suppose we are using a radial basis function $\kappa(\pmb x’, \pmb x)$ kernel to do basis expansion. Where $\phi(\pmb x) = [1, \kappa(\pmb \mu _ 1, \pmb x), \ldots, \kappa(\pmb \mu _ M, \pmb x)]$ for some centres $\pmb \mu _ 1, \ldots, \pmb \mu _ M$. What happens if the width parameter $\gamma$ in the RBF kernel is too large?


The width of the kernel will be too wide, so the model will likely underfit.

Proofs




Related posts