Notes - Machine Learning MT23, Basis expansion
Flashcards
Suppose we are training a model where the input is two-dimensional, i.e. $(x _ 1, x _ 2)$. We want to model quadratic data but we can only use linear regression. How can we achieve this?
Use a feature expansion $\phi(\pmb x) = [1, x _ 1, x _ 2, x _ 1x _ 2, x _ 1^2, x _ 2^2]^T$, and use $\phi(\pmb x)$ as the input.
Suppose we want to use feature expansion to fit $D$-dimensional data to a degree $d$ polynomial. Roughly how many inputs do we then have to give to the model?
Can you define the radial basis function kernel $\kappa(\pmb x’, \pmb x)$ with width parameter $\gamma$?
Suppose we are using a radial basis function $\kappa(\pmb x’, \pmb x)$ kernel to do basis expansion. If our input is originally $\pmb x$, what is our feature expansion $\phi$?
For some centres $\pmb \mu _ 1, \ldots, \pmb \mu _ M$
\[\phi(\pmb x) = [1, \kappa(\pmb \mu_1, \pmb x), \ldots, \kappa(\pmb \mu_M, \pmb x)]\]Suppose we are using a radial basis function $\kappa(\pmb x’, \pmb x)$ kernel to do basis expansion. Where $\phi(\pmb x) = [1, \kappa(\pmb \mu _ 1, \pmb x), \ldots, \kappa(\pmb \mu _ M, \pmb x)]$ for some centres $\pmb \mu _ 1, \ldots, \pmb \mu _ M$. What is the most common approach to picking such centres?
Take the centres as the data points themselves.
Suppose we are using a radial basis function $\kappa(\pmb x’, \pmb x)$ kernel to do basis expansion. Where $\phi(\pmb x) = [1, \kappa(\pmb \mu _ 1, \pmb x), \ldots, \kappa(\pmb \mu _ M, \pmb x)]$ for some centres $\pmb \mu _ 1, \ldots, \pmb \mu _ M$. What happens if the width parameter $\gamma$ in the RBF kernel is too small?
The width of the kernel will be too narrow, so the model will likely overfit.
Suppose we are using a radial basis function $\kappa(\pmb x’, \pmb x)$ kernel to do basis expansion. Where $\phi(\pmb x) = [1, \kappa(\pmb \mu _ 1, \pmb x), \ldots, \kappa(\pmb \mu _ M, \pmb x)]$ for some centres $\pmb \mu _ 1, \ldots, \pmb \mu _ M$. What happens if the width parameter $\gamma$ in the RBF kernel is too large?
The width of the kernel will be too wide, so the model will likely underfit.