Notes - Machine Learning MT23, Maximum likelihood principle


``

[[Course - Machine Learning MT23]]U - [[Notes - Machine Learning MT23, Linear regression]]U

Flashcards

What is the maximum likelihood principle?!?

Can you summarise the maximum likelihood principle?


The best fit model for a given dataset is the one that generates the data with the highest probability.

In the context of linear regression, when using the maximum likelihood principle we wish to learn a mapping $f _ {\pmb w} : \mathbb R^D \to \mathbb R$ which we assume is linear, but with a normally-distributed error term. How can we describe this mathematically? Assume the output is $y$.


\[\mathbb E[y \mid \pmb x, \pmb w] = f_{\pmb w}(\pmb x) = \pmb w^T \pmb x\]

then

\[y \sim \pmb w^T \pmb x + \mathcal N(0, \sigma^2)\]

Consider linear regression under the maximum likelihood framework. We assume that for each $y _ i$, $y _ i = \pmb w \cdot \pmb x _ i + \epsilon _ i$, where $\langle \pmb x _ i, y\rangle^N _ {i = 1}$ is the data we observe and $\epsilon _ i \sim \mathcal N(0, \sigma^2)$. Find an expression for the negative log-likelihood, the function we wish to minimise.


\[\begin{aligned} p(y_1, \ldots, y_n \mid \pmb x_1, \ldots, \pmb x_N, \pmb w, \sigma) &= \prod^N_{i = 1} p(y_i \mid \pmb x_i, \pmb w, \sigma) \\\\ &= \prod^N_{i = 1} \frac{1}{\sqrt{2\pi\sigma^2}\,} \exp\left(-\frac{(y_i - \pmb w^T \pmb x_i)^2}{2\sigma^2}\right) \\\\ &= \left(\frac{1}{\sqrt{2\pi\sigma^2}\,}\right)^N \exp\left(-\frac{1}{2\sigma^2}\sum^N_{i=1}(y_i - \pmb w^T \pmb x_i)^2\right) \end{aligned}\]

then, taking logarithm and using matrix notation

\[\text{NLL}(\pmb y \mid \pmb X, \pmb w, \sigma) = \frac{1}{2\sigma^2}(\pmb X \pmb w - \pmb y)^T(\pmb X \pmb w - \pmb y) + \frac{N}{2}\log(2\pi \sigma^2)\]

Under the MLE framework, if you use normally distributed errors vs Laplace distributed errors for a linear regression, what happens?


  • Normally distributed errors: equivalent to least-squares (i.e. $l _ 2$ regression)
  • Laplace distributed errors: equivalent to $l _ 1$ regression

Proofs




Related posts