Machine Learning MT23, Logistic regression


Flashcards

What type of machine learning problem does logistic regression solve?


Binary classification.

Is logistic regression a generative or discriminative method?


Discriminative.

Can you define $\sigma(x)$, the sigmoid function?


\[\sigma(x) = \frac{1}{1 + \exp(-x)}\]

How does logistic regression model $p(y = 1 \mid \pmb x, \pmb w)$?


\[p(y = 1 \mid \pmb x, \pmb w) = \sigma(\pmb w^T \pmb x) = \frac{1}{1 + \exp(-\pmb w^T \pmb x)}\]

Logistic regression models $p(y = 1 \mid \pmb x, \pmb w)$ as $\sigma(\pmb w^T \pmb x) = \frac{1}{1 + \exp(-\pmb w^T \pmb x)}$. How do we then use this to make predictions, i.e. decide the category of $\pmb x _ \text{new}$?


See if $\sigma(\pmb w^T \pmb x) > 1/2$ (or some other threshold value).

Give the negative log-likelihood $\text{NLL}(\pmb y \mid \pmb X, \pmb w)$ for a logistic regression model.


First,

\[p(\pmb y \mid \pmb X, \pmb w) = \prod^N _ {i=1} \sigma(\pmb w^T \pmb x _ i)^{y _ i} (1-\sigma(\pmb w^T \pmb x _ i))^{1-y _ i}\]

Then,

\[\text{NLL}(\pmb y \mid \pmb X, \pmb w) = -\sum^N _ {i=1}(y _ i \log(\sigma(\pmb w^T \pmb x _ i)) + (1-y _ i)\log(1-\sigma(\pmb w^T \pmb x _ i)))\]

What is the “iteratively reweighted least squares” method?


A technique for finding parameters for logistic regression, based on iteratively solving a weighted least squares problem.

Define the softmax function on a vector $a \in \mathbb R^C$ and describe why it is useful?


\[\text{softmax}([a _ 1, \ldots, a _ C]^T) = \left[\frac {e^{a _ 1}\\,} Z, \ldots \frac {e^{a _ C}\\,} Z \right]^T\]

where

\[Z = \sum^C _ {i=1} e^{a _ i}\]

Useful because it converts an unbounded vector of $C$ numbers into “probabilities” that correspond to something belonging to $C$ different categories.

In multiclass logistic regression, how is $p(y \mid \pmb x, \pmb W)$ where $\pmb x \in \mathbb R^C$ and $\pmb W \in \mathbb R^{D \times C}$ defined (let $\pmb w _ c$ denote the $c$-th column of $\pmb W$)?


\[p(y \mid \pmb x, \pmb W) = \text{softmax}([\pmb w _ 1^T \pmb x, \ldots, \pmb w _ C^T \pmb x])\]

The NLL of logistic regression is given by

\[-\sum^N _ {i=1} (y _ i \log \mu _ i + (1 - y _ i)\log(1 - \mu _ i))\]

where

\[\mu _ i = \sigma(\pmb w^\top \pmb x _ i)\]

Quickly derive $\partial _ {\pmb w} \text{NLL}$ and the Hessian, and use Newton’s method to define an update rule that can be used to calculate the weights.


\[\begin{aligned} \partial _ {\pmb w} \text{NLL}(\pmb y \mid X, \pmb w) &= \sum^N _ {i = 1} \pmb x _ i (\mu _ i - y _ i) \end{aligned}\] \[\begin{aligned} H _ {\pmb w} (\text{NLL}) &= J(\sum^N _ {i = 1} \pmb x _ i (\mu _ i - y _ i))^\top \\\\ &= \sum^N _ {i = 1} \pmb x _ i (\partial _ {\pmb w} (\mu _ i - y _ i))^\top &&\text{since we are taking deriv wrt. } \pmb w\\\\ &= \sum^N _ {i = 1} \pmb x _ i (\pmb x _ i \mu _ i (1 - \mu _ i))^\top \\\\ &= \sum^N _ {i = 1} \pmb x _ i (\mu _ i (1 - \mu _ i)) \pmb x _ i^\top \\\\ &= X^\top S X \end{aligned}\]

where

\[S := \text{diag}(\mu _ i (1 - \mu _ i))\]

It can be shown $S$ is positive definite, so $X^\top S X$ is positive semidefinite.

Then

\[\begin{aligned} \pmb g _ t &= X^\top (\pmb \mu _ t - \pmb y) = -X^\top (\pmb y - \pmb \mu _ t) \\\\ \pmb H _ t &= X^\top S _ t X \end{aligned}\]

So Newton’s update rule gives

\[\begin{aligned} \pmb w _ {t + 1} &= \pmb w _ t - \pmb H _ t^{-1} \pmb g _ t \\\\ &= \pmb w _ t + (X^\top S _ t X)^{-1} X^\top (\pmb y - \mu _ t) \\\\ &= (X^\top S _ t X)^{-1} X^\top S _ t (X \pmb w _ t + S _ t^{-1} (\pmb y - \pmb \mu _ t)) \\\\ &= (X^\top S _ t X)^{-1}(X^\top S _ t z _ t) \end{aligned}\]

where

\[\pmb z _ t = X\pmb w _ t + S^{-1} _ t (\pmb y - \pmb \mu _ t)\]

Deriving the update used in Newton’s method for the MLE of logistic regression gives

\[\pmb w _ {t + 1} =(X^\top S _ t X)^{-1}(X^\top S _ t z _ t)\]

where

\[\pmb z _ t = X\pmb w _ t + S^{-1} _ t (\pmb y - \pmb \mu _ t)\]

and

\[S := \text{diag}(\mu _ i (1 - \mu _ i))\]

How can you recognise this is a solution to a least squares problem?


It’s equivalent to the solution of

\[\min _ {\pmb w} \quad \sum^N _ {i = 1}S _ {t,ii} (z _ {t,i} - \pmb w^\top \pmb x _ i)^2\]



Related posts