Machine Learning MT23, Logistic regression
Flashcards
What type of machine learning problem does logistic regression solve?
Binary classification.
Is logistic regression a generative or discriminative method?
Discriminative.
Can you define $\sigma(x)$, the sigmoid function?
How does logistic regression model $p(y = 1 \mid \pmb x, \pmb w)$?
Logistic regression models $p(y = 1 \mid \pmb x, \pmb w)$ as $\sigma(\pmb w^T \pmb x) = \frac{1}{1 + \exp(-\pmb w^T \pmb x)}$. How do we then use this to make predictions, i.e. decide the category of $\pmb x _ \text{new}$?
See if $\sigma(\pmb w^T \pmb x) > 1/2$ (or some other threshold value).
Give the negative log-likelihood $\text{NLL}(\pmb y \mid \pmb X, \pmb w)$ for a logistic regression model.
First,
\[p(\pmb y \mid \pmb X, \pmb w) = \prod^N _ {i=1} \sigma(\pmb w^T \pmb x _ i)^{y _ i} (1-\sigma(\pmb w^T \pmb x _ i))^{1-y _ i}\]Then,
\[\text{NLL}(\pmb y \mid \pmb X, \pmb w) = -\sum^N _ {i=1}(y _ i \log(\sigma(\pmb w^T \pmb x _ i)) + (1-y _ i)\log(1-\sigma(\pmb w^T \pmb x _ i)))\]What is the “iteratively reweighted least squares” method?
A technique for finding parameters for logistic regression, based on iteratively solving a weighted least squares problem.
Define the softmax function on a vector $a \in \mathbb R^C$ and describe why it is useful?
where
\[Z = \sum^C _ {i=1} e^{a _ i}\]Useful because it converts an unbounded vector of $C$ numbers into “probabilities” that correspond to something belonging to $C$ different categories.
In multiclass logistic regression, how is $p(y \mid \pmb x, \pmb W)$ where $\pmb x \in \mathbb R^C$ and $\pmb W \in \mathbb R^{D \times C}$ defined (let $\pmb w _ c$ denote the $c$-th column of $\pmb W$)?
The NLL of logistic regression is given by
\[-\sum^N _ {i=1} (y _ i \log \mu _ i + (1 - y _ i)\log(1 - \mu _ i))\]
where
\[\mu _ i = \sigma(\pmb w^\top \pmb x _ i)\]
Quickly derive $\partial _ {\pmb w} \text{NLL}$ and the Hessian, and use Newton’s method to define an update rule that can be used to calculate the weights.
where
\[S := \text{diag}(\mu _ i (1 - \mu _ i))\]It can be shown $S$ is positive definite, so $X^\top S X$ is positive semidefinite.
Then
\[\begin{aligned} \pmb g _ t &= X^\top (\pmb \mu _ t - \pmb y) = -X^\top (\pmb y - \pmb \mu _ t) \\\\ \pmb H _ t &= X^\top S _ t X \end{aligned}\]So Newton’s update rule gives
\[\begin{aligned} \pmb w _ {t + 1} &= \pmb w _ t - \pmb H _ t^{-1} \pmb g _ t \\\\ &= \pmb w _ t + (X^\top S _ t X)^{-1} X^\top (\pmb y - \mu _ t) \\\\ &= (X^\top S _ t X)^{-1} X^\top S _ t (X \pmb w _ t + S _ t^{-1} (\pmb y - \pmb \mu _ t)) \\\\ &= (X^\top S _ t X)^{-1}(X^\top S _ t z _ t) \end{aligned}\]where
\[\pmb z _ t = X\pmb w _ t + S^{-1} _ t (\pmb y - \pmb \mu _ t)\]Deriving the update used in Newton’s method for the MLE of logistic regression gives
\[\pmb w _ {t + 1} =(X^\top S _ t X)^{-1}(X^\top S _ t z _ t)\]
where
\[\pmb z _ t = X\pmb w _ t + S^{-1} _ t (\pmb y - \pmb \mu _ t)\]
and
\[S := \text{diag}(\mu _ i (1 - \mu _ i))\]
How can you recognise this is a solution to a least squares problem?
It’s equivalent to the solution of
\[\min _ {\pmb w} \quad \sum^N _ {i = 1}S _ {t,ii} (z _ {t,i} - \pmb w^\top \pmb x _ i)^2\]