Notes - Machine Learning MT23, Logistic regression


Flashcards

What type of machine learning problem does logistic regression solve?


Binary classification.

Is logistic regression a generative or discriminative method?


Discriminative.

Can you define $\sigma(x)$, the sigmoid function?


\[\sigma(x) = \frac{1}{1 + \exp(-x)}\]

How does logistic regression model $p(y = 1 \mid \pmb x, \pmb w)$?


\[p(y = 1 \mid \pmb x, \pmb w) = \sigma(\pmb w^T \pmb x) = \frac{1}{1 + \exp(-\pmb w^T \pmb x)}\]

Logistic regression models $p(y = 1 \mid \pmb x, \pmb w)$ as $\sigma(\pmb w^T \pmb x) = \frac{1}{1 + \exp(-\pmb w^T \pmb x)}$. How do we then use this to make predictions, i.e. decide the category of $\pmb x _ \text{new}$?


See if $\sigma(\pmb w^T \pmb x) > 1/2$ (or some other threshold value).

Give the negative log-likelihood $\text{NLL}(\pmb y \mid \pmb X, \pmb w)$ for a logistic regression model.


First,

\[p(\pmb y \mid \pmb X, \pmb w) = \prod^N_{i=1} \sigma(\pmb w^T \pmb x_i)^{y_i} (1-\sigma(\pmb w^T \pmb x_i))^{1-y_i}\]

Then,

\[\text{NLL}(\pmb y \mid \pmb X, \pmb w) = -\sum^N_{i=1}(y_i \log(\sigma(\pmb w^T \pmb x_i)) + (1-y_i)\log(1-\sigma(\pmb w^T \pmb x_i)))\]

What is the “iteratively reweighted least squares” method?


A technique for finding parameters for logistic regression, based on iteratively solving a weighted least squares problem.

Define the softmax function on a vector $a \in \mathbb R^C$ and describe why it is useful?::

\[\text{softmax}([a_1, \ldots, a_C]^T) = \left[\frac {e^{a_1}\\,} Z, \ldots \frac {e^{a_C}\\,} Z \right]^T\]

where

\[Z = \sum^C_{i=1} e^{a_i}\]

Useful because it converts an unbounded vector of $C$ numbers into “probabilities” that correspond to something belonging to $C$ different categories.

In multiclass logistic regression, how is $p(y \mid \pmb x, \pmb W)$ where $\pmb x \in \mathbb R^C$ and $\pmb W \in \mathbb R^{D \times C}$ defined (let $\pmb w _ c$ denote the $c$-th column of $\pmb W$)?


\[p(y \mid \pmb x, \pmb W) = \text{softmax}([\pmb w_1^T \pmb x, \ldots, \pmb w_C^T \pmb x])\]

The NLL of logistic regression is given by

\[-\sum^N_{i=1} (y_i \log \mu_i + (1 - y_i)\log(1 - \mu_i))\]

where

\[\mu_i = \sigma(\pmb w^\top \pmb x_i)\]

Quickly derive $\partial _ {\pmb w} \text{NLL}$ and the Hessian, and use Newton’s method to define an update rule that can be used to calculate the weights.


\[\begin{aligned} \partial_{\pmb w} \text{NLL}(\pmb y \mid X, \pmb w) &= \sum^N_{i = 1} \pmb x_i (\mu_i - y_i) \end{aligned}\] \[\begin{aligned} H_{\pmb w} (\text{NLL}) &= J(\sum^N_{i = 1} \pmb x_i (\mu_i - y_i))^\top \\\\ &= \sum^N_{i = 1} \pmb x_i (\partial_{\pmb w} (\mu_i - y_i))^\top &&\text{since we are taking deriv wrt. } \pmb w\\\\ &= \sum^N_{i = 1} \pmb x_i (\pmb x_i \mu_i (1 - \mu_i))^\top \\\\ &= \sum^N_{i = 1} \pmb x_i (\mu_i (1 - \mu_i)) \pmb x_i^\top \\\\ &= X^\top S X \end{aligned}\]

where

\[S := \text{diag}(\mu_i (1 - \mu_i))\]

It can be shown $S$ is positive definite, so $X^\top S X$ is positive semidefinite.

Then

\[\begin{aligned} \pmb g_t &= X^\top (\pmb \mu_t - \pmb y) = -X^\top (\pmb y - \pmb \mu_t) \\\\ \pmb H_t &= X^\top S_t X \end{aligned}\]

So Newton’s update rule gives

\[\begin{aligned} \pmb w_{t + 1} &= \pmb w_t - \pmb H_t^{-1} \pmb g_t \\\\ &= \pmb w_t + (X^\top S_t X)^{-1} X^\top (\pmb y - \mu_t) \\\\ &= (X^\top S_t X)^{-1} X^\top S_t (X \pmb w_t + S_t^{-1} (\pmb y - \pmb \mu_t)) \\\\ &= (X^\top S_t X)^{-1}(X^\top S_t z_t) \end{aligned}\]

where

\[\pmb z_t = X\pmb w_t + S^{-1}_t (\pmb y - \pmb \mu_t)\]

Deriving the update used in Newton’s method for the MLE of logistic regression gives

\[\pmb w_{t + 1} =(X^\top S_t X)^{-1}(X^\top S_t z_t)\]

where

\[\pmb z_t = X\pmb w_t + S^{-1}_t (\pmb y - \pmb \mu_t)\]

and

\[S := \text{diag}(\mu_i (1 - \mu_i))\]

How can you recognise this is a solution to a least squares problem?


It’s equivalent to the solution of

\[\min_{\pmb w} \quad \sum^N_{i = 1}S_{t,ii} (z_{t,i} - \pmb w^\top \pmb x_i)^2\]



Related posts