Machine Learning MT23, Cross-entropy loss

Created: March 18, 2024 | About these notes | View in context | Study these flashcards

[[Course - Machine Learning MT23]]^U

Flashcards

Suppose:

We are training a classifier with cross-entropy loss
There are $M$ classes
Given a datapoint $\pmb x$, our predicted probability it belongs to class $c$ is denoted $\hat p(\pmb x = c)$.
$\hat p$ is assumed to depend on model parameters $\theta$

In this context,

What is the expression for the cross-entropy loss given a single datapoint $\pmb x$ and its true class $y$, written $\ell (\theta \mid \pmb x, y)$
How does this simplify if we are performing binary classification (i.e. there are just two classes, and $y = 0$ or $y = 1$)?

?

For multiple classes:

\[\ell(\theta \mid \pmb x, y) = -\sum^M_{c \text{ class}\\,} \mathbb 1(y = c) \cdot \log(\hat p(\pmb x = c))\]

For just two classes, say $c = 0$ or $c = 1$:

\[\ell(\theta \mid \pmb x, y) = -\Big[y\log(\hat p(\pmb x = 1)) + (1 - y)\log(1 - \hat p(\pmb x = 1)) \Big]\]

Proofs

Related posts

[[Course - Machine Learning MT23]]^U

(outgoing)
[[Notes - Optimisation for Data Science HT25, Motivation and examples]]^U

(incoming)
[[Notes - Machine Learning MT23, Naïve Bayes classifiers]]^U

(sim: 0.565)
[[Notes - Machine Learning MT23, Linear regression]]^U

(sim: 0.527)
[[Notes - Machine Learning MT23, Gaussian discriminant analysis]]^U

(sim: 0.513)
[[Notes - Machine Learning MT23, Maximum likelihood principle]]^U

(sim: 0.5)
[[Notes - Machine Learning MT23, Gradient descent]]^U

(sim: 0.499)