Notes - Machine Learning MT23, Cross-entropy loss
Flashcards
Suppose:
- We are training a classifier with cross-entropy loss
- There are $M$ classes
- Given a datapoint $\pmb x$, our predicted probability it belongs to class $c$ is denoted $\hat p(\pmb x = c)$.
- $\hat p$ is assumed to depend on model parameters $\theta$
In this context,
- What is the expression for the cross-entropy loss given a single datapoint $\pmb x$ and its true class $y$, written $\ell (\theta \mid \pmb x, y)$
- How does this simplify if we are performing binary classification (i.e. there are just two classes, and $y = 0$ or $y = 1$)?
?
For multiple classes:
\[\ell(\theta \mid \pmb x, y) = -\sum^M_{c \text{ class}\\,} \mathbb 1(y = c) \cdot \log(\hat p(\pmb x = c))\]For just two classes, say $c = 0$ or $c = 1$:
\[\ell(\theta \mid \pmb x, y) = -\Big[y\log(\hat p(\pmb x = 1)) + (1 - y)\log(1 - \hat p(\pmb x = 1)) \Big]\]