Notes - Machine Learning MT23, Cross-entropy loss


Flashcards

Suppose:

  • We are training a classifier with cross-entropy loss
  • There are $M$ classes
  • Given a datapoint $\pmb x$, our predicted probability it belongs to class $c$ is denoted $\hat p(\pmb x = c)$.
  • $\hat p$ is assumed to depend on model parameters $\theta$

In this context,

  • What is the expression for the cross-entropy loss given a single datapoint $\pmb x$ and its true class $y$, written $\ell (\theta \mid \pmb x, y)$
  • How does this simplify if we are performing binary classification (i.e. there are just two classes, and $y = 0$ or $y = 1$)?

?


For multiple classes:

\[\ell(\theta \mid \pmb x, y) = -\sum^M_{c \text{ class}\\,} \mathbb 1(y = c) \cdot \log(\hat p(\pmb x = c))\]

For just two classes, say $c = 0$ or $c = 1$:

\[\ell(\theta \mid \pmb x, y) = -\Big[y\log(\hat p(\pmb x = 1)) + (1 - y)\log(1 - \hat p(\pmb x = 1)) \Big]\]

Proofs




Related posts