Paper - Explaining and harnessing adversarial examples (2015)


  • Full title: Explaining and harnessing adversarial examples
  • Author(s): Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy
  • Year: 2015
  • Link: https://arxiv.org/abs/1412.6572
  • Relevant for:

Summary

  • Lots of intuition and explanations for why the fast gradient sign method works
  • Focus only on adversarial examples that cause the model to predict some incorrect class, rather than a particular class.
  • The main idea is that you want to change the activation as much as possible, so for $w^\top \tilde x = w^\top (x + \eta) = w^\top x + w^\top \eta$, setting $\eta = \text{sign}(w)$ does this as much as possible. Applying this to the linearised cost function and restricting the adversarial examples to only change by a max of $\epsilon$, this gives $\eta = \epsilon \text{sign}(\nabla _ x \mathcal L(\theta, x, y))$
  • Can use this to regularise the network by adding this to the loss

Flashcards




Related posts