Paper - Explaining and harnessing adversarial examples (2015)

Created: November 18, 2025 | Updated: November 18, 2025 | About these notes

Full title: Explaining and harnessing adversarial examples
Author(s): Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy
Year: 2015
Link: https://arxiv.org/abs/1412.6572
Relevant for:
- [[Course - Theories of Deep Learning MT25]]^U

Summary

Lots of intuition and explanations for why the fast gradient sign method works
Focus only on adversarial examples that cause the model to predict some incorrect class, rather than a particular class.
The main idea is that you want to change the activation as much as possible, so for $w^\top \tilde x = w^\top (x + \eta) = w^\top x + w^\top \eta$, setting $\eta = \text{sign}(w)$ does this as much as possible. Applying this to the linearised cost function and restricting the adversarial examples to only change by a max of $\epsilon$, this gives $\eta = \epsilon \text{sign}(\nabla _ x \mathcal L(\theta, x, y))$
Can use this to regularise the network by adding this to the loss

Flashcards

Related posts

[[Course - Theories of Deep Learning MT25]]^U

(outgoing)
[[Paper - Exponential expressivity in deep neural networks through transient chaos (2016)]]^U

(sim: 0.553)
[[Paper - When and when can deep networks avoid the curse of dimensionality, Poggio (2016)]]^U

(sim: 0.516)
[[Lecture - Theories of Deep Learning MT25, I, Three ingredients of deep learning]]^U

(sim: 0.517)
[[Notes - Optimisation for Data Science HT25, Steepest descent]]^U

(sim: 0.528)
[[Notes - Machine Learning MT23, Linear regression]]^U

(sim: 0.508)