# Paper - Explaining and harnessing adversarial examples (2015)

> Source: https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-explaining-and-harnessing-adversarial-examples-2015/ · Updated: 2025-11-18 · Tags: uni, notes

- **Full title**: *Explaining and harnessing adversarial examples*
- **Author(s)**: Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy
- **Year**: 2015
- **Link**: https://arxiv.org/abs/1412.6572
- **Relevant for**:
	- [Course - Theories of Deep Learning MT25](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/)

### Summary
- Lots of intuition and explanations for why the fast gradient sign method works
- Focus only on adversarial examples that cause the model to predict some incorrect class, rather than a particular class.
- The main idea is that you want to change the activation as much as possible, so for $w^\top \tilde x = w^\top (x + \eta) = w^\top x + w^\top \eta$, setting $\eta = \text{sign}(w)$ does this as much as possible. Applying this to the linearised cost function and restricting the adversarial examples to only change by a max of $\epsilon$, this gives $\eta = \epsilon \text{sign}(\nabla_x \mathcal L(\theta, x, y))$
- Can use this to regularise the network by adding this to the loss

### Flashcards

---
Olly Britton — https://ollybritton.com. Machine-readable index: https://ollybritton.com/llms.txt
