Slides - Dark knowledge (2014)
- Full title: Dark knowledge
- Author(s): Geoffery Hinton, Oriol Vinyals, Jeff Dean
- Year: 2014
- Link: https://www.ttic.edu/dl/dark14.pdf
- Relevant for:
Summary
- Combine the soft targets and the hard targets
- Distillation as a way of combining an ensemble of models into one model
- Dropout as a form of model averaging, over the $2^H$ possible activations. At test time, using all the hidden units (but halving) is equivalent to the geometric mean of the outputs of all the models.
- Training a model on the outputs of a teacher on 7s and 8s in MNIST still gives 87% accuracy over all the other classes
- Soft targets are a very good regulariser
- Self-distillation is a good regulariser