Slides - Dark knowledge (2014)


  • Full title: Dark knowledge
  • Author(s): Geoffery Hinton, Oriol Vinyals, Jeff Dean
  • Year: 2014
  • Link: https://www.ttic.edu/dl/dark14.pdf
  • Relevant for:

Summary

  • Combine the soft targets and the hard targets
  • Distillation as a way of combining an ensemble of models into one model
  • Dropout as a form of model averaging, over the $2^H$ possible activations. At test time, using all the hidden units (but halving) is equivalent to the geometric mean of the outputs of all the models.
  • Training a model on the outputs of a teacher on 7s and 8s in MNIST still gives 87% accuracy over all the other classes
  • Soft targets are a very good regulariser
  • Self-distillation is a good regulariser

Flashcards




Related posts