Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs
This lecture and the previous ([[Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions]]U) are effectively a mini-speedrun of [[Course - Optimisation for Data Science HT25]]U. This lecture in particular covered momentum in the context of mini-batch stochastic gradient descent:
- [[Notes - Optimisation for Data Science HT25, Heavy ball method]]U
- [[Notes - Optimisation for Data Science HT25, Nesterov’s accelerated gradient method]]U
It also covers techniques not mentioned in [[Course - Optimisation for Data Science HT25]]U, including:
- Adaptive subgradients (AdaGrad)
- RMSProp
- AdaDelta
- Adam
- AdaGrad with an adaptive stepsize rule