Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs
This lecture and the previous (Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensionsU) are effectively a mini-speedrun of Course - Optimisation for Data Science HT25U. This lecture in particular covered momentum in the context of mini-batch stochastic gradient descent:
- Notes - Optimisation for Data Science HT25, Accelerated methodsU
- Notes - Optimisation for Data Science HT25, Nesterov’s accelerated gradient methodU
It also covers techniques not mentioned in Course - Optimisation for Data Science HT25U, including:
- Adaptive subgradients (AdaGrad)
- RMSProp
- AdaDelta
- Adam
- AdaGrad with an adaptive stepsize rule