Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions
This lecture and the next ([[Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs]]U) are effectively a mini-speedrun of [[Course - Optimisation for Data Science HT25]]U. In particular, this lecture covered results on the convergence of stochastic gradient descent and how to decrease the noise floor: