Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions
This lecture and the next (Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNsU) are effectively a mini-speedrun of Course - Optimisation for Data Science HT25U. In particular, this lecture covered results on the convergence of stochastic gradient descent and how to decrease the noise floor: