Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions

Created: November 14, 2025 | Updated: November 15, 2025 | Read markdown | About these notes

Course - Theories of Deep Learning MT25^U

This lecture and the next (Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs^U) are effectively a mini-speedrun of Course - Optimisation for Data Science HT25^U. In particular, this lecture covered results on the convergence of stochastic gradient descent and how to decrease the noise floor:

Course - Theories of Deep Learning MT25^U

(outgoing)
Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs^U

(outgoing)
Lecture - Theories of Deep Learning MT25, XIII, Autoencoders^U

(sim: 0.7)
Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlation^U

(sim: 0.691)
Part C^U

(sim: 0.74)
Course - Optimisation for Data Science HT25^U

(outgoing)
Notes - Optimisation for Data Science HT25, Stochastic gradient descent^U

(outgoing)
Notes - Optimisation for Data Science HT25, Stochastic variance reduction methods^U

(outgoing)
Lecture - Machine Learning MT23, XIV^U

(sim: 0.685)