# Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs

> Source: https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/dnns/ · Updated: 2025-11-15 · Tags: uni, lecture

- [Course - Theories of Deep Learning MT25](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/)

This lecture and the previous ([Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/extensions/)) are effectively a mini-speedrun of [Course - Optimisation for Data Science HT25](https://ollybritton.com/notes/uni/part-b/ht25/optimisation-for-data-science/). This lecture in particular covered momentum in the context of mini-batch stochastic gradient descent:

- [Notes - Optimisation for Data Science HT25, Accelerated methods](https://ollybritton.com/notes/uni/part-b/ht25/optimisation-for-data-science/notes/accelerated-methods/)
- [Notes - Optimisation for Data Science HT25, Nesterov's accelerated gradient method](https://ollybritton.com/notes/uni/part-b/ht25/optimisation-for-data-science/notes/nesterovs-accelerated-gradient-method/)

It also covers techniques not mentioned in [Course - Optimisation for Data Science HT25](https://ollybritton.com/notes/uni/part-b/ht25/optimisation-for-data-science/), including:

- Adaptive subgradients (AdaGrad)
- RMSProp
- AdaDelta
- Adam
- AdaGrad with an adaptive stepsize rule

### Papers mentioned
- [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](https://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
- [Paper - ADADELTA, An Adaptive Learning Rate Method](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-adadelta-an-adaptive-learning-rate-method/)
- [rmsprop](https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
- [Adam: A Method for Stochastic Optimization](https://arxiv.org/pdf/1412.6980.pdf)
- [AdaGrad stepsizes: Sharp convergence over nonconvex landscapes](https://arxiv.org/pdf/1806.01811)

---
Olly Britton — https://ollybritton.com. Machine-readable index: https://ollybritton.com/llms.txt
