Course - Theories of Deep Learning MT25

Created: October 09, 2025 | Updated: November 20, 2025 | Read markdown | About these notes | View in context | Study these flashcards

My notes for this course are a little different from my other University Notes^U, since (at least now) it is assessed by mini-project at the end of the term; this means I’m trying to optimise more for understanding[^1] rather than exam grades. For this reason, some of the things I take notes on here might not actually be covered in the course explicitly (e.g. Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension^U).

Notes

Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension^U

Lectures

Reading List

Each lecture above is annotated with the articles and papers that were mentioned. Once a week, we also receive amount of

Week 1
- Paper - Gradient-based learning applied to document recognition, LeCun^U
- Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)^U
- Any of the papers description an application of deep learning in Lecture - Theories of Deep Learning MT25, II, Why deep learning^U
Week 2
- Paper - Error bounds for approximations with deep ReLU networks, Yarotsky (2016)^U
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Class 1
- Paper - Attention Is All You Need (2017)^U
Class 2
Class 3
- The Mathematics of
- Better understanding of why SGD with momentum actually outperforms ADADELTA and understanding the explanation they give in the paper
- Why the encoder vs decoder distinction

See:

Problem Sheets

Questions / To-Do List

Implement proof that “each MNIST digit class is contained on a locally less than 15 dimensional space”
Not known whether you can achieve the optimal $\epsilon^{-d/n}$ width using just one activation function, although it is possible with 2

Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension^U

(incoming)
Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlation^U

(incoming)
Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions^U

(incoming)
Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs^U

(incoming)
Lecture - Theories of Deep Learning MT25, XVI, Ingredients for a successful mini-project report^U

(incoming)
Lecture - Theories of Deep Learning MT25, VI, Controlling the variance of the Jacobian's spectrum^U

(incoming)
Lecture - Theories of Deep Learning MT25, I, Three ingredients of deep learning^U

(incoming)
Lecture - Theories of Deep Learning MT25, II, Why deep learning^U

(incoming)
Lecture - Theories of Deep Learning MT25, XV, A few things we missed and a summary^U

(incoming)
Lecture - Theories of Deep Learning MT25, XIII, Autoencoders^U

(incoming)
Lecture - Theories of Deep Learning MT25, XI, Visualising the filters and response in a CNN^U

(incoming)
Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depth^U

(incoming)
Paper - Optimal nonlinear approximation, DeVore (1989)^U

(incoming)
Paper - Attention Is All You Need (2017)^U

(incoming)
Paper - ADADELTA, An Adaptive Learning Rate Method^U

(incoming)
Paper - Error bounds for approximations with deep ReLU networks, Yarotsky (2016)^U

(incoming)
Paper - Gradient-based learning applied to document recognition, LeCun^U

(incoming)
Paper - Exponential expressivity in deep neural networks through transient chaos (2016)^U

(incoming)
Paper - Explaining and harnessing adversarial examples (2015)^U

(incoming)
Paper - When and when can deep networks avoid the curse of dimensionality, Poggio (2016)^U

(incoming)
Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)^U

(incoming)
Article - Deep, deep trouble, Elad^U

(incoming)
Part C^U

(incoming)
University Notes^U

(incoming)
Courses MT25^U

(incoming)
Course - Geometric Deep Learning HT26^U

(incoming)
Paper - Dynamics of Transient Structure in In-Context Linear Regression Transformers^N

(incoming)