# Course - Theories of Deep Learning MT25

> Source: https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/ · Updated: 2025-11-20 · Tags: uni, course

- [Course webpage](https://courses.maths.ox.ac.uk/course/view.php?id=6108)
- Lecture notes
	- [1, Three ingredients of deep learning](https://courses.maths.ox.ac.uk/pluginfile.php/118353/mod_resource/content/1/slides1_tdl.pdf)
	- [2, Why deep learning](https://courses.maths.ox.ac.uk/pluginfile.php/118354/mod_resource/content/1/slides2_tdl.pdf)
	- [3, Exponential expressivity with depth](https://courses.maths.ox.ac.uk/pluginfile.php/118355/mod_resource/content/3/Lecture%203%20Slides.pdf)
	- [4, Data classes for which DNNs can overcome the curse of dimensionality](https://courses.maths.ox.ac.uk/pluginfile.php/118356/mod_resource/content/1/Lecture%204%20slides.pdf)
	- [5, Controlling the exponential growth of variance and correlation](https://courses.maths.ox.ac.uk/pluginfile.php/118357/mod_resource/content/2/slides5_tdl.pdf)
	- [6, Controlling the variance of the Jacobian's spectrum](https://courses.maths.ox.ac.uk/pluginfile.php/118358/mod_resource/content/1/slides6_tdl.pdf)
	- [7, Stochastic gradient descent and its extensions](https://courses.maths.ox.ac.uk/pluginfile.php/118359/mod_resource/content/3/slides7_tdl.pdf)
	- [8, Optimization algorithms for training DNNs](https://courses.maths.ox.ac.uk/pluginfile.php/118360/mod_resource/content/1/slides8_tdl.pdf)
	- [9, Topology of the loss landscape](https://courses.maths.ox.ac.uk/pluginfile.php/118361/mod_resource/content/1/slides9_tdl.pdf)
	- [10, Observations of the loss landscape](https://courses.maths.ox.ac.uk/pluginfile.php/118362/mod_resource/content/1/slides10_tdl.pdf)
	- [11, Visualising the filters and response in a CNN](https://courses.maths.ox.ac.uk/pluginfile.php/118363/mod_resource/content/2/Slides%20lecture%2011.pdf)
	- [12, The scattering transform and into auto-encoders](https://courses.maths.ox.ac.uk/pluginfile.php/118364/mod_resource/content/2/Lecture%2012%20slides.pdf)
	- [13, Autoencoders](https://courses.maths.ox.ac.uk/pluginfile.php/118365/mod_resource/content/1/slides13_tdl.pdf)
	- [14, Generative adversarial networks](https://courses.maths.ox.ac.uk/pluginfile.php/118366/mod_resource/content/1/slides14_tdl.pdf)
	- [15, A few things we missed and a summary](https://courses.maths.ox.ac.uk/pluginfile.php/118367/mod_resource/content/1/slides15_tdl.pdf)
	- [16, Ingredients for a successful mini-project report](https://courses.maths.ox.ac.uk/pluginfile.php/118368/mod_resource/content/1/slides16_tdl.pdf)
	- [Guest talk on PINNs](https://courses.maths.ox.ac.uk/pluginfile.php/118372/mod_resource/content/1/pinns_lecture_tdl_v2.pdf)
- Lecture recordings
	- [2024-2025](https://ox.cloud.panopto.eu/Panopto/Pages/Sessions/List.aspx?embedded=1#folderID=%22e51140ef-cbe8-4beb-b980-b1b900988c89%22)
	- [2025-2026](https://ox.cloud.panopto.eu/Panopto/Pages/Sessions/List.aspx?embedded=1#folderID=%22104986ec-7f82-4ec9-813f-b31a007918ba%22)
- Other courses this term: [Courses MT25](https://ollybritton.com/notes/uni/part-a/mt25/)

My notes for this course are a little different from my other [University Notes](https://ollybritton.com/notes/uni/), since (at least now) it is assessed by mini-project at the end of the term; this means I'm trying to optimise more for understanding[^1] rather than exam grades. For this reason, some of the things I take notes on here might not actually be covered in the course explicitly (e.g. [Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/notes/)).

### Notes
- [Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/notes/)

### Lectures
- [Lecture - Theories of Deep Learning MT25, I, Three ingredients of deep learning](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/learning/)
- [Lecture - Theories of Deep Learning MT25, II, Why deep learning](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/learning/)
- [Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depth](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/depth/)
- [Lecture - Theories of Deep Learning MT25, IV, Data classes for which DNNs can overcome the curse of dimensionality](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/dimensionality/)
- [Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlation](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/correlation/)
- [Lecture - Theories of Deep Learning MT25, VI, Controlling the variance of the Jacobian's spectrum](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/spectrum/)
- [Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/extensions/)
- [Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/dnns/)
- [redacted](https://ollybritton.com/404)
- [redacted](https://ollybritton.com/404)
- [Lecture - Theories of Deep Learning MT25, XI, Visualising the filters and response in a CNN](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/cnn/)
- [Lecture - Theories of Deep Learning MT25, XII, The scattering transform and into auto-encoders](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/auto-encoders/)
- [Lecture - Theories of Deep Learning MT25, XIII, Autoencoders](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/autoencoders/)
- [Lecture - Theories of Deep Learning MT25, XIV, Generative adversarial networks](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/networks/)
- [Lecture - Theories of Deep Learning MT25, XV, A few things we missed and a summary](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/summary/)
- [Lecture - Theories of Deep Learning MT25, XVI, Ingredients for a successful mini-project report](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/report/)

### Reading List
Each lecture above is annotated with the articles and papers that were mentioned. Once a week, we also receive  amount of 

- Week 1
	- [Paper - Gradient-based learning applied to document recognition, LeCun](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-gradient-based-learning-applied-to-document-recognition-lecun/)
	- [Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-representation-benefits-of-deep-feedforward-networks-telgarsky-2015/)
	- Any of the papers description an application of deep learning in [Lecture - Theories of Deep Learning MT25, II, Why deep learning](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/learning/)
- Week 2
	- [Paper - Error bounds for approximations with deep ReLU networks, Yarotsky (2016)](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-error-bounds-for-approximations-with-deep-relu-networks-yarotsky-2016/)
- Week 3
	- [Activation function design for deep networks: linearity and effective initialisation, Murray](https://arxiv.org/abs/2105.07741)
	- [Exponential expressivity in deep neural networks through transient chaos, Poole](https://arxiv.org/pdf/1606.05340.pdf)
	- [The emergence of spectral universality in deep networks, Pennington](https://arxiv.org/pdf/1802.09979.pdf)
	- [Rapid training of deep neural networks without skip connections or normalisation layers using Deep Kernel Shaping, Martens](https://arxiv.org/pdf/2110.01765)
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Class 1
	- [Paper - Attention Is All You Need (2017)](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-attention-is-all-you-need-2017/)
- Class 2
- Class 3
	- The Mathematics of 
	- Better understanding of why SGD with momentum actually outperforms ADADELTA and understanding the explanation they give in the paper
	- Why the encoder vs decoder distinction

### Related Notes
See:

- [Course - Machine Learning MT23](https://ollybritton.com/notes/uni/part-a/mt23/machine-learning/)
	- [Notes - Machine Learning MT23, Matrix calculus](https://ollybritton.com/notes/uni/part-a/mt23/machine-learning/notes/matrix-calculus/)
- [Course - Uncertainty in Deep Learning MT25](https://ollybritton.com/notes/uni/part-c/mt25/uncertainty-in-deep-learning/)
- [Course - Geometric Deep Learning HT26](https://ollybritton.com/notes/uni/part-c/ht26/geometric-deep-learning/)
- [Course - Continuous Mathematics HT23](https://ollybritton.com/notes/uni/prelims/ht23/continuous-mathematics/)
	- [Notes - Continuous Mathematics HT23, Derivatives](https://ollybritton.com/notes/uni/prelims/ht23/continuous-mathematics/notes/derivatives/)
- [Course - Optimisation for Data Science HT25](https://ollybritton.com/notes/uni/part-b/ht25/optimisation-for-data-science/)
	- [Notes - Optimisation for Data Science HT25, Stochastic gradient descent](https://ollybritton.com/notes/uni/part-b/ht25/optimisation-for-data-science/notes/stochastic-gradient-descent/)
	- [Notes - Optimisation for Data Science HT25, Misc](https://ollybritton.com/notes/uni/part-b/ht25/optimisation-for-data-science/notes/misc/#Multivariate%20chain%20rules)

### Problem Sheets
- [Sheet 1](https://courses.maths.ox.ac.uk/pluginfile.php/118347/mod_assign/introattachment/0/assignment1.pdf), [solutions to A&C](https://courses.maths.ox.ac.uk/pluginfile.php/118347/mod_assign/introattachment/0/assignment1_solutionsAC.pdf), [redacted](https://ollybritton.com/404)
- [Sheet 2](https://courses.maths.ox.ac.uk/pluginfile.php/118348/mod_assign/introattachment/0/assignment2.pdf), [solutions to A,B,C](https://courses.maths.ox.ac.uk/pluginfile.php/118348/mod_assign/introattachment/0/assignment2_solutions_ABC.pdf), [redacted](https://ollybritton.com/404)
- [Sheet 3](https://courses.maths.ox.ac.uk/pluginfile.php/118351/mod_assign/introattachment/0/assignment3.pdf), [solutions to A&C](https://courses.maths.ox.ac.uk/pluginfile.php/118351/mod_assign/introattachment/0/assignment3_solutionsAC.pdf), [redacted](https://ollybritton.com/404)
- [Sheet 4](https://courses.maths.ox.ac.uk/pluginfile.php/118352/mod_assign/introattachment/0/assignment4.pdf), [solutions to A,B,C](https://courses.maths.ox.ac.uk/pluginfile.php/118352/mod_assign/introattachment/0/assignment4_solutionsABC.pdf), [redacted](https://ollybritton.com/404)

### Questions / To-Do List
- [ ] Implement proof that "each MNIST digit class is contained on a locally less than 15 dimensional space"
- [ ] Not known whether you can achieve the optimal $\epsilon^{-d/n}$ width using just one activation function, although it is possible with 2

---
Olly Britton — https://ollybritton.com. Machine-readable index: https://ollybritton.com/llms.txt