Course - Theories of Deep Learning MT25
- Course webpage
- Lecture notes
- 1, Three ingredients of deep learning
- 2, Why deep learning
- 3, Exponential expressivity with depth
- 4, Data classes for which DNNs can overcome the curse of dimensionality
- 5, Controlling the exponential growth of variance and correlation
- 6, Controlling the variance of the Jacobian’s spectrum
- 7, Stochastic gradient descent and its extensions
- 8, Optimization algorithms for training DNNs
- 9, Topology of the loss landscape
- 10, Observations of the loss landscape
- 11, Visualising the filters and response in a CNN
- 12, The scattering transform and into auto-encoders
- 13, Autoencoders
- 14, Generative adversarial networks
- 15, A few things we missed and a summary
- 16, Ingredients for a successful mini-project report
- Guest talk on PINNs
- Lecture recordings
- Other courses this term: Courses MT25U
My notes for this course are a little different from my other University NotesU, since (at least now) it is assessed by mini-project at the end of the term; this means I’m trying to optimise more for understanding[^1] rather than exam grades. For this reason, some of the things I take notes on here might not actually be covered in the course explicitly (e.g. Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimensionU).
Notes
Lectures
- Lecture - Theories of Deep Learning MT25, I, Three ingredients of deep learningU
- Lecture - Theories of Deep Learning MT25, II, Why deep learningU
- Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depthU
- Lecture - Theories of Deep Learning MT25, IV, Data classes for which DNNs can overcome the curse of dimensionalityU
- Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlationU
- Lecture - Theories of Deep Learning MT25, VI, Controlling the variance of the Jacobian’s spectrumU
- Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensionsU
- Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNsU
- redacted?
- redacted?
- Lecture - Theories of Deep Learning MT25, XI, Visualising the filters and response in a CNNU
- Lecture - Theories of Deep Learning MT25, XII, The scattering transform and into auto-encodersU
- Lecture - Theories of Deep Learning MT25, XIII, AutoencodersU
- Lecture - Theories of Deep Learning MT25, XIV, Generative adversarial networksU
- Lecture - Theories of Deep Learning MT25, XV, A few things we missed and a summaryU
- Lecture - Theories of Deep Learning MT25, XVI, Ingredients for a successful mini-project reportU
Reading List
Each lecture above is annotated with the articles and papers that were mentioned. Once a week, we also receive amount of
- Week 1
- Paper - Gradient-based learning applied to document recognition, LeCunU
- Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)U
- Any of the papers description an application of deep learning in Lecture - Theories of Deep Learning MT25, II, Why deep learningU
- Week 2
- Week 3
- Activation function design for deep networks: linearity and effective initialisation, Murray
- Exponential expressivity in deep neural networks through transient chaos, Poole
- The emergence of spectral universality in deep networks, Pennington
- Rapid training of deep neural networks without skip connections or normalisation layers using Deep Kernel Shaping, Martens
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Class 1
- Class 2
- Class 3
- The Mathematics of
- Better understanding of why SGD with momentum actually outperforms ADADELTA and understanding the explanation they give in the paper
- Why the encoder vs decoder distinction
Related Notes
See:
- Course - Machine Learning MT23U
- Course - Uncertainty in Deep Learning MT25U
- Course - Geometric Deep Learning HT26U
- Course - Continuous Mathematics HT23U
- Course - Optimisation for Data Science HT25U
Problem Sheets
- Sheet 1, solutions to A&C, redacted?
- Sheet 2, solutions to A,B,C, redacted?
- Sheet 3, solutions to A&C, redacted?
- Sheet 4, solutions to A,B,C, redacted?
Questions / To-Do List
- Implement proof that “each MNIST digit class is contained on a locally less than 15 dimensional space”
- Not known whether you can achieve the optimal $\epsilon^{-d/n}$ width using just one activation function, although it is possible with 2