Lecture - Theories of Deep Learning MT25, VI, Controlling the variance of the Jacobian's spectrum
[[Course - Theories of Deep Learning MT25]]U
- Now instead looking at the spectrum of the Jacobian of the network on initialisation; this was motivated by empirical results showing that the spectrum of the Jacobian has strong effects on how easy the network is to train.
- Results from random matrix theory can be used to calculate the distribution of this spectrum.
Papers mentioned
- [[Paper - Exponential expressivity in deep neural networks through transient chaos (2016)]]U
- [[Paper - The Emergence of Spectral Universality in Deep Networks (2018)]]?
- [[Paper - Activation function design for deep networks: linearity and effective initialisation]]?
Further associated reading
- Identifying natural depth scales of information propagation: https://arxiv.org/pdf/1611.01232.pdf
- Further details on the role of activation functions: https://arxiv.org/pdf/1902.06853.pdf
- Principles for selecting activation functions: https://arxiv.org/pdf/2105.07741.pdf
- Early results on correlation of inputs (Chapter 2 in particular): https://www.cs.toronto.edu/~radford/ftp/thesis.pdf
- Rigorous treatment of Gaussian Process perspective, infinite: https://arxiv.org/pdf/1711.00165.pdf
- Rigorous treatment of Gaussian Process perspective, finite: https://arxiv.org/pdf/1804.11271.pdf
- Higher order terms and width proportional to depth scaling: https://arxiv.org/pdf/2106.10165.pdf
- Specifics for random ReLU nets:
- https://arxiv.org/pdf/1801.03744.pdf
- https://arxiv.org/pdf/1803.01719.pdf