Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlation
- How does the behaviour of randomly initialised neural networks depend on the choice of random numbers and the choice of activation function?
- 2
- The layer 4 values are very close to zero for a long time
- 3
- Compute histogram of activations, approximately Gaussian
- A better choice of the variance for the initialisations at layer $w$ ($\sigma _ w$) means that the variance of the activations is approximately constant
- Before the variance tends towards zero as you go through the layers
- 4
- Do the same thing with the
Papers mentioned
- http://proceedings.mlr.press/v9/glorot10a.html