Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlation
- How does the behaviour of randomly initialised neural networks depend on the choice of random numbers and the choice of activation function?
- 2
    - The layer 4 values are very close to zero for a long time
 
- 3
    - Compute histogram of activations, approximately Gaussian
- A better choice of the variance for the initialisations at layer $w$ ($\sigma _ w$) means that the variance of the activations is approximately constant
- Before the variance tends towards zero as you go through the layers
 
- 4
    - Do the same thing with the
 
Papers mentioned
- http://proceedings.mlr.press/v9/glorot10a.html