Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlation


  • [[Course - Theories of Deep Learning MT25]]U

  • How does the behaviour of randomly initialised neural networks depend on the choice of random numbers and the choice of activation function?
  • 2
    • The layer 4 values are very close to zero for a long time
  • 3
    • Compute histogram of activations, approximately Gaussian
    • A better choice of the variance for the initialisations at layer $w$ ($\sigma _ w$) means that the variance of the activations is approximately constant
    • Before the variance tends towards zero as you go through the layers
  • 4
    • Do the same thing with the

Papers mentioned

  • http://proceedings.mlr.press/v9/glorot10a.html



Related posts