Paper - Exponential expressivity in deep neural networks through transient chaos (2016)
- Full title: Exponential expressivity in deep neural networks through transient chaos
- Author(s): Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli
- Year: 2016
- Link: https://arxiv.org/pdf/1606.05340
- Relevant for:
- Links to:
Summary
- We want to understand why deep learning is so effective: the paper aims to make two pieces of intuition precise:
- Deep networks can express highly complex functions that shallow networks with the same number of neurons cannot
- Deep neural networks can disentangle highly curved manifolds in the input space into flattened manifolds in hidden space
- Previous work (e.g. [[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]]U) has shown that there exist functions that require an exponential number of neurons in shallow networks but only a polynomial number in neurons
- Limited theory applying only to specific nonlinearities
- Study random deep feedforward networks, with weights drawn from Gaussians with variance $\sigma^2 _ w / N _ {l-1}$.
- First considers how the length of an input is changed as it propagates through the layers of a neural network; summarises this in terms of a “variance map” $\mathcal V$
- Then considers how the similarity between two points in the network changes as it propagates through the layers; summarises this in terms of a “covariance map” $\mathcal C$, and you recover the variance map by looking at the diagonal
- You can interpret this through the Jacobian at some point in a particular layer
- Then considers how one dimensional manifolds are mapped in the input space
- Can consider the extrinsic curvature of this curve
- Can also look at the length of its image in the ambient Euclidean space
- Can look at the Grassmannian length, which is a measure of length which takes into account the curvature
- Then they prove that shallow networks cannot achieve exponential expressivity by proving a general upper bound on the Euclidean length
Flashcards
Questions
- Could you extend this to a Bayesian Neural Networks setting?
- One (the?) advantage of BNNs is that they give you an estimate of the predictive distribution rather than just a single prediction. One way of looking at this paper is that they are considering quantitative measures of how a one-dimensional manifold in the input space gets sufficiently (i.e. exponentially) jumbled up when it’s passed through a random network.
- For BNNs, you would instead be looking at how complex the distribution over the values of each hidden layer becomes, probably (?) recovering the same results if you were to take the MLE of the distribution at each layer.