Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depth
Consider the feedforward network with one hidden layer:
- Input: $h _ 1 = x \in \mathbb R^n$
- Hidden layer: $h _ 2 = \phi(W^{(1)} h _ 1 + b^{(1)}) \in \mathbb R^m$
- Output: $H(x, \theta) = \alpha^\top h _ 3 = \sum^m _ {i = 1} \alpha _ i \phi(w _ i^\top x + b _ i)$
with $\phi(t) \in [0,1]$. @State a theorem of Cybenbko (1989) which details the expressivity of these types of networks.
Let $\phi(t)$ be a continuous monotone function with $\lim _ {t \to -\infty} \phi(t) = 0$ and $\lim _ {t \to \infty} \phi(t) = 1$, then the set of functions of the form $H(x; \theta) = \sum^m _ {i = 1} \alpha _ i \phi(w _ i^\top x + b _ i)$ is dense in $C _ n([0, 1])$.
In other words, a one-layer fully connected net is sufficient to approximate any continuous function, provided $m$ is large enough (although [[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]] shows that some functions require $m$ to be exponentially large).
Papers mentioned
- Approximation by superpositions of a sigmoidal function
- Approximation capabilities of multilayer feedforward networks
- [[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]]U ⭐️
- [[Paper - Error bounds for approximations with deep ReLU networks, Yarotsky]]U ⭐️
- [[Paper - Optimal nonlinear approximation, DeVore (1989)]]U
- Rational neural networks
- Optimal Approximation Complexity of High-Dimensional Functions with Neural Networks
- Nonlinear Approximation and (Deep) ReLU Networks
- OPTIMAL APPROXIMATION WITH SPARSELY CONNECTED DEEP NEURAL NETWORKS