Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depth

[[Course - Theories of Deep Learning MT25]]^U

Consider the feedforward network with one hidden layer:

Input: $h _ 1 = x \in \mathbb R^n$
Hidden layer: $h _ 2 = \phi(W^{(1)} h _ 1 + b^{(1)}) \in \mathbb R^m$
Output: $H(x, \theta) = \alpha^\top h _ 3 = \sum^m _ {i = 1} \alpha _ i \phi(w _ i^\top x + b _ i)$

with $\phi(t) \in [0,1]$. @State a theorem of Cybenbko (1989) which details the expressivity of these types of networks.

Let $\phi(t)$ be a continuous monotone function with $\lim _ {t \to -\infty} \phi(t) = 0$ and $\lim _ {t \to \infty} \phi(t) = 1$, then the set of functions of the form $H(x; \theta) = \sum^m _ {i = 1} \alpha _ i \phi(w _ i^\top x + b _ i)$ is dense in $C _ n([0, 1])$.

In other words, a one-layer fully connected net is sufficient to approximate any continuous function, provided $m$ is large enough (although [[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]] shows that some functions require $m$ to be exponentially large).

Papers mentioned

Approximation by superpositions of a sigmoidal function
Approximation capabilities of multilayer feedforward networks
[[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]]^U ⭐️
[[Paper - Error bounds for approximations with deep ReLU networks, Yarotsky (2016)]]^U ⭐️
[[Paper - Optimal nonlinear approximation, DeVore (1989)]]^U
Rational neural networks
Optimal Approximation Complexity of High-Dimensional Functions with Neural Networks
Nonlinear Approximation and (Deep) ReLU Networks
OPTIMAL APPROXIMATION WITH SPARSELY CONNECTED DEEP NEURAL NETWORKS

Papers mentioned

Related posts