Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depth


Consider the feedforward network with one hidden layer:

  • Input: $h _ 1 = x \in \mathbb R^n$
  • Hidden layer: $h _ 2 = \phi(W^{(1)} h _ 1 + b^{(1)}) \in \mathbb R^m$
  • Output: $H(x, \theta) = \alpha^\top h _ 3 = \sum^m _ {i = 1} \alpha _ i \phi(w _ i^\top x + b _ i)$

with $\phi(t) \in [0,1]$. @State a theorem of Cybenbko (1989) which details the expressivity of these types of networks.


Let $\phi(t)$ be a continuous monotone function with $\lim _ {t \to -\infty} \phi(t) = 0$ and $\lim _ {t \to \infty} \phi(t) = 1$, then the set of functions of the form $H(x; \theta) = \sum^m _ {i = 1} \alpha _ i \phi(w _ i^\top x + b _ i)$ is dense in $C _ n([0, 1])$.

In other words, a one-layer fully connected net is sufficient to approximate any continuous function, provided $m$ is large enough (although [[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]] shows that some functions require $m$ to be exponentially large).

Papers mentioned




Related posts