# Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depth

> Source: https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/depth/ · Updated: 2025-10-19 · Tags: uni, lecture

- [Course - Theories of Deep Learning MT25](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/)

Consider the feedforward network with one hidden layer:

- Input: $h_1 = x \in \mathbb R^n$
- Hidden layer: $h_2 = \phi(W^{(1)} h_1 + b^{(1)}) \in \mathbb R^m$
- Output: $H(x, \theta) = \alpha^\top h_3 = \sum^m_{i = 1} \alpha_i \phi(w_i^\top x + b_i)$

with $\phi(t) \in [0,1]$. @State a theorem of Cybenbko (1989) which details the expressivity of these types of networks.::

Let $\phi(t)$ be a continuous monotone function with $\lim_{t \to -\infty} \phi(t) = 0$ and $\lim_{t \to \infty} \phi(t) = 1$, then the set of functions of the form $H(x; \theta) = \sum^m_{i = 1} \alpha_i \phi(w_i^\top x + b_i)$ is dense in $C_n([0, 1])$.

In other words, a one-layer fully connected net is sufficient to approximate any continuous function, provided $m$ is large enough (although [Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-representation-benefits-of-deep-feedforward-networks-telgarsky-2015/) shows that some functions require $m$ to be exponentially large).

### Papers mentioned
- [Approximation by superpositions of a sigmoidal function](https://link.springer.com/article/10.1007/BF02551274)
- [Approximation capabilities of multilayer feedforward networks](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T)
- [Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-representation-benefits-of-deep-feedforward-networks-telgarsky-2015/) ⭐️
- [Paper - Error bounds for approximations with deep ReLU networks, Yarotsky (2016)](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-error-bounds-for-approximations-with-deep-relu-networks-yarotsky-2016/) ⭐️
- [Paper - Optimal nonlinear approximation, DeVore (1989)](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-optimal-nonlinear-approximation-devore-1989/)
- [Rational neural networks](https://arxiv.org/abs/2004.01902)
- [Optimal Approximation Complexity of High-Dimensional Functions with Neural Networks](https://arxiv.org/abs/2301.13091)
- [Nonlinear Approximation and (Deep) ReLU Networks](https://arxiv.org/pdf/1905.02199)
- [OPTIMAL APPROXIMATION WITH SPARSELY CONNECTED DEEP NEURAL NETWORKS](https://www.mins.ee.ethz.ch/pubs/files/deep-approx-18.pdf)

---
Olly Britton — https://ollybritton.com. Machine-readable index: https://ollybritton.com/llms.txt