# Paper - Optimal nonlinear approximation, DeVore (1989)

> Source: https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-optimal-nonlinear-approximation-devore-1989/ · Updated: 2025-10-22 · Tags: uni, notes

- **Full title**: *Optimal nonlinear approximation*
- **Author(s)**: Ronald A. DeVore, Ralph Howard, Charlies Micchelli
- **Year**: 1989
- **Link**: https://link.springer.com/article/10.1007/BF01171759
- **Relevant for**:
	- [Course - Theories of Deep Learning MT25](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/)
	- [Paper - Error bounds for approximations with deep ReLU networks, Yarotsky (2016)](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-error-bounds-for-approximations-with-deep-relu-networks-yarotsky-2016/)

### Summary
> We introduce a definition of nonlinear n-widths and then determine the n-widths of the unit ball of the Sobolev space W~" in Lq. We prove that in the sense of these widths the manifold of splines of fixed degree with n free knots is optimal for approximating functions in these Sobolev spaces.

Not sure about the whole paper but mainly useful for the following theorem.

### Flashcards
@State the theorem from DeVore et al. (1989) that gives a lower bound on the number of parameters for function approximation where there is a continuous dependence between the function being approximated and the parameters.::

Fix $d, n$. Let $W$ be a positive integer and $\eta : \mathbb R^W \to C([0, 1]^d)$ be any mapping between the space $\mathbb R^W$ and the space $C([0, 1]^d)$. Suppose that there is a continuous map $\mathcal M : F_{d, n} \to \mathbb R^W$ such that $||f - \eta (\mathcal M(f)||_\infty \le \epsilon$ for all $f \in F_{d, n}$. Then $W \ge c\epsilon^{-d/n}$, with some constant $c$ depending only on $n$.

---

Here $F_{d, n}$ is a particular function class, specifically:

- $\|f\|_{\mathcal W^{n,\infty}([0,1]^d)}= \max_{\mathbf n:\,|\mathbf n|\le n}\operatorname*{ess\,sup}_{\mathbf x\in[0,1]^d}\bigl|D^{\mathbf n} f(\mathbf x)\bigr|$ where $\mathbf n = (n_1, \ldots, n_d) \in \{0, 1, \ldots\}^d, |\mathbf n| = n_1 + \cdots + n_d$, i.e. the maximum size of the weak derivative over the domain, ignoring sets of measure $0$.
- $F_{n, d} = \{f \in \mathcal W^{n, \infty}([0, 1]^d) : ||f||_{\mathcal W^{n, \infty}}([0, 1]^d) \le 1\}$

---
Olly Britton — https://ollybritton.com. Machine-readable index: https://ollybritton.com/llms.txt
