Lecture - Theories of Deep Learning MT25, IV, Data classes for which DNNs can overcome the curse of dimensionality and Attention modules

[[Course - Theories of Deep Learning MT25]]^U

Visualise the figure in Hein (2020) describing the “hidden manifold model”.

@Define the hidden manifold model as in Hein (2020) for generating datasets.

\[X = f(CF / \sqrt d) \in \mathbb R^{p,n}\]

where

$F \in \mathbb R^{d, n}$ are the $d$ features used to represent the data
$C \in \mathbb R^{p, d}$ combines the $d < n < p$ features
$f$ is a nonlinear function

How does Pascanu (2014) make an argument for the expressivity of ReLU DNNs based on hyperplane arrangements?

Since ReLU is piecewise linear, the output of a ReLU DNN is a piecewise linear function. They show that the number of possible regions of the input where the function can take on a different linear function is at least

\[\prod^L _ {\ell = 0} n _ \ell^{\text{min}\{n _ 0, n _ \ell / 2\}}\]

where the input is $\mathbb R^{n _ 0}$ and the hidden layers are of width $n _ 1, \ldots, n _ L$.

How does Raghu (2016) make an argument for the expressivity of DNNs based on trajectory lengths?

They show that a circle passed through a random DNN with an increasing number of layers can draw more and more complicated shapes, formalised as a growing arc length in expectation.

Other notes

Often nets are unnecessarily wide (in terms of expressivity) because this makes them feasible to train

Lecture - Theories of Deep Learning MT25, IV, Data classes for which DNNs can overcome the curse of dimensionality and Attention modules

Other notes

Papers mentioned

Other resources

Related posts