# Lecture - Theories of Deep Learning MT25, I, Three ingredients of deep learning

> Source: https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/lectures/learning/ · Updated: 2025-10-19 · Tags: uni, lecture

- [Course - Theories of Deep Learning MT25](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/)

@State the basic setup for a fully connected DNN.::

A repeated affine transformation followed by a nonlinear action:
$$
h^{(i+1)} = \phi_i\left( W^{(i)} h^{(i)} + b^{(i)} \right)
$$
for $i = 1, \ldots, N-1$, where $W^{(i)} \in \mathbb R^{n_{i+1} \times n_i}$ and $\phi(\cdot)$ is a nonlinear activation.

In a CNN, the weights at a given layer correspond to the filter for a convolution. Naïvely "expanding" the convolution into a big matrix multiplication means your forward pass looks something like this:

![Screenshot 2025-10-18 at 14.05.52.png](https://ollybritton.com/assets/attachments/img/Screenshot 2025-10-18 at 14.05.52.png)

Why is this problematic, and what's one way around it?::

The structure in the filters is annoying since you can't just do gradient descent to adjust the weights. One way around this is to instead reorder the data:

![Screenshot 2025-10-18 at 14.07.11.png](https://ollybritton.com/assets/attachments/img/Screenshot 2025-10-18 at 14.07.11.png)

MNIST digits are $28 \times 28$ greyscale images, which means they live in $\mathbb R^{784}$. What does it mean to say that high correlation in the data causes each MNIST digit class to be contained on a locally less than 15 dimensional space?::

If you were to consider the PCA of the data matrix of 50 nearest $L_2$ neighbours to a given digit, it is well-approximated by a rank-15 matrix.

Can you explain the phrase

> "classifiers can be viewed largely as nullspace maps"

?::

For classification tasks, much of the variation in the inputs are invariants which shouldn't affect the output. This means that there are many directions in the input space that the network should ignore.

#### Papers mentioned
- [Paper - Gradient-based learning applied to document recognition, LeCun](https://ollybritton.com/notes/uni/part-c/mt25/theories-of-deep-learning/reading/paper-gradient-based-learning-applied-to-document-recognition-lecun/)

---
Olly Britton — https://ollybritton.com. Machine-readable index: https://ollybritton.com/llms.txt