Notes - Machine Learning MT23, Convolutional neural networks
Flashcards
Suppose we have a $100 \times 100$ image tensor and a convolutional filter of size $5 \times 5$ with a stride of $2$ in both directions. What is the size of the output tensor?
What does it mean to take the “dot product” of a $W \times H \times C$ filter tensor with a $W \times H \times C$ patch of an image (recall that images are three dimensional when considering colour channels)?
Multiply elementwise, then add up.
What is the padding parameter of a convolutional filter?
The number of rows or columns of zeroes added around an image so that the filter can apply to more of the image without running off the edge.
What is the stride parameter of a convolutional filter?
The size of the step in each direction.
How would you represent an $m \times n$ colour image as a tensor, and what would corresponding convolutional filters look like?
one layer for each channel, and then filters are of the form
\[W \times H \times 3\]What is the max-pool operation in a convolutional neural network?
Take the largest value in a small patch of a tensor.
Describe the inputs and outputs of a layer in a convolutional neural network with an arbitrary stride size $s$ and no zero-padding.
Inputs:
- $\mathbf I$: input tensor, $C _ I \times H _ I \times W _ I$ (think of this as the activation from the previous layer)
- $\mathbf F$: filter tensor, $C _ O \times C _ I \times H _ F \times W _ F$ (for each of the output layers, combine the input in this channel at this $y$ and this $x$)
- $\mathbf B$: bias tensor, $C _ O \times C _ I \times H _ F \times W _ F$
- $\mathbf O$: output tensor, $C _ O \times H _ O \times W _ O$ (think of this as the pre-activation)
Where:
- $C _ I$ is the number of channels in the input (e.g. RGB)
- $C _ O$ is the number of channels in the output
- $H _ I \times W _ I$ are the dimensions of each channel in the input
- $H _ O \times W _ O$ are the dimensions of each channel in the output
Then
\[\mathbf O[u][y][x] = \mathbf B[u] + \sum^{C_ I}_ {i = 1} \sum^{H_ F}_ {j = 1} \sum^{W_F}_ {k = 1} \mathbf I[i][sx + j][sy + k] \times \mathbf F[u][i][j][k]\]In general, what is a convolutional filter $f$ in a convolutional nerual network?
A tensor of dimension $W _ f \times H _ f \times C _ l$ where $C _ l$ is the number of channels in the previous layer.
Suppose the input to a convolutional layer is $m \times n \times c$ and we apply $f$ filters of dimension $w \times h \times c$. What’s the dimension of the output?
(i.e. noting that the third dimension comes from the number of filters, rather than how the filters are applied)