Machine Learning MT23, Convolutional neural networks
Flashcards
Suppose we have a $100 \times 100$ image tensor and a convolutional filter of size $5 \times 5$ with a stride of $2$ in both directions. What is the size of the output tensor?
@Define what it means to take the “dot product” of a $W \times H \times C$ filter tensor with a $W \times H \times C$ patch of an image (recall that images are three dimensional when considering colour channels)?
Multiply elementwise, then add up.
@Define the padding parameter of a convolutional filter.
The number of rows or columns of zeroes added around an image so that the filter can apply to more of the image without running off the edge.
@Define the stride parameter of a convolutional filter.
The size of the step in each direction.
How would you represent an $m \times n$ colour image as a tensor, and what would corresponding convolutional filters look like?
one layer for each channel, and then filters are of the form
\[W \times H \times 3\]@Define the max-pool operation in a convolutional neural network.
Take the largest value in a small patch of a tensor.
@Define the inputs and outputs of a layer in a convolutional neural network with an arbitrary stride size $s$ and no zero-padding.
Inputs:
- $\mathbf I$: input tensor, $C _ I \times H _ I \times W _ I$ (think of this as the activation from the previous layer)
- $\mathbf F$: filter tensor, $C _ O \times C _ I \times H _ F \times W _ F$ (for each of the output layers, combine the input in this channel at this $y$ and this $x$)
- $\mathbf B$: bias tensor, $C _ O \times C _ I \times H _ F \times W _ F$
- $\mathbf O$: output tensor, $C _ O \times H _ O \times W _ O$ (think of this as the pre-activation)
Where:
- $C _ I$ is the number of channels in the input (e.g. RGB)
- $C _ O$ is the number of channels in the output
- $H _ I \times W _ I$ are the dimensions of each channel in the input
- $H _ O \times W _ O$ are the dimensions of each channel in the output
Then
\[\mathbf O[u][y][x] = \mathbf B[u] + \sum^{C _ I} _ {i = 1} \sum^{H _ F} _ {j = 1} \sum^{W _ F} _ {k = 1} \mathbf I[i][sx + j][sy + k] \times \mathbf F[u][i][j][k]\]In general, @define a convolutional filter $f$ in a convolutional neural network?
A tensor of dimension $W _ f \times H _ f \times C _ l$ where $C _ l$ is the number of channels in the previous layer.
Suppose the input to a convolutional layer is $m \times n \times c$ and we apply $f$ filters of dimension $w \times h \times c$. What’s the dimension of the output?
(i.e. noting that the third dimension comes from the number of filters, rather than how the filters are applied)
Suppose:
- We are implementing a convolutional layer between layer $l$ and layer $l+1$
- The output of the $l$-th layer is of the shape $m _ l \times n _ l \times F _ l$, indexed by $a^l _ {i, j, f}$
- We apply $F _ {l+1}$ filters of shape $W _ f \times H _ f \times F _ l$
- We have no zero-padding and a stride of $1$ in each direction
How can you then write the preactivation of layer $l+1$?
Suppose:
- We are implementing a convolutional layer between layer $l$ and layer $l+1$
- The output of the $l$-th layer is of the shape $m _ l \times n _ l \times F _ l$, indexed by $a^l _ {i, j, f}$
- We apply $F _ {l+1}$ filters of shape $W _ f \times H _ f \times F _ l$
- We have no zero-padding and a stride of $1$ in each direction
The preactivation of layer $l+1$ is then given by:
\[z^{l+1} _
{i', j', f'} = b^{l+1, f'} + \sum^{W _
{f'}
} _
{i=1} \sum^{H _
{f'}
} _
{j = 1} \sum^{F _
l} _
{f = 1} a^l _ {i' + i - 1, j' + j - 1, f} w^{l+1, f'} _
{i, j, f}\]
Derive
\[\frac{\partial \ell}{\partial w^{l+1, f'} _
{i, j, f}
}\]
?
@justify~
Suppose:
- We are implementing a convolutional layer between layer $l$ and layer $l+1$
- The convolutional is a max pool operation
@Define $z^{l + 1}$ and derive
\[\frac{\partial z^{l+1} _
{i',j'}
}{\partial a^l _
{i, j}
}\]
?
@justify~