Machine Learning MT23, Convolutional neural networks


Flashcards

Suppose we have a $100 \times 100$ image tensor and a convolutional filter of size $5 \times 5$ with a stride of $2$ in both directions. What is the size of the output tensor?


\[48 \times 48\]

@Define what it means to take the “dot product” of a $W \times H \times C$ filter tensor with a $W \times H \times C$ patch of an image (recall that images are three dimensional when considering colour channels)?


Multiply elementwise, then add up.

@Define the padding parameter of a convolutional filter.


The number of rows or columns of zeroes added around an image so that the filter can apply to more of the image without running off the edge.

@Define the stride parameter of a convolutional filter.


The size of the step in each direction.

How would you represent an $m \times n$ colour image as a tensor, and what would corresponding convolutional filters look like?


\[m \times n \times 3\]

one layer for each channel, and then filters are of the form

\[W \times H \times 3\]

@Define the max-pool operation in a convolutional neural network.


Take the largest value in a small patch of a tensor.

@Define the inputs and outputs of a layer in a convolutional neural network with an arbitrary stride size $s$ and no zero-padding.


Inputs:

  • $\mathbf I$: input tensor, $C _ I \times H _ I \times W _ I$ (think of this as the activation from the previous layer)
  • $\mathbf F$: filter tensor, $C _ O \times C _ I \times H _ F \times W _ F$ (for each of the output layers, combine the input in this channel at this $y$ and this $x$)
  • $\mathbf B$: bias tensor, $C _ O \times C _ I \times H _ F \times W _ F$
  • $\mathbf O$: output tensor, $C _ O \times H _ O \times W _ O$ (think of this as the pre-activation)

Where:

  • $C _ I$ is the number of channels in the input (e.g. RGB)
  • $C _ O$ is the number of channels in the output
  • $H _ I \times W _ I$ are the dimensions of each channel in the input
  • $H _ O \times W _ O$ are the dimensions of each channel in the output

Then

\[\mathbf O[u][y][x] = \mathbf B[u] + \sum^{C _ I} _ {i = 1} \sum^{H _ F} _ {j = 1} \sum^{W _ F} _ {k = 1} \mathbf I[i][sx + j][sy + k] \times \mathbf F[u][i][j][k]\]

In general, @define a convolutional filter $f$ in a convolutional neural network?


A tensor of dimension $W _ f \times H _ f \times C _ l$ where $C _ l$ is the number of channels in the previous layer.

Suppose the input to a convolutional layer is $m \times n \times c$ and we apply $f$ filters of dimension $w \times h \times c$. What’s the dimension of the output?


\[m' \times n' \times f\]

(i.e. noting that the third dimension comes from the number of filters, rather than how the filters are applied)

Suppose:

  • We are implementing a convolutional layer between layer $l$ and layer $l+1$
  • The output of the $l$-th layer is of the shape $m _ l \times n _ l \times F _ l$, indexed by $a^l _ {i, j, f}$
  • We apply $F _ {l+1}$ filters of shape $W _ f \times H _ f \times F _ l$
  • We have no zero-padding and a stride of $1$ in each direction

How can you then write the preactivation of layer $l+1$?


\[z^{l+1} _ {i', j', f'} = b^{l+1, f'} + \sum^{W _ {f'} } _ {i=1} \sum^{H _ {f'} } _ {j = 1} \sum^{F _ l} _ {f = 1} a^l _ {i' + i - 1, j' + j - 1, f} w^{l+1, f'} _ {i, j, f}\]

Suppose:

  • We are implementing a convolutional layer between layer $l$ and layer $l+1$
  • The output of the $l$-th layer is of the shape $m _ l \times n _ l \times F _ l$, indexed by $a^l _ {i, j, f}$
  • We apply $F _ {l+1}$ filters of shape $W _ f \times H _ f \times F _ l$
  • We have no zero-padding and a stride of $1$ in each direction

The preactivation of layer $l+1$ is then given by:

\[z^{l+1} _ {i', j', f'} = b^{l+1, f'} + \sum^{W _ {f'} } _ {i=1} \sum^{H _ {f'} } _ {j = 1} \sum^{F _ l} _ {f = 1} a^l _ {i' + i - 1, j' + j - 1, f} w^{l+1, f'} _ {i, j, f}\]

Derive

\[\frac{\partial \ell}{\partial w^{l+1, f'} _ {i, j, f} }\]

?


\[\begin{aligned} \frac{\partial \ell}{\partial w^{l+1, f'} _ {i, j, f} } &= \frac{\partial \ell}{\partial z _ {i', j', f'}^{l+1} } \frac{\partial z^{l+1} _ {i', j', f'} }{\partial w^{l+1, f'} _ {i, j, f} } \\\\ &= \sum _ {i', j'} \frac{\partial \ell}{\partial z^{l+1} _ {i', j', f'} } \cdot a^\ell _ {i' + i - 1, j' + j - 1, f} \end{aligned}\]

@justify~

Suppose:

  • We are implementing a convolutional layer between layer $l$ and layer $l+1$
  • The convolutional is a max pool operation

@Define $z^{l + 1}$ and derive

\[\frac{\partial z^{l+1} _ {i',j'} }{\partial a^l _ {i, j} }\]

?


\[z^{l+1} _ {i', j'} = \max _ {i, j \in \Omega(i', j')} a^l _ {i,j}\] \[\frac{\partial z^{l+1} _ {i',j'} }{\partial a^l _ {i, j} } = \mathbb 1\left((i, j) = \text{argmax} _ {\tilde i, \tilde j \in \Omega(i', j')} a^l _ {\tilde i, \tilde j}\right)\]

@justify~




Related posts