Computer Vision MT25, Image classification
Flashcards
General classification
Suppose:
- $f$ is a classifier
- $\mathcal D = ((x _ i, y _ i))^N _ {i=1}$ is some dataset
@Define the accuracy and expected accuracy.
If the classifier predicts probabilities, then the expected accuracy is given by
\[\text{EAcc}(f) = \frac{1}{N} \sum _ {x, y \in D} f _ y(x)\]where $f _ y(x)$ is the predicted probability of $x$ belonging to class $y$.
Image embeddings
@Define a feature extractor.
A map from images to image embeddings, e.g.
\[\phi : \mathbb R^{H \times W \times 3} \to \mathbb R^d\]Name some @example image embeddings, and some example classifiers that could be used on these embeddings.
- Image embeddings
- Fourier transform
- Bag of visual words (SIFT)
- Histogram of gradients
- Classifiers
- SVMs
- Kernel SVMs
- Random forests
Suppose $\phi : \mathbb R^{H \times W \times 3} \to \mathbb R^d$ is a feature extractor that maps images to embeddings. Briefly explain the @algorithm for classifying an image using nearest neighbour classification.
Given an image
- Find its embedding with $\phi$
- Look up nearest neighbours in the embedding space
- The predicted class is the majority vote of the neighbourhood
Suppose that a $k$-NN classifier is used to classify images of boats and deer. @Visualise what the accuracy on a validation set might look like as you increase $k$, and why it’s not a good idea to make $k$ as high possible.

There is a healthy balance to be struck: if $k$ is all images, then this just classifies images based on the proportion of each class in the training set.
Multi-class classification
Suppose you are classifying a data set into $K$ classes, and have trained $K$ 1-vs-all classifiers $f _ 1, \ldots, f _ K$. How can you create an overall classifier $f : X \to \mathbb R^K$ which outputs a probability distribution over the classes?
@Define the $\text{softmax}(Y, \tau)$ with temperature function, @state three results that intuitively relate it to just taking the argmax of some set of predictions, and @visualise how varying $\tau$ affects the distribution derived from the following input data:

.

where $\hat Y$ is the vector of predictions for each class. We have the results that:
- It maintains the relative ordering: $\text{softmax} _ i(\hat Y, \tau _ 1) < \text{softmax} _ j(\hat Y, \tau _ 1) \implies \text{softmax} _ i(\hat Y, \tau _ 2) < \text{softmax} _ j(\hat Y, \tau _ 2)$
- As $\tau \to \infty$, softmax becomes a uniform distribution.
- As $\tau \to 0$, softmax becomes argmax (as one-hot).
