Computer Vision MT25, Image classification
Flashcards
General classification
Suppose:
- $f$ is a classifier
- $\mathcal D = ((x _ i, y _ i))^N _ {i=1}$ is some dataset
@Define the accuracy and expected accuracy.
If the classifier predicts probabilities, then the expected accuracy is given by
\[\text{EAcc}(f) = \frac{1}{N} \sum _ {x, y \in D} f _ y(x)\]where $f _ y(x)$ is the predicted probability of $x$ belonging to class $y$.
Image embeddings
@Define a feature extractor.
A map from images to image embeddings, e.g.
\[\phi : \mathbb R^{H \times W \times 3} \to \mathbb R^d\]Name some @example image embeddings, and some example classifiers that could be used on these embeddings.
- Image embeddings
- Fourier transform
- Bag of visual words (SIFT)
- Histogram of gradients
- Classifiers
- SVMs
- Kernel SVMs
- Random forests
Suppose $\phi : \mathbb R^{H \times W \times 3} \to \mathbb R^d$ is a feature extractor that maps images to embeddings. Briefly explain the @algorithm for classifying an image using nearest neighbour classification.
Given an image
- Find its embedding with $\phi$
- Look up nearest neighbours in the embedding space
- The predicted class is the majority vote of the neighbourhood
Suppose that a $k$-NN classifier is used to classify images of boats and deer. @Visualise what the accuracy on a validation set might look like as you increase $k$, and why it’s not a good idea to make $k$ as high possible.

There is a healthy balance to be struck: if $k$ is all images, then this just classifies images based on the proportion of each class in the training set.