Computer Vision MT25, Image classification


Flashcards

General classification

Suppose:

  • $f$ is a classifier
  • $\mathcal D = ((x _ i, y _ i))^N _ {i=1}$ is some dataset

@Define the accuracy and expected accuracy.


\[\text{Acc}(f) = \frac{1}{N} \sum _ {x, y \in \mathcal D} [y = f(x)]\]

If the classifier predicts probabilities, then the expected accuracy is given by

\[\text{EAcc}(f) = \frac{1}{N} \sum _ {x, y \in D} f _ y(x)\]

where $f _ y(x)$ is the predicted probability of $x$ belonging to class $y$.

Image embeddings

@Define a feature extractor.


A map from images to image embeddings, e.g.

\[\phi : \mathbb R^{H \times W \times 3} \to \mathbb R^d\]

Name some @example image embeddings, and some example classifiers that could be used on these embeddings.


  • Image embeddings
    • Fourier transform
    • Bag of visual words (SIFT)
    • Histogram of gradients
  • Classifiers
    • SVMs
    • Kernel SVMs
    • Random forests

Suppose $\phi : \mathbb R^{H \times W \times 3} \to \mathbb R^d$ is a feature extractor that maps images to embeddings. Briefly explain the @algorithm for classifying an image using nearest neighbour classification.


Given an image

  1. Find its embedding with $\phi$
  2. Look up nearest neighbours in the embedding space
  3. The predicted class is the majority vote of the neighbourhood

Suppose that a $k$-NN classifier is used to classify images of boats and deer. @Visualise what the accuracy on a validation set might look like as you increase $k$, and why it’s not a good idea to make $k$ as high possible.


There is a healthy balance to be struck: if $k$ is all images, then this just classifies images based on the proportion of each class in the training set.

Multi-class classification

Suppose you are classifying a data set into $K$ classes, and have trained $K$ 1-vs-all classifiers $f _ 1, \ldots, f _ K$. How can you create an overall classifier $f : X \to \mathbb R^K$ which outputs a probability distribution over the classes?


\[f(x) = \text{softmax}(f _ 1(x), \ldots, f _ K(x))\]

@Define the $\text{softmax}(Y, \tau)$ with temperature function, @state three results that intuitively relate it to just taking the argmax of some set of predictions, and @visualise how varying $\tau$ affects the distribution derived from the following input data:

.


\[(\text{softmax}(\hat Y, \tau)) _ k := \frac{\exp \frac{\hat Y _ k}{\tau}}{\sum _ j \exp \frac{\hat Y _ j}{\tau}}\]

where $\hat Y$ is the vector of predictions for each class. We have the results that:

  • It maintains the relative ordering: $\text{softmax} _ i(\hat Y, \tau _ 1) < \text{softmax} _ j(\hat Y, \tau _ 1) \implies \text{softmax} _ i(\hat Y, \tau _ 2) < \text{softmax} _ j(\hat Y, \tau _ 2)$
  • As $\tau \to \infty$, softmax becomes a uniform distribution.
  • As $\tau \to 0$, softmax becomes argmax (as one-hot).




Related posts