Computer Vision MT25, Correspondences


Flashcards

@Define the problem of correspondences in computer vision.

Given multiple images of the same scene, we wish to find corresponding points.

RANSAC (RANdom SAmple Consensus) fits a model to outlier-contaminated data — e.g. geometric verification of feature matches (∆bite-geometric-verification-ransac). @Explain how its use of randomness and consensus makes it robust to outliers, where a least-squares fit over all the points would fail.

Loop: (1) randomly pick a minimal sample — the fewest points that define the model (2 for a line, 4 for a homography, 8 for the fundamental matrix); (2) fit the model to that sample alone; (3) score it by its consensus set — the points whose residual under the model is below a threshold (the inliers). Repeat many times; keep the model with the largest consensus set, then optionally refit on its inliers.

Randomness → robustness: a minimal sample is small enough that, fairly often, it contains only inliers, and an all-inlier fit is uncorrupted because no outlier ever entered it. Least-squares over all points has no such escape — a handful of outliers drag the estimate arbitrarily. Enough random draws make an outlier-free sample almost certain.

Consensus → robustness: models are ranked by inlier count, not by residual over all points. A model fit to a contaminated sample explains few points (small consensus) and is discarded; outliers are mutually inconsistent, so they can never build a large consensus for a wrong model. Only the true model explains all the inliers, so it wins — the scoring metric is itself outlier-robust.

Source: Lecture 5, Feature Matching slide (RANSAC geometric verification); Lecture 1 history (Torr & Murray 1997, robust fundamental-matrix estimation); examined in a past CV exam, Q1(f). The step-by-step randomness/consensus rationale is reasonable lecture-grounded reasoning, not stated verbatim on the slides.

@exam~

Bite-sized

A famous quote attributed to Takeo Kanade: “What are the three most important problems in computer vision?” Answer: correspondence, correspondence, correspondence.

Source: Lecture 5, What Are the Three Most Important Problems in Computer Vision? slide.

@bite~

@Justify why simply subtracting two images of the same scene (taken from different viewpoints) does not give a meaningful correspondence map.

Subtraction $I _ 1 - I _ 2$ only matches pixels at the same image coordinates. But under viewpoint change, the same scene point projects to different image coordinates in the two views (and the camera may have rotated, the lighting may have changed, etc.), so the pixel-aligned difference is dominated by misalignment rather than by genuine scene difference.

To find correspondences we need a method that searches for matching content at different image positions, robust to small geometric and photometric transformations — this is what SIFT-style local descriptors are designed for.

Source: Lecture 5, Finding Correspondences slide (Christ Church example).

@bite~

Brute-force feature matching pairs each descriptor in image 1 with the descriptor in image 2 of lowest Euclidean distance (or another metric). This naive strategy has cost $O(\#kp _ 1 \cdot \#kp _ 2 \cdot d)$ for $d$-dimensional descriptors, and is followed by sorting and taking the top $N$ matches.

Source: Lecture 5, Feature Matching slide.

@bite~

@Describe how geometric verification (e.g. RANSAC-style) refines a candidate set of feature matches.

After brute-force matching produces many candidate pairs (with inevitable false matches), geometric verification keeps only those pairs consistent with a single global geometric transformation.

The recipe is RANSAC-style:

  • Sample a minimal subset of correspondences (e.g. 4 for a homography, 8 for the fundamental matrix).
  • Fit a candidate transformation $T$ to that subset.
  • Score $T$ by counting inliers — other matches whose error under $T$ falls below a threshold.
  • Repeat many times and keep the $T$ with the most inliers; the final correspondence set is the inliers of the best $T$.

This both removes false matches and recovers the underlying geometry between the two views.

Source: Lecture 5, Feature Matching slide; concept developed further in Notes - Computer Vision MT25, Multiple view geometryU.

@bite~ @exam~

Correspondences are the underlying problem for many CV tasks. The lecture lists them: optical flow, stereo / disparity, tracking, image retrieval, multi-view 3D reconstruction — all are “correspondence problems at heart”, and algorithms developed for one (e.g. Lucas-Kanade) transfer naturally to the others.

Source: Lecture 13, A Note on Correspondences slide.

@bite~

Visualising matching and RANSAC

Matching pairs each keypoint with its ∆bite-brute-force-feature-matching nearest-neighbour descriptor, filters the ambiguous pairs with Lowe’s ratio test, and then uses ∆ransac-randomness-and-consensus to keep only the geometrically consistent matches. Hover a keypoint to inspect its nearest and second-nearest match and the ratio; lower the threshold to prune ambiguous matches; then run RANSAC to watch its inlier consensus separate the true matches (green) from the survivors that are geometrically wrong (red).

To match two images we describe each keypoint by a vector and pair it with its nearest-neighbour descriptor in the other image. Ambiguous matches are filtered by Lowe's ratio test (accept only if \(d_1 / d_2 < \tau\), the nearest over second-nearest distance) and by a mutual check. The surviving matches still contain outliers, so RANSAC repeatedly fits a model to a minimal random sample and keeps the one with the largest set of agreeing matches (its consensus) — that consensus is what makes it robust. Hover a keypoint on the left to inspect its match.
ratio \(\tau\)mutual NN

Green = match kept; faint red = rejected by the ratio test. After RANSAC: thick green = inlier (consensus), red = outlier.