Computer Vision MT25, Correspondences
Flashcards
@Define the problem of correspondences in computer vision.
Given multiple images of the same scene, we wish to find corresponding points.
RANSAC (RANdom SAmple Consensus) fits a model to outlier-contaminated data — e.g. geometric verification of feature matches (∆bite-geometric-verification-ransac). @Explain how its use of randomness and consensus makes it robust to outliers, where a least-squares fit over all the points would fail.
Loop: (1) randomly pick a minimal sample — the fewest points that define the model (2 for a line, 4 for a homography, 8 for the fundamental matrix); (2) fit the model to that sample alone; (3) score it by its consensus set — the points whose residual under the model is below a threshold (the inliers). Repeat many times; keep the model with the largest consensus set, then optionally refit on its inliers.
Randomness → robustness: a minimal sample is small enough that, fairly often, it contains only inliers, and an all-inlier fit is uncorrupted because no outlier ever entered it. Least-squares over all points has no such escape — a handful of outliers drag the estimate arbitrarily. Enough random draws make an outlier-free sample almost certain.
Consensus → robustness: models are ranked by inlier count, not by residual over all points. A model fit to a contaminated sample explains few points (small consensus) and is discarded; outliers are mutually inconsistent, so they can never build a large consensus for a wrong model. Only the true model explains all the inliers, so it wins — the scoring metric is itself outlier-robust.
Bite-sized
A famous quote attributed to Takeo Kanade: “What are the three most important problems in computer vision?” Answer: correspondence, correspondence, correspondence.
@Justify why simply subtracting two images of the same scene (taken from different viewpoints) does not give a meaningful correspondence map.
Subtraction $I _ 1 - I _ 2$ only matches pixels at the same image coordinates. But under viewpoint change, the same scene point projects to different image coordinates in the two views (and the camera may have rotated, the lighting may have changed, etc.), so the pixel-aligned difference is dominated by misalignment rather than by genuine scene difference.
To find correspondences we need a method that searches for matching content at different image positions, robust to small geometric and photometric transformations — this is what SIFT-style local descriptors are designed for.
Brute-force feature matching pairs each descriptor in image 1 with the descriptor in image 2 of lowest Euclidean distance (or another metric). This naive strategy has cost $O(\#kp _ 1 \cdot \#kp _ 2 \cdot d)$ for $d$-dimensional descriptors, and is followed by sorting and taking the top $N$ matches.
@Describe how geometric verification (e.g. RANSAC-style) refines a candidate set of feature matches.
After brute-force matching produces many candidate pairs (with inevitable false matches), geometric verification keeps only those pairs consistent with a single global geometric transformation.
The recipe is RANSAC-style:
- Sample a minimal subset of correspondences (e.g. 4 for a homography, 8 for the fundamental matrix).
- Fit a candidate transformation $T$ to that subset.
- Score $T$ by counting inliers — other matches whose error under $T$ falls below a threshold.
- Repeat many times and keep the $T$ with the most inliers; the final correspondence set is the inliers of the best $T$.
This both removes false matches and recovers the underlying geometry between the two views.
Correspondences are the underlying problem for many CV tasks. The lecture lists them: optical flow, stereo / disparity, tracking, image retrieval, multi-view 3D reconstruction — all are “correspondence problems at heart”, and algorithms developed for one (e.g. Lucas-Kanade) transfer naturally to the others.
Visualising matching and RANSAC
Matching pairs each keypoint with its ∆bite-brute-force-feature-matching nearest-neighbour descriptor, filters the ambiguous pairs with Lowe’s ratio test, and then uses ∆ransac-randomness-and-consensus to keep only the geometrically consistent matches. Hover a keypoint to inspect its nearest and second-nearest match and the ratio; lower the threshold to prune ambiguous matches; then run RANSAC to watch its inlier consensus separate the true matches (green) from the survivors that are geometrically wrong (red).