Computer Vision MT25, Homogenous coordinates and homographies


Flashcards

@Define how the point $(x, y)$ would be represented in homogeneous coordinates.

\[\begin{pmatrix} x \\ y \\ 1 \end{pmatrix}\]

(technically this is an element of the equivalence class ${(\lambda x, \lambda y, \lambda) \mid \lambda \in \mathbb R \setminus {0}}$)

@State the affine transformation matrix corresponding to a translation by $\delta _ x$, $\delta _ y$ in homogeneous coordinates.

\[\begin{pmatrix} 1 & 0 & \delta _ x \\ 0 & 1 & \delta _ y \\ 0 & 0 & 1 \end{pmatrix}\]

@State the affine transformation matrix corresponding to a uniform scaling by a factor of $s$ in homogeneous coordinates.

\[\begin{pmatrix} s & 0 & 0 \\ 0 & s & 0 \\ 0 & 0 & 1 \\ \end{pmatrix}\]

@State the affine transformation matrix corresponding to an anti-clockwise rotation by angle $\theta$ in homogeneous coordinates.

\[\begin{pmatrix} \cos \theta & -\sin\theta & 0 \\ \sin \theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{pmatrix}\]

@State the affine transformation matrix corresponding to horizontal shearing by a factor of $m$ in homogenous coordinates.

\[\begin{pmatrix} 1 & m & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}\]

@State the affine transformation matrix corresponding to vertical shearing by a factor of $m$ in homogenous coordinates.

\[\begin{pmatrix} 1 & 0 & 0 \\ m & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}\]

@State and @define three properties that all affine transformations preserve.

  • Collinearity: points remain collinear if they lie on a straight line,
  • Parallelism: lines remain parallel,
  • Convexity: convex sets of points remain convex.

Suppose you have an affine transformation matrix

\[\begin{pmatrix} a _ {11} & a _ {12} & a _ {13} \\ a _ {21} & a _ {22} & a _ {23} \\ 0 & 0 & 1 \end{pmatrix}\]

How could you compute the source pixel corresponding to an output pixel $(x, y)$ when using a backward warp?

\[\begin{pmatrix} a _ {11} & a _ {12} & a _ {13} \\ a _ {21} & a _ {22} & a _ {23} \\ 0 & 0 & 1 \end{pmatrix}^{-1} \begin{pmatrix} x \\ y \\ 1 \end{pmatrix}\]

Homographies

@Define a homography $H$.

A transformation of 2D homogeneous coordinates $p$ of the form

\[H\begin{pmatrix}x \\ y \\ z \end{pmatrix} = \begin{pmatrix} h _ {11} & h _ {12} & h _ {13} \\ h _ {21} & h _ {22} & h _ {23} \\ h _ {31} & h _ {32} & h _ {33} \end{pmatrix}\begin{pmatrix}x \\ y \\ z \end{pmatrix}\]

(since these coordinates are actually elements of an equivalence class, the matrix is only defined up to scale, so a homography has $8$ degrees of freedom rather than $9$. Some references rescale to fix $h _ {33} = 1$, but this fails when the true $h _ {33} = 0$, so it is cleaner to keep $h _ {33}$ symbolic and impose $|H| _ F = 1$ instead).

How many point-to-point correspondences are required to determine a homography?

$4$, as long as no $3$ points lie on the same line.

Suppose you have $4$ point-to-point correspondences $A _ 1, A _ 2, A _ 3, A _ 4$ and $B _ 1, B _ 2, B _ 3, B _ 4$.

Supposing that the entries of the homography matrix are

\[\begin{pmatrix} h _ {11} & h _ {12} & h _ {13} \\ h _ {21} & h _ {22} & h _ {23} \\ h _ {31} & h _ {32} & h _ {33} \end{pmatrix}\]

how could you solve for $H$?

Writing the correspondences down and then collecting them into a linear system, we have

\[\begin{pmatrix} - A _ {x,1} & - A _ {y,1} & -1 & 0 & 0 & 0 & B _ {x,1}A _ {x,1} & B _ {x,1}A _ {y,1} & B _ {x,1} \\ 0 & 0 & 0 & - A _ {x,1} & - A _ {y,1} & -1 & B _ {y,1}A _ {x,1} & B _ {y,1}A _ {y,1} & B _ {y,1} \\ - A _ {x,2} & - A _ {y,2} & -1 & 0 & 0 & 0 & B _ {x,2}A _ {x,2} & B _ {x,2}A _ {y,2} & B _ {x,2} \\ 0 & 0 & 0 & - A _ {x,2} & - A _ {y,2} & -1 & B _ {y,2}A _ {x,2} & B _ {y,2}A _ {y,2} & B _ {y,2} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & - A _ {x,4} & - A _ {y,4} & -1 & B _ {y,4}A _ {x,4} & B _ {y,4}A _ {y,4} & B _ {y,4} \end{pmatrix} \begin{pmatrix} h _ {11}\\ h _ {12}\\ h _ {13}\\ h _ {21}\\ h _ {22}\\ h _ {23}\\ h _ {31}\\ h _ {32}\\ h _ {33} \end{pmatrix} = 0\]

Hence $H$ lies in the null space of this matrix, which can be found via the SVD.

@Visualise how reversing the homography for the image on the right and overlaying the image on the left onto the image on the right would look.

How may a line $l$ be represented in homogenous coordinates?

Lines are defined via $ax + by + c = 0$, so taking $l= (a, b, c)^\top$ and $x = (x, y, 1)^\top$, we may write this as

\[l^\top x = 0\]

@Define the cross product $a \times b$. How may it be written as a matrix multiplication?

\[a \times b = \begin{pmatrix} a _ 2 b _ 3 - a _ 3 b _ 2 \\ a _ 3 b _ 1 - a _ 1 b _ 3 \\ a _ 1 b _ 2 - a _ 2 b _ 1 \end{pmatrix} = \begin{bmatrix} 0 & -a _ 3 & a _ 2 \\ a _ 3 & 0 & -a _ 1 \\ -a _ 2 & a _ 1 & 0 \end{bmatrix} \begin{pmatrix} b _ 1 \\ b _ 2 \\ b _ 3 \end{pmatrix} := [a _ \times] b\]

Geometric meaning: $a \times b$ is perpendicular to both $a$ and $b$, with magnitude $ \vert a \vert \, \vert b \vert \,\sin\theta$, equal to the area of the parallelogram spanned by $a$ and $b$.

In homogeneous coordinates, @define $u \cong v$ for two vectors $u, v$.

$u, v$ are equal up to a non-zero scalar multiple.

Bite-sized

In homogeneous coordinates, a point with last coordinate $z = 0$ — for example $(x, y, 0)^\top$ — represents a point at infinity in the direction $(x, y)$. These are the “ideal points” that compactify the affine plane into the projective plane.

Source: Lecture 5, Homogeneous Coordinates II slide.

@bite~

A homography has 8 degrees of freedom (a $3 \times 3$ matrix has 9 entries but is only defined up to scale). An affine transformation has 6 degrees of freedom (top two rows of a $3 \times 3$ matrix with last row $(0, 0, 1)$).

Source: Lecture 5, Homography - Degrees of Freedom and Homography as a Matrix slides.

@bite~

@Justify why affine transformations preserve parallelism but a general homography does not.

An affine map has the form $A \pmb x + \pmb t$ for some linear $A$ and translation $\pmb t$. Linear maps preserve linear combinations, and a translation acts uniformly on all points, so two parallel lines $\ell _ 1, \ell _ 2$ are sent to two lines that remain parallel.

A homography includes a perspective component (the bottom row of $H$ is not $(0, 0, 1)$). This causes points at infinity to map to finite points: the vanishing point. Concretely, two lines parallel in the source plane meet at the same point at infinity, but the homography sends that point to a finite vanishing point in the target — so the lines now intersect there. Hence parallelism is not preserved.

What is preserved by a general homography: collinearity, incidence, and the cross-ratio of four collinear points.

Source: Lecture 2, Affine Transformation Matrix slide; Lecture 5, Homography discussion.

@bite~

@Justify why a backward warp is generally preferred over a forward warp when applying a geometric transformation to an image.

  • Forward warp $T(x, y) = (x’, y’)$: iterate over each source pixel and write it into the target at $(x’, y’)$. Two problems:
    • Holes: many target pixels are never written (since $T$ may not map any integer source to them).
    • Pile-ups: several source pixels can map to the same target — need to handle collisions.
  • Backward warp $T^{-1}(x’, y’) = (x, y)$: iterate over each target pixel and look up its colour from the source. Every target pixel is written exactly once, no holes, no collisions. If the source coordinates are non-integer, interpolate (bilinear, bicubic, etc.).

The backward formulation is just an inverse-and-iterate, and it removes both of the forward-warp pathologies in a single move.

Source: Lecture 2, Forward Warps and Backward Warps slides.

@bite~

A homogeneous linear system $A \pmb p = \pmb 0$ subject to $\ \vert \pmb p\ \vert = 1$ is solved by taking $\pmb p$ to be the right singular vector of $A$ corresponding to its smallest singular value — equivalently the eigenvector of $A^\top A$ with the smallest eigenvalue. This is the SVD trick used to solve for the homography matrix, the fundamental matrix, and the camera projection matrix.

Source: Lecture 5, Homogeneous Linear System slide.

@bite~

@Describe how composition of affine transformations works in homogeneous coordinates and why ordering matters.

In homogeneous coordinates, each affine transformation is a $3 \times 3$ matrix with bottom row $(0, 0, 1)$. The set of such matrices is closed under matrix multiplication — so composing two affine maps gives another affine map, and we just multiply their matrices.

Ordering: matrix multiplication is non-commutative. For a point $\pmb x$ expressed as a column vector, transformations are applied right-to-left: $R T \pmb x$ means “first translate by $T$, then rotate by $R$”. The example from the lecture:

\[R T = \begin{pmatrix} \cos\theta & -\sin\theta & \delta _ x\cos\theta - \delta _ y\sin\theta \\ \sin\theta & \cos\theta & \delta _ x\sin\theta + \delta _ y\cos\theta \\ 0 & 0 & 1 \end{pmatrix} \ne T R = \begin{pmatrix} \cos\theta & -\sin\theta & \delta _ x \\ \sin\theta & \cos\theta & \delta _ y \\ 0 & 0 & 1 \end{pmatrix}.\]

So “translate then rotate” produces a different image to “rotate then translate”.

Source: Lecture 2, Combining Transformations slide.

@bite~

To convert from homogeneous coordinates $(x, y, w)^\top$ back to Euclidean 2D coordinates, divide by the last entry: $(x/w, y/w)$. This is the projection step in $\mathbb R^3 \to \mathbb R^2$ that gives perspective projection its $1/z$ depth dependence.

Source: Lecture 14, Homogeneous Coordinates III slide.

@bite~

A line $\ell^\top \pmb x = 0$ in 2D homogeneous coordinates passes through two points $\pmb p _ 1, \pmb p _ 2$ iff $\ell$ is perpendicular to both — i.e. $\ell = \pmb p _ 1 \times \pmb p _ 2$. Dually, the intersection of two lines $\ell _ 1, \ell _ 2$ is the point $\pmb p = \ell _ 1 \times \ell _ 2$. This point-line duality via cross-product is the workhorse identity for projective geometry computations.

Source: Lecture 5, Homography discussion + cross-product treatment.

@bite~




Related posts