Computer Vision MT25, Lucas-Kanade tracking


Flashcards

@State the general problem that Lucas-Kanade tracking tries to solve, and give examples of what each might be in the problem of trying to identify Pac-Man on a game screen.

We have:

  • An image $I$ (the game screen)
  • A template $T$, which we wish to locate within the image (a small patch of Pac-Man)
  • A guess of the current location of the template within the image, given by $W(\pmb x, \pmb p)$, where:
    • $\pmb x$ is an arbitrary point in the template
    • $W(\cdot, \pmb p)$ is a warp from the template coordinates to the image coordinates, parameterised by $\pmb p$
    • (e.g. $W$ is affine, mapping $x$ to $x + (t _ x, t _ y)$, which we hope to be the location of Pac-Man in image coordinates)

We want:

  • Some update $\Delta \pmb p$ such that $ \vert \vert I(W(\pmb x, \pmb p + \Delta \pmb p)) - T(\pmb x) \vert \vert _ 2$ is minimised over all $\pmb x$ (i.e. an improvement to our guess of where Pac-Man is inside the image)

In Lucas-Kanade tracking, we have:

  • An image $I$ (e.g. a game screen)
  • A template $T$, which we wish to locate within the image (e.g. a small patch of Pac-Man)
  • A guess of the current location of the template within the image, given by $W(\pmb x, \pmb p)$, where:
    • $\pmb x$ is an arbitrary point in the template
    • $W(\cdot, \pmb p)$ is a warp from the template coordinates to the image coordinates, parameterised by $\pmb p$
    • (e.g. $W$ is affine, mapping $x$ to $x + (t _ x, t _ y)$, which we hope to be the location of Pac-Man in image coordinates)

and we want:

  • Some update $\Delta \pmb p$ such that $ \vert \vert I(W(\pmb x, \pmb p + \Delta \pmb p)) - T(\pmb x) \vert \vert _ 2$ is minimised over all $\pmb x$ (e.g. an improvement to our guess of where Pac-Man is inside the image)

What does the loss/energy function $E$ look like in terms of $\Delta \pmb p$, and how does Lucas-Kanade tracking approximate this?

In general, we have:

\[E(\Delta \pmb p) = \sum _ {x \in T} (I(W(\pmb x, \pmb p + \Delta \pmb p)) - T(\pmb x))^2\]

Lucas-Kanade tracking approximates this via a first-order Taylor expansion as:

\[E(\Delta \pmb p) \approx \sum _ {x \in T} \left(I(W(\pmb x, \pmb p)) + \nabla I^\top \frac{\partial W}{\partial \pmb p} \Delta \pmb p - T(x)\right)^2\]

In Lucas-Kanade tracking, we have:

  • An image $I$ (e.g. a game screen)
  • A template $T$, which we wish to locate within the image (e.g. a small patch of Pac-Man)
  • A guess of the current location of the template within the image, given by $W(\pmb x, \pmb p)$, where:
    • $\pmb x$ is an arbitrary point in the template
    • $W(\cdot, \pmb p)$ is a warp from the template coordinates to the image coordinates, parameterised by $\pmb p$
    • (e.g. $W$ is affine, mapping $x$ to $x + (t _ x, t _ y)$, which we hope to be the location of Pac-Man in image coordinates)

and we want:

  • Some update $\Delta \pmb p$ such that $ \vert \vert I(W(\pmb x, \pmb p + \Delta \pmb p)) - T(\pmb x) \vert \vert _ 2$ is minimised over all $\pmb x$ (e.g. an improvement to our guess of where Pac-Man is inside the image)

In general, we aim to minimise the energy function

\[E(\Delta \pmb p) = \sum _ {x \in T} (I(W(\pmb x, \pmb p + \Delta \pmb p)) - T(\pmb x))^2\]

Lucas-Kanade tracking approximates this via a first-order Taylor expansion as

\[E(\Delta \pmb p) \approx \sum _ {x \in T} \left(I(W(\pmb x, \pmb p)) + \nabla I^\top \frac{\partial W}{\partial \pmb p} \Delta \pmb p - T(x)\right)^2\]

@State the update that this leads to.

The optimality condition $\partial E / \partial \Delta \pmb p = \pmb 0$ gives the normal equations $\pmb M \Delta \pmb p = \pmb b$, hence

\[\Delta \pmb p = \pmb M^{-1} \pmb b\]

For each template pixel $\pmb x \in T$, define the steepest-descent vector ($n = \dim \pmb p$) and the residual:

\[\pmb s(\pmb x)^\top = \nabla I\big(W(\pmb x, \pmb p)\big)^\top \left.\frac{\partial W}{\partial \pmb p}\right \vert _ {(\pmb x, \, \pmb p)}, \qquad r(\pmb x) = T(\pmb x) - I\big(W(\pmb x, \pmb p)\big)\]

Then

\[\pmb M = \sum _ {\pmb x \in T} \pmb s(\pmb x) \, \pmb s(\pmb x)^\top, \qquad \pmb b = \sum _ {\pmb x \in T} \pmb s(\pmb x) \, r(\pmb x)\]

Evaluation points:

  • $\pmb x$ ranges over template-domain coordinates.
  • $\frac{\partial W}{\partial \pmb p}$ is the $2 \times n$ warp Jacobian at $(\pmb x, \pmb p)$, depending on $\pmb x$ directly through the warp formula.
  • $\nabla I$ is the image gradient sampled at the warped point $W(\pmb x, \pmb p)$, not at $\pmb x$.
  • $T(\pmb x)$ uses the template at $\pmb x$, while $I(W(\pmb x, \pmb p))$ uses the image at the warped point.

Both $\pmb M$ and $\pmb b$ also depend on the current $\pmb p$, so they are recomputed each iteration.

@exam~

In Lucas-Kanade tracking, we have:

  • An image $I$ (e.g. a game screen)
  • A template $T$, which we wish to locate within the image (e.g. a small patch of Pac-Man)
  • A guess of the current location of the template within the image, given by $W(\pmb x, \pmb p)$, where:
    • $\pmb x$ is an arbitrary point in the template
    • $W(\cdot, \pmb p)$ is a warp from the template coordinates to the image coordinates, parameterised by $\pmb p$
    • (e.g. $W$ is affine, mapping $x$ to $x + (t _ x, t _ y)$, which we hope to be the location of Pac-Man in image coordinates)

and we want:

  • Some update $\Delta \pmb p$ such that $ \vert \vert I(W(\pmb x, \pmb p + \Delta \pmb p)) - T(\pmb x) \vert \vert _ 2$ is minimised over all $\pmb x$ (e.g. an improvement to our guess of where Pac-Man is inside the image)

In general, we aim to minimise the energy function

\[E(\Delta \pmb p) = \sum _ {x \in T} (I(W(\pmb x, \pmb p + \Delta \pmb p)) - T(\pmb x))^2\]

Lucas-Kanade tracking approximates this via a first-order Taylor expansion as

\[E(\Delta \pmb p) \approx \sum _ {x \in T} \left(I(W(\pmb x, \pmb p)) + \nabla I^\top \frac{\partial W}{\partial \pmb p} \Delta \pmb p - T(x)\right)^2\]

@Prove that this leads to the update

\[\Delta \pmb p = \pmb M^{-1} \pmb b\]

where

\[\begin{aligned} \pmb M &= \sum _ {\pmb x \in T} \left( \nabla I^\top \frac{\partial W}{\partial \pmb p} \right)^\top \left( \nabla I^\top \frac{\partial W}{\partial \pmb p} \right) \\ \pmb b &= \sum _ {\pmb x \in T} \left( \nabla I^\top \frac{\partial W}{\partial \pmb p} \right)^\top (T(\pmb x) - I(W(\pmb x, \pmb p))) \end{aligned}\]

Write the per-pixel steepest-descent row vector and linearised residual

\[J(\pmb x) := \nabla I\big(W(\pmb x, \pmb p)\big)^\top \frac{\partial W}{\partial \pmb p} \in \mathbb R^{1 \times n}, \qquad r(\pmb x) := I\big(W(\pmb x, \pmb p)\big) + J(\pmb x)\,\Delta \pmb p - T(\pmb x),\]

so the Taylor-approximated energy is just a sum of squares, $E(\Delta \pmb p) = \sum _ {\pmb x \in T} r(\pmb x)^2$.

Differentiate: each summand is a square, so the chain rule produces an extra factor of the residual $r(\pmb x)$ (the summand itself) alongside the derivative of the inside:

\[\frac{\partial E}{\partial \Delta \pmb p} = \sum _ {\pmb x \in T} \frac{\partial}{\partial \Delta \pmb p}\, r(\pmb x)^2 = \sum _ {\pmb x \in T} 2\, r(\pmb x)\, \frac{\partial r(\pmb x)}{\partial \Delta \pmb p}.\]

Only the term $J(\pmb x)\,\Delta \pmb p$ inside $r(\pmb x)$ depends on $\Delta \pmb p$, and $r(\pmb x)$ is a scalar while $\Delta \pmb p$ is a vector, so $\dfrac{\partial r(\pmb x)}{\partial \Delta \pmb p} = J(\pmb x)^\top$ (an $n \times 1$ column — this is where the transpose comes from). Substituting, and keeping the residual fully inside the bracket (all three terms, including $-T(\pmb x)$):

\[\frac{\partial E}{\partial \Delta \pmb p} = 2 \sum _ {\pmb x \in T} J(\pmb x)^\top \Big(I\big(W(\pmb x, \pmb p)\big) + J(\pmb x)\,\Delta \pmb p - T(\pmb x)\Big).\]

Set to zero and rearrange: split the residual into its $\Delta \pmb p$-dependent part $J(\pmb x)\,\Delta \pmb p$ and its constant part $I(W(\pmb x, \pmb p)) - T(\pmb x)$, then move the constant to the right-hand side:

\[\underbrace{\Big(\sum _ {\pmb x \in T} J(\pmb x)^\top J(\pmb x)\Big)} _ {\pmb M}\,\Delta \pmb p = \underbrace{\sum _ {\pmb x \in T} J(\pmb x)^\top \big(T(\pmb x) - I(W(\pmb x, \pmb p))\big)} _ {\pmb b}.\]

Since $J(\pmb x) = \nabla I^\top \frac{\partial W}{\partial \pmb p}$, the left factor $J(\pmb x)^\top$ is exactly $\big(\nabla I^\top \frac{\partial W}{\partial \pmb p}\big)^\top$, so this matches the stated $\pmb M$ and $\pmb b$. The Gauss-Newton matrix $\pmb M$ is symmetric positive semi-definite, and is invertible whenever the template has gradient content in $n$ independent directions, giving

\[\Delta \pmb p = \pmb M^{-1} \pmb b.\]

Bite-sized

Lucas-Kanade tracking was introduced in 1981 by Lucas & Kanade. The original motivation was to make brightness-constancy-based correspondence estimation more efficient than naive template matching by formulating it as a continuous optimisation problem.

Source: Lecture 1, 1981: Lucas-Kanade Template Tracking slide.

@bite~

@Justify the brightness constancy assumption that underlies Lucas-Kanade tracking and optical flow.

The brightness constancy assumption says: when a scene point moves between frames, its observed image intensity doesn’t change. Formally, for a pixel at $(x, y, t)$ that moves by $(\Delta x, \Delta y)$ in time $\Delta t$:

\[I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t).\]

This is the only equation tying the unknown motion to observable data. It’s strong enough to convert an under-determined correspondence problem into a tractable optimisation, but breaks under:

  • Lighting changes: a moving point’s intensity changes if illumination shifts.
  • Specular highlights: glints move with the camera, not the scene.
  • Occlusion: a previously-visible point disappears.
  • Large motion: the linearisation (1st-order Taylor) only works for small $\Delta x, \Delta y$.

These are the standard failure modes for both LK tracking and optical flow.

Source: Lecture 13, Optical Flow – The Beginnings slide.

@bite~

@Describe the drift failure mode of Lucas-Kanade tracking and two standard mitigations.

Drift: if the template is updated each frame to compensate for genuine appearance changes (e.g. rotation, scale, lighting), small errors accumulate over time. The template gradually picks up background pixels, then more background, until the tracker “sticks” to the background and loses the actual object.

Mitigations:

  1. Keep the original template fixed (do not update). Works well when appearance is stable but fails on long sequences with real appearance change.
  2. Mask the template using a segmentation mask (foreground only). The mask suppresses background pixels so they cannot leak into the template even if updates are applied.
  3. Periodically re-detect with an independent detector to correct the tracker (used in TLD-style tracking-by-detection).

Source: Lecture 13, LK Tracker Insights slide.

@bite~

The generalised Lucas-Kanade framework works for any differentiable warp $W(\pmb x, \pmb p)$ — the lecture gives explicit Jacobians for translation, translation+rotation, translation+scaling, and full affine warps. The same derivation also lets you derive an LK-style tracker for homographies, perspective warps, etc.

Source: Lecture 13, Generalised LK Tracking slide.

@bite~ @exam~

@State the Jacobian $\partial W / \partial \pmb p$ for a translation+scaling warp $W(\pmb x, \pmb p) = (s x + t _ x, s y + t _ y)$ with $\pmb p = (t _ x, t _ y, s)$.

\[\frac{\partial W}{\partial \pmb p} = \begin{pmatrix} \partial W _ x / \partial t _ x & \partial W _ x / \partial t _ y & \partial W _ x / \partial s \\ \partial W _ y / \partial t _ x & \partial W _ y / \partial t _ y & \partial W _ y / \partial s \end{pmatrix} = \begin{pmatrix} 1 & 0 & x \\ 0 & 1 & y \end{pmatrix}.\]

This is a $2 \times 3$ matrix evaluated at each pixel $\pmb x = (x, y)$. The Jacobian enters the LK update via the per-pixel “steepest descent image” $\nabla I^\top (\partial W / \partial \pmb p)$.

Source: Lecture 13, Jacobian slide.

@bite~

The generalised Lucas-Kanade tracker (∆bite-lk-any-differentiable-warp) works for any differentiable warp $W(\pmb x, \pmb p)$, using the $2 \times n$ warp Jacobian $\frac{\partial W}{\partial \pmb p}$ evaluated per pixel $\pmb x = (x, y)$.

@Compute $\frac{\partial W}{\partial \pmb p}$ for the translation, translation+rotation, translation+scaling, and affine warps.

Translation ($\pmb p = (t _ x, t _ y)$, $W = (x + t _ x,\ y + t _ y)$):

\[\frac{\partial W}{\partial \pmb p} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\]

Translation + rotation ($\pmb p = (t _ x, t _ y, \theta)$, $W = (x\cos\theta - y\sin\theta + t _ x,\ x\sin\theta + y\cos\theta + t _ y)$):

\[\frac{\partial W}{\partial \pmb p} = \begin{pmatrix} 1 & 0 & -x\sin\theta - y\cos\theta \\ 0 & 1 & x\cos\theta - y\sin\theta \end{pmatrix}\]

Translation + scaling ($\pmb p = (t _ x, t _ y, s)$, $W = (sx + t _ x,\ sy + t _ y)$):

\[\frac{\partial W}{\partial \pmb p} = \begin{pmatrix} 1 & 0 & x \\ 0 & 1 & y \end{pmatrix}\]

Affine ($\pmb p = (p _ {11}, \ldots, p _ {23})$, $W = (p _ {11}x + p _ {12}y + p _ {13},\ p _ {21}x + p _ {22}y + p _ {23})$):

\[\frac{\partial W}{\partial \pmb p} = \begin{pmatrix} x & y & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & x & y & 1 \end{pmatrix}\]

Each Jacobian is $2 \times n$, with rows $\partial W _ x / \partial \pmb p$ and $\partial W _ y / \partial \pmb p$, evaluated at the pixel $\pmb x = (x, y)$. It enters the LK update through the steepest-descent image $\nabla I^\top \frac{\partial W}{\partial \pmb p}$ (∆lk-update-rule-statement).

Source: Lecture 13, Generalised LK Tracking and Jacobian slides.

@bite~ @example~

A practical speed-up for Lucas-Kanade is the inverse compositional algorithm of Baker & Matthews: rather than recomputing the Hessian-like matrix $\pmb M$ each iteration, you can precompute the template gradients and warp “the other way” so $\pmb M$ becomes constant across iterations and only $\pmb b$ varies. This dramatically cuts per-iteration cost.

Source: Lecture 13, Summary of Generalised LK Tracking slide; Baker & Matthews, Lucas-Kanade 20 Years On.

@bite~ @exam~

For pure translation tracking, the LK update reduces to a $2 \times 2$ linear system $\pmb M \binom{\delta _ x}{\delta _ y} = \pmb b$, where $\pmb M = \sum \nabla I \nabla I^\top$ is the image-gradient outer-product sum (the “structure tensor”) and $\pmb b$ is the temporal-difference term. The solution is given in closed form by $\pmb M^{-1} \pmb b$, and $\pmb M$ must be invertible — meaning the template region must have gradient content in two non-parallel directions (the aperture problem).

Source: Lecture 13, Derivation and Matrix Form slides.

@bite~

@State the strategy for deriving the generalised LK update rule (∆lk-update-rule-derivation), in 4 steps without algebra.

  • Linearise: replace the non-linear residual $I(W(\pmb x, \pmb p + \Delta \pmb p)) - T(\pmb x)$ with its first-order Taylor expansion around the current $\pmb p$, producing a quadratic-in-$\Delta \pmb p$ objective $E(\Delta \pmb p)$.
  • Recognise as least squares: stacking the per-pixel residuals into a vector, $E(\Delta \pmb p) = \|A \Delta \pmb p - r\|^2$ where each row of $A$ is the “steepest descent image” $\nabla I^\top \partial W / \partial \pmb p$ at one pixel and $r$ is the temporal-difference residual $T(\pmb x) - I(W(\pmb x, \pmb p))$.
  • Set the gradient to zero: $\partial E / \partial \Delta \pmb p = 0$ gives the normal equations $A^\top A \, \Delta \pmb p = A^\top r$, i.e. $\pmb M \Delta \pmb p = \pmb b$.
  • Invert: $\Delta \pmb p = \pmb M^{-1} \pmb b$, with $\pmb M$ the Gauss-Newton approximation to the Hessian and $\pmb b$ proportional to the (negative) gradient at $\Delta \pmb p = 0$. Iterate by updating $\pmb p \leftarrow \pmb p + \Delta \pmb p$.

This is just one step of Gauss-Newton on the original non-linear least-squares problem; the LK iteration is re-linearise + solve repeated until $\Delta \pmb p$ is small.

Source: Lecture 13, Derivation, General Energy Formulation, and Solving the General Case slides.

@bite~ @proofsupport~