Computer Vision MT25, Camera models


Flashcards

@Define the single-view ambiguity.


The observation that one image is not enough to determine the true location of objects in the image (even given perfect knowledge about the camera itself)).

@Define the camera coordinate system.


The coordinate system where:

  • The optical centre is at the origin (i.e. the location of the pinhole in the pinhole camera model, or the effective centre of projection more generally)
  • The $z$ axis is the optical axis, perpendicular to the image plane
  • The $xy$ plane is parallel to the image plane, $x$ is horizontal and $y$ is vertical

@Visualise (or draw) the relationship between a 3D point $P = (x, y, z)$ and it’s projection $P’$ onto the image plane given a focal length $f$ in the pinhole camera model and in the camera coordinate system, and @state the projection and location of the 2D point on the image plane.


Then we have

\[\begin{aligned} (x, y, z) &\mapsto \begin{bmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} \\ &=\left( f \frac x z, f \frac y z \right) \end{aligned}\]

In what way is orthographic projection a special case of perspective projection?


It is where the distance from the centre of the projection to the image plane is infinite.

What matrix corresponds to orthographic projection in (homogenous) camera coordinates?


\[\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}\]

@Define the normalised coordinate system.


  • The camera centre is at the origin
  • The principal axis is the $z$-axis
  • $x$ and $y$ axes of the image plane are parallel to axes of the world

@Define the camera calibration problem and give its typical factorisation into intrinsic and extrinsic camera parameters.


Camera calibration is the problem of determining the transformation from the world coordinate system to the image coordinate system.

\[\begin{pmatrix} \text{2D}\\ \text{point}\\ \boldsymbol{x}\\ (3\times 1) \end{pmatrix} \;\asymp\; \begin{pmatrix} \text{Camera to}\\ \text{pixel coord.}\\ \text{trans.\ matrix}\\ \boldsymbol{K}\ (3\times 3) \end{pmatrix} \begin{pmatrix} \textit{Canonical}\\ \text{projection matrix}\\ [I\mid 0]\ (3\times 4) \end{pmatrix} \begin{pmatrix} \text{World to}\\ \text{camera coord.}\\ \text{trans.\ matrix}\\ \begin{pmatrix} R & t\\[3pt] 0^{\mathsf T} & 1 \end{pmatrix}\ (4\times 4) \end{pmatrix} \begin{pmatrix} \text{3D}\\ \text{point}\\ \boldsymbol{X}\\ (4\times 1) \end{pmatrix}\]

where:

  • The first matrix represents the intrinsic camera parameters: principal point and scaling factors
  • The second matrix represents the extrinsic camera parameters: $\mathbf R$otation, $\mathbf t$ranslation of the camera

All together we have the general camera projection matrix $\mathbf P$ so that

\[\begin{aligned} \pmb x &= \mathbf K[\mathbf R \mid \pmb t] \pmb X \\ &= \mathbf P \mathbf X \end{aligned}\]

@Define the principal point $\pmb p$. Where is it in the normalised coordinate system and the image coordinate system?


  • $\pmb p$: The point where the principal axis (i.e. the one perpendicular to the image plane coming from the world) intersects the image plane
  • In the normalised coordinate system: the centre of the image
  • In the image coordinate system: the corner of the image

In the camera calibration problem, we wish to determine the transformation from the world coordinate system to the image coordinate system.

\[\begin{pmatrix} \text{2D}\\ \text{point}\\ \boldsymbol{x}\\ (3\times 1) \end{pmatrix} \;\asymp\; \begin{pmatrix} \text{Camera to}\\ \text{pixel coord.}\\ \text{trans.\ matrix}\\ \boldsymbol{K}\ (3\times 3) \end{pmatrix} \begin{pmatrix} \textit{Canonical}\\ \text{projection matrix}\\ [I\mid 0]\ (3\times 4) \end{pmatrix} \begin{pmatrix} \text{World to}\\ \text{camera coord.}\\ \text{trans.\ matrix}\\ \begin{pmatrix} \mathbf R & \pmb t\\[3pt] \pmb 0^{\top} & 1 \end{pmatrix}\ (4\times 4) \end{pmatrix} \begin{pmatrix} \text{3D}\\ \text{point}\\ \boldsymbol{X}\\ (4\times 1) \end{pmatrix}\]

where:

  • The first matrix represents the intrinsic camera parameters: principal point and scaling factors
  • The second matrix represents the extrinsic camera parameters: $\mathbf R$otation, $\mathbf t$ranslation of the camera

All together we have the general camera projection matrix $\mathbf P$ so that

\[\begin{aligned} \pmb x &= \mathbf K[\mathbf R \mid \pmb t] \pmb X \\ &= \mathbf P \mathbf X \end{aligned}\]

Can you write out all the entries of $\mathbf P$, and explain what each does?


\[\begin{aligned} \mathbf P &= \mathbf K[\mathbf R \mid \pmb t] \\ &= \left( \begin{bmatrix} m _ x & 0 & 0 \\ 0 & m _ y & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} f & 0 & p _ x \\ 0 & f & p _ y \\ 0 & 0 & 1 \\ \end{bmatrix} \right) \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \, \begin{bmatrix} \mathbf R & \pmb t\\[3pt] \pmb 0^{\top} & 1 \end{bmatrix} \end{aligned}\]

where:

  • $\mathbf R$ and $\mathbf t$ is a rotation that maps from the world coordinates to the normalised camera coordinates
  • $f$ is the focal length
  • $p _ x, p _ y$ is the location of the principal point in image coordinates
  • $m _ x, m _ y$ are the number of pixels per $m$ in the horizontal and vertical directions

In the camera calibration problem, we wish to determine the transformation from the world coordinate system to the image coordinate system.

\[\begin{pmatrix} \text{2D}\\ \text{point}\\ \boldsymbol{x}\\ (3\times 1) \end{pmatrix} \;\asymp\; \begin{pmatrix} \text{Camera to}\\ \text{pixel coord.}\\ \text{trans.\ matrix}\\ \boldsymbol{K}\ (3\times 3) \end{pmatrix} \begin{pmatrix} \textit{Canonical}\\ \text{projection matrix}\\ [I\mid 0]\ (3\times 4) \end{pmatrix} \begin{pmatrix} \text{World to}\\ \text{camera coord.}\\ \text{trans.\ matrix}\\ \begin{pmatrix} \mathbf R & \pmb t\\[3pt] \pmb 0^{\top} & 1 \end{pmatrix}\ (4\times 4) \end{pmatrix} \begin{pmatrix} \text{3D}\\ \text{point}\\ \boldsymbol{X}\\ (4\times 1) \end{pmatrix}\]

where:

  • The first matrix represents the intrinsic camera parameters: principal point and scaling factors
  • The second matrix represents the extrinsic camera parameters: $\mathbf R$otation, $\mathbf t$ranslation of the camera

All together we have the general camera projection matrix $\mathbf P$ so that

\[\begin{aligned} \pmb x &= \mathbf K[\mathbf R \mid \pmb t] \pmb X \\ &= \mathbf P \mathbf X \end{aligned}\]

Derive the “linear method” for determining $\mathbf P$?


Forget all the parameters in $\mathbf P$ and instead consider it as

\[\mathbf P = \begin{bmatrix} p _ {11} & p _ {12} & p _ {13} & p _ {14} \\ p _ {21} & p _ {22} & p _ {23} & p _ {24} \\ p _ {31} & p _ {32} & p _ {33} & p _ {34} \end{bmatrix}\]

Given $n$ points with known 3D coordinates $\mathbf X _ i$ and image projections $\pmb x _ i$, we have

\[\pmb x _ i \cong \mathbf P \pmb X _ i\]

Writing

\[\mathbf P _ j = \begin{bmatrix} p _ {j1} \\ p _ {j2} \\ p _ {j3} \\ p _ {j4} \end{bmatrix}\]

therefore

\[\begin{bmatrix} x _ i \\ y _ i \\ 1 \end{bmatrix} \cong \begin{bmatrix} \pmb X _ i^\top \mathbf P _ 1 \\ \pmb X _ i^\top \mathbf P _ 2 \\ \pmb X _ i^\top \mathbf P _ 3 \end{bmatrix}\]

or equivalently

\[\begin{aligned} \mathbf X _ i^\top P _ 1 - x _ i \mathbf X _ i^\top \mathbf P _ 3 &= 0 \\ \mathbf X _ i^\top P _ 2 - x _ i \mathbf X _ i^\top \mathbf P _ 3 &= 0 \end{aligned}\]

Collecting this into a matrix equation, we obtain

\[\begin{bmatrix} \mathbf X _ i^\top & 0 & -x _ i \mathbf X _ i^\top \\ 0 & \mathbf X _ i^\top & -y _ i \mathbf X _ i^\top \end{bmatrix} \begin{bmatrix} \mathbf P _ 1 \\ \mathbf P _ 2 \\ \mathbf P _ 3 \end{bmatrix}\]

Repeating this for the $n$ points, we obtain

\[\mathbf A \pmb p = 0\]

where

\[\begin{aligned} \mathbf A &= \begin{bmatrix} \mathbf X _ 1^\top & 0 & -x _ 1 \mathbf X _ 1^\top \\ 0 & \mathbf X _ 1^\top & -y _ 1 \mathbf X _ 1^\top \\ & \vdots & \\ \mathbf X _ n^\top & 0 & -x _ n \mathbf X _ n^\top \\ 0 & \mathbf X _ n^\top & -y _ n \mathbf X _ n^\top \\ \end{bmatrix} \\ \\ \pmb p &= \begin{bmatrix} \mathbf P _ 1 \\ \mathbf P _ 2 \\ \mathbf P _ 3 \end{bmatrix} \end{aligned}\]

which can be solved via least squares.

@prove~

What is the downside of linear calibration?


You don’t obtain the explicit camera parameters.

How does nonlinear calibration differ from linear calibration?


You instead formulate the problem of determining the parameters via a loss function and solve via nonlinear optimisation methods.




Related posts