NLA MT25, QR factorisation


Flashcards

Via Gram-Schmidt

See ∆qr-with-gram-schmidt.

Why isn’t the QR factorisation via the Gram-Schmidt process a good idea?

It is not numerically stable.

Via Householder reflectors

See Notes - Numerical Analysis HT24, Householder reflectorsU and ∆qr-with-householder.

@State a result that characterises the backward stability of finding the QR decomposition of a matrix with Householder reflectors.

The computed version of the factorisation satisfies

\[\hat Q \hat R = A + \Delta A, \quad \vert \vert \hat Q^\top \hat Q - I \vert \vert _ 2 = O(\epsilon)\]

Hence this method is unconditionally backward stable.

@exam~

What’s the difference between “orthogonal triangularisation” and “triangular orthogonalisation” for computing the QR factorisation of a matrix?

  • Orthogonal triangularisation: apply orthogonal matrices to $A$ to create the triangular factor — e.g. Householder reflectors, or Givens rotations ($G _ p \cdots G _ 1 A = R$, then $A = (G _ 1^\top \cdots G _ p^\top) R = QR$).
  • Triangular orthogonalisation: apply triangular operations to $A$ to create the orthonormal factor — e.g. Gram-Schmidt, or CholeskyQR ($A^\top A = R^\top R$ then $Q = A R^{-1}$).

Note: Givens rotations are orthogonal triangularisation (the Householder category), a common point of confusion — they are orthogonal matrices used to zero entries, not a triangular route to $Q$.

Source: Lecture 6, §6.3 of the lecture notes (Householder QR “orthogonal triangularisation”; the §6.3 footnote lists Gram-Schmidt and CholeskyQR as triangular orthogonalisation) and §6.4 (Givens rotations).

@State a result about the backward stability of the decomposition $A = \hat Q \hat R$ using Householder reflectors.

With Householder QR, the computed $\hat Q, \hat R$ satisfy

\[ \vert \vert \hat Q^\top \hat Q - I \vert \vert = O(\epsilon)\]

and

\[ \vert \vert A - \hat Q \hat R \vert \vert = O(\epsilon \vert \vert A \vert \vert )\]

@Prove that for Householder QR, the computed $\hat Q, \hat R$ satisfy

\[ \vert \vert \hat Q^\top \hat Q - I \vert \vert = O(\epsilon)\]

and

\[ \vert \vert A - \hat Q \hat R \vert \vert = O(\epsilon \vert \vert A \vert \vert )\]

Key idea: Applying an orthogonal matrix in floating-point is backward stable, and Householder QR is just a sequence of orthogonal multiplications. The errors don’t compound multiplicatively because orthogonal matrices have $\|H _ i\| = 1$.

Householder QR computes $R$ by applying a sequence of reflectors $H _ 1, H _ 2, \ldots, H _ n$ (each chosen to zero out the sub-diagonal of the active column) that triangularise $A$:

\[R = H _ n \cdots H _ 1 A, \qquad Q^\top = H _ n \cdots H _ 1.\]

Each $H _ i$ is a Householder reflector, so symmetric and orthogonal: $H _ i^\top = H _ i$, $H _ i^2 = I$, and $\|H _ i\| = 1$. (See Notes - Numerical Analysis HT24, Householder reflectorsU for the construction.)


Claim: Each Householder reflector $H _ i$ applied to a matrix $X$ in floating-point satisfies

\[\text{fl}(H _ i X) = H _ i X + E _ i, \qquad \|E _ i\| \le c \epsilon \|X\|\]

for some constant $c$ (depending slightly on $n$). To see this without a componentwise rounding analysis: each $H _ i$ is orthogonal with $\|H _ i\| _ 2 = 1$, so the claim is exactly ∆orthogonal-multiplication-backward-stable-proof applied with $Q = H _ i$ — equivalently the forward-error bound ∆forward-error-matrix-multiplication-proof with $\|H _ i\| = 1$, giving $\|E _ i\| \le \epsilon \|X\|$. The sharper constant $c = c(n)$ from the explicit formula $H _ i X = X - \frac{2}{w _ i^\top w _ i} w _ i (w _ i^\top X)$ (stable inner products + rank-1 update) is Higham’s componentwise route, not carried out in the course.


Forward error in $\hat R$: Apply this claim $n$ times to $A$. After the $k$th reflector,

\[\hat A^{(k)} = H _ k \hat A^{(k-1)} + E _ k.\]

Iterating from $\hat A^{(0)} = A$ gives

\[\hat R = \hat A^{(n)} = H _ n \cdots H _ 1 A + \sum^n _ {k=1} H _ n \cdots H _ {k+1} E _ k.\]

Each term in the sum has norm $\le c\epsilon \| \hat A^{(k-1)} \|$, and since $\|H _ i\| = 1$, $\|\hat A^{(k-1)}\| = \|A\| + O(\epsilon \|A\|)$. So

\[\hat R = H _ n \cdots H _ 1 A + F, \qquad \|F\| = O(\epsilon \|A\|). \quad (\ast)\]

Forward error in $\hat Q$: $\hat Q^\top$ is computed by applying the same reflectors in the same order to $I$, so by exactly the same argument (with $A$ replaced by $I$),

\[\hat Q^\top = H _ n \cdots H _ 1 + G, \qquad \|G\| = O(\epsilon). \quad (\dagger)\]

Orthogonality: Using $(\dagger)$ and the fact that $(H _ n \cdots H _ 1)(H _ 1 \cdots H _ n) = I$ (since each $H _ i$ is its own inverse):

\[\begin{aligned} \hat Q^\top \hat Q &= (H _ n \cdots H _ 1 + G) (H _ 1 \cdots H _ n + G^\top) \\ &= I + (H _ n \cdots H _ 1)G^\top + G (H _ 1 \cdots H _ n) + G G^\top \end{aligned}\]

Each cross-term has norm $\le \|G\| = O(\epsilon)$ (since $\|H _ n \cdots H _ 1\| = 1$), and $\|G G^\top\| = O(\epsilon^2)$, so $\|\hat Q^\top \hat Q - I\| = O(\epsilon)$.

Backward error. Using $(\ast)$ and $(\dagger)$,

\[\begin{aligned} \hat Q \hat R &= (H _ 1 \cdots H _ n + G^\top)(H _ n \cdots H _ 1 A + F) \\ &= A + G^\top H _ n \cdots H _ 1 A + (H _ 1 \cdots H _ n) F + G^\top F, \end{aligned}\]

using $H _ 1 \cdots H _ n \cdot H _ n \cdots H _ 1 = I$ in the first term. The remaining terms have norm $O(\epsilon \|A\|)$, $O(\epsilon \|A\|)$ and $O(\epsilon^2 \|A\|)$ respectively. Therefore

\[\|\hat Q \hat R - A\| = O(\epsilon \|A\|).\]

@exam~

Via Givens rotations

See ∆givens-rotation-full-matrix-definition for the Givens rotation matrix.

@Describe how to compute the $R$ factor of the QR factorisation of $A \in \mathbb R^{m \times n}$ using Givens rotations, and give the leading-order flop cost.

Repeatedly left-multiply $A$ by Givens rotations $G(i, j, \theta _ {ij})$ (∆givens-rotation-full-matrix-definition), each angle $\theta _ {ij}$ chosen so that the multiplication zeros one subdiagonal entry. Sweeping over all subdiagonal positions drives $A$ to upper-triangular form, leaving the $R$ factor:

\[R = \left( \prod _ {i=1}^{m-1} \prod _ {j=i+1}^{m} G(i, j, \theta _ {ij}) \right) A.\]

Each Givens rotation is orthogonal, so the accumulated product is orthogonal and $Q = \left( \prod _ {i=1}^{m-1} \prod _ {j=i+1}^{m} G(i, j, \theta _ {ij}) \right)^\ast$ (giving $A = QR$).

Cost: a single multiplication by $G(i, j, \theta _ {ij})$ changes only rows $i$ and $j$ across the $n$ columns, costing $\approx 6n$ flops. There are $\approx m^2/2$ such rotations, so computing $R$ costs $\approx 3 n m^2$ flops to leading order. (Contrast Householder QR’s $2mn^2 - \tfrac{2}{3}n^3$: the per-rotation locality makes Givens the better choice for sparse or Hessenberg $A$, not dense $A$ — see ∆orthogonal-vs-triangular-orthogonalisation.)

Source: Lecture 6, §6.4 of the lecture notes (Givens rotations for QR), with the leading-order flop count.

@exam~

Worked example

Compute the Householder QR factorisation $A = QR$ of $A = \begin{bmatrix} 3 & 1 \\ 4 & 1 \end{bmatrix}$ by hand.

Reflector for column 1: $a _ 1 = (3, 4)^\top$, $\|a _ 1\| _ 2 = 5$. Take $v = a _ 1 - \|a _ 1\| e _ 1 = (-2, 4)^\top$, $v^\top v = 20$:

\[H = I - \tfrac{2}{v^\top v} v v^\top = I - \tfrac{1}{10}\begin{bmatrix} 4 & -8 \\ -8 & 16 \end{bmatrix} = \begin{bmatrix} 0.6 & 0.8 \\ 0.8 & -0.6 \end{bmatrix}.\]

Apply: $H a _ 1 = (5, 0)^\top$ as designed, and $H A = \begin{bmatrix} 5 & 1.4 \\ 0 & 0.2 \end{bmatrix} =: R$.

A Householder reflector is orthogonal and symmetric ($H^{-1} = H$), so $Q = H = \begin{bmatrix} 0.6 & 0.8 \\ 0.8 & -0.6 \end{bmatrix}$. For $n = 2$ one reflector suffices; in general $n - 1$ reflectors triangularise $A$.

Check: $QR = \begin{bmatrix} 0.6 & 0.8 \\ 0.8 & -0.6 \end{bmatrix}\begin{bmatrix} 5 & 1.4 \\ 0 & 0.2 \end{bmatrix} = \begin{bmatrix} 3 & 1 \\ 4 & 1 \end{bmatrix} = A$ $\checkmark$.

Remark: for stability one actually takes the sign of the target as $-\mathrm{sign}(a _ {11})\|a _ 1\|$ to avoid cancellation in $v _ 1$ (∆bite-na-householder-sign-choice); here the lecture’s $+\|a _ 1\|$ convention is used for a clean hand computation.

Source: Lecture 6, §6.3 of the lecture notes (Householder QR).

@example~ @exam~

Bite-sized

Cost of Householder QR: for square $A \in \mathbb R^{n \times n}$, $\tfrac{4}{3} n^3$ flops — about $2 \times$ slower than LU. For tall-skinny $A \in \mathbb R^{m \times n}$ with $m \ge n$, $\approx $ $2 m n^2 - \tfrac{2}{3} n^3$ flops.

Source: Lecture 6, Householder QR factorisation slide and §6.3 of the lecture notes (cost discussion).

@bite~ @exam~

Applying a Householder reflector $H = I - 2 v v^\top$ to a vector $a$ takes $O(m)$ flops, not $O(m^2)$ — why?

Don’t form $H$ explicitly. Instead use the formula $Ha = a - 2v (v^\top a)$: compute the scalar $v^\top a$ at cost $O(m)$, then the rank-1 update $a - 2v \cdot \text{scalar}$ at cost $O(m)$. Total $O(m)$ per reflector application. This is what makes Householder QR feasible at $O(mn^2)$ rather than $O(m^2 n)$.

Source: Lecture 6, §6.3 of the lecture notes (post-Householder-QR cost remark).

@bite~

Full vs thin QR for $A \in \mathbb R^{m \times n}$ with $m \ge n$: the full QR has $Q _ F \in \mathbb R^{m \times m}$ (square orthogonal) with $A = Q _ F \begin{bmatrix} R \\ 0 \end{bmatrix}$; the thin QR has $Q \in \mathbb R^{m \times n}$ (orthonormal columns) with $A = QR$ and $R \in \mathbb R^{n \times n}$. For most algorithms (least-squares, projection) we use the thin QR.

Source: Lecture 6, Householder QR factorisation slide and §6.3 of the lecture notes (full vs thin distinction).

@bite~

Householder QR’s stability comes from being a composition of orthogonal multiplications, each of which is backward stable ($\mathrm{fl}(H _ i A) = H _ i (A + \Delta A)$ with $\ \vert \Delta A\ \vert = O(\epsilon \ \vert A\ \vert )$). The product of these errors stays bounded because $\ \vert H _ i\ \vert _ 2 = 1$ — orthogonal multiplications don’t amplify error.

Source: Lecture 7, Stability of Householder QR slide and §7.7 of the lecture notes.

@bite~

Strategy for proving Householder QR backward stability (∆householder-qr-backward-stable), $\|\hat Q\hat R - A\| _ 2 = O(\epsilon \|A\|)$ and $\|\hat Q^\top \hat Q - I\| _ 2 = O(\epsilon)$.

  • Per-reflector claim: applying a single Householder reflector in floating-point gives $\mathrm{fl}(H _ i X) = H _ i X + E _ i$ with $\|E _ i\| \le c \epsilon \|X\|$.
  • Iterate to bound $\hat R$: telescope the per-reflector errors. Each $\|H _ i\| = 1$, so $\|\hat A^{(k)}\| \approx \|A\|$ throughout, and the total error is $\|F\| = O(\epsilon \|A\|)$. So $\hat R = H _ n \cdots H _ 1 A + F$.
  • Iterate to bound $\hat Q$: same argument with $X = I$ — gives $\hat Q^\top = H _ n \cdots H _ 1 + G$ with $\|G\| = O(\epsilon)$.
  • Orthogonality from $G$ small: compute $\hat Q^\top \hat Q = I + \text{cross-terms}$, each cross-term $O(\epsilon)$.
  • Backward error from $F, G$ small: compute $\hat Q \hat R = A + \text{cross-terms}$, each $O(\epsilon \|A\|)$.

The whole argument rides on $\|H _ i\| _ 2 = 1$ — orthogonal multiplications don’t amplify errors.

Source: Lecture 7, §7.7 of the lecture notes (proof outline for Householder QR backward stability).

@bite~ @proofsupport~