NLA MT25, QR factorisation
Flashcards
Via Gram-Schmidt
Why isn’t the QR factorisation via the Gram-Schmidt process a good idea?
It is not numerically stable.
Via Householder reflectors
See Notes - Numerical Analysis HT24, Householder reflectorsU and ∆qr-with-householder.
@State a result that characterises the backward stability of finding the QR decomposition of a matrix with Householder reflectors.
The computed version of the factorisation satisfies
\[\hat Q \hat R = A + \Delta A, \quad \vert \vert \hat Q^\top \hat Q - I \vert \vert _ 2 = O(\epsilon)\]Hence this method is unconditionally backward stable.
What’s the difference between “orthogonal triangularisation” and “triangular orthogonalisation” for computing the QR factorisation of a matrix?
- Orthogonal triangularisation: apply orthogonal matrices to $A$ to create the triangular factor — e.g. Householder reflectors, or Givens rotations ($G _ p \cdots G _ 1 A = R$, then $A = (G _ 1^\top \cdots G _ p^\top) R = QR$).
- Triangular orthogonalisation: apply triangular operations to $A$ to create the orthonormal factor — e.g. Gram-Schmidt, or CholeskyQR ($A^\top A = R^\top R$ then $Q = A R^{-1}$).
Note: Givens rotations are orthogonal triangularisation (the Householder category), a common point of confusion — they are orthogonal matrices used to zero entries, not a triangular route to $Q$.
@State a result about the backward stability of the decomposition $A = \hat Q \hat R$ using Householder reflectors.
With Householder QR, the computed $\hat Q, \hat R$ satisfy
\[ \vert \vert \hat Q^\top \hat Q - I \vert \vert = O(\epsilon)\]and
\[ \vert \vert A - \hat Q \hat R \vert \vert = O(\epsilon \vert \vert A \vert \vert )\]@Prove that for Householder QR, the computed $\hat Q, \hat R$ satisfy
\[ \vert \vert \hat Q^\top \hat Q - I \vert \vert = O(\epsilon)\]
and
\[ \vert \vert A - \hat Q \hat R \vert \vert = O(\epsilon \vert \vert A \vert \vert )\]
Key idea: Applying an orthogonal matrix in floating-point is backward stable, and Householder QR is just a sequence of orthogonal multiplications. The errors don’t compound multiplicatively because orthogonal matrices have $\|H _ i\| = 1$.
Householder QR computes $R$ by applying a sequence of reflectors $H _ 1, H _ 2, \ldots, H _ n$ (each chosen to zero out the sub-diagonal of the active column) that triangularise $A$:
\[R = H _ n \cdots H _ 1 A, \qquad Q^\top = H _ n \cdots H _ 1.\]Each $H _ i$ is a Householder reflector, so symmetric and orthogonal: $H _ i^\top = H _ i$, $H _ i^2 = I$, and $\|H _ i\| = 1$. (See Notes - Numerical Analysis HT24, Householder reflectorsU for the construction.)
Claim: Each Householder reflector $H _ i$ applied to a matrix $X$ in floating-point satisfies
\[\text{fl}(H _ i X) = H _ i X + E _ i, \qquad \|E _ i\| \le c \epsilon \|X\|\]for some constant $c$ (depending slightly on $n$). To see this without a componentwise rounding analysis: each $H _ i$ is orthogonal with $\|H _ i\| _ 2 = 1$, so the claim is exactly ∆orthogonal-multiplication-backward-stable-proof applied with $Q = H _ i$ — equivalently the forward-error bound ∆forward-error-matrix-multiplication-proof with $\|H _ i\| = 1$, giving $\|E _ i\| \le \epsilon \|X\|$. The sharper constant $c = c(n)$ from the explicit formula $H _ i X = X - \frac{2}{w _ i^\top w _ i} w _ i (w _ i^\top X)$ (stable inner products + rank-1 update) is Higham’s componentwise route, not carried out in the course.
Forward error in $\hat R$: Apply this claim $n$ times to $A$. After the $k$th reflector,
\[\hat A^{(k)} = H _ k \hat A^{(k-1)} + E _ k.\]Iterating from $\hat A^{(0)} = A$ gives
\[\hat R = \hat A^{(n)} = H _ n \cdots H _ 1 A + \sum^n _ {k=1} H _ n \cdots H _ {k+1} E _ k.\]Each term in the sum has norm $\le c\epsilon \| \hat A^{(k-1)} \|$, and since $\|H _ i\| = 1$, $\|\hat A^{(k-1)}\| = \|A\| + O(\epsilon \|A\|)$. So
\[\hat R = H _ n \cdots H _ 1 A + F, \qquad \|F\| = O(\epsilon \|A\|). \quad (\ast)\]Forward error in $\hat Q$: $\hat Q^\top$ is computed by applying the same reflectors in the same order to $I$, so by exactly the same argument (with $A$ replaced by $I$),
\[\hat Q^\top = H _ n \cdots H _ 1 + G, \qquad \|G\| = O(\epsilon). \quad (\dagger)\]Orthogonality: Using $(\dagger)$ and the fact that $(H _ n \cdots H _ 1)(H _ 1 \cdots H _ n) = I$ (since each $H _ i$ is its own inverse):
\[\begin{aligned} \hat Q^\top \hat Q &= (H _ n \cdots H _ 1 + G) (H _ 1 \cdots H _ n + G^\top) \\ &= I + (H _ n \cdots H _ 1)G^\top + G (H _ 1 \cdots H _ n) + G G^\top \end{aligned}\]Each cross-term has norm $\le \|G\| = O(\epsilon)$ (since $\|H _ n \cdots H _ 1\| = 1$), and $\|G G^\top\| = O(\epsilon^2)$, so $\|\hat Q^\top \hat Q - I\| = O(\epsilon)$.
Backward error. Using $(\ast)$ and $(\dagger)$,
\[\begin{aligned} \hat Q \hat R &= (H _ 1 \cdots H _ n + G^\top)(H _ n \cdots H _ 1 A + F) \\ &= A + G^\top H _ n \cdots H _ 1 A + (H _ 1 \cdots H _ n) F + G^\top F, \end{aligned}\]using $H _ 1 \cdots H _ n \cdot H _ n \cdots H _ 1 = I$ in the first term. The remaining terms have norm $O(\epsilon \|A\|)$, $O(\epsilon \|A\|)$ and $O(\epsilon^2 \|A\|)$ respectively. Therefore
\[\|\hat Q \hat R - A\| = O(\epsilon \|A\|).\]Via Givens rotations
See ∆givens-rotation-full-matrix-definition for the Givens rotation matrix.
@Describe how to compute the $R$ factor of the QR factorisation of $A \in \mathbb R^{m \times n}$ using Givens rotations, and give the leading-order flop cost.
Repeatedly left-multiply $A$ by Givens rotations $G(i, j, \theta _ {ij})$ (∆givens-rotation-full-matrix-definition), each angle $\theta _ {ij}$ chosen so that the multiplication zeros one subdiagonal entry. Sweeping over all subdiagonal positions drives $A$ to upper-triangular form, leaving the $R$ factor:
\[R = \left( \prod _ {i=1}^{m-1} \prod _ {j=i+1}^{m} G(i, j, \theta _ {ij}) \right) A.\]Each Givens rotation is orthogonal, so the accumulated product is orthogonal and $Q = \left( \prod _ {i=1}^{m-1} \prod _ {j=i+1}^{m} G(i, j, \theta _ {ij}) \right)^\ast$ (giving $A = QR$).
Cost: a single multiplication by $G(i, j, \theta _ {ij})$ changes only rows $i$ and $j$ across the $n$ columns, costing $\approx 6n$ flops. There are $\approx m^2/2$ such rotations, so computing $R$ costs $\approx 3 n m^2$ flops to leading order. (Contrast Householder QR’s $2mn^2 - \tfrac{2}{3}n^3$: the per-rotation locality makes Givens the better choice for sparse or Hessenberg $A$, not dense $A$ — see ∆orthogonal-vs-triangular-orthogonalisation.)
Worked example
Compute the Householder QR factorisation $A = QR$ of $A = \begin{bmatrix} 3 & 1 \\ 4 & 1 \end{bmatrix}$ by hand.
Reflector for column 1: $a _ 1 = (3, 4)^\top$, $\|a _ 1\| _ 2 = 5$. Take $v = a _ 1 - \|a _ 1\| e _ 1 = (-2, 4)^\top$, $v^\top v = 20$:
\[H = I - \tfrac{2}{v^\top v} v v^\top = I - \tfrac{1}{10}\begin{bmatrix} 4 & -8 \\ -8 & 16 \end{bmatrix} = \begin{bmatrix} 0.6 & 0.8 \\ 0.8 & -0.6 \end{bmatrix}.\]Apply: $H a _ 1 = (5, 0)^\top$ as designed, and $H A = \begin{bmatrix} 5 & 1.4 \\ 0 & 0.2 \end{bmatrix} =: R$.
A Householder reflector is orthogonal and symmetric ($H^{-1} = H$), so $Q = H = \begin{bmatrix} 0.6 & 0.8 \\ 0.8 & -0.6 \end{bmatrix}$. For $n = 2$ one reflector suffices; in general $n - 1$ reflectors triangularise $A$.
Check: $QR = \begin{bmatrix} 0.6 & 0.8 \\ 0.8 & -0.6 \end{bmatrix}\begin{bmatrix} 5 & 1.4 \\ 0 & 0.2 \end{bmatrix} = \begin{bmatrix} 3 & 1 \\ 4 & 1 \end{bmatrix} = A$ $\checkmark$.
Remark: for stability one actually takes the sign of the target as $-\mathrm{sign}(a _ {11})\|a _ 1\|$ to avoid cancellation in $v _ 1$ (∆bite-na-householder-sign-choice); here the lecture’s $+\|a _ 1\|$ convention is used for a clean hand computation.
Bite-sized
Cost of Householder QR: for square $A \in \mathbb R^{n \times n}$, $\tfrac{4}{3} n^3$ flops — about $2 \times$ slower than LU. For tall-skinny $A \in \mathbb R^{m \times n}$ with $m \ge n$, $\approx $ $2 m n^2 - \tfrac{2}{3} n^3$ flops.
Applying a Householder reflector $H = I - 2 v v^\top$ to a vector $a$ takes $O(m)$ flops, not $O(m^2)$ — why?
Don’t form $H$ explicitly. Instead use the formula $Ha = a - 2v (v^\top a)$: compute the scalar $v^\top a$ at cost $O(m)$, then the rank-1 update $a - 2v \cdot \text{scalar}$ at cost $O(m)$. Total $O(m)$ per reflector application. This is what makes Householder QR feasible at $O(mn^2)$ rather than $O(m^2 n)$.
Full vs thin QR for $A \in \mathbb R^{m \times n}$ with $m \ge n$: the full QR has $Q _ F \in \mathbb R^{m \times m}$ (square orthogonal) with $A = Q _ F \begin{bmatrix} R \\ 0 \end{bmatrix}$; the thin QR has $Q \in \mathbb R^{m \times n}$ (orthonormal columns) with $A = QR$ and $R \in \mathbb R^{n \times n}$. For most algorithms (least-squares, projection) we use the thin QR.
Householder QR’s stability comes from being a composition of orthogonal multiplications, each of which is backward stable ($\mathrm{fl}(H _ i A) = H _ i (A + \Delta A)$ with $\ \vert \Delta A\ \vert = O(\epsilon \ \vert A\ \vert )$). The product of these errors stays bounded because $\ \vert H _ i\ \vert _ 2 = 1$ — orthogonal multiplications don’t amplify error.
Strategy for proving Householder QR backward stability (∆householder-qr-backward-stable), $\|\hat Q\hat R - A\| _ 2 = O(\epsilon \|A\|)$ and $\|\hat Q^\top \hat Q - I\| _ 2 = O(\epsilon)$.
- Per-reflector claim: applying a single Householder reflector in floating-point gives $\mathrm{fl}(H _ i X) = H _ i X + E _ i$ with $\|E _ i\| \le c \epsilon \|X\|$.
- Iterate to bound $\hat R$: telescope the per-reflector errors. Each $\|H _ i\| = 1$, so $\|\hat A^{(k)}\| \approx \|A\|$ throughout, and the total error is $\|F\| = O(\epsilon \|A\|)$. So $\hat R = H _ n \cdots H _ 1 A + F$.
- Iterate to bound $\hat Q$: same argument with $X = I$ — gives $\hat Q^\top = H _ n \cdots H _ 1 + G$ with $\|G\| = O(\epsilon)$.
- Orthogonality from $G$ small: compute $\hat Q^\top \hat Q = I + \text{cross-terms}$, each cross-term $O(\epsilon)$.
- Backward error from $F, G$ small: compute $\hat Q \hat R = A + \text{cross-terms}$, each $O(\epsilon \|A\|)$.
The whole argument rides on $\|H _ i\| _ 2 = 1$ — orthogonal multiplications don’t amplify errors.