This is a map of every main result and method in the course. This course studies two problems only: the linear system $Ax = b$ and the eigenproblem $Ax = \lambda x$, and the whole course is the story of decomposing $A$ so that these become trivial, analysing when the decompositions are computable in a backward-stable way, and replacing them by iterative or randomised surrogates when $n$ is too large for the $O(n^3)$ direct methods.
Foundations: norms, subspaces, structure
Vector and matrix norms
Norms are the measuring stick for “how big is $A - \hat A$”. The induced $p$-norm $\ \vert A\ \vert _ p = \max _ {x\ne 0}\ \vert Ax\ \vert _ p/\ \vert x\ \vert _ p$ has the closed forms $\ \vert A\ \vert _ 2 = \sigma _ {\max}(A)$, $\ \vert A\ \vert _ 1$ = max absolute column sum, $\ \vert A\ \vert _ \infty$ = max absolute row sum; the Frobenius and trace (nuclear) norms are not induced but are, like $\ \vert \cdot\ \vert _ 2$, unitarily invariant. Submultiplicativity $\ \vert AB\ \vert _ p \le \ \vert A\ \vert _ p\ \vert B\ \vert _ p$ is the workhorse inequality behind every perturbation bound in the course.
- Matrix $p$-norm definition∆matrix-p-norm-definition — $\max _ {x\ne0}\ \vert Ax\ \vert _ p/\ \vert x\ \vert _ p$.
- Alternative characterisations∆matrix-norm-alternative-characterisations — $\ \vert A\ \vert _ 2=\sigma _ 1$, $\ \vert A\ \vert _ 1$ max column sum, $\ \vert A\ \vert _ \infty$ max row sum.
- Frobenius / trace norms∆frobenius-norm-definition, ∆trace-norm-definition — $\sqrt{\sum A _ {ij}^2}=\sqrt{\sum\sigma _ i^2}$; $\sum\sigma _ i$.
- Unitary invariance∆unitary-invariance-definition, ∆unitarily-invariant-norms-list — $\ \vert UAV\ \vert =\ \vert A\ \vert $; the 2-, Frobenius, trace norms qualify.
- Submultiplicative norm∆submultiplicative-norm-definition, ∆matrix-p-norm-submultiplicative — $\ \vert AB\ \vert \le\ \vert A\ \vert \ \vert B\ \vert $; $p$-norms satisfy it.
- Submultiplicativity proof∆matrix-p-norm-submultiplicative-proof, ∆bite-matrix-p-norm-submultiplicative-strategy — multiply-and-divide by $\ \vert Bx\ \vert _ p$; $B=0$ separate.
- Vector $p$-norm equivalences (+ proof)∆vector-inf-norm-vs-2-norm, ∆vector-2-norm-vs-1-norm, ∆vector-inf-norm-vs-1-norm, ∆vector-p-q-norm-monotonicity, ∆vector-p-q-norm-monotonicity-proof — the $\sqrt n$ / $n$ constants; $p>q\implies\ \vert x\ \vert _ p\le\ \vert x\ \vert _ q$.
- Matrix $p$-norm equivalences∆matrix-2-norm-vs-inf-norm-bound, ∆matrix-2-norm-vs-1-norm-bound, ∆frobenius-vs-spectral-norm-bound — $\ \vert A\ \vert _ 2\le\ \vert A\ \vert _ F\le\sqrt{\min(m,n)}\ \vert A\ \vert _ 2$.
- Headline facts (bite)∆bite-cauchy-schwarz, ∆bite-2-norm-via-inner-product, ∆bite-frobenius-svd-and-trace, ∆bite-2-norm-unitarily-invariant, ∆bite-spectral-and-trace-norm-via-svd — CS; $\ \vert x\ \vert _ 2=\sqrt{x^\top x}$; $\ \vert A\ \vert _ F^2=\mathrm{Trace}(A^\top A)$.
Source: Lecture 1, §1.5–1.6 (vector and matrix norms); Problem Sheets 1 Q2–Q4.
Subspaces and the dimension-count lemma
A $d$-dimensional subspace is represented by a (preferably orthonormal) $n\times d$ matrix via its column span. Lemma 1.1 — if $\dim\mathcal S _ 1+\dim\mathcal S _ 2>n$ then $\mathcal S _ 1\cap\mathcal S _ 2\ne\{0\}$ — is the single combinatorial fact that powers the proofs of Courant–Fischer, Eckart–Young and Weyl.
- Subspace as a span∆matrix-represents-subspace, ∆bite-subspace-as-matrix-span — $\mathcal S=\{Vc\}$.
- Dimension-count lemma (+ proof)∆subspace-intersection-by-dimension, ∆bite-subspace-intersection-lemma — wide $[V _ 1,V _ 2]$ has a right null vector; split it.
- Why orthonormal representation∆bite-subspace-why-orthonormal-rep — projector is $QQ^\top$ vs $V(V^\top V)^{-1}V^\top$.
Source: Lecture 1, §1.7 (subspaces and orthonormal matrices), Lemma 1.1.
Structured matrices and normality
Symmetric, orthogonal, skew-symmetric and (the umbrella) normal matrices are exactly the ones that are unitarily diagonalisable. The two structural lemmas — an upper-triangular normal matrix is diagonal, and a real symmetric matrix is orthogonally diagonalisable with real eigenvalues — feed directly into the Schur-form theory.
- Normal $\iff$ unitarily diagonalisable (+ proof)∆normal-iff-unitarily-diagonalisable, ∆normal-iff-unitarily-diagonalisable-proof — via Schur and ∆diagonal-schur-iff-normal.
- Symmetric $\iff$ real-eig orthogonally diagonalisable (+ proof, strategy)∆symmetric-iff-orthogonally-diagonalisable, ∆symmetric-iff-orthogonally-diagonalisable-proof, ∆bite-symmetric-iff-real-eig-orth-diag-strategy — easy direction transposes; hard direction Schur + real eigenspaces.
- Upper-triangular normal $\implies$ diagonal (+ proof, strategy)∆upper-triangular-normal-implies-diagonal, ∆upper-triangular-normal-implies-diagonal-proof, ∆bite-upper-triangular-normal-strategy — induction on the $(1,1)$ entry-comparison in $T^\ast T=TT^\ast$.
- Hessenberg / tridiagonal / invariances (bite)∆upper-hessenberg-definition, ∆bite-tridiagonal-definition, ∆bite-triangular-symmetric-invariance, ∆bite-normal-special-cases, ∆bite-orthogonal-vs-orthonormal-naming — structure definitions and what survives addition/multiplication/inversion.
Source: Lecture 1, §1.2 (structured matrices); §1.4 (spectral theorem for real symmetric matrices).
The singular value decomposition
SVD: existence and structure
Every $A\in\mathbb R^{m\times n}$ ($m\ge n$) factors as $A=U\Sigma V^\top$ with $U$ orthonormal columns, $V$ orthogonal and $\Sigma=\operatorname{diag}(\sigma _ 1\ge\cdots\ge\sigma _ n\ge 0)$, equivalently $A=\sum _ i\sigma _ i u _ i v _ i^\top$. Geometrically it is rotate–scale–rotate: the image of the unit sphere is an ellipsoid with semi-axes $\sigma _ i$. The singular values are unique (square roots of the eigenvalues of $A^\top A$); the singular vectors are unique only up to coupled sign flips (and rotation within repeated-$\sigma$ subspaces). It is the central object of the course.
- Statement (thin + full)∆svd-theorem — $A=U\Sigma V^\top$, ordered non-negative $\sigma _ i$; thin vs full $U$.
- Geometry∆svd-geometric-interpretation — rotation/reflection then scaling then rotation/reflection.
- Existence proof (+ shortest-proof bite)∆svd-theorem-proof, ∆bite-svd-existence-shortest-proof — eigendecompose $A^\top A=V\Lambda V^\top$, set $\Sigma=\Lambda^{1/2}$, $U=AV\Sigma^{-1}$ (carded in Notes - Numerical Analysis HT24, Singular value decompositionU).
- Eigen-connection (+ proof)∆svd-eigenvector-connection-proof, ∆bite-ml-singular-values-via-AtA — $V$ eigvecs of $A^\top A$, $U$ of $AA^\top$, $\sigma _ i=\sqrt{\lambda _ i(A^\top A)}$.
- Uniqueness∆bite-svd-uniqueness, ∆number-of-svds-distinct-singular-values — $\sigma _ i$ unique; $2^r$ sign choices; rotation freedom on ties (uniqueness count carded in Notes - Numerical Analysis HT24, Singular value decompositionU).
- Summation form∆bite-svd-summation-form — $A=\sum\sigma _ i u _ i v _ i^\top$.
- Worked example∆svd-example-computation — explicit $4\times 2$ SVD by hand.
- Cost (Golub–Kahan, non-examinable)∆bite-svd-cost-via-golub-kahan — $\approx\tfrac83 mn^2$ for $\Sigma$, $\approx 20mn^2$ with vectors.
Source: Lecture 2, §2 (Theorem 2.1, SVD existence); §2.3 (uniqueness).
SVD properties: rank and the four fundamental subspaces
From $A=U\Sigma V^\top$ everything is read off: $\mathrm{rank}(A)$ = number of positive $\sigma _ i$; column space = first $r$ columns of $U$; row space = first $r$ of $V$; $\ker A$ = last $n-r$ of $V$; $\ker A^\top$ = last $n-r$ of $U$. The proof is the change of variables $y=V^\top x$ then read off $\Sigma$.
- Properties (+ proof, strategy)∆svd-properties-proof, ∆bite-svd-properties-strategy — rank and the four subspaces via $y=V^\top x$.
- Image-compression illustration∆bite-svd-image-compression — rank-50 of a $500\times500$ image $\approx 5\times$ compression.
Source: Lecture 2, §2.1 (rank, column/row/null space); Lecture 3, §3.1 (image compression).
SVD and eigenvalues; the Jordan–Wielandt matrix
For normal $A$ the singular values are the eigenvalue moduli, $\sigma _ i= \vert \lambda _ i \vert $ (special cases: symmetric, unitary, skew-symmetric). The Jordan–Wielandt (Hermitian-dilation) matrix $\big[\begin{smallmatrix}0&A\\A^\top&0\end{smallmatrix}\big]$ has eigenvalues $\pm\sigma _ i$ (plus $m-n$ zeros) and is the standard tool for porting a symmetric-eigenvalue result to an SVD analogue.
- $\sigma _ i= \vert \lambda _ i \vert $ for normal (+ proof, strategy)∆singular-values-abs-eigenvalues-for-normal, ∆singular-values-abs-eigenvalues-for-normal-proof, ∆bite-svd-equals-abs-eig-normal-strategy — Schur + diagonal-iff-normal.
- Special cases∆singular-values-equal-abs-eigenvalues-symmetric, ∆singular-values-equal-abs-eigenvalues-unitary, ∆singular-values-equal-abs-eigenvalues-skew-symmetric, ∆singular-values-equal-abs-eigenvalues-normal-restated — symmetric / unitary / skew-symmetric instances.
- Jordan–Wielandt matrix∆jordan-wielandt-definition, ∆jordan-wielandt-eigenvalues — eigenvalues $\pm\sigma _ i$ and the explicit eigenvector matrix.
Source: Lecture 2, §2.2 (SVD and symmetric eigenvalue decomposition); Problem Sheet 1 Q6.
Symmetric eigenvalue decomposition
Every real symmetric $A$ has $A=V\Lambda V^\top$ with $V$ real orthogonal and $\Lambda$ real diagonal: the eigenvalues are real and the eigenvectors of distinct eigenvalues are orthogonal. It is the SVD’s twin and the engine of the SVD existence proof; the general normal case gives $A=U\Lambda U^\ast$ with $U$ unitary.
- Statement∆symmetric-eigenvalue-decomposition, ∆bite-symmetric-evd-two-claims — $A=V\Lambda V^\top$; real eigenvalues + orthogonal eigenvectors.
- Proof (+ strategy)∆symmetric-eigenvalue-decomposition-proof, ∆bite-symmetric-evd-strategy — Schur over $\mathbb C$ + normal-implies-diagonal + real eigenspaces.
- Worked example∆evd-worked-example — explicit $2\times2$ symmetric EVD.
- Normal generalisation / repeated eigenvalues (bite)∆bite-normal-matrices-unitary-evd, ∆bite-symmetric-evd-repeated-eigenvalues — normal $\Rightarrow$ $U\Lambda U^\ast$; multiplicity = eigenspace dimension.
Source: Lecture 2, §2 (symmetric eigenvalue decomposition); Lecture 1 §1.4.
Low-rank approximation: the truncated SVD (Eckart–Young)
The rank-$r$ truncation $A _ r=\sum _ {i\le r}\sigma _ i u _ i v _ i^\top$ is the best rank-$r$ approximation in any unitarily invariant norm, with error $\ \vert A-A _ r\ \vert _ 2=\sigma _ {r+1}$ and $\ \vert A-A _ r\ \vert _ F=\sqrt{\sum _ {i>r}\sigma _ i^2}$. This is the single most consequential SVD application: PCA, image compression, matrix completion and the entire randomised-SVD chapter are corollaries. NLA MT25 leaves this uncarded itself; the statement and proof live in the Part A entry.
- Spectral norm $=\sigma _ 1$ (+ proof)∆2-norm-equal-to-largest-singular-value, ∆cf-sigma-max-min-as-special-cases — $\ \vert A\ \vert _ 2=\sigma _ 1$, $\sigma _ n=\min\ \vert Ax\ \vert /\ \vert x\ \vert $ (the $\sigma _ 1$ card is in Notes - Numerical Analysis HT24, Singular value decompositionU; the CF special case is the NLA-side card).
- Eckart–Young (+ proof)∆eckart-young-theorem, ∆eckart-young-theorem-proof-detailed — best rank-$k$ is the truncated SVD; error $\sigma _ {k+1}$ (carded in Notes - Numerical Analysis HT24, Singular value decompositionU / Notes - Machine Learning MT23, Singular value decompositionU).
- Frobenius via singular values∆bite-na-svd-best-rank-k-error, ∆bite-na-frobenius-via-singular-values — $\ \vert A-A _ k\ \vert _ 2=\sigma _ {k+1}$; $\ \vert A\ \vert _ F=\sqrt{\sum\sigma _ i^2}$.
Source: Lecture 3, §3 (Theorem 3.1, low-rank approximation via the truncated SVD).
Pseudoinverse
For rank-$r$ $A=U _ r\Sigma _ r V _ r^\top$ the pseudoinverse is $A^\dagger=V _ r\Sigma _ r^{-1}U _ r^\top$; it is the unique matrix satisfying the four Moore–Penrose conditions, equals $A^{-1}$ when nonsingular, gives the minimum-norm least-squares solution $x=A^\dagger b$, and has $\ \vert A^\dagger\ \vert _ 2=1/\sigma _ {\min}^{\mathrm{nz}}(A)$. It is the bridge between the SVD and least-squares, and the central object of the HMT and CUR analyses.
- Definition / Penrose conditions∆pseudoinverse-definition, ∆pseudoinverse-symmetry, ∆bite-pseudoinverse-penrose-conditions — $V _ r\Sigma _ r^{-1}U _ r^\top$; the four characterising identities.
- Specialisations∆pseudoinverse-coincides-with-inverse, ∆pseudoinverse-identity, ∆bite-pseudoinverse-full-rank-formulas — $A^\dagger=A^{-1}$; one-sided identities; $(A^\top A)^{-1}A^\top$ / $A^\top(AA^\top)^{-1}$.
- Least-squares / underdetermined / norm∆pseudoinverse-least-squares, ∆pseudoinverse-underdetermined-system, ∆pseudoinverse-spectral-norm, ∆bite-pseudoinverse-singular-values — $x=A^\dagger b$; minimum-norm solution; $\ \vert A^\dagger\ \vert _ 2=1/\sigma _ {\min}^{\mathrm{nz}}$.
Source: Lecture 15, §15.1 (pseudoinverse, introduced for the HMT analysis).
Variational and perturbation theory
Courant–Fischer minmax theorem
For symmetric $A$, $\lambda _ i(A)=\max _ {\dim\mathcal S=i}\min _ {x\in\mathcal S}\frac{x^\top Ax}{x^\top x}=\min _ {\dim\mathcal S=n-i+1}\max _ {x\in\mathcal S}\frac{x^\top Ax}{x^\top x}$, with the analogous statement for singular values. The two subspace dimensions sum to $n+1$, so the dimension-count lemma forces an intersection — that is the entire proof idea. It is the source of Weyl’s inequality, Cauchy interlacing and the Rayleigh–Ritz bound.
- Statement (+ ordering bite)∆courant-fischer-minmax-theorem, ∆bite-cf-minmax-ordering, ∆bite-cf-lambda-extremes — eigenvalue and singular-value forms; $i=1,n$ collapse to Rayleigh-quotient extremes.
- Proof (+ strategy, key step)∆courant-fischer-minmax-proof, ∆bite-courant-fischer-strategy, ∆bite-cf-dimension-count-step — constructive upper bound + dimension-count lower bound; $i+(n-i+1)=n+1$.
- $\sigma _ 1,\sigma _ n$ as special cases∆cf-sigma-max-min-as-special-cases — plug $i=1$ and $i=n$.
- Stacked/concatenated bound (+ proof)∆singular-values-of-stacked-and-concatenated-matrices, ∆singular-values-of-stacked-and-concatenated-proof — $\sigma _ i([A _ 1;A _ 2])\ge\max(\sigma _ i(A _ 1),\sigma _ i(A _ 2))$.
- Variational characterisations∆variational-characterisations — the unifying “quantity = optimum of a functional” viewpoint (see redacted?).
- Gives Rayleigh–Ritz bound∆bite-cf-gives-rayleigh-ritz-bound — restrict the max to $\mathrm{range}(Q)$: $\hat\lambda _ i\le\lambda _ i(A)$.
Source: Lecture 4, §4 (Theorem 4.1, Courant–Fischer minmax theorem).
Weyl's inequality
Perturbing $A$ by $E$ moves every singular value, and every eigenvalue of a symmetric matrix, by at most $\ \vert E\ \vert _ 2$ — additively, with no condition-number amplification: $ \vert \sigma _ i(A+E)-\sigma _ i(A) \vert \le\ \vert E\ \vert _ 2$. So these quantities are perfectly conditioned and a backward-stable algorithm computes them to working precision; contrast the $\epsilon^{1/n}$ catastrophe for non-symmetric eigenvalues.
- Statement∆weyls-inequality — singular-value, $\ \vert A+E\ \vert _ 2$, and symmetric-eigenvalue forms.
- Proof (+ strategy)∆weyls-inequality-proof, ∆bite-weyl-proof-strategy — Courant–Fischer + triangle inequality inside the min-over-subspace.
- Conditioning consequence∆bite-weyl-implies-well-conditioning, ∆bite-weyl-vs-nonsymmetric-sensitivity — well-conditioned $\sigma _ i$/symmetric $\lambda _ i$; non-symmetric $\lambda$ can move by $\epsilon^{1/n}$.
Source: Lecture 4, §4.1 (Theorem 4.2, Weyl’s inequality); §4.1.1 (non-symmetric sensitivity).
Direct factorisations and linear systems
LU factorisation with pivoting
Gaussian elimination is the rank-1 peeling $A=\sum _ i L _ iU _ i$; partial pivoting gives $PA=LU$ with $ \vert L _ {ij} \vert \le 1$, exists for every nonsingular $A$, and costs $\tfrac23 n^3$ flops with $O(n^2)$ per subsequent right-hand side. It is backward stable up to the growth factor $\rho _ n=\ \vert U\ \vert /\ \vert A\ \vert $ — empirically $O(\sqrt n)$ but Wilkinson-$2^{n-1}$ in the worst case, the resolution of which is a famous open problem. It is the de-facto linear solver despite QR’s unconditional stability, because it is $2\times$ faster.
- Rank-1 view / algorithm∆rank-1-view-of-lu, ∆pivoted-lu-decomposition-algorithm — $A=\sum L _ iU _ i$; pivoted Gaussian elimination.
- Existence (+ strategy)∆pivoted-lu-decomposition-existence, ∆bite-pivoted-lu-existence-strategy — rank-counting contradiction; $PA=LU$ for any nonsingular $A$.
- Backward stability∆lu-backward-stability, ∆lu-backward-stable-solve-condition, ∆bite-lu-growth-factor — $\ \vert \hat L\hat U-A\ \vert \le c _ n u\ \vert L\ \vert \ \vert U\ \vert $; growth factor $\rho _ n$; stable iff $\ \vert L\ \vert \ \vert U\ \vert =O(\ \vert A\ \vert )$.
- Cost / why default / multiple RHS∆bite-lu-cost, ∆bite-lu-vs-qr-de-facto, ∆bite-lu-multiple-rhs — $\tfrac23 n^3$; $2\times$ faster than QR; $O(kn^2)$ for $k$ RHS.
- No-pivot failure example∆bite-lu-no-factorisation-without-pivot-example — $\big[\begin{smallmatrix}0&1\\1&0\end{smallmatrix}\big]$.
- Worked example∆pivoted-lu-worked-example — explicit $3\times3$ pivoted LU.
Source: Lecture 5, §5.1–5.2 (LU with partial pivoting); Lecture 7 §7.5.2 (instability/growth factor); Problem Sheets 2 Q1, Q3.
Cholesky factorisation
For $A\succ 0$ the LU peeling becomes symmetric, $A=R^\top R$ with $R$ upper triangular and a positive diagonal, at half the cost ($\tfrac13 n^3$) and with no pivoting ever needed (the active trailing block stays SPD). It is unconditionally backward stable because there is no pivot growth and $ \vert R _ {ij} \vert \le\sqrt{\ \vert A\ \vert }$. The indefinite analogue is $LDL^\top$ with $1\times1$ and $2\times2$ blocks.
- Existence / definition∆cholesky-existence-conditions, ∆cholesky-factorisation, ∆lu-decomposition-gives-cholesky-when-symmetric-positive-definite — exists iff $A\succeq 0$; symmetry collapse of LU.
- No pivots needed (+ proof, strategy)∆cholesky-no-pivots-needed-proof, ∆bite-cholesky-no-pivot-reason, ∆bite-cholesky-no-pivots-strategy — Schur complement of SPD is SPD (complete-the-square).
- Backward stability∆cholesky-backward-stability, ∆bite-cholesky-backward-stable, ∆bite-cholesky-r-bounded — no growth, $ \vert R _ {ij} \vert \le\sqrt{\ \vert A\ \vert }$, unconditionally stable.
- Cost / diagonal / uniqueness (bite)∆bite-cholesky-cost, ∆bite-cholesky-r-positive-diagonal, ∆bite-cholesky-uniqueness — $\tfrac13 n^3$; positive diagonal; unique with positive diagonal (PS2 Q4b).
- $LDL^\top$ (indefinite)∆ldlt-factorisation, ∆bite-ldlt-real-indefinite-blocks, ∆ldlt-worked-example — square-root-free; $2\times2$ blocks in the indefinite case.
- Worked example∆cholesky-worked-example — explicit $2\times2$ Cholesky.
Source: Lecture 5, §5.3 (Cholesky for $A\succ 0$); Lecture 7 §7.5.3 (backward stability); Problem Sheet 2 Q4.
QR factorisation
$A=QR$ with orthonormal $Q$ and upper-triangular $R$, computed stably by orthogonal triangularisation (Householder reflectors, or Givens rotations for sparse/Hessenberg matrices) rather than the unstable triangular orthogonalisation of Gram–Schmidt. Householder QR is unconditionally backward stable ($\ \vert \hat Q\hat R-A\ \vert =O(\epsilon\ \vert A\ \vert )$, $\ \vert \hat Q^\top\hat Q-I\ \vert =O(\epsilon)$) at $\tfrac43 n^3$ flops because it is a product of norm-1 orthogonal operations.
- Gram–Schmidt / Householder construction∆qr-with-gram-schmidt, ∆qr-with-householder, ∆why-not-gram-schmidt-for-qr — the two routes (construction cards in Notes - Numerical Analysis HT24, QR decompositionU); GS is unstable.
- Orthogonal vs triangular orthogonalisation∆orthogonal-vs-triangular-orthogonalisation — Householder/Givens vs Gram–Schmidt/CholeskyQR.
- Backward stability (+ proof, strategy)∆householder-qr-backward-stability-statement, ∆householder-qr-backward-stability-explicit, ∆householder-qr-backward-stable, ∆bite-householder-qr-backward-stability-strategy, ∆bite-householder-qr-stability-mechanism — per-reflector error telescoped; rides on $\ \vert H _ i\ \vert _ 2=1$.
- Givens rotations∆givens-rotation-c-s-formula, ∆bite-na-givens-vs-householder-scope, ∆bite-na-givens-hessenberg-qr-cost — local two-row zeroing; Hessenberg QR in $O(n^2)$ (carded in Notes - Numerical Analysis HT24, Givens rotationsU).
- Householder reflector facts∆householder-reflector-definition, ∆householder-maps-to-axis-proof, ∆bite-na-householder-properties, ∆bite-householder-reflector-application-cost — $I-2vv^\top$; maps $x$ to $\ \vert x\ \vert e _ 1$; $O(m)$ application (carded in Notes - Numerical Analysis HT24, Householder reflectorsU).
- Cost / full vs thin∆bite-householder-qr-cost, ∆bite-full-vs-thin-qr, ∆bite-na-full-vs-thin-qr-cloze — $\tfrac43 n^3$ / $2mn^2-\tfrac23 n^3$; thin used in practice.
- Worked example∆qr-householder-worked-example — explicit $2\times2$ Householder QR.
Source: Lecture 6, §6.1–6.4 (Gram–Schmidt, Householder, Givens); Lecture 7 §7.7 (stability); Problem Sheet 2 Q2, Q4c.
Solving linear systems
$Ax=b$ is solved by LU + two backward-stable triangular solves (default), Cholesky for $A\succ 0$, QR for guaranteed stability at $2\times$ the cost, or the SVD ($\approx 10\times$ QR) when $A$ is near-singular. The forward error of any backward-stable solve is $\ \vert \hat x-x\ \vert /\ \vert x\ \vert \lesssim\epsilon\,\kappa _ 2(A)$ — the algorithm is exonerated, ill-conditioning is the problem’s fault.
- Setup / default solvers∆bite-linear-system-setup-and-default-solvers — LU+PP general, Cholesky for $A\succ 0$.
- Triangular solves are stable∆triangular-systems-backward-stable, ∆solving-linear-systems-via-triangular-stability, ∆bite-triangular-systems-stable — $(R+\Delta R)\hat x=b$, $\ \vert \Delta R\ \vert =O(\epsilon\ \vert R\ \vert )$.
- QR vs LU trade-off∆qr-vs-lu-for-linear-systems — unconditional stability vs $2\times$ speed.
- Relative-error bound∆bite-linear-system-relative-error-bound — $\le\epsilon\kappa _ 2(A)+O(\epsilon^2)$.
- When to use the SVD / three-method flops∆bite-linear-system-when-svd, ∆bite-linear-system-three-method-flops — near-rank-deficient $A$; $\tfrac23 n^3$ / $\tfrac43 n^3$ / $\tfrac{20}{3}n^3$.
- Forward sensitivity∆bite-linear-system-relative-error-bound — $\ \vert \Delta x\ \vert /\ \vert x\ \vert $ amplified by $\kappa _ 2(A)$ (PS2 Q7 one-sided form).
Source: Lecture 5–6, §5–6 (LU/QR/SVD solvers); Lecture 7 Theorem 7.1.
Least-squares problems
For overdetermined full-rank $\min _ x\ \vert Ax-b\ \vert _ 2$ the QR route ($\hat x=R^{-1}Q^\top b$) is backward stable; the normal equations $A^\top Ax=A^\top b$ + Cholesky are fast but have forward error $O(u\kappa _ 2(A)^2)$ because forming $A^\top A$ squares the condition number (the Cholesky solve itself is still backward stable). Geometrically $\hat x$ makes the residual orthogonal to $\mathrm{range}(A)$ — Galerkin orthogonality / the normal equations.
- Setup / QR solution∆bite-least-squares-setup, ∆bite-least-squares-qr-solution, ∆qr-least-squares-backward-stability — $\hat x=R^{-1}Q^\top b$; backward stable.
- Normal equations (+ derivation)∆normal-equations-derivation, ∆normal-equations-vs-qr — set $\nabla\ \vert Ax-b\ \vert ^2=0$; $\kappa^2$ forward error.
- Geometry / SVD / pseudoinverse forms∆bite-least-squares-geometric-interpretation, ∆bite-least-squares-svd-solution, ∆bite-least-squares-via-pseudoinverse — residual $\perp\mathrm{range}(A)$; $V\hat\Sigma^{-1}\hat s$; $A^\dagger b$.
- Best approximation in inner-product spaces∆best-approximation-condition-general — $\langle v-\hat w,u\rangle=0$ Galerkin form (carded in Notes - Numerical Analysis HT24, Best approximation in inner product spacesU).
Source: Lecture 6, §6.5–6.7 (least-squares via QR / normal equations); Problem Sheet 2 Q6.
Numerical stability and conditioning
Conditioning vs stability; the matrix condition number
Conditioning is a property of the problem ($\kappa$ = sensitivity of $f(x)$ to perturbations of $x$); backward stability is a property of the algorithm (computed $\hat y=f(x+\Delta x)$ with $\ \vert \Delta x\ \vert =O(\epsilon\ \vert x\ \vert )$). The folk theorem: backward-stable algorithm + well-conditioned problem $\Rightarrow$ accurate answer. The matrix condition number $\kappa _ 2(A)=\sigma _ {\max}/\sigma _ {\min}$ is exactly the amplification factor for $Ax=b$.
- Definitions∆conditioning-absolute-and-relative, ∆backward-stability-definition, ∆forward-stability-definition, ∆matrix-condition-number-definition — $\kappa$, $\kappa _ r$; backward/forward stability; $\kappa _ 2=\sigma _ {\max}/\sigma _ {\min}$.
- fl notation / scalar bounds∆floating-point-notation, ∆fl-scalar-arithmetic-bounds, ∆numerical-stability-delta-notation, ∆bite-unit-roundoff-and-epsilon-convention — $\mathrm{fl}(\cdot)$; $u\approx 10^{-16}$; $\epsilon=O(u)$ convention.
- Folk theorem∆backward-stable-implies-forward-stable-justification, ∆bite-backward-stable-plus-well-conditioned — backward + well-conditioned = accurate.
Source: Lecture 7, §7.1–7.4 (floating point, conditioning, stability).
Backward-stable linear-system error bound
If $(A+\Delta A)\hat x=b$ with $\ \vert \Delta A\ \vert \le\epsilon\ \vert A\ \vert $ and $\kappa _ 2(A)\ll\epsilon^{-1}$ then $\ \vert \hat x-x\ \vert /\ \vert x\ \vert \le\epsilon\kappa _ 2(A)+O(\epsilon^2)$. The proof is a Neumann-series expansion of $(A+\Delta A)^{-1}$; the hypothesis $\kappa _ 2(A)\ll\epsilon^{-1}$ is precisely what makes that series converge. This single theorem is why $\kappa _ 2$ is defined the way it is.
- Statement∆backward-stable-linear-system-error-bound — $\ \vert \hat x-x\ \vert /\ \vert x\ \vert \le\epsilon\kappa _ 2(A)+O(\epsilon^2)$.
- Proof (+ strategy, hypothesis)∆backward-stable-linear-system-error-bound-proof, ∆bite-bs-linear-system-error-strategy, ∆bite-bs-linear-system-error-hypothesis-kappa-bound — Neumann series; where $\kappa _ 2\ll\epsilon^{-1}$ enters.
Source: Lecture 7, §7.4 (Theorem 7.1).
Stability of the building blocks
Vector–vector and matrix–vector products are backward stable; matrix–matrix multiplication is not (only a forward-error bound $\ \vert \mathrm{fl}(AB)-AB\ \vert \le\epsilon\ \vert A\ \vert \ \vert B\ \vert $, hence relative error $\le\epsilon\min(\kappa _ 2(A),\kappa _ 2(B))$). The redeeming exception is multiplication by an orthogonal matrix, which is backward stable since $\ \vert Q\ \vert _ 2=1$ — the reason every stable algorithm in the course is built from orthogonal transformations.
- mat-vec / mat-mat backward error∆backward-error-of-matrix-vector-multiplication, ∆orthogonal-vs-general-matrix-multiplication-stability — mat-vec stable; mat-mat not.
- Forward-error bound (+ proof, strategy)∆forward-error-matrix-multiplication, ∆forward-error-matrix-multiplication-proof, ∆bite-matmul-forward-error-strategy — column-by-column + $\sigma _ {\min}$ division.
- Orthogonal multiplication is backward stable (+ proof)∆orthogonal-multiplication-backward-stable-proof, ∆orthogonal-product-forward-error-proof — fold the error into $\Delta B=Q^\top E$; squareness essential.
- Triangular / inverse / summary (bite)∆bite-triangular-systems-stable, ∆bite-no-backward-stable-inverse, ∆bite-stable-vs-unstable-algorithms-summary — triangular solves stable; no backward-stable inverse; the stable/unstable roster.
Source: Lecture 7, §7.5–7.7 (stability of triangular systems, LU, Cholesky, matrix multiplication, Householder QR); Problem Sheet 3 Q3.
Eigenvalue problems
The eigenvalue problem and the Schur decomposition
$Ax=\lambda x$ has no closed form — eigenvalues are roots of $\det(\lambda I-A)$ and by Abel–Galois no finite-step algorithm exists for $n\ge 5$ (companion-matrix argument), so all eigensolvers are iterative. The practical goal is the Schur form $A=UTU^\ast$ ($U$ unitary, $T$ upper triangular, $\mathrm{eig}(A)=\mathrm{diag}(T)$) because it is computable backward-stably via orthogonal similarities; $T$ is diagonal iff $A$ is normal, in which case Schur = the eigenvalue decomposition.
- No finite-step algorithm∆bite-standard-eigenvalue-problem, ∆bite-no-finite-step-eigenvalue-algorithm, ∆companion-matrix-eigenvalue-proof — companion matrix + Abel–Galois (proof carded in Notes - Numerical Analysis HT24, EigenvaluesU).
- Schur statement (+ proof, strategy)∆schur-decomposition, ∆schur-decomposition-proof, ∆bite-schur-decomposition-strategy — deflation by one eigenpair, recurse.
- Real Schur / eigenvalues / why the goal∆real-schur-decomposition, ∆schur-gives-eigenvalues, ∆schur-backward-stable-intuition, ∆bite-why-schur-as-goal, ∆bite-schur-as-eigenvalue-goal — $2\times2$ blocks; backward-stable since orthogonal.
- Diagonal $\iff$ normal (+ proof, strategy)∆diagonal-schur-iff-normal, ∆diagonal-schur-iff-normal-proof, ∆bite-diagonal-schur-iff-normal-strategy — outsources to the upper-triangular-normal lemma.
- Invariant subspaces / Sylvester reduction∆bite-schur-U-columns-not-eigenvectors, ∆bite-schur-invariant-subspaces, ∆schur-to-eigenvalue-decomposition-via-sylvester — only $u _ 1$ is an eigenvector; nested $A$-invariant flag; non-normal diagonalisation not backward stable.
- Worked example∆schur-worked-example — explicit $2\times2$ non-normal Schur.
Source: Lecture 8, §8.1 (Theorem 8.1, Schur decomposition; Theorem 8.2, diagonal iff normal).
Power method, inverse iteration, Rayleigh quotient iteration
The power method $x\leftarrow Ax/\ \vert Ax\ \vert $ converges to the dominant eigenpair linearly at rate $ \vert \lambda _ 2/\lambda _ 1 \vert $. Shift-and-invert $(A-sI)^{-1}$ targets the eigenvalue closest to $s$ (one $LU$ up front at $O(n^3)$, then $O(n^2)$ per step); Rayleigh quotient iteration adapts $s _ k=x _ k^\top Ax _ k$ for quadratic (cubic if symmetric) convergence. This is the conceptual seed of the QR algorithm and Krylov methods.
- Basic power method (+ analysis)∆bite-basic-power-method, ∆power-method-analysis, ∆bite-power-method-convergence-rate — $x\leftarrow Ax/\ \vert Ax\ \vert $; rate $ \vert \lambda _ 2/\lambda _ 1 \vert $ (analysis carded in Notes - Numerical Analysis HT24, Power methodU).
- Shifted inverse iteration∆shifted-inverse-power-method-algorithm, ∆shifted-inverse-power-method-cost, ∆bite-shifted-inverse-eigenvalue-identity, ∆bite-na-shifted-inverse-cost — eigenvalues of $(A-sI)^{-1}$ are $(\lambda _ i-s)^{-1}$; one $LU$ then $O(n^2)$/step.
- Rayleigh quotient iteration∆rayleigh-quotient-iteration, ∆bite-rayleigh-quotient-iteration-convergence-rate — adaptive shift; quadratic / cubic (symmetric).
- When useful / failure∆bite-basic-power-method-applications, ∆bite-na-power-method-failure-equal-modulus — PageRank; fails on equal-modulus dominant eigenvalues (failure card in Notes - Numerical Analysis HT24, Power methodU).
Source: Lecture 8, §8.2 (power method); §8.2.2 (shifted inverse / Rayleigh quotient iteration).
The QR algorithm (non-examinable in MT25)
Factor–swap $A _ k=Q _ kR _ k$, $A _ {k+1}=R _ kQ _ k$ keeps $A _ k$ similar to $A$ and drives it to triangular (Schur) form; it is simultaneously the power method on the first column and inverse iteration on the last, so shifts ($s _ k=A _ {nn}$ or Wilkinson) plus a one-off Hessenberg/tridiagonal reduction and deflation give $O(n^3)$ ($\approx 25 n^3$) and $2$–$4$ iterations per eigenvalue. Non-examinable in 2025-26 (lecture notes §9–10) but the underlying mechanism is examinable conceptually.
- Iteration / similarity (+ proof)∆qr-algorithm-basic-description, ∆qr-algorithm-similarity-proof, ∆bite-qr-algorithm-iteration — $A _ {k+1}=Q _ k^\top A _ kQ _ k$ (carded in Notes - Numerical Analysis HT24, QR algorithmU).
- Why it triangularises∆bite-qr-algorithm-why-triangularises — power + inverse-power method under the hood.
- Shifts / Hessenberg / deflation∆qr-algorithm-three-practical-modifications, ∆bite-shifted-qr-cost, ∆bite-qr-needs-hessenberg-preprocessing, ∆bite-qr-symmetric-tridiagonal — three practical modifications; $\approx 25 n^3$ (modifications card in Notes - Numerical Analysis HT24, QR algorithmU).
- Convergence proof (symmetric)∆qr-algorithm-convergence-proof-symmetric — full convergence argument (carded in Notes - Numerical Analysis HT24, QR algorithmU).
- Golub–Kahan SVD— no flashcard (non-examinable; bidiagonalise then symmetric QR on $B^\top B$, §10.2; only the cost bite ∆bite-svd-cost-via-golub-kahan exists).
Source: Lectures 9–10, §9–10 (QR algorithm, Hessenberg reduction, Golub–Kahan; non-examinable 2025-26).
Generalised eigenvalue problems and the QZ algorithm
$Ax=\lambda Bx$ generalises the standard problem; when $B\succ 0$ and $A$ symmetric a Cholesky $B=R^\top R$ reduces it to the symmetric standard problem $R^{-\top}AR^{-1}y=\lambda y$ (preserving symmetry, unlike $B^{-1}A$). The general case is solved by the QZ algorithm.
- Setup∆generalised-eigenvalue-problem-setup — find $\lambda,x\ne 0$ with $Ax=\lambda Bx$.
- Symmetric-PD reduction∆bite-generalised-eig-symmetric-pd-reduction — Cholesky congruence keeps symmetry (PS3 Q4).
- QZ algorithm∆bite-qz-algorithm-for-generalised-eig — unitary $Q,Z$ with $QAZ,QBZ$ triangular; $\approx 50 n^3$.
- Non-symmetric ill-conditioning∆bite-nonsymmetric-eigenvalues-ill-conditioned — Jordan-block perturbation moves eigenvalues by $\epsilon^{1/n}$.
Source: Lecture 10, §10.3 (QZ algorithm; non-examinable); Problem Sheet 3 Q4 (examinable reduction).
Krylov subspace methods
Krylov subspaces and the polynomial viewpoint
Look for $\hat x=p _ {k-1}(A)v\in\mathcal K _ k(A,v)=\mathrm{span}(v,Av,\dots,A^{k-1}v)$, choosing the polynomial optimally for the problem (the power method is the special case $p(z)=z^{k-1}$). Convergence analysis reduces entirely to polynomial approximation on the spectrum; the naive basis $[v,Av,\dots]$ is hopelessly ill-conditioned, which is why Arnoldi/Lanczos build an orthonormal basis on the fly.
- General idea / definition∆krylov-methods-general-idea, ∆krylov-subspace-definition, ∆bite-krylov-polynomial-form — $\mathcal K _ k$; every element is $p _ {k-1}(A)v$.
- Power method as Krylov / fast cases∆bite-power-method-as-krylov, ∆bite-krylov-fast-convergence-conditions, ∆bite-krylov-shift-invariance — clustered or few-distinct eigenvalues $\Rightarrow$ $O(1)$ iterations.
- Why orthonormalise∆why-orthonormal-krylov-basis, ∆naive-krylov-basis-conditioning — naive Krylov matrix is power-method ill-conditioned (PS3 Q6).
Source: Lecture 11, §11.1–11.2 (Krylov subspaces and the polynomial idea).
Arnoldi iteration and decomposition
Arnoldi builds an orthonormal Krylov basis by multiplying $A$ once and modified-Gram-Schmidt-orthogonalising, producing $AQ _ k=Q _ {k+1}\tilde H _ k$ with $H _ k=Q _ k^\top AQ _ k$ upper Hessenberg (the Rayleigh–Ritz compression of $A$). Cost $O(nk^2)$ and memory $O(nk)$ — it must keep all previous $q _ j$. A “happy breakdown” ($h _ {k+1,k}=0$) means the Krylov subspace is $A$-invariant: good news, the exact solution/eigenpairs are already inside.
- Decomposition / iteration∆arnoldi-decomposition, ∆arnoldi-iteration, ∆arnoldi-iteration-purpose, ∆arnoldi-iteration-outputs — $AQ _ k=Q _ {k+1}\tilde H _ k$; the modified-GS algorithm.
- Spans the Krylov subspace (+ proof, strategy)∆arnoldi-spans-krylov-proof, ∆bite-arnoldi-spans-krylov-strategy — polynomial-in-$A$ induction.
- Happy breakdown∆happy-breakdown — $h _ {k+1,k}=0\Rightarrow$ $A$-invariant subspace; exact solution found.
- Cost / role of $H _ k$ / vs Lanczos∆arnoldi-iteration-cost, ∆bite-arnoldi-Hk-role, ∆bite-arnoldi-per-step-cost, ∆bite-arnoldi-vs-naive-krylov-qr, ∆bite-arnoldi-vs-lanczos-storage — $O(nk^2)$; Ritz values; storage vs Lanczos.
Source: Lecture 11, §11.3 (Algorithm 11.1, Arnoldi iteration; Theorem 11.1).
GMRES
GMRES picks $x _ k=\arg\min _ {x\in\mathcal K _ k(A,b)}\ \vert Ax-b\ \vert _ 2$; via the Arnoldi decomposition this becomes the small Hessenberg least-squares problem $\min _ y\ \vert \tilde H _ ky-\ \vert b\ \vert _ 2 e _ 1\ \vert _ 2$, solved incrementally by Givens rotations with the residual norm read off for free. For diagonalisable $A=X\Lambda X^{-1}$, $\ \vert Ax _ k-b\ \vert _ 2\le\kappa _ 2(X)\min _ {p\in\mathcal P _ k,p(0)=1}\max _ {z\in\lambda(A)} \vert p(z) \vert \,\ \vert b\ \vert _ 2$: fast when eigenvalues are clustered away from $0$ or few-distinct, slow otherwise (hence preconditioning and restarting).
- Main idea / algorithm (+ justification)∆gmres-main-idea, ∆gmres-algorithm, ∆gmres-algorithm-justification, ∆bite-gmres-rhs-is-norm-times-e1, ∆bite-gmres-residual-from-rotated-rhs — residual minimisation reduced to Hessenberg LS; $Q _ {k+1}^\top b=\ \vert b\ \vert _ 2 e _ 1$.
- Convergence (+ proof, strategy)∆gmres-convergence, ∆gmres-convergence-proof, ∆bite-gmres-convergence-strategy — polynomial bound via diagonalisation + submultiplicativity.
- Fast-convergence cases / normal $A$∆gmres-fast-convergence-examples, ∆gmres-convergence-visualisation, ∆bite-gmres-bound-for-normal-matrices, ∆bite-gmres-finite-termination, ∆bite-gmres-non-normal-bound-issue — clustered / few-distinct; $\kappa _ 2(X)=1$ for normal; terminates by step $n$.
- Preconditioning∆gmres-preconditioner-desiderata, ∆ilu-preconditioner, ∆saddle-point-preconditioner, ∆preconditioning-as-inverse-approximation — solve $MAx=Mb$; ILU; the 3-step saddle-point preconditioner.
- Restarting / initial guess / cost∆gmres-random-restarts, ∆bite-gmres-with-initial-guess, ∆gmres-cost — restart to cap $O(nk^2)$/$O(nk)$ growth; affine $x _ 0+\mathcal K _ k(A,r _ 0)$.
- MINRES— no flashcard (non-examinable §13.4; symmetric-indefinite GMRES via Lanczos, $O(n)$ memory).
Source: Lecture 12, §12 (Theorem 12.1, GMRES convergence; preconditioning, restarting); Problem Sheets 3 Q7, 4 Q1.
Lanczos iteration and decomposition
For symmetric $A$ the Arnoldi Hessenberg collapses to a symmetric tridiagonal $T _ k$ (since $H _ k=Q _ k^\top AQ _ k$ is symmetric), giving the three-term recurrence $t _ {k+1,k}q _ {k+1}=(A-t _ {k,k}I)q _ k-t _ {k-1,k}q _ {k-1}$: cost $O(nk)$ and memory $O(n)$ instead of Arnoldi’s $O(nk^2)$/$O(nk)$. In floating point it suffers loss of orthogonality, fixed by reorthogonalisation.
- Lanczos = symmetric Arnoldi (+ proof)∆lanczos-as-symmetric-arnoldi, ∆arnoldi-tridiagonal-when-a-symmetric, ∆bite-symmetric-arnoldi-is-tridiagonal-why — $H _ k$ symmetric + Hessenberg $\Rightarrow$ tridiagonal.
- Decomposition / iteration / recurrence∆lanczos-decomposition, ∆lanczos-iteration, ∆lanczos-three-term-recurrence — $AQ _ k=Q _ {k+1}\tilde T _ k$; the 3-term recurrence.
- Cost / loss of orthogonality∆lanczos-iteration-cost, ∆lanczos-loss-of-orthogonality, ∆bite-lanczos-reorthogonalisation, ∆bite-lanczos-initial-vector — $O(nk)$ vs $O(nk^2)$; reorthogonalisation fix.
Source: Lecture 13, §13.1 (Lanczos iteration and decomposition).
Conjugate gradient method
For $A\succ 0$, CG is the Krylov iterate $x _ k=\arg\min _ {x\in\mathcal K _ k(A,b)}\ \vert x-x _ \ast\ \vert _ A$, equivalently Galerkin orthogonality $Q _ k^\top(Ax _ k-b)=0$, equivalently the tridiagonal projection $T _ ky=Q _ k^\top b$. The practical recurrence keeps the search directions mutually $A$-conjugate (and the residuals orthogonal scaled Lanczos vectors) at $O(n)$ memory; the error contracts at the Chebyshev rate $2\big(\tfrac{\sqrt\kappa-1}{\sqrt\kappa+1}\big)^k$, slow only when $\kappa _ 2(A)$ is large (hence preconditioning).
- Three equivalent characterisations∆conjugate-gradient-main-idea, ∆galerkin-condition, ∆conjugate-gradient-motivation, ∆bite-cg-three-equivalent-characterisations — $A$-norm min = Galerkin = tridiagonal projection.
- $A$-conjugacy / algorithm∆a-conjugacy, ∆conjugate-gradient-algorithm, ∆cg-conjugacy-importance, ∆bite-cg-per-step-cost, ∆bite-cg-needs-spd — $u^\top Av=0$; the $\alpha _ k,\beta _ k$ recurrence; one matvec/step.
- Conjugacy maintained (+ proof, strategy, key step)∆conjugate-gradient-maintains-conjugacy, ∆bite-cg-conjugacy-strategy, ∆bite-cg-conjugacy-key-inequality-beta-derivation — $\beta _ k$ is the forced Gram–Schmidt coefficient.
- Correctness (+ proof, strategy, hypothesis)∆conjugate-gradient-correctness-proof, ∆bite-cg-correctness-strategy, ∆bite-cg-correctness-hypothesis-spd, ∆bite-cg-residual-orthogonality, ∆bite-cg-finite-termination — $A$-norm minimiser; terminates in $\le n$; why $A\succ 0$.
- Convergence (+ proof, strategy)∆conjugate-gradient-convergence, ∆conjugate-gradient-convergence-proof, ∆cg-chebyshev-bound-on-eigenvalue-interval, ∆bite-cg-convergence-strategy, ∆bite-cg-where-sqrt-kappa-comes-from — Chebyshev $\sqrt\kappa$ rate.
- Slow when ill-conditioned / preconditioning / memory∆cg-slow-when-large-condition-number, ∆preconditioning-for-conjugate-gradient, ∆bite-cg-memory-vs-gmres — $M^\top AM\approx I$; $O(n)$ vs GMRES $O(nk)$.
Source: Lecture 13, §13.2–13.3 (Theorem 13.1 correctness, Theorem 13.2 convergence); Problem Sheets 4 Q2, Q3.
Chebyshev polynomials and the CG/GMRES rate
$T _ k(x)=\cos(k\arccos x)$ satisfies $ \vert T _ k \vert \le 1$ on $[-1,1]$ and grows fastest of all $\mathcal P _ k$ outside it; the shifted-scaled $p(x)=c _ kT _ k(\tfrac{2x-b-a}{b-a})$ with $p(0)=1$ minimises $\max _ {[a,b]} \vert p \vert $ and yields the $\sqrt\kappa$ in the CG bound (the $\sqrt{}$ coming from the change of variables $\tfrac12(z+z^{-1})=\tfrac{b+a}{b-a}$).
- Definition / recurrence∆chebyshev-polynomials-definition, ∆bite-na-three-term-recurrence-general — $T _ {k+1}=2xT _ k-T _ {k-1}$ (general orthogonal-polynomial recurrence in Notes - Numerical Analysis HT24, Orthogonal polynomialsU).
- Growth / symmetry∆chebyshev-polynomials-growth, ∆chebyshev-polynomials-symmetry — $ \vert T _ k \vert \le 1$ inside, $\gg 1$ outside; even-modulus.
- Shifted-scaled CG bound∆bite-chebyshev-shifted-scaled-form, ∆bite-chebyshev-cg-style-bound, ∆bite-chebyshev-why-optimal-for-cg, ∆bite-chebyshev-sqrt-kappa-origin — the $2(\tfrac{\sqrt\kappa-1}{\sqrt\kappa+1})^k$ minimax bound.
Source: Lecture 13, §13.3.1–13.3.2 (Chebyshev polynomials in the CG analysis).
Rayleigh–Ritz and the Lanczos eigenvalue algorithm
Rayleigh–Ritz extracts approximate eigenpairs from a subspace: eigendecompose $Q^\top AQ=V\hat\Lambda V^\top$, take Ritz values $\mathrm{diag}\hat\Lambda$ and Ritz vectors $QV$; Courant–Fischer gives the Cauchy-interlacing sandwich $\lambda _ {n-k+i}\le\hat\lambda _ i\le\lambda _ i$, exact iff $\mathrm{range}(Q)$ is $A$-invariant. The Lanczos algorithm = Lanczos iteration + Rayleigh–Ritz on the tridiagonal $T _ k$, converging fast to extremal eigenpairs (the Courant–Fischer sandwich beats the power method).
- Rayleigh–Ritz algorithm / variational∆rayleigh-ritz-algorithm, ∆rayleigh-ritz-variational-characterisation — project, solve small eigenproblem, lift; best Rayleigh quotient in $\mathrm{range}(Q)$.
- Exactness / interlacing / stopping∆bite-rayleigh-ritz-exact-when-invariant, ∆bite-cauchy-interlacing-for-ritz, ∆bite-ritz-residual-as-stopping-criterion — exact iff $A$-invariant; $\hat\lambda _ i\in[\lambda _ {n-k+i},\lambda _ i]$.
- Lanczos algorithm (+ extremal-convergence justification)∆lanczos-algorithm-use, ∆lanczos-algorithm, ∆rayleigh-ritz-becomes-tridiagonal-in-lanczos, ∆lanczos-extremal-convergence-justification — iteration + Rayleigh–Ritz; extremal eigenpairs first.
- Why extremal first∆bite-lanczos-equals-iteration-plus-rayleigh-ritz, ∆bite-lanczos-projected-tridiagonal, ∆bite-lanczos-ritz-values-and-vectors, ∆bite-lanczos-extremal-first, ∆bite-lanczos-beats-power-method — beats $k-1$ power-method steps.
Source: Lecture 13, §13.6 (Algorithm 13.1 Rayleigh–Ritz, Algorithm 13.2 Lanczos algorithm).
Randomised numerical linear algebra
Gaussian random matrices
Two facts power all of randomised NLA: orthogonal invariance (if $G$ is Gaussian and $Q$ orthogonal independent of $G$, then $QG,GQ$ are Gaussian) and Marchenko–Pastur (rectangular random matrices are well-conditioned). Square Gaussians, by contrast, can be arbitrarily ill-conditioned ($\mathbb E[\sigma _ {\min}^{-2}]=\infty$) — the reason oversampling is needed.
- Definition / two key facts∆gaussian-random-matrix-definition, ∆bite-gaussian-two-key-facts — iid $N(0,1)$; orthogonal invariance + Marchenko–Pastur.
- Orthogonal invariance (+ proofs)∆orthogonal-invariance-of-gaussian-matrices, ∆bite-gaussian-invariance-two-proofs, ∆bite-gaussian-invariance-utility — mean+covariance or pdf change-of-variables.
- Square-Gaussian ill-conditioning∆square-gaussian-ill-conditioning, ∆bite-when-gaussian-sketch-suboptimal — $\mathbb E[\sigma _ {\min}^{-2}]=\infty$; structured fast sketches.
Source: Lecture 14, §14.1.1 (orthogonal invariance of Gaussian matrices).
Marchenko–Pastur
The singular values of an $m\times n$ ($m\ge n$) iid mean-0 variance-1 matrix asymptotically fill $[\sqrt m-\sqrt n,\sqrt m+\sqrt n]$ with the MP density, so $\kappa _ 2\approx\frac{1+\sqrt{n/m}}{1-\sqrt{n/m}}=O(1)$ — “rectangular random matrices are well-conditioned”, with an $\exp(-ct^2)$ failure tail. Distribution-free (no Gaussianity needed for MP itself). This is the workhorse behind sketch-and-solve, Blendenpik and HMT.
- Statement / well-conditioning∆marchenko-pastur, ∆random-rectangular-matrices-are-well-conditioned, ∆bite-marchenko-pastur-headline — support $[\sqrt m-\sqrt n,\sqrt m+\sqrt n]$; $\kappa _ 2=O(1)$.
- Support / distribution-free / tail∆bite-marchenko-pastur-support, ∆bite-marchenko-pastur-distribution-free, ∆bite-marchenko-pastur-failure-probability — any iid distribution; $\exp(-ct^2)$ failure.
- Visualisation∆marchenko-pastur-visualisation — singular-value histograms vs aspect ratio.
Source: Lecture 14, §14.1.2 (Theorem 14.1, Marchenko–Pastur).
Sketch-and-solve and Blendenpik for least-squares
For tall $\min _ x\ \vert Ax-b\ \vert _ 2$, randomly mix the rows with a Gaussian sketch $G\in\mathbb R^{s\times m}$ ($s=cn$) and solve $\min _ x\ \vert G(Ax-b)\ \vert _ 2$: with high probability $\ \vert A\hat x-b\ \vert _ 2\le\frac{\sqrt s+\sqrt{n+1}}{\sqrt s-\sqrt{n+1}}\ \vert Ax _ \ast-b\ \vert _ 2$ ($\le 3\times$ at $s=4(n+1)$), via orthogonal invariance + MP. Blendenpik instead uses the sketch as a preconditioner — $GA=\hat Q\hat R$, then CG on $A\hat R^{-1}$, which MP makes $O(1)$-conditioned — for full-accuracy solutions in $O(mn\log m+n^3)$.
- Problem / row subset selection∆randomised-least-squares-problem-setup, ∆row-subset-selection-intuition, ∆row-subset-selection-failure-example — why deterministic subselection fails.
- Sketch-and-solve (+ proof, strategy, key, hypothesis)∆least-squares-sketch-and-solve, ∆sketch-and-solve-error-bound, ∆sketch-and-solve-error-bound-proof, ∆bite-sketch-and-solve-strategy, ∆bite-sketch-and-solve-key-inequality-mp-sandwich, ∆bite-sketch-and-solve-hypothesis-gaussian — the MP-sandwich residual bound.
- Prefactor / cost / vs subselection∆bite-sketch-and-solve-prefactor, ∆bite-sketch-and-solve-cost, ∆bite-why-sketching-beats-row-selection, ∆sketch-and-solve-connection-to-subset-selection — $\tau=3$ at $s=4(n+1)$; need fast sketches.
- Blendenpik∆bite-blendenpik-one-sentence, ∆bite-blendenpik-well-conditioned-preconditioner, ∆bite-blendenpik-inner-solver-cost — sketch-to-precondition; $\kappa _ 2(A\hat R^{-1})=O(1)$ via MP; CG inner solve (PS4 Q5).
Source: Lecture 14, §14.2–14.5 (Theorem 14.2 sketch-and-solve, Theorem 14.3 Blendenpik); Problem Sheet 4 Q5.
Randomised SVD (HMT)
Halko–Martinsson–Tropp: draw Gaussian $G\in\mathbb R^{n\times r}$, $AG=QR$, return $\hat A=QQ^\top A$ — a rank-$r$ approximation in $O(mnr)$. It is near-optimal: $\mathbb E\ \vert A-\hat A\ \vert _ F\le\sqrt{1+\tfrac{\hat r}{r-\hat r-1}}\,\ \vert A-A _ {\hat r}\ \vert _ F$ for $\hat r<r-1$, the $\hat r<r$ “oversampling” being the artefact of square-Gaussian ill-conditioning. The proof writes the error as $(I-QQ^\top)(A-A _ {\hat r})(I-GM^\top)$ with $M=(V _ 1^\top G)^\dagger V _ 1^\top$ and bounds each factor by MP.
- Purpose / algorithm / cost∆hmt-purpose, ∆hmt-algorithm, ∆bite-hmt-cost, ∆hmt-srft-alternative — $\hat A=QQ^\top A$ in $O(mnr)$; SRFT alternative.
- Error bound / oversampling∆hmt-algorithm-error, ∆hmt-oversampling-parameter, ∆bite-hmt-prefactor-evaluation — near-optimal Frobenius bound; $r=\hat r+p$.
- Rough error (+ proof, strategy, key step)∆hmt-algorithm-rough-error, ∆hmt-algorithm-rough-error-proof, ∆bite-hmt-rough-error-strategy, ∆bite-hmt-rough-error-key-inequality-m-choice — three-factor decomposition + MP on each piece; the engineered $M$.
- Intuition / to SVD∆bite-hmt-why-range-AG-captures-dominant-subspace, ∆bite-hmt-to-svd-factorisation — $\mathrm{range}(AG)$ captures the top subspace; small SVD of $Q^\top A$.
- Generalised Nyström— no flashcard (non-examinable §15.4; $\hat A=AX(Y^\top AX)^\dagger Y^\top A$, $O(mn\log n+r^3)$; PS4 Q7 PSD form examinable, see ∆bite-cur-as-generalised-nystrom).
Source: Lecture 15, §15.1–15.2 (Algorithm 15.1 HMT, Theorem 15.1); §15.4 (generalised Nyström, non-examinable); Problem Sheet 4 Q4, Q6.
CUR approximation
$A\approx CUR$ with $C,R$ actual column/row submatrices and (the powerful choice) $U=A[I,J]^{-1}$ — computable without ever reading all of $A$, structure- and interpretability-preserving, and a special case of generalised Nyström with subsampling sketches. With $U=A[I,J]^{-1}$ the residual has the LU Schur-complement block structure. There exists a choice with $\ \vert A-CUR\ \vert _ F\le(r+1)\ \vert A-A _ r\ \vert _ F$ — dimension-free, no $\sqrt m,\sqrt n$; the course proves the weaker $\sqrt{(m-r)r+1}(\sqrt{(n-r)r+1}+1)$ bound (§16.1–16.2 non-examinable).
- Definition / error estimate∆cur, ∆cur-estimating-approximation-error, ∆bite-cur-no-need-to-read-whole-A — $A\approx CUR$; sketched $\ \vert A-CUR\ \vert _ F$ estimate.
- LU connection / as generalised Nyström∆cur-error-decomposition, ∆cur-lu-connection, ∆bite-cur-as-generalised-nystrom — Schur-complement residual; $X,Y$ subsampling.
- Error bounds (+ proof, strategy, key)∆cur-error-bound, ∆cur-error-bound-proof, ∆bite-cur-three-error-bounds, ∆bite-cur-bounds-dimension-free, ∆bite-cur-error-bound-strategy, ∆bite-cur-error-bound-key-inequality-max-vol — $(r+1)$-style bounds; column-only / standard / fast CUR.
- Sketched LS multi-RHS lemma (+ proof, strategy)∆sketched-least-squares-multiple-rhs, ∆sketched-least-squares-multiple-rhs-proof, ∆bite-sketched-ls-multiple-rhs-strategy — $\ \vert A\hat X-B\ \vert _ F/\ \vert AX-B\ \vert _ F\le\ \vert S\ \vert _ 2/\sigma _ {\min}(SQ)$.
- Max-volume submatrix lemma (+ proof, strategy)∆max-vol-orthonormal-submatrix, ∆max-vol-orthonormal-submatrix-proof, ∆bite-max-vol-orthonormal-strategy — $1/\sigma _ {\min}(SQ)\le\sqrt{(n-r)r+1}$ via determinant maximisation.
- Finding good $I,J$∆bite-cur-finding-good-subsets — many choices work; Osinsky $O(mn^2)$ deterministic.
Source: Lecture 16, §16 (CUR definition, LU connection); §16.1–16.2 (existence theorems and proofs, non-examinable).
Supporting results (Useful miscellany)
Auxiliary identities reused inside the proofs above.
- Neumann series ∆neumann-series — $(I-X)^{-1}=\sum X^k$ for $\ \vert X\ \vert <1$; used in the backward-stable solve bound.
- Projector norm identity ∆projector-norm-identity — $\ \vert I-P\ \vert _ 2=\ \vert P\ \vert _ 2$ for any nontrivial projector; used in HMT and CUR proofs.
- Block-triangular inverse ∆block-upper-triangular-inverse-identity — $\big[\begin{smallmatrix}I&X\\0&I\end{smallmatrix}\big]^{-1}=\big[\begin{smallmatrix}I&-X\\0&I\end{smallmatrix}\big]$; the $\mathrm{eig}(AB)=\mathrm{eig}(BA)$ similarity (PS3 Q5).
- Rectangular-orthonormal clarification ∆rectangular-orthonormal-matrix-clarification — $Q^\top Q=I$ but $QQ^\top\ne I$ for tall $Q$.
- Diagonalisability conditions ∆diagonalisability-equivalent-conditions — minimal-polynomial / Jordan-block criteria; field caveat.
- Trace / transpose / one-sided inverse (bite) ∆bite-trace-cyclic-and-frobenius, ∆bite-transpose-inverse-of-product, ∆bite-square-one-sided-inverse, ∆all-ones-matrix-eigenvalues — $\mathrm{Trace}(XY)=\mathrm{Trace}(YX)$; $\ \vert B\ \vert _ F^2=\mathrm{Trace}(B^\top B)$; rank-1 $\mathbf 1\mathbf 1^\top$.
- Jordan normal form ∆bite-na-no-finite-step-algorithm — JNF is the “ultimate” but numerically useless reduction; companion-matrix link (Jordan-block cards in Notes - Linear Algebra MT23, Jordan normal formU).
Source: Lecture 1 §1.7 / Lecture 7 §7.4 (Neumann series, projector identity); Problem Sheets 1, 3, 5.
Methods
LU / Cholesky direct solver
Factor once ($PA=LU$ at $\tfrac23 n^3$, or $A=R^\top R$ at $\tfrac13 n^3$ for $A\succ 0$), then solve $Ax=b$ by two triangular solves at $O(n^2)$; reuse the factorisation for extra right-hand sides. Default solver; backward stable up to the growth factor (LU) / unconditionally (Cholesky).
- LU solve∆pivoted-lu-decomposition-algorithm, ∆bite-lu-cost, ∆bite-lu-multiple-rhs — factorise then triangular-solve.
- Cholesky solve∆cholesky-factorisation, ∆bite-cholesky-cost, ∆bite-cholesky-backward-stable — SPD half-cost stable solver.
- Stability∆lu-backward-stable-solve-condition, ∆bite-stable-vs-unstable-algorithms-summary — stable iff $\ \vert L\ \vert \ \vert U\ \vert =O(\ \vert A\ \vert )$.
Source: Lectures 5, 7.
Householder / Givens QR and QR-based solvers
Orthogonal triangularisation produces $A=QR$ stably; then $Ax=b$ via $Rx=Q^\top b$ and least-squares via $\hat x=R^{-1}Q^\top b$. Givens variant targets sparse/Hessenberg structure. $2\times$ slower than LU but unconditionally backward stable.
- Householder QR∆qr-with-householder, ∆householder-qr-backward-stable, ∆bite-householder-qr-cost — the workhorse stable factorisation.
- Givens QR∆bite-na-givens-hessenberg-qr-cost — $O(n^2)$ Hessenberg QR (the QR-algorithm inner step).
- QR linear solve / least-squares∆qr-vs-lu-for-linear-systems, ∆qr-least-squares-backward-stability, ∆bite-least-squares-qr-solution — $\hat x=R^{-1}Q^\top b$.
Source: Lecture 6.
Power method / inverse iteration / Rayleigh quotient iteration
Single-eigenpair iterations: power method for the dominant pair, shift-and-invert for the pair nearest $s$, Rayleigh quotient iteration for adaptive (cubic, symmetric) speed. Building block for the QR algorithm and Krylov eigensolvers.
- Power / inverse / RQI∆bite-basic-power-method, ∆shifted-inverse-power-method-algorithm, ∆rayleigh-quotient-iteration — the three iterations and their rates.
Source: Lecture 8.
QR algorithm and Golub–Kahan SVD (non-examinable in MT25)
The universal dense eigensolver: Hessenberg/tridiagonal reduction, then shifted QR with deflation, $\approx 25 n^3$ (symmetric $\tfrac43 n^3$ for eigenvalues). Golub–Kahan adapts it to the SVD via bidiagonalisation.
- QR algorithm∆qr-algorithm-basic-description, ∆qr-algorithm-three-practical-modifications — factor–swap with shifts + deflation (carded in Notes - Numerical Analysis HT24, QR algorithmU).
- Golub–Kahan / QZ∆bite-svd-cost-via-golub-kahan, ∆bite-qz-algorithm-for-generalised-eig — bidiagonalisation; generalised-eigenproblem QZ.
Source: Lectures 9–10 (non-examinable 2025-26).
Arnoldi / GMRES (general $A$)
Build an orthonormal Krylov basis (Arnoldi), then minimise the residual over it (GMRES) by a small Hessenberg least-squares; preconditioning and restarting control cost. The general-matrix iterative solver.
- Arnoldi∆arnoldi-iteration, ∆happy-breakdown — orthonormal Krylov basis + Hessenberg compression.
- GMRES∆gmres-algorithm, ∆gmres-convergence, ∆gmres-preconditioner-desiderata, ∆gmres-random-restarts — residual minimisation; polynomial convergence; preconditioning/restart.
Source: Lectures 11–12.
Lanczos / CG (symmetric $A$)
Symmetric specialisation: Lanczos’s three-term recurrence ($O(n)$ memory), then CG for $A\succ 0$ linear systems ($A$-norm minimisation, Chebyshev rate) or Rayleigh–Ritz for extremal eigenpairs (Lanczos algorithm). MINRES handles the indefinite case (non-examinable).
- Lanczos iteration∆lanczos-iteration, ∆lanczos-three-term-recurrence — symmetric Arnoldi, $O(n)$ memory.
- CG∆conjugate-gradient-algorithm, ∆conjugate-gradient-convergence, ∆preconditioning-for-conjugate-gradient — SPD Krylov solver, $\sqrt\kappa$ rate.
- Lanczos eigenvalue algorithm∆lanczos-algorithm, ∆bite-lanczos-extremal-first — iteration + Rayleigh–Ritz, extremal first.
Source: Lecture 13.
Randomised methods
Sketching for the over-sized regime: sketch-and-solve / Blendenpik for least-squares, HMT (and generalised Nyström) for low-rank approximation, CUR for interpretable/sub-matrix low-rank. All rely on orthogonal invariance + Marchenko–Pastur.
- Least-squares∆least-squares-sketch-and-solve, ∆bite-blendenpik-one-sentence — sketch-and-solve; sketch-to-precondition.
- Low-rank∆hmt-algorithm, ∆cur, ∆bite-cur-as-generalised-nystrom — HMT randomised SVD; CUR; generalised Nyström.
Source: Lectures 14–16.