NLA MT25, Overview of results and methods


This is a map of every main result and method in the course. This course studies two problems only: the linear system $Ax = b$ and the eigenproblem $Ax = \lambda x$, and the whole course is the story of decomposing $A$ so that these become trivial, analysing when the decompositions are computable in a backward-stable way, and replacing them by iterative or randomised surrogates when $n$ is too large for the $O(n^3)$ direct methods.

Foundations: norms, subspaces, structure

Vector and matrix norms

Norms are the measuring stick for “how big is $A - \hat A$”. The induced $p$-norm $\ \vert A\ \vert _ p = \max _ {x\ne 0}\ \vert Ax\ \vert _ p/\ \vert x\ \vert _ p$ has the closed forms $\ \vert A\ \vert _ 2 = \sigma _ {\max}(A)$, $\ \vert A\ \vert _ 1$ = max absolute column sum, $\ \vert A\ \vert _ \infty$ = max absolute row sum; the Frobenius and trace (nuclear) norms are not induced but are, like $\ \vert \cdot\ \vert _ 2$, unitarily invariant. Submultiplicativity $\ \vert AB\ \vert _ p \le \ \vert A\ \vert _ p\ \vert B\ \vert _ p$ is the workhorse inequality behind every perturbation bound in the course.

Source: Lecture 1, §1.5–1.6 (vector and matrix norms); Problem Sheets 1 Q2–Q4.

Subspaces and the dimension-count lemma

A $d$-dimensional subspace is represented by a (preferably orthonormal) $n\times d$ matrix via its column span. Lemma 1.1 — if $\dim\mathcal S _ 1+\dim\mathcal S _ 2>n$ then $\mathcal S _ 1\cap\mathcal S _ 2\ne\{0\}$ — is the single combinatorial fact that powers the proofs of Courant–Fischer, Eckart–Young and Weyl.

Source: Lecture 1, §1.7 (subspaces and orthonormal matrices), Lemma 1.1.

Structured matrices and normality

Symmetric, orthogonal, skew-symmetric and (the umbrella) normal matrices are exactly the ones that are unitarily diagonalisable. The two structural lemmas — an upper-triangular normal matrix is diagonal, and a real symmetric matrix is orthogonally diagonalisable with real eigenvalues — feed directly into the Schur-form theory.

Source: Lecture 1, §1.2 (structured matrices); §1.4 (spectral theorem for real symmetric matrices).

The singular value decomposition

SVD: existence and structure

Every $A\in\mathbb R^{m\times n}$ ($m\ge n$) factors as $A=U\Sigma V^\top$ with $U$ orthonormal columns, $V$ orthogonal and $\Sigma=\operatorname{diag}(\sigma _ 1\ge\cdots\ge\sigma _ n\ge 0)$, equivalently $A=\sum _ i\sigma _ i u _ i v _ i^\top$. Geometrically it is rotate–scale–rotate: the image of the unit sphere is an ellipsoid with semi-axes $\sigma _ i$. The singular values are unique (square roots of the eigenvalues of $A^\top A$); the singular vectors are unique only up to coupled sign flips (and rotation within repeated-$\sigma$ subspaces). It is the central object of the course.

Source: Lecture 2, §2 (Theorem 2.1, SVD existence); §2.3 (uniqueness).

SVD properties: rank and the four fundamental subspaces

From $A=U\Sigma V^\top$ everything is read off: $\mathrm{rank}(A)$ = number of positive $\sigma _ i$; column space = first $r$ columns of $U$; row space = first $r$ of $V$; $\ker A$ = last $n-r$ of $V$; $\ker A^\top$ = last $n-r$ of $U$. The proof is the change of variables $y=V^\top x$ then read off $\Sigma$.

Source: Lecture 2, §2.1 (rank, column/row/null space); Lecture 3, §3.1 (image compression).

SVD and eigenvalues; the Jordan–Wielandt matrix

For normal $A$ the singular values are the eigenvalue moduli, $\sigma _ i= \vert \lambda _ i \vert $ (special cases: symmetric, unitary, skew-symmetric). The Jordan–Wielandt (Hermitian-dilation) matrix $\big[\begin{smallmatrix}0&A\\A^\top&0\end{smallmatrix}\big]$ has eigenvalues $\pm\sigma _ i$ (plus $m-n$ zeros) and is the standard tool for porting a symmetric-eigenvalue result to an SVD analogue.

Source: Lecture 2, §2.2 (SVD and symmetric eigenvalue decomposition); Problem Sheet 1 Q6.

Symmetric eigenvalue decomposition

Every real symmetric $A$ has $A=V\Lambda V^\top$ with $V$ real orthogonal and $\Lambda$ real diagonal: the eigenvalues are real and the eigenvectors of distinct eigenvalues are orthogonal. It is the SVD’s twin and the engine of the SVD existence proof; the general normal case gives $A=U\Lambda U^\ast$ with $U$ unitary.

Source: Lecture 2, §2 (symmetric eigenvalue decomposition); Lecture 1 §1.4.

Low-rank approximation: the truncated SVD (Eckart–Young)

The rank-$r$ truncation $A _ r=\sum _ {i\le r}\sigma _ i u _ i v _ i^\top$ is the best rank-$r$ approximation in any unitarily invariant norm, with error $\ \vert A-A _ r\ \vert _ 2=\sigma _ {r+1}$ and $\ \vert A-A _ r\ \vert _ F=\sqrt{\sum _ {i>r}\sigma _ i^2}$. This is the single most consequential SVD application: PCA, image compression, matrix completion and the entire randomised-SVD chapter are corollaries. NLA MT25 leaves this uncarded itself; the statement and proof live in the Part A entry.

Source: Lecture 3, §3 (Theorem 3.1, low-rank approximation via the truncated SVD).

Pseudoinverse

For rank-$r$ $A=U _ r\Sigma _ r V _ r^\top$ the pseudoinverse is $A^\dagger=V _ r\Sigma _ r^{-1}U _ r^\top$; it is the unique matrix satisfying the four Moore–Penrose conditions, equals $A^{-1}$ when nonsingular, gives the minimum-norm least-squares solution $x=A^\dagger b$, and has $\ \vert A^\dagger\ \vert _ 2=1/\sigma _ {\min}^{\mathrm{nz}}(A)$. It is the bridge between the SVD and least-squares, and the central object of the HMT and CUR analyses.

Source: Lecture 15, §15.1 (pseudoinverse, introduced for the HMT analysis).

Variational and perturbation theory

Courant–Fischer minmax theorem

For symmetric $A$, $\lambda _ i(A)=\max _ {\dim\mathcal S=i}\min _ {x\in\mathcal S}\frac{x^\top Ax}{x^\top x}=\min _ {\dim\mathcal S=n-i+1}\max _ {x\in\mathcal S}\frac{x^\top Ax}{x^\top x}$, with the analogous statement for singular values. The two subspace dimensions sum to $n+1$, so the dimension-count lemma forces an intersection — that is the entire proof idea. It is the source of Weyl’s inequality, Cauchy interlacing and the Rayleigh–Ritz bound.

Source: Lecture 4, §4 (Theorem 4.1, Courant–Fischer minmax theorem).

Weyl's inequality

Perturbing $A$ by $E$ moves every singular value, and every eigenvalue of a symmetric matrix, by at most $\ \vert E\ \vert _ 2$ — additively, with no condition-number amplification: $ \vert \sigma _ i(A+E)-\sigma _ i(A) \vert \le\ \vert E\ \vert _ 2$. So these quantities are perfectly conditioned and a backward-stable algorithm computes them to working precision; contrast the $\epsilon^{1/n}$ catastrophe for non-symmetric eigenvalues.

Source: Lecture 4, §4.1 (Theorem 4.2, Weyl’s inequality); §4.1.1 (non-symmetric sensitivity).

Direct factorisations and linear systems

LU factorisation with pivoting

Gaussian elimination is the rank-1 peeling $A=\sum _ i L _ iU _ i$; partial pivoting gives $PA=LU$ with $ \vert L _ {ij} \vert \le 1$, exists for every nonsingular $A$, and costs $\tfrac23 n^3$ flops with $O(n^2)$ per subsequent right-hand side. It is backward stable up to the growth factor $\rho _ n=\ \vert U\ \vert /\ \vert A\ \vert $ — empirically $O(\sqrt n)$ but Wilkinson-$2^{n-1}$ in the worst case, the resolution of which is a famous open problem. It is the de-facto linear solver despite QR’s unconditional stability, because it is $2\times$ faster.

Source: Lecture 5, §5.1–5.2 (LU with partial pivoting); Lecture 7 §7.5.2 (instability/growth factor); Problem Sheets 2 Q1, Q3.

Cholesky factorisation

For $A\succ 0$ the LU peeling becomes symmetric, $A=R^\top R$ with $R$ upper triangular and a positive diagonal, at half the cost ($\tfrac13 n^3$) and with no pivoting ever needed (the active trailing block stays SPD). It is unconditionally backward stable because there is no pivot growth and $ \vert R _ {ij} \vert \le\sqrt{\ \vert A\ \vert }$. The indefinite analogue is $LDL^\top$ with $1\times1$ and $2\times2$ blocks.

Source: Lecture 5, §5.3 (Cholesky for $A\succ 0$); Lecture 7 §7.5.3 (backward stability); Problem Sheet 2 Q4.

QR factorisation

$A=QR$ with orthonormal $Q$ and upper-triangular $R$, computed stably by orthogonal triangularisation (Householder reflectors, or Givens rotations for sparse/Hessenberg matrices) rather than the unstable triangular orthogonalisation of Gram–Schmidt. Householder QR is unconditionally backward stable ($\ \vert \hat Q\hat R-A\ \vert =O(\epsilon\ \vert A\ \vert )$, $\ \vert \hat Q^\top\hat Q-I\ \vert =O(\epsilon)$) at $\tfrac43 n^3$ flops because it is a product of norm-1 orthogonal operations.

Source: Lecture 6, §6.1–6.4 (Gram–Schmidt, Householder, Givens); Lecture 7 §7.7 (stability); Problem Sheet 2 Q2, Q4c.

Solving linear systems

$Ax=b$ is solved by LU + two backward-stable triangular solves (default), Cholesky for $A\succ 0$, QR for guaranteed stability at $2\times$ the cost, or the SVD ($\approx 10\times$ QR) when $A$ is near-singular. The forward error of any backward-stable solve is $\ \vert \hat x-x\ \vert /\ \vert x\ \vert \lesssim\epsilon\,\kappa _ 2(A)$ — the algorithm is exonerated, ill-conditioning is the problem’s fault.

Source: Lecture 5–6, §5–6 (LU/QR/SVD solvers); Lecture 7 Theorem 7.1.

Least-squares problems

For overdetermined full-rank $\min _ x\ \vert Ax-b\ \vert _ 2$ the QR route ($\hat x=R^{-1}Q^\top b$) is backward stable; the normal equations $A^\top Ax=A^\top b$ + Cholesky are fast but have forward error $O(u\kappa _ 2(A)^2)$ because forming $A^\top A$ squares the condition number (the Cholesky solve itself is still backward stable). Geometrically $\hat x$ makes the residual orthogonal to $\mathrm{range}(A)$ — Galerkin orthogonality / the normal equations.

Source: Lecture 6, §6.5–6.7 (least-squares via QR / normal equations); Problem Sheet 2 Q6.

Numerical stability and conditioning

Conditioning vs stability; the matrix condition number

Conditioning is a property of the problem ($\kappa$ = sensitivity of $f(x)$ to perturbations of $x$); backward stability is a property of the algorithm (computed $\hat y=f(x+\Delta x)$ with $\ \vert \Delta x\ \vert =O(\epsilon\ \vert x\ \vert )$). The folk theorem: backward-stable algorithm + well-conditioned problem $\Rightarrow$ accurate answer. The matrix condition number $\kappa _ 2(A)=\sigma _ {\max}/\sigma _ {\min}$ is exactly the amplification factor for $Ax=b$.

Source: Lecture 7, §7.1–7.4 (floating point, conditioning, stability).

Backward-stable linear-system error bound

If $(A+\Delta A)\hat x=b$ with $\ \vert \Delta A\ \vert \le\epsilon\ \vert A\ \vert $ and $\kappa _ 2(A)\ll\epsilon^{-1}$ then $\ \vert \hat x-x\ \vert /\ \vert x\ \vert \le\epsilon\kappa _ 2(A)+O(\epsilon^2)$. The proof is a Neumann-series expansion of $(A+\Delta A)^{-1}$; the hypothesis $\kappa _ 2(A)\ll\epsilon^{-1}$ is precisely what makes that series converge. This single theorem is why $\kappa _ 2$ is defined the way it is.

Source: Lecture 7, §7.4 (Theorem 7.1).

Stability of the building blocks

Vector–vector and matrix–vector products are backward stable; matrix–matrix multiplication is not (only a forward-error bound $\ \vert \mathrm{fl}(AB)-AB\ \vert \le\epsilon\ \vert A\ \vert \ \vert B\ \vert $, hence relative error $\le\epsilon\min(\kappa _ 2(A),\kappa _ 2(B))$). The redeeming exception is multiplication by an orthogonal matrix, which is backward stable since $\ \vert Q\ \vert _ 2=1$ — the reason every stable algorithm in the course is built from orthogonal transformations.

Source: Lecture 7, §7.5–7.7 (stability of triangular systems, LU, Cholesky, matrix multiplication, Householder QR); Problem Sheet 3 Q3.

Eigenvalue problems

The eigenvalue problem and the Schur decomposition

$Ax=\lambda x$ has no closed form — eigenvalues are roots of $\det(\lambda I-A)$ and by Abel–Galois no finite-step algorithm exists for $n\ge 5$ (companion-matrix argument), so all eigensolvers are iterative. The practical goal is the Schur form $A=UTU^\ast$ ($U$ unitary, $T$ upper triangular, $\mathrm{eig}(A)=\mathrm{diag}(T)$) because it is computable backward-stably via orthogonal similarities; $T$ is diagonal iff $A$ is normal, in which case Schur = the eigenvalue decomposition.

Source: Lecture 8, §8.1 (Theorem 8.1, Schur decomposition; Theorem 8.2, diagonal iff normal).

Power method, inverse iteration, Rayleigh quotient iteration

The power method $x\leftarrow Ax/\ \vert Ax\ \vert $ converges to the dominant eigenpair linearly at rate $ \vert \lambda _ 2/\lambda _ 1 \vert $. Shift-and-invert $(A-sI)^{-1}$ targets the eigenvalue closest to $s$ (one $LU$ up front at $O(n^3)$, then $O(n^2)$ per step); Rayleigh quotient iteration adapts $s _ k=x _ k^\top Ax _ k$ for quadratic (cubic if symmetric) convergence. This is the conceptual seed of the QR algorithm and Krylov methods.

Source: Lecture 8, §8.2 (power method); §8.2.2 (shifted inverse / Rayleigh quotient iteration).

The QR algorithm (non-examinable in MT25)

Factor–swap $A _ k=Q _ kR _ k$, $A _ {k+1}=R _ kQ _ k$ keeps $A _ k$ similar to $A$ and drives it to triangular (Schur) form; it is simultaneously the power method on the first column and inverse iteration on the last, so shifts ($s _ k=A _ {nn}$ or Wilkinson) plus a one-off Hessenberg/tridiagonal reduction and deflation give $O(n^3)$ ($\approx 25 n^3$) and $2$–$4$ iterations per eigenvalue. Non-examinable in 2025-26 (lecture notes §9–10) but the underlying mechanism is examinable conceptually.

Source: Lectures 9–10, §9–10 (QR algorithm, Hessenberg reduction, Golub–Kahan; non-examinable 2025-26).

Generalised eigenvalue problems and the QZ algorithm

$Ax=\lambda Bx$ generalises the standard problem; when $B\succ 0$ and $A$ symmetric a Cholesky $B=R^\top R$ reduces it to the symmetric standard problem $R^{-\top}AR^{-1}y=\lambda y$ (preserving symmetry, unlike $B^{-1}A$). The general case is solved by the QZ algorithm.

Source: Lecture 10, §10.3 (QZ algorithm; non-examinable); Problem Sheet 3 Q4 (examinable reduction).

Krylov subspace methods

Krylov subspaces and the polynomial viewpoint

Look for $\hat x=p _ {k-1}(A)v\in\mathcal K _ k(A,v)=\mathrm{span}(v,Av,\dots,A^{k-1}v)$, choosing the polynomial optimally for the problem (the power method is the special case $p(z)=z^{k-1}$). Convergence analysis reduces entirely to polynomial approximation on the spectrum; the naive basis $[v,Av,\dots]$ is hopelessly ill-conditioned, which is why Arnoldi/Lanczos build an orthonormal basis on the fly.

Source: Lecture 11, §11.1–11.2 (Krylov subspaces and the polynomial idea).

Arnoldi iteration and decomposition

Arnoldi builds an orthonormal Krylov basis by multiplying $A$ once and modified-Gram-Schmidt-orthogonalising, producing $AQ _ k=Q _ {k+1}\tilde H _ k$ with $H _ k=Q _ k^\top AQ _ k$ upper Hessenberg (the Rayleigh–Ritz compression of $A$). Cost $O(nk^2)$ and memory $O(nk)$ — it must keep all previous $q _ j$. A “happy breakdown” ($h _ {k+1,k}=0$) means the Krylov subspace is $A$-invariant: good news, the exact solution/eigenpairs are already inside.

Source: Lecture 11, §11.3 (Algorithm 11.1, Arnoldi iteration; Theorem 11.1).

GMRES

GMRES picks $x _ k=\arg\min _ {x\in\mathcal K _ k(A,b)}\ \vert Ax-b\ \vert _ 2$; via the Arnoldi decomposition this becomes the small Hessenberg least-squares problem $\min _ y\ \vert \tilde H _ ky-\ \vert b\ \vert _ 2 e _ 1\ \vert _ 2$, solved incrementally by Givens rotations with the residual norm read off for free. For diagonalisable $A=X\Lambda X^{-1}$, $\ \vert Ax _ k-b\ \vert _ 2\le\kappa _ 2(X)\min _ {p\in\mathcal P _ k,p(0)=1}\max _ {z\in\lambda(A)} \vert p(z) \vert \,\ \vert b\ \vert _ 2$: fast when eigenvalues are clustered away from $0$ or few-distinct, slow otherwise (hence preconditioning and restarting).

Source: Lecture 12, §12 (Theorem 12.1, GMRES convergence; preconditioning, restarting); Problem Sheets 3 Q7, 4 Q1.

Lanczos iteration and decomposition

For symmetric $A$ the Arnoldi Hessenberg collapses to a symmetric tridiagonal $T _ k$ (since $H _ k=Q _ k^\top AQ _ k$ is symmetric), giving the three-term recurrence $t _ {k+1,k}q _ {k+1}=(A-t _ {k,k}I)q _ k-t _ {k-1,k}q _ {k-1}$: cost $O(nk)$ and memory $O(n)$ instead of Arnoldi’s $O(nk^2)$/$O(nk)$. In floating point it suffers loss of orthogonality, fixed by reorthogonalisation.

Source: Lecture 13, §13.1 (Lanczos iteration and decomposition).

Conjugate gradient method

For $A\succ 0$, CG is the Krylov iterate $x _ k=\arg\min _ {x\in\mathcal K _ k(A,b)}\ \vert x-x _ \ast\ \vert _ A$, equivalently Galerkin orthogonality $Q _ k^\top(Ax _ k-b)=0$, equivalently the tridiagonal projection $T _ ky=Q _ k^\top b$. The practical recurrence keeps the search directions mutually $A$-conjugate (and the residuals orthogonal scaled Lanczos vectors) at $O(n)$ memory; the error contracts at the Chebyshev rate $2\big(\tfrac{\sqrt\kappa-1}{\sqrt\kappa+1}\big)^k$, slow only when $\kappa _ 2(A)$ is large (hence preconditioning).

Source: Lecture 13, §13.2–13.3 (Theorem 13.1 correctness, Theorem 13.2 convergence); Problem Sheets 4 Q2, Q3.

Chebyshev polynomials and the CG/GMRES rate

$T _ k(x)=\cos(k\arccos x)$ satisfies $ \vert T _ k \vert \le 1$ on $[-1,1]$ and grows fastest of all $\mathcal P _ k$ outside it; the shifted-scaled $p(x)=c _ kT _ k(\tfrac{2x-b-a}{b-a})$ with $p(0)=1$ minimises $\max _ {[a,b]} \vert p \vert $ and yields the $\sqrt\kappa$ in the CG bound (the $\sqrt{}$ coming from the change of variables $\tfrac12(z+z^{-1})=\tfrac{b+a}{b-a}$).

Source: Lecture 13, §13.3.1–13.3.2 (Chebyshev polynomials in the CG analysis).

Rayleigh–Ritz and the Lanczos eigenvalue algorithm

Rayleigh–Ritz extracts approximate eigenpairs from a subspace: eigendecompose $Q^\top AQ=V\hat\Lambda V^\top$, take Ritz values $\mathrm{diag}\hat\Lambda$ and Ritz vectors $QV$; Courant–Fischer gives the Cauchy-interlacing sandwich $\lambda _ {n-k+i}\le\hat\lambda _ i\le\lambda _ i$, exact iff $\mathrm{range}(Q)$ is $A$-invariant. The Lanczos algorithm = Lanczos iteration + Rayleigh–Ritz on the tridiagonal $T _ k$, converging fast to extremal eigenpairs (the Courant–Fischer sandwich beats the power method).

Source: Lecture 13, §13.6 (Algorithm 13.1 Rayleigh–Ritz, Algorithm 13.2 Lanczos algorithm).

Randomised numerical linear algebra

Gaussian random matrices

Two facts power all of randomised NLA: orthogonal invariance (if $G$ is Gaussian and $Q$ orthogonal independent of $G$, then $QG,GQ$ are Gaussian) and Marchenko–Pastur (rectangular random matrices are well-conditioned). Square Gaussians, by contrast, can be arbitrarily ill-conditioned ($\mathbb E[\sigma _ {\min}^{-2}]=\infty$) — the reason oversampling is needed.

Source: Lecture 14, §14.1.1 (orthogonal invariance of Gaussian matrices).

Marchenko–Pastur

The singular values of an $m\times n$ ($m\ge n$) iid mean-0 variance-1 matrix asymptotically fill $[\sqrt m-\sqrt n,\sqrt m+\sqrt n]$ with the MP density, so $\kappa _ 2\approx\frac{1+\sqrt{n/m}}{1-\sqrt{n/m}}=O(1)$ — “rectangular random matrices are well-conditioned”, with an $\exp(-ct^2)$ failure tail. Distribution-free (no Gaussianity needed for MP itself). This is the workhorse behind sketch-and-solve, Blendenpik and HMT.

Source: Lecture 14, §14.1.2 (Theorem 14.1, Marchenko–Pastur).

Sketch-and-solve and Blendenpik for least-squares

For tall $\min _ x\ \vert Ax-b\ \vert _ 2$, randomly mix the rows with a Gaussian sketch $G\in\mathbb R^{s\times m}$ ($s=cn$) and solve $\min _ x\ \vert G(Ax-b)\ \vert _ 2$: with high probability $\ \vert A\hat x-b\ \vert _ 2\le\frac{\sqrt s+\sqrt{n+1}}{\sqrt s-\sqrt{n+1}}\ \vert Ax _ \ast-b\ \vert _ 2$ ($\le 3\times$ at $s=4(n+1)$), via orthogonal invariance + MP. Blendenpik instead uses the sketch as a preconditioner — $GA=\hat Q\hat R$, then CG on $A\hat R^{-1}$, which MP makes $O(1)$-conditioned — for full-accuracy solutions in $O(mn\log m+n^3)$.

Source: Lecture 14, §14.2–14.5 (Theorem 14.2 sketch-and-solve, Theorem 14.3 Blendenpik); Problem Sheet 4 Q5.

Randomised SVD (HMT)

Halko–Martinsson–Tropp: draw Gaussian $G\in\mathbb R^{n\times r}$, $AG=QR$, return $\hat A=QQ^\top A$ — a rank-$r$ approximation in $O(mnr)$. It is near-optimal: $\mathbb E\ \vert A-\hat A\ \vert _ F\le\sqrt{1+\tfrac{\hat r}{r-\hat r-1}}\,\ \vert A-A _ {\hat r}\ \vert _ F$ for $\hat r<r-1$, the $\hat r<r$ “oversampling” being the artefact of square-Gaussian ill-conditioning. The proof writes the error as $(I-QQ^\top)(A-A _ {\hat r})(I-GM^\top)$ with $M=(V _ 1^\top G)^\dagger V _ 1^\top$ and bounds each factor by MP.

Source: Lecture 15, §15.1–15.2 (Algorithm 15.1 HMT, Theorem 15.1); §15.4 (generalised Nyström, non-examinable); Problem Sheet 4 Q4, Q6.

CUR approximation

$A\approx CUR$ with $C,R$ actual column/row submatrices and (the powerful choice) $U=A[I,J]^{-1}$ — computable without ever reading all of $A$, structure- and interpretability-preserving, and a special case of generalised Nyström with subsampling sketches. With $U=A[I,J]^{-1}$ the residual has the LU Schur-complement block structure. There exists a choice with $\ \vert A-CUR\ \vert _ F\le(r+1)\ \vert A-A _ r\ \vert _ F$ — dimension-free, no $\sqrt m,\sqrt n$; the course proves the weaker $\sqrt{(m-r)r+1}(\sqrt{(n-r)r+1}+1)$ bound (§16.1–16.2 non-examinable).

Source: Lecture 16, §16 (CUR definition, LU connection); §16.1–16.2 (existence theorems and proofs, non-examinable).

Supporting results (Useful miscellany)

Auxiliary identities reused inside the proofs above.

Source: Lecture 1 §1.7 / Lecture 7 §7.4 (Neumann series, projector identity); Problem Sheets 1, 3, 5.

Methods

LU / Cholesky direct solver

Factor once ($PA=LU$ at $\tfrac23 n^3$, or $A=R^\top R$ at $\tfrac13 n^3$ for $A\succ 0$), then solve $Ax=b$ by two triangular solves at $O(n^2)$; reuse the factorisation for extra right-hand sides. Default solver; backward stable up to the growth factor (LU) / unconditionally (Cholesky).

Source: Lectures 5, 7.

Householder / Givens QR and QR-based solvers

Orthogonal triangularisation produces $A=QR$ stably; then $Ax=b$ via $Rx=Q^\top b$ and least-squares via $\hat x=R^{-1}Q^\top b$. Givens variant targets sparse/Hessenberg structure. $2\times$ slower than LU but unconditionally backward stable.

Source: Lecture 6.

QR algorithm and Golub–Kahan SVD (non-examinable in MT25)

The universal dense eigensolver: Hessenberg/tridiagonal reduction, then shifted QR with deflation, $\approx 25 n^3$ (symmetric $\tfrac43 n^3$ for eigenvalues). Golub–Kahan adapts it to the SVD via bidiagonalisation.

Source: Lectures 9–10 (non-examinable 2025-26).

Arnoldi / GMRES (general $A$)

Build an orthonormal Krylov basis (Arnoldi), then minimise the residual over it (GMRES) by a small Hessenberg least-squares; preconditioning and restarting control cost. The general-matrix iterative solver.

Source: Lectures 11–12.

Lanczos / CG (symmetric $A$)

Symmetric specialisation: Lanczos’s three-term recurrence ($O(n)$ memory), then CG for $A\succ 0$ linear systems ($A$-norm minimisation, Chebyshev rate) or Rayleigh–Ritz for extremal eigenpairs (Lanczos algorithm). MINRES handles the indefinite case (non-examinable).

Source: Lecture 13.

Randomised methods

Sketching for the over-sized regime: sketch-and-solve / Blendenpik for least-squares, HMT (and generalised Nyström) for low-rank approximation, CUR for interpretable/sub-matrix low-rank. All rely on orthogonal invariance + Marchenko–Pastur.

Source: Lectures 14–16.