NLA MT25, Krylov subspace methods


Flashcards

Suppose we are solving linear systems $Ax = b$ or eigenvalue problems $Ax = \lambda x$. What is the general idea of Krylov subspace methods?

Look for approximate solutions $\hat x \in \mathcal K _ {r}(A, v)$, where

\[\mathcal K _ r(A, v) = \text{span}(v, Av, \ldots, A^{r-1} v)\]

where $v \in \mathbb R^n$, or equivalently $\hat x = p _ {r-1}(A)v$, where $p \in \mathbb C _ {r-1}[t]$ is a degree $r-1$ polynomial.

Why is this useful?

  • Only matrix-vector products are needed to form this basis,
  • The polynomial in $A^i$ is chosen optimally for the problem (compare to the power method, which fixes $p(z) = z^{r-1}$).
  • Often only $O(1)$ iterations suffice when the spectrum is well-behaved.

@Define the order-$r$ Krylov subspace $\mathcal K _ r(A, v)$.

\[\mathcal K _ r(A, v) = \text{span}(v, Av, \ldots, A^{r-1} v)\]

@exam~

Orthonormal bases of Krylov subspaces

This is worth considering because orthonormal bases of Krylov subspaces are very useful in iterative algorithms, such as the Notes - NLA MT25, GMRES algorithmU. A better approach than using a generic approach (e.g. QR decomposition on a matrix of columns) applied to $\{v, Av, \ldots, A^{r-1}v\}$ is given in Notes - NLA MT25, Arnoldi decomposition and iterationU.

Why is it useful to have an orthonormal basis of the Krylov subspace $\mathcal K _ r(A, v)$, rather than any basis (e.g. $[v, Av, \ldots, A^{r-1}v]$ itself)?

Many Krylov methods (e.g. ∆gmres-algorithm, ∆conjugate-gradient-algorithm, Arnoldi-eigenvalue) seek approximate solutions in the form $\hat x = Q _ r y \in \mathcal K _ r(A, v)$. For this to be efficient and stable, we want $Q _ r$ to be orthonormal.

Consider the order-$r$ Krylov subspace $\mathcal K _ r(A, v)$.

Orthonormal bases of these subspaces turn out to be very useful in iterative algorithms. What’s a naïve way of computing an orthonormal basis of $\mathcal K _ r(A, v)$, and why is this naïve?

Form the matrix

\[K = \Bigg[ v \, \Bigg \vert \, Av \, \Bigg \vert \, \cdots \, \Bigg \vert \, A^{r-1} v \Bigg]\]

and orthogonalise, e.g. by finding the QR decomposition $K = QR$. This is naïve, because $K$ is typically very poorly conditioned (its columns increasingly align with dominant eigenvector of $A$), and so the computed $\hat Q$ loses orthogonality in floating-point arithmetic.

Bite-sized

The power method is a special case of a Krylov method with $\hat x = A^{k-1} v = p _ {k-1}(A) v$ where $p _ {k-1}(z) = $ $z^{k-1}$. Krylov methods improve on this by choosing the polynomial $p _ {k-1}$ optimally for the problem (residual minimisation, $A$-norm error minimisation, etc.), not just as a monomial.

Source: Lecture 11, Basic idea of Krylov: polynomial approximation slide and §11.1 of the lecture notes.

@bite~

Every vector in $\mathcal K _ k(A, v)$ can be written as $\hat x = $ $p _ {k-1}(A) v$ for some polynomial $p _ {k-1}$ of degree at most $k - 1$. This polynomial viewpoint is what makes the convergence analysis of Krylov methods tractable (everything reduces to polynomial approximation on the spectrum of $A$).

Source: Lecture 11, Basic idea of Krylov: polynomial approximation slide and §11.1 of the lecture notes; Problem Sheet 3, Q2(a).

@bite~

The Krylov subspace is shift-invariant: $\mathcal K _ k(A - sI, v) = $ $\mathcal K _ k(A, v)$. This is why shift-and-invert Arnoldi changes only the matrix being inverted, not the subspace structure.

Source: Lecture 13, §13.7 of the lecture notes (“Arnoldi for nonsymmetric eigenvalue problems”, non-examinable but the shift-invariance is a clean fact).

@bite~

When does a Krylov subspace method converge in $O(1)$ iterations independent of the matrix size $n$?

When the spectrum of $A$ is “well-behaved” for polynomial approximation: either (a) eigenvalues clustered away from 0, so a polynomial $p(z) = (1 - z/c)^k$ with $ \vert c \vert > r$ shrinks geometrically on $\lambda(A)$; or (b) only a small number $k \ll n$ of distinct eigenvalues, so $p(z) = \prod _ i (z - \lambda _ i)/(0 - \lambda _ i)$ vanishes at every eigenvalue and convergence terminates in $\le k$ steps. In both cases the rate doesn’t depend on $n$ — the practical appeal of Krylov methods on large sparse problems.

Source: Lecture 12, When does GMRES converge fast? slide and §12.2 of the lecture notes.

@bite~