NLA MT25, Arnoldi decomposition and iteration
Flashcards
@State (and @visualise) the Arnoldi decomposition for $A \in \mathbb R^{n \times n}$ and a vector $b \in \mathbb R^n$.
The Arnoldi iteration produces matrices $Q _ k = [q _ 1 \mid \cdots \mid q _ k] \in \mathbb R^{n \times k}$ and $Q _ {k+1} = [Q _ k \mid q _ {k+1}] \in \mathbb R^{n \times (k+1)}$ with orthonormal columns spanning $\mathcal K _ k(A, b) = \text{span}(b, Ab, \ldots, A^{k-1}b)$ and $\mathcal K _ {k+1}(A, b)$ respectively, plus an upper Hessenberg matrix $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ satisfying
\[AQ _ k = Q _ {k+1} \tilde H _ k\]where
\[\tilde H _ k = \begin{bmatrix} h _ {1,1} & h _ {1,2} & \cdots & h _ {1,k} \\ h _ {2,1} & h _ {2,2} & \cdots & h _ {2,k} \\ & \ddots & \ddots & \vdots \\ & & h _ {k,k-1} & h _ {k,k} \\ & & & h _ {k+1,k} \end{bmatrix}\]and $Q _ {k+1}^\top Q _ {k+1} = I _ {k+1}$.
Schematically:
\[\underbrace{\Big[\;A\;\Big]} _ {n \times n} \underbrace{\Big[\;Q _ k\;\Big]} _ {n \times k} \;=\; \underbrace{\Big[\;Q _ {k+1}\;\Big]} _ {n \times (k+1)} \underbrace{\Big[\;\tilde H _ k\;\Big]} _ {(k+1) \times k}.\]Equivalent “split” form, separating the in-basis part from the new direction $q _ {k+1}$:
\[A Q _ k = Q _ k H _ k + h _ {k+1, k} \, q _ {k+1} e _ k^\top,\]where $H _ k \in \mathbb R^{k \times k}$ is the top $k \times k$ block of $\tilde H _ k$ (also upper Hessenberg, and equal to $Q _ k^\top A Q _ k$), and the rank-1 correction captures the one new direction added at step $k$.
What the Hessenberg factor represents: $H _ k = Q _ k^\top A Q _ k$ is the compression of $A$ onto the Krylov subspace — the orthogonal (Rayleigh-Ritz) projection of $A$ expressed in the basis $Q _ k$, with entries $h _ {ij} = q _ i^\top A q _ j$ (obtained for free during the modified Gram-Schmidt step). Its eigenvalues are the Ritz values, the Arnoldi approximations to $\text{eig}(A)$. It is upper Hessenberg precisely because Arnoldi orthogonalises $A q _ j$ only against $q _ 1, \ldots, q _ j$ (so $h _ {ij} = 0$ for $i > j + 1$); for symmetric $A$ it collapses to tridiagonal — the Lanczos case (∆lanczos-decomposition). The extra subdiagonal row $h _ {k+1,k} e _ k^\top$ in $\tilde H _ k$ measures how far $A$ pushes $\mathcal K _ k$ into the new direction $q _ {k+1}$: it vanishes exactly at a ∆happy-breakdown, and is what lets GMRES reduce residual minimisation to a small Hessenberg least-squares (∆bite-arnoldi-Hk-role).
@Describe the Arnoldi iteration. Suppose:
- $A \in \mathbb R^{n \times n}$
- $b \in \mathbb R^n$
State the @algorithm, the meaning of each step, the outputs, and the time and space.
The Arnoldi iteration constructs an orthonormal basis of the Krylov subspace $\mathcal K _ r(A, b) = \text{span}(b, Ab, \ldots, A^{r-1} b)$ together with an upper Hessenberg matrix $H _ r = Q _ r^\top A Q _ r$, by multiplying $A$ once at a time and orthogonalising each new vector against all previous basis vectors via modified Gram-Schmidt.
Algorithm:
- Set $q _ 1 = b/\|b\| _ 2$.
- For $k = 1, 2, \ldots, r$:
- Set $v = Aq _ k$ (apply $A$ to the latest basis vector).
- For $j = 1, 2, \ldots, k$ (modified Gram-Schmidt against all previous $q _ j$):
- $h _ {j, k} =q _ j^\top v$ (the $(j,k)$ entry of $H _ r$).
- $v = v - h _ {j,k} q _ j$ (orthogonalise against $q _ j$).
- $h _ {k+1, k} = \|v\| _ 2$ (subdiagonal entry).
- If $h _ {k+1, k} = 0$, this is a ∆happy-breakdown and the Krylov subspace is $A$-invariant, and we should stop. Otherwise,
- $q _ {k+1} = v/h _ {k+1, k}$.
Outputs, after $r$ loop iterations:
- $Q _ {r+1} = [q _ 1 \mid \cdots \mid q _ {r+1}] \in \mathbb R^{n \times (r+1)}$, orthonormal columns spanning $\mathcal K _ {r+1}(A, b)$
- $\tilde H _ r \in \mathbb R^{(r+1) \times r}$ upper Hessenberg.
Together these outputs satisfy the ∆arnoldi-decomposition:
\[AQ _ r = Q _ {r+1} \tilde H _ r,\]where $Q _ r = [q _ 1 \mid \cdots \mid q _ r]$ is the first $r$ columns of $Q _ {r+1}$. The top $r \times r$ block of $\tilde H _ r$ is $H _ r = Q _ r^\top A Q _ r$. The entries $h _ {j, k}$ are computed as a side effect of the Gram-Schmidt step, where we see $h _ {j,k} = q _ j^\top A q _ k$ comes from the inner product and $h _ {k+1, k} = \|v\| _ 2$ from the normalisation.
Time:
- $r$ matrix-vector products with $A$: $O(rn^2)$
- $O(r^2)$ inner products of length $n$ for the orthogonalisation
- Total: $O(nr^2)$
Space:
- $O(nr)$ needed to store $Q _ {r+1}$, since each new $Aq _ k$ must be orthogonalised against all previous $q _ j$.
- (Compare to ∆lanczos-iteration, which only requires $O(n)$ memory due to ∆lanczos-three-term-recurrence).
What is the purpose of the Arnoldi iteration?
Given a matrix $A \in \mathbb R^{n \times n}$ and initial vector $v$, find an orthonormal basis $[q _ 1, \ldots, q _ r]$ of its Krylov subspace $\mathcal K _ r (A, v) = \text{span}(v, Av, \ldots, A^{r-1} v)$.
The Arnoldi iteration finds this orthonormal subspace in a much more intelligent way than e.g. a naïve QR factorisation applied to $[v \mid Av \mid \cdots \mid A^{r-1} v]$, since forming this and then orthogonalising can be very poor for conditioning.
Arnoldi also gives you the useful Hessenberg projection as it goes,
\[A Q _ k = Q _ k H _ k + q _ {k+1} \, h _ {k+1,k} \, e _ k^\top, \quad H _ k = Q _ k^\top A Q _ k \text{ is upper Hessenberg}.\]Suppose we have a matrix $A \in \mathbb R^{n \times n}$ and an initial vector $v$. The Arnoldi iteration finds an orthonormal basis $[q _ 1, \ldots, q _ r]$ of its Krylov subspace $\mathcal K _ r(A, v)$. @State the full outputs of this algorithm and what each one represents.
After $r$ steps, Arnoldi outputs:
- $Q _ r = [q _ 1 \mid \cdots \mid q _ r] \in \mathbb R^{n \times r}$. This is an orthonormal basis matrix. Its columns are an orthonormal basis of the Krylov subspace, $\text{span}(Q _ r) = \mathcal K _ r(A, v) = \text{span}(v, Av, \ldots, A^{r-1} v)$, so $Q _ r^\top Q _ r = I _ r$. (Note $Q _ r$ is tall and skinny when $r < n$, so $Q _ r Q _ r^\top \ne I _ n$ — it’s the orthogonal projector onto $\mathcal K _ r(A, v)$.)
- $H _ r \in \mathbb R^{r \times r}$ upper Hessenberg. This is the projection of $A$ onto the Krylov subspace, expressed in the basis $Q _ r$: $H _ r = Q _ r^\top A Q _ r$. The $(j, k)$ entries are $h _ {jk} = q _ j^\top A q _ k$, computed as a side effect of the Gram-Schmidt step in the algorithm.
- $q _ {r+1} \in \mathbb R^n$ and $h _ {r+1, r} \in \mathbb R$. The next Arnoldi vector and the subdiagonal entry. Together with $H _ r$ and $Q _ r$, these encode the Arnoldi decomposition: with $Q _ {r+1} = [Q _ r \mid q _ {r+1}]$ and $\tilde H _ r = \begin{bmatrix}H _ r \\ h _ {r+1, r} e _ r^\top\end{bmatrix} \in \mathbb R^{(r+1) \times r}$ ,
Suppose:
- $A \in \mathbb R^{n \times n}$
- $b \in \mathbb R^n$
The Arnoldi iteration for computing an orthonormal basis $[q _ 1, \ldots, q _ {r}]$ of its Krylov subspace is given by:
- Let $q _ 1 = b / \vert \vert b \vert \vert _ 2$
- For $k = 1, 2, \ldots, r-1$:
- Set $v = Aq _ k$
- For $j = 1, 2, \ldots, k$
- $h _ {jk} = q _ j^\top v$
- $v = v - h _ {jk} q _ j$
- $h _ {k+1, k} = \vert \vert v \vert \vert _ 2$
- $q _ {k+1} = v/h _ {k+1, k}$
and outputs
- $Q _ r = [q _ 1 \mid \cdots \mid q _ r]$ s.t. $\text{span}(q _ 1, \ldots, q _ r) = \mathcal K _ r(A, v)$
- $H _ r$ upper Hessenberg
- $H _ r = Q _ r^\top A Q _ r$
Suppose that $h _ {k+1, k} \ne 0$ for $k = 1, \ldots, \ell$.
@Prove that then for $k = 1, \ldots, \ell$, we have
\[\text{span}(q _ 1, \ldots, q _ k) = \mathcal K _ k(A, b)\]
- Set $v = Aq _ k$
- For $j = 1, 2, \ldots, k$
- $h _ {jk} = q _ j^\top v$
- $v = v - h _ {jk} q _ j$
- $h _ {k+1, k} = \vert \vert v \vert \vert _ 2$
- $q _ {k+1} = v/h _ {k+1, k}$
Induct on $\ell$.
Base case $\ell = 1$: Then $q _ 1 = q _ \ell = p _ {\ell-1}(A)b$ where $p _ {\ell-1}$ is a polynomial of degree $\ell - 1 = 0$, so it is a constant.
Inductive step: Suppose that the result holds for $k = 1, \ldots, \ell$. Then for $\ell + 1$, we obtain
\[q _ {\ell + 1} = \frac{1}{h _ {\ell + 1, \ell}}(Aq _ \ell - \sum^\ell _ {j=1} h _ {j, \ell} q _ j)\]which has exact degree $\ell$, and hence lies in the order-$\ell+1$ Krylov subspace.
Suppose:
- $A \in \mathbb R^{n \times n}$
- $b \in \mathbb R^n$
The Arnoldi iteration for computing an orthonormal basis $[q _ 1, \ldots, q _ {r}]$ of its Krylov subspace is given by:
- Let $q _ 1 = b / \vert \vert b \vert \vert _ 2$
- For $k = 1, 2, \ldots, r-1$:
- Set $v = Aq _ k$
- For $j = 1, 2, \ldots, k$
- $h _ {jk} = q _ j^\top v$
- $v = v - h _ {jk} q _ j$
- $h _ {k+1, k} = \vert \vert v \vert \vert _ 2$
- $q _ {k+1} = v/h _ {k+1, k}$
and outputs
- $Q _ r = [q _ 1 \mid \cdots \mid q _ r]$ s.t. $\text{span}(q _ 1, \ldots, q _ r) = \mathcal K _ r(A, v)$
- $H _ r$ upper Hessenberg
- $H _ r = Q _ r^\top A Q _ r$
What is meant by a “happy breakdown” in this algorithm, and why is it happy?
- Set $v = Aq _ k$
- For $j = 1, 2, \ldots, k$
- $h _ {jk} = q _ j^\top v$
- $v = v - h _ {jk} q _ j$
- $h _ {k+1, k} = \vert \vert v \vert \vert _ 2$
- $q _ {k+1} = v/h _ {k+1, k}$
A happy breakdown occurs at step $k$ when
\[h _ {k+1, k} = \|v\| _ 2 = 0\]i.e. when the residual $v = Aq _ k - \sum^k _ {j=1} h _ {jk} q _ j$ vanishes after Gram-Schmidt orthogonalisation. The algorithm cannot proceed because the next vector $q _ {k+1} = v / h _ {k+1, k}$ requires dividing by zero.
Why this happens: $h _ {k+1, k} = 0$ means $Aq _ k \in \text{span}(q _ 1, \ldots, q _ k) = \mathcal K _ k(A, b)$. Since the Krylov subspace is closed under $A$ at this step, it must be $A$-invariant.
\[A \mathcal K _ k(A, b) \subseteq \mathcal K _ k(A,b)\]Why this is happy: Invariance is good news for downstream Krylov methods!
- For linear systems $Ax = b$ via GMRES, this implies the exact solution $x^\ast = A^{-1} b$ lies in $\mathcal K _ k(A, b)$ (since $A$-invariance plus $b \in \mathcal K _ k$ implies $A^{-1} b \in \mathcal K _ k$ when $A$ is nonsingular). Hence GMRES at step $k$ returns the exact solution.
- For eigenvalue problems $Ax = \lambda x$ via Lanczos, $\mathcal K _ k(A, b)$ being $A$-invariant means the eigenvalues of $H _ k$ are exact eigenvalues of $A$.
What is the cost of the Arnoldi iteration for finding an orthogonal basis for $\mathcal K _ k({A, b})$?
Bite-sized
The naive way to find an orthonormal basis for $\mathcal K _ k(A, b)$ is to form $K = [b, Ab, \ldots, A^{k-1} b]$ and QR it. This is terrible because $K$ is dominated by the leading eigenvector (as in the power method), so $K$ is extremely ill-conditioned. Arnoldi avoids this by orthogonalising each $Aq _ i$ as soon as it is computed.
What is $H _ k := Q _ k^\top A Q _ k$, the upper-left block of $\tilde H _ k$ from Arnoldi, used for?
$H _ k$ is the upper Hessenberg projection of $A$ onto the Krylov subspace expressed in the orthonormal basis $Q _ k$. Its eigenvalues are the Ritz values, which approximate eigenvalues of $A$ (used in Arnoldi-based eigenvalue algorithms); its structure also lets GMRES/MINRES reduce the Krylov-subspace least-squares to a small Hessenberg least-squares.
Each Arnoldi step costs one matrix-vector product $A q _ k$ plus $O(k)$ inner products and axpys for orthogonalising against $q _ 1, \ldots, q _ k$. Over $r$ steps this totals $r$ matvecs plus $O(n r^2)$ orthogonalisation work.
Why does Arnoldi need to store all previous $q _ j$, while Lanczos only needs the last two?
Because in general $H _ k$ is upper Hessenberg with all $k$ entries possibly nonzero in column $k$, so orthogonalising the new $Aq _ k$ requires inner products against every previous $q _ j$. When $A$ is symmetric, $H _ k$ becomes tridiagonal, so only $q _ k$ and $q _ {k-1}$ contribute — the three-term Lanczos recurrence — and earlier $q _ j$ can be discarded.
Strategy for proving the Arnoldi vectors span the Krylov subspace (∆arnoldi-spans-krylov-proof).
- Induct on $k$. Base case: $q _ 1 = b / \|b\| _ 2$ is a degree-$0$ polynomial in $A$ applied to $b$, so spans $\mathcal K _ 1$.
- Inductive step: rearrange the algorithm’s normalisation step to express $q _ {k+1}$ as $\tfrac{1}{h _ {k+1,k}}(Aq _ k - \sum _ {j \le k} h _ {jk} q _ j)$. By the inductive hypothesis each $q _ j$ is a polynomial of degree $\le j - 1$, so $q _ {k+1}$ is a polynomial of degree exactly $k$ in $A$ applied to $b$.
- A degree-$k$ polynomial in $A$ acting on $b$ lies in $\mathcal K _ {k+1}$, completing the induction.