NLA MT25, GMRES algorithm
Flashcards
What is the main idea behind the GMRES algorithm for solving linear systems $Ax = b$?
Minimise the residual in the Krylov subspace:
\[x = \text{argmin} _ {x \in \mathcal K _ k(A, b)} \vert \vert Ax - b \vert \vert _ 2\]\
@Describe the steps in the GMRES @algorithm for approximately solving linear systems $Ax = b$, and annotate each step with a flop cost.
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
($O(nk^2)$ flops)
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]and return $\hat x = y$.
($O(k^2)$ flops)
The steps in the GMRES @algorithm for approximately solving linear systems $Ax = b$ are as follows:
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]
and return $\hat x = y$.
@Justify why this works.
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
and return $\hat x = y$.
We aim to minimise the residual in the Krylov subspace:
\[x = \text{argmin} _ {x \in \mathcal K _ k(A, b)} \vert \vert Ax - b \vert \vert _ 2\]Write $x = Q _ k y$. Then
\[\begin{aligned} \min _ y \vert \vert AQ _ k y - b \vert \vert _ 2 &= \min _ y \vert \vert Q _ {k+1} \tilde H _ k y - b \vert \vert _ 2 \\ &= \min _ y \left \vert \left \vert \begin{bmatrix} \tilde H _ k \\ 0\end{bmatrix} y - \begin{bmatrix} Q _ {k+1}^\top \\ Q^\top _ {k+1} \end{bmatrix} b \right \vert \right \vert _ 2 \\ &= \min _ y \left \vert \left \vert \begin{bmatrix} \tilde H _ k \\ 0 \end{bmatrix} y\\ - \vert \vert b \vert \vert _ 2 e _ 1 \right \vert \right \vert _ 2 \end{aligned}\]This is minimised when $ \vert \vert \tilde H _ k y - Q^\top _ {k+1}b \vert \vert _ 2$ is minimised. This a Hessenberg least-squares problem and can be solved efficiently via QR.
Recall the ∆gmres-algorithm:
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]
and return $\hat x = y$.
@State a theorem describing the convergence of the GMRES algorithm.
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
and return $\hat x = y$.
Suppose:
- $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$
Then the $k$-th GMRES iterate $x _ k$ satisfies
\[\vert \vert Ax _ K - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]Recall the ∆gmres-algorithm:
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]
and return $\hat x = y$.
@Prove that if:
- $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$
then the $k$-th GMRES iterate $x _ k$ satisfies
\[\vert \vert Ax _ K - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
and return $\hat x = y$.
Recall that $x _ k \in \mathcal K _ k(A, b)$ implies $x _ k = p _ {k-1}(A)b$ where $p _ {k-1}$ is a polynomial of degree at most $k-1$. Hence the GMRES solution is
\[\begin{aligned} \min _ {x _ k \in \mathcal K _ k(A, b)} \vert \vert Ax _ k - b\| _ 2 &= \min _ {p _ {k-1} \in \mathbb C _ {k-1}[t]} \vert \vert Ap _ {k-1}(A)b - b \vert \vert _ 2 \\ &= \min _ {\tilde p \in \mathbb C _ k[t], \tilde p(0) = 0} \vert \vert (\tilde p(A) - I)b \vert \vert _ 2 \\ &= \min _ {p \in \mathcal P _ k, p(0) = 1} \vert \vert p(A)b \vert \vert _ 2 \end{aligned}\]As $A$ is diagonalisable, we have
\[\begin{aligned} \vert \vert p(A) \vert \vert _ 2 &= \vert \vert Xp(\Lambda)X^{-1} \vert \vert _ 2 \\ &\le \vert \vert X \vert \vert _ 2 \vert \vert X^{-1} \vert \vert _ 2 \vert \vert p(\Lambda) \vert \vert _ 2 \\ &= \kappa _ 2(X) \max _ {z \in \lambda (A)} \vert p(z) \vert \end{aligned}\]Hence overall
\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]Recall the ∆gmres-algorithm:
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]
and return $\hat x = y$.
We have the result that
- $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$
then the $k$-th GMRES iterate $x _ k$ satisfies
\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]
Can you therefore describe the class of matrices $A$ for which the GMRES algorithm will work well?
In what two common situations might this occur?
@Visualise a figure comparing the
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
and return $\hat x = y$.
Matrices with eigenvalues $\lambda(A)$ where there is a polynomial $p$ with $p(0) = 1$ and $ \vert p \vert $ is small at the eigenvalues of $A$.
- If the eigenvalues of the matrix are clustered away from $0$, then such a polynomial is relatively easy to find.
- If the matrix has $k \ll n$ distinct eigenvalues, then you can set $p(z) = C \prod^k _ {i = 1}(z - \lambda _ i)$

What is the cost of GMRES for finding an approximate solution to $Ax = b$ in the order-$k$ Krylov subspace?
Preconditioning
Recall the ∆gmres-algorithm:
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]
and return $\hat x = y$.
We have the result that
- $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$
then the $k$-th GMRES iterate $x _ k$ satisfies
\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]
The idea of preconditioning is to instead solve
\[MAx = Mb\]
Given the above analysis, what properties of $M$ would we like?
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
and return $\hat x = y$.
- Applying $M$ to a vector should be easy, since GMRES requires matrix-vector multiplications
- At least one of:
- $MA$ has clustered eigenvalues away from $0$
- $MA$ has a small number of distinct nonzero eigenvalues
- $MA$ is well-conditioned $\kappa _ 2(MA) = O(1)$
Recall the ∆gmres-algorithm:
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]
and return $\hat x = y$.
We have the result that
- $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$
then the $k$-th GMRES iterate $x _ k$ satisfies
\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]
The idea of preconditioning is to instead solve
\[MAx = Mb\]
What is the ILU preconditioner in this context?
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
and return $\hat x = y$.
We have some approximate LU factorisation $A \approx LU$. Then $M = (LU)^{-1} = U^{-1} L^{-1}$. If $L$ and $U$ are as sparse as $A$, then $MA \approx I$.
Recall the ∆gmres-algorithm:
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]
and return $\hat x = y$.
We have the result that
- $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$
then the $k$-th GMRES iterate $x _ k$ satisfies
\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]
The idea of preconditioning is to instead solve
\[MAx = Mb\]
Suppose
\[A = \begin{bmatrix}
B & C \\
D & 0
\end{bmatrix}\]
What is one (very non-obvious) choice for a preconditioner of $A$ here, and why does it work?
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
and return $\hat x = y$.
Then if $M$ is nonsingular, $MA$ has eigenvalues ${1, \frac 1 2 (1 \pm \sqrt 5)}$ and so you have three-step convergence.
Recall the ∆gmres-algorithm:
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]
and return $\hat x = y$.
We have the result that
- $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$
then the $k$-th GMRES iterate $x _ k$ satisfies
\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]
The idea of preconditioning is to instead solve
\[MAx = Mb\]
In what sense can preconditioning be viewed as approximating the inverse of $A$?
Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:
- $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
- $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
- $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
- $AQ _ k = Q _ {k+1} \tilde H _ k$
Then solve the Hessenberg least-squares problem
and return $\hat x = y$.
The ideal preconditioner would be $M = A^{-1}$, since then
\[MAx = Mb \implies x = A^{-1} b\]Restarting
Suppose we want to solve $Ax = b$. @Describe the restarting technique for GMRES. Why is it useful?
- Stop GMRES after $k _ \max$ iterations to get an approximate solution $\hat x _ 1$
- Solve $A \tilde x = b - A \hat x _ 1$ via GMRES
- Obtain a solution $\hat x _ 1 + \tilde x$