NLA MT25, GMRES algorithm


Flashcards

What is the main idea behind the GMRES algorithm for solving linear systems $Ax = b$?


Minimise the residual in the Krylov subspace:

\[x = \text{argmin} _ {x \in \mathcal K _ k(A, b)} \vert \vert Ax - b \vert \vert _ 2\]

\

@Describe the steps in the GMRES @algorithm for approximately solving linear systems $Ax = b$, and annotate each step with a flop cost.


Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

($O(nk^2)$ flops)

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

($O(k^2)$ flops)

The steps in the GMRES @algorithm for approximately solving linear systems $Ax = b$ are as follows:

Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

@Justify why this works.


We aim to minimise the residual in the Krylov subspace:

\[x = \text{argmin} _ {x \in \mathcal K _ k(A, b)} \vert \vert Ax - b \vert \vert _ 2\]

Write $x = Q _ k y$. Then

\[\begin{aligned} \min _ y \vert \vert AQ _ k y - b \vert \vert _ 2 &= \min _ y \vert \vert Q _ {k+1} \tilde H _ k y - b \vert \vert _ 2 \\ &= \min _ y \left \vert \left \vert \begin{bmatrix} \tilde H _ k \\ 0\end{bmatrix} y - \begin{bmatrix} Q _ {k+1}^\top \\ Q^\top _ {k+1} \end{bmatrix} b \right \vert \right \vert _ 2 \\ &= \min _ y \left \vert \left \vert \begin{bmatrix} \tilde H _ k \\ 0 \end{bmatrix} y\\ - \vert \vert b \vert \vert _ 2 e _ 1 \right \vert \right \vert _ 2 \end{aligned}\]

This is minimised when $ \vert \vert \tilde H _ k y - Q^\top _ {k+1}b \vert \vert _ 2$ is minimised. This a Hessenberg least-squares problem and can be solved efficiently via QR.

Recall the ∆gmres-algorithm:

Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

@State a theorem describing the convergence of the GMRES algorithm.


Suppose:

  • $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$

Then the $k$-th GMRES iterate $x _ k$ satisfies

\[\vert \vert Ax _ K - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]

Recall the ∆gmres-algorithm:

Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

@Prove that if:

  • $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$

then the $k$-th GMRES iterate $x _ k$ satisfies

\[\vert \vert Ax _ K - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]

Recall that $x _ k \in \mathcal K _ k(A, b)$ implies $x _ k = p _ {k-1}(A)b$ where $p _ {k-1}$ is a polynomial of degree at most $k-1$. Hence the GMRES solution is

\[\begin{aligned} \min _ {x _ k \in \mathcal K _ k(A, b)} \vert \vert Ax _ k - b\| _ 2 &= \min _ {p _ {k-1} \in \mathbb C _ {k-1}[t]} \vert \vert Ap _ {k-1}(A)b - b \vert \vert _ 2 \\ &= \min _ {\tilde p \in \mathbb C _ k[t], \tilde p(0) = 0} \vert \vert (\tilde p(A) - I)b \vert \vert _ 2 \\ &= \min _ {p \in \mathcal P _ k, p(0) = 1} \vert \vert p(A)b \vert \vert _ 2 \end{aligned}\]

As $A$ is diagonalisable, we have

\[\begin{aligned} \vert \vert p(A) \vert \vert _ 2 &= \vert \vert Xp(\Lambda)X^{-1} \vert \vert _ 2 \\ &\le \vert \vert X \vert \vert _ 2 \vert \vert X^{-1} \vert \vert _ 2 \vert \vert p(\Lambda) \vert \vert _ 2 \\ &= \kappa _ 2(X) \max _ {z \in \lambda (A)} \vert p(z) \vert \end{aligned}\]

Hence overall

\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]

Recall the ∆gmres-algorithm:

Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

We have the result that

  • $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$

then the $k$-th GMRES iterate $x _ k$ satisfies

\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]

Can you therefore describe the class of matrices $A$ for which the GMRES algorithm will work well?

In what two common situations might this occur?

@Visualise a figure comparing the


Matrices with eigenvalues $\lambda(A)$ where there is a polynomial $p$ with $p(0) = 1$ and $ \vert p \vert $ is small at the eigenvalues of $A$.

  1. If the eigenvalues of the matrix are clustered away from $0$, then such a polynomial is relatively easy to find.
  2. If the matrix has $k \ll n$ distinct eigenvalues, then you can set $p(z) = C \prod^k _ {i = 1}(z - \lambda _ i)$

What is the cost of GMRES for finding an approximate solution to $Ax = b$ in the order-$k$ Krylov subspace?


\[O(nk^2)\]

Preconditioning

Recall the ∆gmres-algorithm:

Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

We have the result that

  • $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$

then the $k$-th GMRES iterate $x _ k$ satisfies

\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]

The idea of preconditioning is to instead solve

\[MAx = Mb\]

Given the above analysis, what properties of $M$ would we like?


  • Applying $M$ to a vector should be easy, since GMRES requires matrix-vector multiplications
  • At least one of:
    1. $MA$ has clustered eigenvalues away from $0$
    2. $MA$ has a small number of distinct nonzero eigenvalues
    3. $MA$ is well-conditioned $\kappa _ 2(MA) = O(1)$

Recall the ∆gmres-algorithm:

Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

We have the result that

  • $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$

then the $k$-th GMRES iterate $x _ k$ satisfies

\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]

The idea of preconditioning is to instead solve

\[MAx = Mb\]

What is the ILU preconditioner in this context?


We have some approximate LU factorisation $A \approx LU$. Then $M = (LU)^{-1} = U^{-1} L^{-1}$. If $L$ and $U$ are as sparse as $A$, then $MA \approx I$.

Recall the ∆gmres-algorithm:

Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

We have the result that

  • $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$

then the $k$-th GMRES iterate $x _ k$ satisfies

\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]

The idea of preconditioning is to instead solve

\[MAx = Mb\]

Suppose

\[A = \begin{bmatrix} B & C \\ D & 0 \end{bmatrix}\]

What is one (very non-obvious) choice for a preconditioner of $A$ here, and why does it work?


\[M = \begin{bmatrix} A^{-1} & 0 \\ 0 & (CA^{-1} B)^{-1} \end{bmatrix}\]

Then if $M$ is nonsingular, $MA$ has eigenvalues ${1, \frac 1 2 (1 \pm \sqrt 5)}$ and so you have three-step convergence.

Recall the ∆gmres-algorithm:

Use the Arnoldi iteration to calculate $Q _ k$, $Q _ {k+1}$, $\tilde H _ k$ where:

  • $Q _ k$ is an orthonormal basis for $\mathcal K _ k(A, b)$
  • $Q _ {k+1}$ is an orthonormal basis for $\mathcal K _ {k+1}(A, b)$
  • $\tilde H _ k \in \mathbb R^{(k+1) \times k}$ is an upper Hessenberg matrix
  • $AQ _ k = Q _ {k+1} \tilde H _ k$

Then solve the Hessenberg least-squares problem

\[\min _ y \vert \vert \tilde H _ k y -Q^\top _ {k+1} b \vert \vert _ 2\]

and return $\hat x = y$.

We have the result that

  • $A$ is a diagonalisable matrix, so that $A = X \Lambda X^{-1}$

then the $k$-th GMRES iterate $x _ k$ satisfies

\[\vert \vert Ax _ k - b \vert \vert _ 2 \le \kappa _ 2(X) \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {z \in \lambda(A)} \vert p(z) \vert \, \vert \vert b \vert \vert _ 2\]

The idea of preconditioning is to instead solve

\[MAx = Mb\]

In what sense can preconditioning be viewed as approximating the inverse of $A$?


The ideal preconditioner would be $M = A^{-1}$, since then

\[MAx = Mb \implies x = A^{-1} b\]

Restarting

Suppose we want to solve $Ax = b$. @Describe the restarting technique for GMRES. Why is it useful?


  • Stop GMRES after $k _ \max$ iterations to get an approximate solution $\hat x _ 1$
  • Solve $A \tilde x = b - A \hat x _ 1$ via GMRES
  • Obtain a solution $\hat x _ 1 + \tilde x$



Related posts