NLA MT25, Conjugate gradient method
- [[Course - Numerical Linear Algebra MT25]]U
- Does the conjugate gradient method only work when the matrix is symmetric?
Flashcards
What is the main idea of the conjugate gradient method for solving positive definite linear systems $Ax = b$?
Solve instead
\[Q _ k^\top (AQ _ k y - b) = 0\]or equivalently
\[T _ k y - Q _ k^\top b = 0\]where $x = Q _ k y$, and $Q _ k, T _ k$ come from the Lanczos decomposition
\[A Q _ k = Q _ k T _ k + q _ {k+1} [0, \ldots, 0, t _ {k+1, k}]\]@Define Galerkin orthogonality in the context of the conjugate gradient method.
The requirement that
\[Q _ k^\top (AQ _ k y - b) = 0\]i.e. that the residual $Ax - b$ is orthogonal to $Q _ k$.
In the conjugate gradient method, we solve positive definite linear systems $Ax = b$ by instead solving
\[Q _ k^\top (AQ _ k y - b) = 0\]
where
\[AQ _ k = Q _ k T _ k + q _ {k+1} [0, \ldots, 0, t _ {k+1, k}]\]
and $T _ k$ is symmetric tridiagonal. What is the motivation behind this?
The CG algorithm minimises the $A$-norm of the error in the Krylov subspace
\[\begin{aligned} x _ k &= \text{argmin} _ {x \in Q _ k} \vert \vert x - x _ \ast \vert \vert _ A \\ \end{aligned}\]Writing $x _ k = Q _ k y$, we have
\[\begin{aligned} (x _ k - x _ \ast)^\top A (x _ k - x _ \ast) &= (Q _ k y - x _ \ast)^\top A (Q _ k y - x _ \ast) \\ &= y^\top (Q _ k^\top A Q _ k) y - 2b^\top Q _ k y + b^\top x _ \ast \end{aligned}\]Differentiating with respect to $y$, we obtain the minimum as
\[y = (Q _ k^\top A Q _ k)^{-1} Q _ k^\top b\]so $Q _ k^\top (AQ _ k y - b) = 0$.
Another way to see this is via the results in [[Notes - Numerical Analysis HT24, Best approximation in inner product spaces]]; the minimum in the $A$-norm will be attained when $\langle q _ i, x _ \ast - x\rangle _ A = 0$ for all $q _ i$.
@justify~
@State the CG @algorithm for solving positive definite linear systems $Ax = b$.
- Initialisation
- $x _ 0 = 0$
- $r _ 0 = -b$
- $p _ 0 = r _ 0$
- For $k = 1, 2, 3, \ldots$
- $\alpha _ k = \langle r _ k, r _ k \rangle / \langle p _ k, Ap _ k \rangle$
- $x _ {k+1} = x _ k + \alpha _ k p _ k$
- $r _ {k+1} = r _ k - \alpha _ k A p _ k$
- $\beta _ k = \langle r _ {k+1}, r _ {k+1} \rangle / \langle r _ k, r _ k \rangle$
- $p _ {k+1} = r _ {k+1} + \beta _ k p _ k$
where $r _ k = b - Ax _ k$ is the residual and $p _ k$ is the search direction.
In the conjugate gradient method, we solve positive definite linear systems $Ax = b$ by instead solving
\[Q _ k^\top (AQ _ k y - b) = 0\]
where
\[AQ _ k = Q _ k T _ k + q _ {k+1} [0, \ldots, 0, t _ {k+1, k}]\]
and $T _ k$ is symmetric tridiagonal.
@Prove that this may be implemented via the algorithm:
- Initialisation
- $x _ 0 = 0$
- $r _ 0 = -b$
- $p _ 0 = r _ 0$
- For $k = 1, 2, 3, \ldots$
- $\alpha _ k = \langle r _ k, r _ k \rangle / \langle p _ k, Ap _ k \rangle$
- $x _ {k+1} = x _ k + \alpha _ k p _ k$
- $r _ {k+1} = r _ k - \alpha _ k A p _ k$
- $\beta _ k = \langle r _ {k+1}, r _ {k+1} \rangle / \langle r _ k, r _ k \rangle$
- $p _ {k+1} = r _ {k+1} + \beta _ k p _ k$
where $r _ k = b - Ax _ k$ is the residual and $p _ k$ is the search direction.
- $x _ 0 = 0$
- $r _ 0 = -b$
- $p _ 0 = r _ 0$
- $\alpha _ k = \langle r _ k, r _ k \rangle / \langle p _ k, Ap _ k \rangle$
- $x _ {k+1} = x _ k + \alpha _ k p _ k$
- $r _ {k+1} = r _ k - \alpha _ k A p _ k$
- $\beta _ k = \langle r _ {k+1}, r _ {k+1} \rangle / \langle r _ k, r _ k \rangle$
- $p _ {k+1} = r _ {k+1} + \beta _ k p _ k$
@todo, once have done sheet 4.
Suppose:
- $A \in \mathbb R^{n \times n}$, $A \succ 0$
- $b \in \mathbb R^n$
@State a theorem describing the convergence of the CG algorithm applied to the linear system $Ax = b$.
Let $e _ k := x _ \ast - x _ k$ be the error after the $k$th CG iteration where $x _ \ast$ is the exact solution. Then
\[\frac{ \vert \vert e _ k \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \le 2\left(\frac{\sqrt{\kappa _ 2(A)} - 1}{\sqrt{\kappa _ 2(A)} + 1}\right)^k\]Suppose:
- $A \in \mathbb R^{n \times n}$, $A \succ 0$
- $b \in \mathbb R^n$
- $Ax _ \ast = b$
- $e _ k := x _ \ast - x _ k$ be the error after the $k$th CG iteration
@Prove that then
\[\frac{ \vert \vert e _ k \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \le 2\left(\frac{\sqrt{\kappa _ 2(A)} - 1}{\sqrt{\kappa _ 2(A)} + 1}\right)^k\]
You may assume that
\[\min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {x \in [\lambda _ \min(A), \lambda _ \max(A)]} \vert p(x) \vert \le 2\left(\frac{\sqrt{\kappa _ 2(A)} - 1}{\sqrt{\kappa _ 2(A)} + 1}\right)^k\]
We have $e _ 0 = x _ \ast$ since $x _ 0 = 0$. Then
\[\begin{aligned} \frac{ \vert \vert e _ k \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} &= \min _ {x \in \mathcal K _ k(A, b)} \frac{ \vert \vert x _ k - x _ \ast \vert \vert _ A}{ \vert \vert x _ \ast \vert \vert _ A} \\ &= \min _ {p _ {k-1} \in \mathcal P _ {k-1}} \frac{ \vert \vert p _ {k-1}(A) b - A^{-1} b \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \\ &= \min _ {p _ {k-1} \in \mathcal P _ {k-1}} \frac{ \vert \vert (p _ {k-1}(A) A - I)e _ 0 \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \\ &= \min _ {p \in \mathcal P _ k, p(0) = 1} \frac{ \vert \vert p(A) e _ 0 \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \\ &= \min _ {p \in \mathcal P _ k, p(0) = 1} \frac{\left \vert \left \vert V\begin{bmatrix} p(\lambda _ 1) & & \\ & \ddots & \\ & & p(\lambda _ n) \end{bmatrix} V^\top e _ 0 \right \vert \right \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \end{aligned}\]Then
\[\begin{aligned} \left \vert \left \vert V\begin{bmatrix} p(\lambda _ 1) & & \\ & \ddots & \\ & & p(\lambda _ n) \end{bmatrix} V^\top e _ 0 \right \vert \right \vert _ A &= \sum _ i \lambda _ i p(\lambda _ i)^2 (V^\top e _ 0)^2 _ i \\ &\le \max _ j p(\lambda _ j)^2 \sum _ i \lambda _ i (V^\top e _ 0) _ i^2 \\ &= \max _ j p(\lambda _ j)^2 \vert \vert e _ 0 \vert \vert ^2 _ A \end{aligned}\]Thus
\[\begin{aligned} \frac{ \vert \vert e _ k \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} &\le \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ j \vert p(\lambda _ j) \vert \\ &\le \min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {x \in [\lambda _ \min(A), \lambda _ \max(A)]} \vert p(x) \vert \\ &\le 2\left(\frac{\sqrt{\kappa _ 2(A)} - 1}{\sqrt{\kappa _ 2(A)} + 1}\right)^k \end{aligned}\]@Prove using results about Chebyshev polynomials that
\[\min _ {p \in \mathcal P _ k, p(0) = 1} \max _ {x \in [\lambda _ \min(A), \lambda _ \max(A)]} \vert p(x) \vert \le 2\left(\frac{\sqrt{\kappa _ 2(A)} - 1}{\sqrt{\kappa _ 2(A)} + 1}\right)^k\]
where $A$ is a matrix with minimum and maximum eigenvalue $\lambda _ \min$ and $\lambda _ \max$.
For ease of notation, denote $a = \lambda _ \min$ and $b = \lambda _ \max$. Let
\[p(x) = c _ k T _ k\left(\frac{2x - b - a}{b - a}\right)\]where $c _ k = 1 / T _ k \left(\frac{-b-a}{b-a}\right)$. Then $p(0) = 1$, and by results about Chebyshev polynomials we have
\[\vert p(x) \vert \le 1/ \vert T _ k\left(\frac{b+a}{b-a}\right)\]on $x \in [a, b]$.
Furthermore, we have
\[T _ k(z) = \frac 1 2 (z^k + z^{-k})\]with $\frac 1 2 (z + z^{-1}) = \frac{b+a}{b-a}$, and hence
\[\begin{aligned} z &= \frac{\sqrt{b/a} + 1}{\sqrt{b / a}-1} \\ &= \frac{\sqrt{\kappa _ 2(A)} + 1}{\sqrt{\kappa _ 2(A)} - 1} \end{aligned}\]Hence
\[\begin{aligned} \vert p(x) \vert &\le \frac{1}{T _ k\left(\frac{b+a}{b-a}\right)} \\ &\le 2 \left( \frac{\sqrt \kappa _ 2(A) - 1}{\sqrt \kappa _ 2(A) + 1} \right)^k \end{aligned}\]We have the ∆convergence-of-conjugate-gradient result that if:
- $A \in \mathbb R^{n \times n}$, $A \succ 0$
- $b \in \mathbb R^n$
- $e _ k := x _ \ast - x _ k$ be the error after the $k$th CG iteration where $x _ \ast$ is the exact solution
then:
\[\frac{ \vert \vert e _ k \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \le 2\left(\frac{\sqrt{\kappa _ 2(A)} - 1}{\sqrt{\kappa _ 2(A)} + 1}\right)^k\]
Intuitively, what does this mean in terms of when CG will be slow?
When the eigenvalues are not clustered away from $0$.
We have the ∆convergence-of-conjugate-gradient result that if:
- $A \in \mathbb R^{n \times n}$, $A \succ 0$
- $b \in \mathbb R^n$
- $e _ k := x _ \ast - x _ k$ be the error after the $k$th CG iteration where $x _ \ast$ is the exact solution
then:
\[\frac{ \vert \vert e _ k \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \le 2\left(\frac{\sqrt{\kappa _ 2(A)} - 1}{\sqrt{\kappa _ 2(A)} + 1}\right)^k\]
This means that CG will be slow if the eigenvalues are not distributed in a favourable manner, like not being clustered away from $0$.
To solve this, we can use preconditioning. What does the preconditioning setup look like in this context?
Find some matrix $M$ such that
\[M^\top M \approx A^{-1}\]and solve
\[M^\top A M y = M^\top b, \quad My = x\]We have the ∆convergence-of-conjugate-gradient result that if:
- $A \in \mathbb R^{n \times n}$, $A \succ 0$
- $b \in \mathbb R^n$
- $e _ k := x _ \ast - x _ k$ be the error after the $k$th CG iteration where $x _ \ast$ is the exact solution
then:
\[\frac{ \vert \vert e _ k \vert \vert _ A}{ \vert \vert e _ 0 \vert \vert _ A} \le 2\left(\frac{\sqrt{\kappa _ 2(A)} - 1}{\sqrt{\kappa _ 2(A)} + 1}\right)^k\]
This means that CG will be slow if the eigenvalues are not distributed in a favourable manner, like not being clustered away from $0$.
To solve this, we can use preconditioning where we find some matrix $M$ such that
\[M^\top M \approx A^{-1}\]
and solve
\[M^\top A M y = M^\top b, \quad My = x\]
What are the desiderata of $M$ in this context?
- $M^\top A M$ is easy to multiply to a vector
- $M^\top A M$ has eigenvalues has clustered eigenvalues away from $0$