NLA MT25, Lanczos algorithm
Flashcards
What is the ∆lanczos-algorithm (not the ∆lanczos-iteration!) used for?
Solving symmetric eigenvalue problems
\[Ax = \lambda x.\]@Describe the Lanczos @algorithm for solving symmetric eigenvalue problems $Ax = \lambda x$, and describe the time and space cost.
The Lanczos algorithm computes approximate eigenpairs by the ∆rayleigh-ritz-algorithm applied to the Krylov subspace $\mathcal K _ k(A, v)$ as computed by the ∆lanczos-iteration.
Algorithm:
- Set $q _ 1 = b/\|b\| _ 2$ (typically $b$ random for eigenvalue problems).
- For $k = 1, 2, \ldots, r$:
- One Lanczos iteration step (∆lanczos-iteration):
- $v = Aq _ k$
- $t _ {k, k} = q _ k^\top v$
- $v = v - t _ {k,k} q _ k - t _ {k-1, k} q _ {k-1}$, and if $k = 1$ omit this last subtraction (since $q _ 0$ doesn’t exist)
- $t _ {k+1, k} = \|v\| _ 2$
- $q _ {k+1} = v/t _ {k+1, k}$ (or a ∆happy-breakdown if $t _ {k+1, k} = 0$)
- Rayleigh-Ritz on $Q$ (∆rayleigh-ritz-algorithm):
- Compute the eigendecomposition $T _ k = V _ k \hat \Lambda _ k V _ k^\top$ of the tridiagonal matrix $T _ k = Q _ k^\top A Q _ k$.
- Ritz values: $\text{diag}(\hat \Lambda _ k)$ are the approximate eigenvalues of $A$.
- Ritz vectors: columns of $Q _ k V _ k$, approximate eigenvectors of $A$.
- …stopping once enough Ritz pairs have converged (judged e.g. by the residual norm $\|A\hat v _ i - \hat \lambda _ i \hat v _ i\| _ 2$).
- One Lanczos iteration step (∆lanczos-iteration):
Outputs: After $r$ outer iterations we have $r$ Ritz pairs $(\hat \lambda _ i, \hat v _ i)$, approximating eigenpairs of $A$ with $\hat v _ i \in \mathcal K _ r(A, b)$.
Time:
- Per iteration:
- One $O(n^2)$ matrix-vector product $Aq _ k$
- One $O(n)$ inner product for the Lanczos three-term recurrence
- $O(k^2)$ for the specialised symmetric tridiagonal QR eigendecomposition $T _ k = V _ k \hat \Lambda _ k V _ k^\top$
- $O(nk^2)$ for forming the Ritz vectors (not necessarily needed at every step).
Space:
- $O(nr)$ for storing $Q _ r$, since this is needed to recover the Ritz vectors. (Compare to the pure ∆lanczos-iteration, which doesn’t need to store each $q _ k$ because of the three-term recurrence).
The ∆lanczos-algorithm for solving symmetric eigenvalue problems $Ax = \lambda x$ does iterations of the form:
- Perform Lanczos iterations to obtain $AQ _ k = Q _ k T _ k + q _ {k+1} [0, \ldots, 0, t _ {k+1, k}]$
- Compute the eigenvalue decomposition of $T _ k = V _ k \hat \Lambda V _ k^\top$ (Rayleigh-Ritz with subspace $Q _ k$)
- $\text{diag}(\hat \Lambda)$ are the approximate eigenvalues (Ritz values), and the columns of $Q _ k V _ k$ are the approximate eigenvectors (Ritz vectors)
The ∆rayleigh-ritz-algorithm calculates $Q _ k^\top A Q _ k = T _ k$. Why is it particularly simple in this case?
The Lanczos iteration for a symmetric matrix gives a tridiagonal output, so we only have to solve a tridiagonal eigenproblem. (Perhaps it would be more accurate to say that the ∆arnoldi-iteration becomes the specialised ∆lanczos-iteration when $A$ is symmetric).
The ∆lanczos-algorithm for solving symmetric eigenvalue problems $Ax = \lambda x$ produces Ritz pairs $(\hat \lambda _ i, \hat v _ i)$ where $\hat \lambda _ 1 \ge \cdots \ge \hat \lambda _ k$ are the eigenvalues of the tridiagonal matrix $T _ k = Q _ k^\top A Q _ k$, and $\mathcal K _ k(A, b) = \text{span}(b, Ab, \ldots, A^{k-1} b)$ is the order-$k$ Krylov subspace.
@Justify why the convergence of this algorithm is good for extremal eigenpairs, i.e. why
- $\hat \lambda _ 1 \to \lambda _ \max(A)$ rapidly, and
- $\hat \lambda _ k \to \lambda _ \min(A)$ rapidly,
and briefly discuss why this same argument does not work for non-extremal (interior) eigenvalues?
By ∆courant-fischer-minmax-theorem, the largest eigenvalue of symmetric $A$ has the variational characterisation
\[\lambda _ \max(A) = \max _ {x \in \mathbb R^n \setminus \{0\}} \frac{x^\top A x}{x^\top x}.\]By the variational characterisation of Ritz values (∆rayleigh-ritz-variational-characterisation), the largest Ritz value of Lanczos at step $k$ is the same maximum restricted to the Krylov subspace:
\[\hat \lambda _ 1 = \max _ {x \in \mathcal K _ k(A, b) \setminus \{0\}} \frac{x^\top A x}{x^\top x}\]Since $A^{k-1} b \in \mathcal K _ k(A, b)$, the maximum over the subspace is at least the value of the Rayleigh quotient at this particular vector. Combining this, we obtain
\[\begin{aligned} \lambda _ \max(A) &\ge \max _ {x \in \mathcal K _ k(A, b) \setminus \{0\})} \frac{x^\top A x}{x^\top x} &&(\hat \lambda _ 1, \text{Lanczos' largest Ritz value} ) \\ &\ge \frac{v^\top A v}{v^\top v} &&(\text{where } v = A^{k-1}b \text{ is the power method's est.}) \end{aligned}\]Hence we see that the Lanczos’ $\hat \lambda _ 1$ is sandwiched between the true eigenvalue and the power method estimate.
Symmetrically, we may make the same argument for $\lambda _ \min$. The analysis for the interior eigenpairs is more difficult because the Courant-Fischer inequality here is min-max or max-min, rather than just one max or min.
Bite-sized
The Lanczos algorithm for symmetric eigenproblems combines two pieces: the Lanczos iteration (to build an orthonormal Krylov basis $Q _ k$) and the Rayleigh-Ritz projection (to extract approximate eigenpairs from $Q _ k^\top A Q _ k$).
In the Lanczos algorithm, the projected matrix $T _ k := Q _ k^\top A Q _ k$ is symmetric tridiagonal, so the Ritz eigenvalue subproblem reduces to a tridiagonal eigenproblem (much cheaper than a general Hessenberg one).
In the Lanczos algorithm, what are the Ritz values and Ritz vectors in terms of the eigendecomposition $T _ k = V _ k \hat\Lambda V _ k^\top$ of $T _ k = Q _ k^\top A Q _ k$?
- Ritz values: the diagonal entries of $\hat\Lambda$ (the eigenvalues of $T _ k$); these approximate the eigenvalues of $A$.
- Ritz vectors: the columns of $Q _ k V _ k$; these approximate the corresponding eigenvectors of $A$.
Which eigenpairs of a symmetric $A$ does the Lanczos algorithm tend to find quickly?
The extremal ones — $\lambda _ \max$ and $\lambda _ \min$ converge rapidly as the Krylov dimension grows, with $\lambda _ 2$, $\lambda _ {n-1}$ next, etc. Interior eigenpairs converge much more slowly: the Courant-Fischer sandwich argument that gives the extremal rate involves a single max (or min), but for interior $\lambda _ i$ it becomes a min-max over $i$-dimensional subspaces and the clean lower bound is lost.
After $k$ Lanczos iterations the largest Ritz value $\hat\lambda _ 1$ is at least as good an estimate of $\lambda _ \max(A)$ as the estimate from $k - 1$ steps of the power method on the same initial vector $b$ — because $A^{k-1} b \in \mathcal K _ k(A, b)$ and Lanczos maximises the Rayleigh quotient over the whole subspace.