NLA MT25, Power method
- Course - Numerical Linear Algebra MT25U
- The analysis of the power method is covered here:
- Notes - Numerical Analysis HT24, Power methodU
Flashcards
@Describe the steps in the shifted inverse power method for computing the eigenvalue and eigenvector of $A \in \mathbb R^{n \times n}$ closest to a prescribed value $s \in \mathbb C$.
- Select arbitrary $y \in \mathbb R^n$
- Set $x _ 0 = y/ \vert \vert y \vert \vert $
- Then, for $k = 0, 1, \ldots$
- Compute $y _ k = (A - s I)^{-1} x _ k$
- Set $x _ {k+1} = y _ k / \vert \vert y _ k \vert \vert $
Then $x _ {k} \to \pm v _ 1$ and $\hat \lambda = x^\top A x$ gives an estimate for the eigenvalue.
(A justification of why this works is in ∆power-method-analysis).
The shifted inverse power method for computing the eigenvalue of $A \in \mathbb R^{n \times n}$ closest to a prescribed value $s \in \mathbb C$ can be described as follows:
- Select arbitrary $y \in \mathbb R^n$
- Set $x _ 0 = y/ \vert \vert y \vert \vert $
- Then, for $k = 0, 1, \ldots$
- Compute $y _ k = (A - s I)^{-1} x _ k$
- Set $x _ {k+1} = y _ k / \vert \vert y _ k \vert \vert $
Then $x _ {k} \to \pm v _ 1$ and $\hat \lambda = x^\top A x$ gives an estimate for the eigenvalue. This improves the rate of convergence to
\[\frac{ \vert \lambda _ {\sigma(2)} - s \vert ^{-1}}{ \vert \lambda _ {\sigma(1)} - s \vert ^{-1}} = \frac{ \vert \lambda _ {\sigma(1)} - s \vert }{ \vert \lambda _ {\sigma(2)} - s \vert }\]
where $\sigma$ is a permutation such that $ \vert \lambda _ {\sigma(1)} - s \vert < \vert \lambda _ {\sigma(2)} - s \vert \le \cdots \le \vert \lambda _ {\sigma(n)} - s \vert $, i.e. $\lambda _ {\sigma(1)}$ is the eigenvalue closest to the shift $s$.
What is the downside of this method?
- Compute $y _ k = (A - s I)^{-1} x _ k$
- Set $x _ {k+1} = y _ k / \vert \vert y _ k \vert \vert $
Each step requires solving the linear system $(A - sI)y _ k = x _ k$. We compute the $LU$ factorisation of $(A - sI)$ once at $O(n^3)$ up front, and then each iteration is $O(n^2)$ via two triangular solves. (This per-step cost compares to $O(n^2)$ per step for the basic power method, so inverse iteration is only a constant-factor more expensive.)
The shifted inverse power method for computing the eigenvalue of $A \in \mathbb R^{n \times n}$ closest to a prescribed value $s \in \mathbb C$ can be described as follows:
- Select arbitrary $y \in \mathbb R^n$
- Set $x _ 0 = y/ \vert \vert y \vert \vert $
- Then, for $k = 0, 1, \ldots$
- Compute $y _ k = (A - s I)^{-1} x _ k$
- Set $x _ {k+1} = y _ k / \vert \vert y _ k \vert \vert $
Then $x _ {k} \to \pm v _ 1$ and $\hat \lambda = x^\top A x$ gives an estimate for the eigenvalue. @State how the Rayleigh quotient iteration modifies this, the corresponding convergence rate, and how this improves further when $A$ is symmetric.
- Compute $y _ k = (A - s I)^{-1} x _ k$
- Set $x _ {k+1} = y _ k / \vert \vert y _ k \vert \vert $
The Rayleigh quotient iteration changes the shift at each iteration to $s _ k = x _ k^\top A x _ k$.
This gives quadratic convergence
\[ \vert \vert Ax^{k+1} - \lambda^{k+1} x^{k+1} \vert \vert = O( \vert \vert Ax^k - \lambda^k x^k \vert \vert ^2)\]and when $A$ is symmetric, we have cubic convergence.
Bite-sized
State the basic (unshifted) power method.
Initialise $x _ 0$ randomly (any starting vector with nonzero $v _ 1$ component). For $k = 0, 1, \ldots$:
\[x _ {k+1} = A x _ k / \|A x _ k\| _ 2, \qquad \hat \lambda _ k = x _ k^\top A x _ k.\]Then $x _ k \to \pm v _ 1$ (the dominant eigenvector) and $\hat \lambda _ k \to \lambda _ 1$ (the dominant eigenvalue), with linear convergence rate $ \vert \lambda _ 2 / \lambda _ 1 \vert $.
Convergence rate of the basic power method: linear with rate $ \vert \lambda _ 2 / \lambda _ 1 \vert $. So convergence is slow when $ \vert \lambda _ 2 \vert $ is close to $ \vert \lambda _ 1 \vert $ (small spectral gap).
Key identity for shifted inverse iteration: the eigenvalues of $(A - sI)^{-1}$ are $(\lambda _ i(A) - s)^{-1}$. The largest in absolute value corresponds to the eigenvalue of $A$ closest to $s$, which is what the iterates converge to.
Rayleigh quotient iteration’s local convergence rate is quadratic for a general matrix and cubic when $A$ is symmetric.
When is the basic (unshifted) power method useful in practice?
When you need just the dominant eigenvalue/eigenvector and the spectral gap $ \vert \lambda _ 1 \vert / \vert \lambda _ 2 \vert $ is large. Notable example: Google PageRank, where $A$ is a stochastic matrix and a few power iterations suffice to rank webpages. It’s also a building block (conceptually) for the QR algorithm, Krylov methods, and randomised SVD analysis.