NLA MT25, Power method


Flashcards

@Describe the steps in the shifted inverse power method for computing the eigenvalue and eigenvector of $A \in \mathbb R^{n \times n}$ closest to a prescribed value $s \in \mathbb C$.

  • Select arbitrary $y \in \mathbb R^n$
  • Set $x _ 0 = y/ \vert \vert y \vert \vert $
  • Then, for $k = 0, 1, \ldots$
    • Compute $y _ k = (A - s I)^{-1} x _ k$
    • Set $x _ {k+1} = y _ k / \vert \vert y _ k \vert \vert $

Then $x _ {k} \to \pm v _ 1$ and $\hat \lambda = x^\top A x$ gives an estimate for the eigenvalue.

(A justification of why this works is in ∆power-method-analysis).

@exam~

The shifted inverse power method for computing the eigenvalue of $A \in \mathbb R^{n \times n}$ closest to a prescribed value $s \in \mathbb C$ can be described as follows:

  • Select arbitrary $y \in \mathbb R^n$
  • Set $x _ 0 = y/ \vert \vert y \vert \vert $
  • Then, for $k = 0, 1, \ldots$
    • Compute $y _ k = (A - s I)^{-1} x _ k$
    • Set $x _ {k+1} = y _ k / \vert \vert y _ k \vert \vert $

Then $x _ {k} \to \pm v _ 1$ and $\hat \lambda = x^\top A x$ gives an estimate for the eigenvalue. This improves the rate of convergence to

\[\frac{ \vert \lambda _ {\sigma(2)} - s \vert ^{-1}}{ \vert \lambda _ {\sigma(1)} - s \vert ^{-1}} = \frac{ \vert \lambda _ {\sigma(1)} - s \vert }{ \vert \lambda _ {\sigma(2)} - s \vert }\]

where $\sigma$ is a permutation such that $ \vert \lambda _ {\sigma(1)} - s \vert < \vert \lambda _ {\sigma(2)} - s \vert \le \cdots \le \vert \lambda _ {\sigma(n)} - s \vert $, i.e. $\lambda _ {\sigma(1)}$ is the eigenvalue closest to the shift $s$.

What is the downside of this method?

Each step requires solving the linear system $(A - sI)y _ k = x _ k$. We compute the $LU$ factorisation of $(A - sI)$ once at $O(n^3)$ up front, and then each iteration is $O(n^2)$ via two triangular solves. (This per-step cost compares to $O(n^2)$ per step for the basic power method, so inverse iteration is only a constant-factor more expensive.)

The shifted inverse power method for computing the eigenvalue of $A \in \mathbb R^{n \times n}$ closest to a prescribed value $s \in \mathbb C$ can be described as follows:

  • Select arbitrary $y \in \mathbb R^n$
  • Set $x _ 0 = y/ \vert \vert y \vert \vert $
  • Then, for $k = 0, 1, \ldots$
    • Compute $y _ k = (A - s I)^{-1} x _ k$
    • Set $x _ {k+1} = y _ k / \vert \vert y _ k \vert \vert $

Then $x _ {k} \to \pm v _ 1$ and $\hat \lambda = x^\top A x$ gives an estimate for the eigenvalue. @State how the Rayleigh quotient iteration modifies this, the corresponding convergence rate, and how this improves further when $A$ is symmetric.

The Rayleigh quotient iteration changes the shift at each iteration to $s _ k = x _ k^\top A x _ k$.

This gives quadratic convergence

\[ \vert \vert Ax^{k+1} - \lambda^{k+1} x^{k+1} \vert \vert = O( \vert \vert Ax^k - \lambda^k x^k \vert \vert ^2)\]

and when $A$ is symmetric, we have cubic convergence.

Bite-sized

State the basic (unshifted) power method.

Initialise $x _ 0$ randomly (any starting vector with nonzero $v _ 1$ component). For $k = 0, 1, \ldots$:

\[x _ {k+1} = A x _ k / \|A x _ k\| _ 2, \qquad \hat \lambda _ k = x _ k^\top A x _ k.\]

Then $x _ k \to \pm v _ 1$ (the dominant eigenvector) and $\hat \lambda _ k \to \lambda _ 1$ (the dominant eigenvalue), with linear convergence rate $ \vert \lambda _ 2 / \lambda _ 1 \vert $.

Source: Lecture 8, Power method for $Ax = \lambda x$ slide and Algorithm 8.1 in §8.2 of the lecture notes.

@bite~ @exam~

Convergence rate of the basic power method: linear with rate $ \vert \lambda _ 2 / \lambda _ 1 \vert $. So convergence is slow when $ \vert \lambda _ 2 \vert $ is close to $ \vert \lambda _ 1 \vert $ (small spectral gap).

Source: Lecture 8, §8.2 of the lecture notes (convergence analysis).

@bite~ @exam~

Key identity for shifted inverse iteration: the eigenvalues of $(A - sI)^{-1}$ are $(\lambda _ i(A) - s)^{-1}$. The largest in absolute value corresponds to the eigenvalue of $A$ closest to $s$, which is what the iterates converge to.

Source: Lecture 8, Inverse power method slide and §8.2.2 of the lecture notes.

@bite~

Rayleigh quotient iteration’s local convergence rate is quadratic for a general matrix and cubic when $A$ is symmetric.

Source: Lecture 8, Inverse power method slide and §8.2.2 of the lecture notes.

@bite~

When is the basic (unshifted) power method useful in practice?

When you need just the dominant eigenvalue/eigenvector and the spectral gap $ \vert \lambda _ 1 \vert / \vert \lambda _ 2 \vert $ is large. Notable example: Google PageRank, where $A$ is a stochastic matrix and a few power iterations suffice to rank webpages. It’s also a building block (conceptually) for the QR algorithm, Krylov methods, and randomised SVD analysis.

Source: Lecture 8, Why compute eigenvalues? Google PageRank slide and §8.2.1 of the lecture notes.

@bite~