Notes - Numerical Analysis HT24, Power method


Flashcards

Describe the steps in the Power Method for computing the largest eigenvalue of a matrix $A \in \mathbb R^{n \times n}$, and how to establish the eigenvector / eigenvalue from the iterations.


  • Select arbitrary $y \in \mathbb R^n$
  • Set $x _ 0 = y/ \vert \vert y \vert \vert $
  • Then, for $k = 0, 1, \cdots$
    • Compute $y _ k = Ax _ k$
    • Set $x _ {k + 1} = y _ k / \vert \vert y _ k \vert \vert $

Then $x _ k \to \pm v _ 1$ and $ \vert \vert y _ k \vert \vert \approx \vert \lambda _ 1 \vert $, where $\lambda _ 1$ is the largest eigenvalue of $A$ and $v _ 1$ is the corresponding eigenvector of unit length.

The Power Method for computing the largest eigenvalue of a matrix $A \in \mathbb R^{n \times n}$ works as follows:

  • Select arbitrary $y \in \mathbb R^n$
  • Set $x _ 0 = y/ \vert \vert y \vert \vert $
  • Then, for $k = 0, 1, \cdots$
    • Compute $y _ k = Ax _ k$
    • Set $x _ {k + 1} = y _ k / \vert \vert y _ k \vert \vert $

Then $x _ k \to \pm v _ 1$ and $ \vert \vert y _ k \vert \vert \to \vert \lambda _ 1 \vert $ where $\lambda _ 1$ is the largest eigenvalue of $A$ and $v _ 1$ is the corresponding eigenvector of unit length. Under the assumptions that:

  • The matrix is diagonalisable
  • The modulus of the largest eigenvalue is strictly bigger than the modulus of the second largest

Quickly:

  • Prove that this method works
  • Give an expression that determines how fast the convergence is
  • Justify why the same idea works for $A^{-1}$ (the “inverse power method”), and give another expression for how fast the convergence is
  • Explain the idea of shifts and why they are more useful in the inverse power method than in the ordinary power method

Proof it works:

Overall idea: Expand $x _ 0$ in the orthonormal basis of eigenvectors of $A$ (which exists since $A$ is diagonalisable). Then look at $A^k x _ 0$, then relate $x _ k$ and $y _ k$ to $A^k x _ 0$.

This algorithm computes a list $y _ k = \beta _ k A^k x _ 0$ for some set of coefficients $\beta _ k$. Suppose $\{v _ 1, \cdots, v _ n\}$ and $\{\lambda _ 1, \cdots, \lambda _ n\}$ are the eigenvalues of $A$ arranged in descending order of modulus and that the eigenvectors are of unit length. Then $\exists \alpha _ i$ such that

\[x_0 = \sum^n_{i = 1} \alpha_i v_i\]

Hence

\[\begin{aligned} A^k x_0 &= \sum^n_{i = 1} \alpha_i \lambda_i^k v_i \\\\ &= \lambda_1^k \left[ \alpha_1 v_1 + \sum^n_{i = 2} \alpha_i \left(\frac{\lambda_i}{\lambda_1}\right)^k v_i \right] \end{aligned}\]

Since $\lambda _ 1 > \lambda _ i$ for $i \ge 2$, for $k$ large we have $A^k x _ 0 \approx \alpha _ 1 \lambda^k _ 1 v _ 1$. Hence

\[\begin{aligned} x_{k} &= \frac{y_{k-1} }{||y_{k-1}||} \\\\ &= \frac{\beta_{k-1} }{|\beta_{k-1}|} \cdot \frac{A^{k-1} x_0}{||A^{k-1} x_0||} \\\\ &\approx \frac{\beta_{k-1} }{|\beta_{k-1}|} \cdot \frac{\lambda_1^k \alpha_1}{|\lambda_1^k \alpha_1|} \cdot \frac{v_1}{||v||_1} \\\\ &= \pm v_1 \end{aligned}\]

and

\[\begin{aligned} y_k &= Ax_{k-1} \\\\ &\approx \pm A v_1 \\\\ &= \pm \lambda_1 v_1 \end{aligned}\]

so $ \vert \vert y _ k \vert \vert = \vert \lambda _ 1 \vert $ (since the eigenvector basis elements have unit modulus).

(The assumption that the matrix is diagonalisable is not necessary, it just suffices that the dominant eigenvalue of $A$ is unique. The proof that doesn’t assume the matrix is diagonalisable instead uses the basis for the Jordan decomposition of $A$, and analyses the convergence that way).

How fast is convergence? The speed at which $A^k x _ 0$ tends to $\lambda^k _ 1 \alpha _ 1 v _ 1$ depends on the ratio

\[\frac{\lambda_2}{\lambda_1}\]

where $\lambda _ 2$ is the second-largest eigenvalue. Ideally, this is as small as possible.

What about the inverse power method? If we instead use $A^{-1}$, then the eigenvalues are $1/\lambda _ i$ for each of the eigenvalues of $A$. Since the dominant eigenvalue of $A^{-1}$ is then $1/\lambda _ n$, applying the inverse power method to $A^{-1}$ means convergence to $1/\lambda _ n$ and corresponding eigenvector $v _ n$. In this case, the speed of convergence depends on

\[\frac{1/\lambda_{n-1} }{1/\lambda_n} = \frac{\lambda_n}{\lambda_{n-1}}\]

Again, for the best convergence, this should be as small as possible.

Shifts: For any $s \in \mathbb C$, the eigenvalues of $A - sI$ are $\lambda - s$. So if we can pick the shift well and apply the power method to $A - sI$, we could make

\[\frac{\lambda_{\sigma(2)} - s}{\lambda_{\sigma(1)} - s}\]

small, where $\sigma$ is a permutation such that $ \vert \lambda _ {\sigma(1)} - s \vert > \vert \lambda _ {\sigma(2)} - s \vert \ge \cdots \ge \vert \lambda _ {\sigma(n)} - s \vert $. But without knowing the eigenvalues ahead of time, this is difficult. And in some sense we are not going to be able to get crazy speedups because we’re affecting all of the eigenvalues uniformly, so making one really large will also make the others really large.

However, if we apply the inverse power method to $A - sI$, then the speed of convergence depends on

\[\frac{1/(\lambda_{\sigma(n-1)} - s)}{1/(\lambda_{\sigma(n)} - s)} = \frac{\lambda_{\sigma(n)} - s}{\lambda_{\sigma(n-1)} - s}\]

Even if $s$ is only approximately close to an eigenvalue (by construction of $\sigma$, if $s$ is close to an eigenvalue, then this eigenvalue is $\lambda _ {\sigma(n)}$) then it is possible that $\frac{1}{\lambda _ {\sigma(n)} - s}$ is much bigger than the runner up $1/(\lambda _ {\sigma(n-1)} - s)$.




Related posts