Notes - Numerical Analysis HT24, QR algorithm

[[Course - Numerical Analysis HT24]]^U
- [[Notes - Numerical Analysis HT24, Eigenvalue problems]]^U
- [[Notes - Numerical Analysis HT24, QR factorisation]]^U

Flashcards

Describe the basic $QR$ algorithm for determining the eigenvalues of a matrix $A \in \mathbb R^{n \times n}$, and how to establish the eigenvectors and eigenvalues once the iterations are finished, under the assumptions that:

$A$ is symmetric
$A$ has eigenvalues of distinct modulus

Set $A _ 1 = A$
For $k = 1, 2, \cdots$
- Calculate the $QR$ factorisation $A _ k = Q _ k R _ k$
- Set $A _ {k+1} = R _ k Q _ k$

Then $A _ k$ converges to a diagonal matrix with the eigenvalues on the diagonal (if $A$ is not symmetric but still has eigenvalues of distinct modulus, then $A _ k$ converges to an upper triangular matrix. If the eigenvalues aren’t distinct, $A _ k$ might not converge at all).

In general, if you know the eigenvalues of a matrix, then it is straightforward to find eigenvectors by solving a linear system. (And likewise, if you know the eigenvectors, you can find the eigenvalues).

But in the case where $A$ is symmetric and converges to a diagonal matrix, if you let $Q^{(k)} = Q _ 1 \cdots Q _ k$, then

\[A_k = (Q^{(k)})^\top A Q^{(k)}\]

since $A _ k$ is diagonal (or almost diagonal), it follows that $Q^{(k)}$ consists of an orthonormal basis consisting of $A$’s eigenvectors.

The basic QR algorithm for determining the eigenvalues of a matrix $A \in \mathbb R^{n \times n}$ is given by:

Set $A _ 1 = A$
For $k = 1, 2, \cdots$
- Calculate the $QR$ factorisation $A _ k = Q _ k R _ k$
- Set $A _ {k+1} = R _ k Q _ k$

Quickly justify why all matrices $A _ k$ are similar to $A$.

(for the full analysis given in the notes, we also need the assumptions that $A$ is symmetric and the eigenvalues of $A$ have distinct modulus, but this isn’t needed here).

Use induction on $k$. $A _ 1$ is similar to $A$ by definition. Then for $A _ {k+1}$:

\[\begin{aligned} A_{k+1} &= R_k Q_k \\\\ &= Q^{-1}_k (Q_k R_k) Q_k \\\\ &= Q_k^{-1}A_k Q_k \end{aligned}\]

and this is a similarity transformation.

The basic QR algorithm for determining the eigenvalues of a matrix $A \in \mathbb R^{n \times n}$ under the assumptions that:

$A$ has eigenvalues of distinct modulus (this is required)
$A$ is symmetric (this is not required, if we drop this assumption, then $A _ k$ instead tends to a upper triangular matrix of eigenvalues).

is given by:

Set $A _ 1 = A$
For $k = 1, 2, \cdots$
- Calculate the $QR$ factorisation $A _ k = Q _ k R _ k$
- Set $A _ {k+1} = R _ k Q _ k$

Then $A _ k$ converges to a diagonal matrix of eigenvalues.

What three modifcations are used in practice so that $A _ k$ converges faster, and why do these help? Describe them, and then present a full algorithm.

First reduce $A$ to tridiagonal form by orthogonal similarity transformations; this reduces the complexity of calculating the QR at each step. This can be done using Givens rotations or Householder reflectors.
Use shifts, to instead calculate the eigenvalues of $A + sI$ for some $s$ chosen so that the convergence ratio is higher. A typical choice is the bottom right entry of the matrix, since this will approximately equal the smallest eigenvalue of the matrix, making the of the final entry to the smallest eigenvalue higher.
“Deflate” the matrix $A$: Each time the bottom row and last column converge to $(0, 0, \cdots, \lambda)$ within some threshold, remove them from $A$. This reduces the size of the matrix, speeding up matrix operations, and it also means that when we pick a new shift using $a _ {(n-1), (n-1)}$, the speed of the convergence now improves for the second smallest eigenvalue, rather than for the smallest eigenvalue.

A full algorithm:

Reduce $A$ to tridiagonal form by orthogonal similarity transformations.

For $m = n, n -1, \cdots, 2$:

While $a _ {m-1, m} >$ tolerance: (this detects when we should deflate)
- $[Q, R] = \mathbf{QRFactorise}(A - a _ {m, m}I)$
- $A = RQ + a _ {m, m} I$ (note this process preserves the fact the $A _ k$ are similar)
Record eigenvalue $\lambda _ m = a _ {m, m}$
Set $A$ to the leading $m-1$ by $m-1$ submatrix of $A$

Finally: record eigenvalue $\lambda _ 1 = a _ {1, 1}$.

The basic $QR$ algorithm for determining the eigenvalues of a matrix $A \in \mathbb R^{n \times n}$ under assumptions that

$A$ is symmetric
$A$ has eigenvalues of distinct modulus

works as follows:

Set $A _ 1 = A$
For $k = 1, 2, \cdots$
- Calculate the $QR$ factorisation $A _ k = Q _ k R _ k$
- Set $A _ {k+1} = R _ k Q _ k$

Quickly prove that under these assumptions, $A$ eventually converges to a diagonal matrix of eigenvalues (assuming that the power method works).

We do this in several steps, each step building off the last:

Showing $A _ k$ are similar for all $k$
Showing $A _ {k+1} = (Q^{(k)})^\top A Q^{(k)}$, where $Q^{(k)} = Q _ 1 \cdots Q _ k$
Showing $A^k = Q^{(k)}R^{(k)}$, where $R^{(k)} = R _ k \cdots R _ 1$ (note $R^{(k)}$ is reversed)
Showing $(A _ k) _ {1,1}$ must converge to the largest eigenvalue (in modulus) by relating it to $A^k e _ 1$
Showing $(A _ k) _ {n,n}$ must converge to the smallest eigenvalue (in modulus) by relating it to $A^{-k}e _ n$
Finally showing that the rest of entries of $A$ must also converge

Showing $A _ k$ are similar for all $k$: Use induction on $k$. $A _ 1$ is similar to $A$ by definition. Then for $A _ {k+1}$:

\[\begin{aligned} A_{k+1} &= R_k Q_k \\\\ &= Q^{-1}_k (Q_k R_k) Q_k \\\\ &= Q_k^{-1}A_k Q_k \end{aligned}\]

Showing $A _ {k+1} = (Q^{(k)})^\top A Q^{(k)}$, where $Q^{(k)} = Q _ 1 \cdots Q _ k$: This follows from the above proposition by induction.

Showing $A^k = Q^{(k)}R^{(k)}$, where $R^{(k)} = R _ k \cdots R _ 1$: Again, use induction on $k$. $A^1 = Q^{(1)} R^{(1)}$ by definition. Then, for $A^{k+1}$.

\[\begin{aligned} A^{k+1} &= AA^k \\\\ &= A Q^{(k)} R^{(k)} &&(\text{by induction})\\\\ &= Q^{(k_)} Q_{k+1} R_{k+1} R^{(k)} && (\star) \\\\ &= Q^{(k+1)} R^{(k+1)} \end{aligned}\]

Where does $(\star)$ come from? We have that $A _ {k+1} = (Q^{(k)})^\top A Q^{(k)}$ from the previos proposition. But $A _ {k+1}$ also equals $Q _ {k+1} R _ {k+1}$. Then

\[(Q^{(k)})^\top A Q^{(k)} = Q_{k+1} R_{k+1}\]

which, by premultiplying both sides by $Q^{(k)}$, we see

\[AQ^{(k)} = Q^{(k)} Q_{k+1} R_{k+1}\]

Showing $(A _ k) _ {1,1}$ must be the dominant eigenvalue by relating it to $A^k e _ 1$: Consider

\[A^k e_1 = Q^{(k)} R^{(k)}e_1\]

since $R^{(k)}$ is upper-triangular, $R^{(k)} e _ 1$ will equal $[R^{(k)} _ {1,1}, 0, \cdots, 0] = R^{(k)} _ {1,1} e _ 1$ and so $Q^{(k)} R^{(k)} e _ 1$ is $R _ {1,1}^{(k)} q^{(k)} _ 1$ where $q _ 1^{(k)}$ is the first column of $Q^{(k)}$. But from the analysis of the power method, $A^k e _ 1$ must be parallel to the dominant eigenvector of $A$.

But now considering $A _ k$ (subscript instead of superscript), we know

\[A_k = (Q^{(k-1)})^\top A Q^{(k-1)}\]

so in fact $A _ k$ must have first entry approximately equal to $\lambda _ 1$, which can be seen by considering $Q^{(k-1)} A _ k = A Q^{(k-1)}$. We also know that the rate of convergence depends on $\left(\frac{\lambda _ 2}{\lambda _ 1}\right)^k$, where $\lambda _ 1$ is the largest eigenvalue in modulus, and $\lambda _ 2$ is the second largest in modulus.

(One concern here is that the analysis of the power method shows convergence to $\pm v _ 1$ where $v _ 1$ is the dominant eigenvector, but this doesn’t matter because $-v _ 1$ is still “the” dominant of eigenvector of $A$. Note also that here we are just using facts about $A^k$ to deduce that the first entry of $A _ k$ is $\lambda$, we never need to explicitly calculate $A _ k$).

Showing $(A _ k) _ {n,n}$ must converge to the smallest eigenvalue (in modulus) by relating it to $A^{-k}e _ n$: Now consider

\[\begin{aligned} A^{-k} e_n &= (Q^{(k)}) R^{(k)})^{-1} e_n \\\\ &= ((R^{(k)})^{-1} (Q^{(k)})^\top) e_n \\\\ &= Q^{(k)} ((R^{(k)})^{-1})^\top e_n && (\text{since } A^{-1} \text{ is symmetric}) \\\\ &= Q^{(k)} \begin{pmatrix}0 \\\\ \vdots \\\\ c\end{pmatrix} \\\\ &= cq^{(k)}_n \end{aligned}\]

where $c = (R^{(k)})^{-1} _ {n,n}$. But $A^{-k} e _ n$ is the inverse power method, so $q _ n^{(k)}$ must actually converge to the eigenvector corresponding to the smallest eigenvalue, and $(A _ k) _ {n,n}$ must converge to $\lambda _ n$. Furthermore, the rate of convergence depends on

\[\left(\frac{\lambda_{n} }{\lambda_{n-1} }\right)^k\]

Finally showing that the rest of entries of $A$ must also converge: From the above, we know that for large $k$, $A _ k$ looks approximately like

\[A_k = \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 & 0 \\\\ 0 & & & & 0 \\\\ \vdots & & A'_k & & \vdots \\\\ 0 & & & & 0 \\\\ 0 & 0 & \cdots & 0 & \lambda_n \end{pmatrix}\]

by the symmetry of $A _ k$. But because of all the zeroes, we can repeat the analysis on $A _ k’$ to see that the rest of the entries must also converge to a diagonal matrix (admittedly this isn’t very precise, but it is also not very precise in the course notes).

What assumptions are really needed?

The analysis of the QR algorithm for a matrix $A$ in our notes uses the additional assumptions that:

$A$ is symmetric
$A$ has eigenvalues of distinct modulus

This means that the algorithm converges to a diagonal matrix with the eigenvalues on the diagonal. But what about weaker assumptions?

If $A$ is not symmetric but still has eigenvalues of distinct modulus, then it converges to an upper triangular matrix. This can be used to find the [[Notes - Numerical Analysis HT24, Schur decomposition]]^U of $A$.
If $A$ does not have not have distinct eigenvalues, then it might not converge. For example, consider $\begin{pmatrix}0 & 1 \ 1 & 0\end{pmatrix}$ (though this isn’t an “if and only if”, e.g. consider the identity matrix).

Flashcards

What assumptions are really needed?

Related posts