Notes - NLA MT25, Useful miscellany

Course - Numerical Linear Algebra MT25^U

Flashcards

What is

\[\begin{bmatrix} I _ m & X \\ 0 & I _ n \end{bmatrix}^{-1}\]

block-upper-triangular-inverse-identity

\[\begin{bmatrix} I _ m & -X \\ 0 & I _ n \end{bmatrix}\]

Suppose that $ \vert \vert X \vert \vert < 1$ in a submultiplicative norm. How can you rewrite $(I - X)^{-1}$?

neumann-series

\[(I - X)^{-1} = I + X + X^2 + X^3 + \cdots\]

(submultiplicativity is useful so you can say $ \vert \vert X^k \vert \vert \le \vert \vert X \vert \vert ^k \to 0$).

@exam~

Suppose that $Q \in \mathbb R^{m \times n}$ is orthogonal (note that $Q$ might be rectangular!). What does this mean, and what does this not mean?

rectangular-orthonormal-matrix-clarification

\[Q^\top Q = I _ n\]

This does not mean that

\[Q Q^\top = I _ n\]

(although this is true for square matrices, and more generally $A^{-1} A = I _ n$ implies $A A^{-1} = I _ n$, it is not true in general for rectangular matrices).

@State a useful identity about the spectral norm of $I - P$ where $P$ is a projector ($P^2 = P$).

projector-norm-identity

Suppose $P \in \mathbb R^{n \times n}$ is a projector $(P^2 = P)$ with $P \ne 0, I$. Then

\[\|I-P\| _ 2 = \|P\| _ 2.\]

What are the eigenvalues of the $n \times n$ all-ones matrix $J _ n = \mathbf 1 \mathbf 1^\top$ (every entry equal to $1$)?

all-ones-matrix-eigenvalues

$J _ n$ is symmetric and rank-$1$, so it has a single nonzero eigenvalue. The eigenvalues are:

$n$ (multiplicity $1$), with eigenvector $\mathbf 1 = (1, \ldots, 1)^\top$, since $J _ n \mathbf 1 = n \mathbf 1$.
$0$ (multiplicity $n - 1$), with eigenspace $\mathbf 1^\perp = \{x : \sum _ i x _ i = 0\}$.

Rank-$1$ forces a single nonzero eigenvalue, and $\mathrm{Trace}(J _ n) = n$ fixes its value as $n$.

When is $A \in \mathbb F^{n \times n}$ diagonalisable over $\mathbb F$? State the equivalent conditions.

diagonalisability-equivalent-conditions

Diagonalisable over $\mathbb F$ means $A = S \Lambda S^{-1}$ for some invertible $S$ and diagonal $\Lambda$ with entries in $\mathbb F$. The following are equivalent:

$\mathbb F^n$ has a basis of eigenvectors of $A$, i.e. $\bigoplus _ \lambda \ker(A - \lambda I) = \mathbb F^n$.
The characteristic polynomial splits over $\mathbb F$, and for every eigenvalue the geometric multiplicity equals the algebraic multiplicity.
The minimal polynomial $m _ A$ is a product of distinct linear factors in $\mathbb F[t]$ (squarefree and split). This is the most usable criterion.
Every Jordan block of $A$ has size $1$.

Field caveat: over $\mathbb C$ the minimal-polynomial criterion is just “$m _ A$ squarefree”, since splitting is automatic. Over $\mathbb R$ you additionally need all eigenvalues real: a rotation has $m _ A = t^2 + 1$, squarefree but not split over $\mathbb R$, so it is $\mathbb C$-diagonalisable but not $\mathbb R$-diagonalisable. Having $n$ distinct eigenvalues is sufficient but not necessary. Normal (resp. Hermitian/symmetric) matrices satisfy the stronger property of unitary (resp. orthogonal) diagonalisability.

See also other cards on this topic:

Definition (eigenbasis): Notes - Linear Algebra II HT23, Eigenvectors and eigenvalues^U, “what it means for $T$ to be diagonalisable”.
Minimal-polynomial criterion and its proof: Notes - Linear Algebra MT23, Primary decomposition theorem^U, “deduce from $m _ T(x)$ whether $T$ is diagonalisable”.
Jordan-form criterion: Notes - Linear Algebra MT23, Jordan normal form^U, “tell from the JNF whether diagonalisable” (and the algebraic/geometric multiplicity card).
Unitary/orthogonal variants: ∆normal-iff-unitarily-diagonalisable, ∆symmetric-iff-orthogonally-diagonalisable.

@State the push-through identity, give a one-line proof, and name its most common application.

push-through-identity

For $A \in \mathbb R^{m \times n}$ and $B \in \mathbb R^{n \times m}$ (with the relevant inverses existing):

\[(I _ m + AB)^{-1} A = A (I _ n + BA)^{-1}.\]

Proof (one line): from $A(I _ n + BA) = A + ABA = (I _ m + AB)A$, left-multiply by $(I _ m + AB)^{-1}$ and right-multiply by $(I _ n + BA)^{-1}$.

Most common application, the ridge-regression dual: with $A = X^\top$, $B = X$ and $\lambda > 0$, a short rescaling gives

\[(X^\top X + \lambda I _ D)^{-1} X^\top = X^\top (X X^\top + \lambda I _ N)^{-1}.\]

For data $X \in \mathbb R^{N \times D}$ the LHS solves a $D \times D$ system and the RHS an $N \times N$ one, so you pick whichever dimension is smaller. The RHS also exposes the inner-product matrix $X X^\top$, which opens the door for kernel methods (∆muffin-worry-push).

Source: standard NLA identity; ridge instance used in Notes - Machine Learning MT23, Linear regression^U.

@Define the Wirtinger derivatives and explain why they are the right tool for optimising a real-valued function of a complex variable.

wirtinger-derivative

Problem: we want to minimise $J(W)$ where $J : \mathbb C \to \mathbb R$, for example $J(W) = \vert 1 - aW \vert ^2$. The ordinary complex derivative $\partial J / \partial W$ does not exist because a non-constant real-valued function of a complex variable cannot satisfy the Cauchy-Riemann equations.

Workaround (Wirtinger): write $W = u + iv$ with $u, v \in \mathbb R$, then formally treat $W = u + iv$ and $W^\ast = u - iv$ as independent variables. Define

\[\frac{\partial}{\partial W} = \frac{1}{2}\left( \frac{\partial}{\partial u} - i \frac{\partial}{\partial v} \right), \qquad \frac{\partial}{\partial W^\ast} = \frac{1}{2}\left( \frac{\partial}{\partial u} + i \frac{\partial}{\partial v} \right).\]

Under this convention the usual chain and product rules apply, with $W$ and $W^\ast$ behaving as independent symbols: $\partial W / \partial W = 1$, $\partial W^\ast / \partial W = 0$, $\partial W / \partial W^\ast = 0$, $\partial W^\ast / \partial W^\ast = 1$.

Why it works for optimisation: for real-valued $J$, a critical point in $\mathbb C \cong \mathbb R^2$ requires $\partial J / \partial u = \partial J / \partial v = 0$. These two real conditions combine into the single complex condition

\[\frac{\partial J}{\partial W^\ast} = 0.\]

The conjugate condition $\partial J / \partial W = 0$ is equivalent and gives the same critical points. The convention is to use $W^\ast$, matching the fact that $\partial J / \partial W^\ast$ points in the direction of steepest ascent of $J$ in the complex plane.

Practical rule: to differentiate $J(W, W^\ast)$ with respect to $W^\ast$, treat $W$ as constant. The two examples used in the Wiener-filter derivation (∆wiener-derivation):

\[\frac{\partial}{\partial W^\ast} \vert W \vert ^2 = \frac{\partial}{\partial W^\ast}(W W^\ast) = W, \qquad \frac{\partial}{\partial W^\ast} \vert 1 - aW \vert ^2 = \frac{\partial}{\partial W^\ast}((1 - aW)(1 - a^\ast W^\ast)) = -a^\ast(1 - aW).\]

Source: standard signal-processing tool (Wirtinger 1927). Not in the NLA lecture notes but referenced for Wiener filtering and complex-valued optimisation.

Bite-sized

Transpose of a product: $(AB)^\top = $ $B^\top A^\top$. Inverse of a product (when both are invertible): $(AB)^{-1} = $ $B^{-1} A^{-1}$.

Source: Lecture 1, Some useful results slide and §1.7 of the lecture notes.

@bite~

bite-transpose-inverse-of-product

For square matrices, $AB = I \implies $ $BA = I$. Non-trivial: square invertibility is two-sided, but only because the matrices are square (and finite-dimensional).

Source: Lecture 1, Some useful results slide and §1.7 of the lecture notes.

@bite~

bite-square-one-sided-inverse

Trace identities: $\mathrm{Trace}(XY) = $ $\mathrm{Trace}(YX)$ whenever both products are square, and for any $B \in \mathbb R^{m \times n}$, $\ \vert B\ \vert _ F^2 = $ $\mathrm{Trace}(B^\top B)$.

Source: Lecture 1, Some useful results slide and §1.7 of the lecture notes.

@bite~

bite-trace-cyclic-and-frobenius

For a diagonalisable $A = X \Lambda X^{-1}$ and a polynomial $p$ (e.g. $p(z) = 1 - \alpha z$, so $p(A) = I - \alpha A$), what bounds $\|p(A)\| _ 2$ in terms of the eigenvalues, and when is it an equality?

bite-spectral-norm-of-polynomial-via-eigenvalues

Since $p(A) = X\, p(\Lambda)\, X^{-1}$ with $p(\Lambda) = \text{diag}(p(\lambda _ 1), \ldots, p(\lambda _ n))$, submultiplicativity gives

\[\|p(A)\| _ 2 \le \|X\| _ 2\, \|X^{-1}\| _ 2\, \|p(\Lambda)\| _ 2 = \kappa _ 2(X)\, \max _ i \vert p(\lambda _ i) \vert .\]

In particular $\|I - \alpha A\| _ 2 \le \kappa _ 2(X)\, \max _ i \vert 1 - \alpha\lambda _ i \vert $.

Equality (the $\kappa _ 2(X)$ factor becomes $1$) when $A$ is normal — in particular symmetric or Hermitian — since then $A = U \Lambda U^\ast$ with $U$ unitary and $\kappa _ 2(U) = 1$:

\[\|p(A)\| _ 2 = \max _ i \vert p(\lambda _ i) \vert , \qquad \|I - \alpha A\| _ 2 = \max _ i \vert 1 - \alpha\lambda _ i \vert .\]

This is the standard move reducing a matrix contraction factor to a scalar “polynomial small on the spectrum” problem (stationary iterations, GMRES, CG).

Source: Lecture 12, §12.1 of the lecture notes (GMRES convergence, $\|p(A)\| _ 2 \le \kappa _ 2(X)\max _ {z \in \lambda(A)} \vert p(z) \vert $) and Lecture 13, §13.3 (symmetric case gives equality).

@bite~

Angle-addition identities: $\cos(A \pm B) = $ $\cos A \cos B \mp \sin A \sin B$, and $\sin(A \pm B) = $ $\sin A \cos B \pm \cos A \sin B$.

Source: Standard trigonometric identities (general background, not in the NLA lecture notes); flagged in the Numerical Linear Algebra 2024 attempt, Q2(b) Jacobi rotation.

@bite~ @exam~

bite-trig-angle-addition

Double-angle identities: $\sin 2\theta = $ $2 \sin\theta \cos\theta$, and $\cos 2\theta = \cos^2\theta - \sin^2\theta = $ $1 - 2\sin^2\theta = 2\cos^2\theta - 1$; together with $\sin^2\theta + \cos^2\theta = 1$.

Source: Standard trigonometric identities (general background, not in the NLA lecture notes); flagged in the Numerical Linear Algebra 2024 attempt, Q2(b) — the Jacobi off-diagonal $\cos\theta\sin\theta\,(a _ {ii}-a _ {jj}) + (\cos^2\theta - \sin^2\theta)\,a _ {ij}$ collapses to $\tfrac12\sin 2\theta\,(a _ {ii}-a _ {jj}) + \cos 2\theta\,a _ {ij}$.

@bite~ @exam~

bite-trig-double-angle

Two rearrangements worth knowing cold: $\cos\theta \sin\theta = $ $\tfrac12 \sin 2\theta$ (the form that zeroes the Jacobi off-diagonal), and the sum-to-product identity $\cos A + \cos B = $ $2 \cos\tfrac{A+B}{2} \cos\tfrac{A-B}{2}$ (which underlies the Chebyshev three-term recurrence).

Source: Standard trigonometric identities (general background, not in the NLA lecture notes); flagged in the Numerical Linear Algebra 2024 attempt, Q2(b); the sum-to-product form also drives the Chebyshev recurrence (cf. Exam Numerical Linear Algebra 2020, Q2(a)).

@bite~ @exam~

bite-trig-product-and-sum-to-product

Flashcards

Bite-sized

Related posts