NLA MT25, Cholesky factorisation


Flashcards

Suppose $A \in \mathbb R^{n \times n}$. @State necessary and sufficient conditions for it to have a Cholesky factorisation $A = R^\top R$ for $R \in \mathbb R^{n \times n}$?

$A$ has a Cholesky factorisation iff $A$ is symmetric positive semidefinite.

@exam~

@Define the Cholesky factorisation of a symmetric positive semidefinite matrix $A$.

\[A = R^\top R\]

where $R \in \mathbb R^{n \times n}$ and upper triangular.

@exam~

Suppose we are finding the pivoted LU decomposition for a symmetric positive definite matrix $A \succ 0$.

What simplifications can be made as you find the $L _ i$ and $U _ i$ factors, and what is the resulting factorisation called?

Recall the rank-1 step from ∆pivoted-lu-decomposition-algorithm, where we write the active block as

\[A^{(k)} = \begin{pmatrix} \alpha & u^\top \\ v & B \end{pmatrix}\]

and extract

\[L _ k = \begin{pmatrix} 0 _ {k-1} \\ 1 \\ v/\alpha \end{pmatrix} \in \mathbb R^n, \qquad U _ k = \begin{pmatrix} 0 _ {k-1}^\top & \alpha & u^\top \end{pmatrix} \in \mathbb R^n.\]

When $A$ is positive definite, three things simplify.

Since $A^{(k)}$ remains symmetric, we have $u = v$ and so we may rewrite both $L _ k$ and $R _ k$ in terms of

\[R _ k := \sqrt \alpha \begin{pmatrix} 0 _ {k-1}^\top & 1 & v^\top / \alpha \end{pmatrix}, \qquad L _ k = R _ k^\top\]

to give $L _ k U _ k = R _ k^\top R _ k$. This is slightly different from the standard $LU$ decomposition as $L$ no longer has $1$s on its diagonal.

The active block $A^{(k)}$ stays symmetric positive definite, so the $(1, 1)$ entry $\alpha$ remains strictly positive and hence no pivots are required.

The resulting factorisation is the Cholesky factorisation, since $A = R^\top R$, with $R$ upper triangular and with a positive diagonal.

@Prove that if we are doing the pivoted LU decomposition for a symmetric positive definite matrix $A \succ 0$ then no pivots are needed.

Pivoting (a row swap) is only ever needed to avoid a zero leading entry of the current active block. So “no pivots are needed for $A \succ 0$” is exactly the claim that every successive pivot is strictly positive, which down the elimination reduces, by induction, to one fact.

Lemma (the Schur complement of an SPD matrix is SPD). Partition

\[A = \begin{bmatrix} \alpha & v^\top \\ v & B \end{bmatrix} \succ 0, \qquad \alpha = A _ {11}.\]

One elimination step replaces the trailing block by the Schur complement $S := B - \tfrac{1}{\alpha} v v^\top$. Then $\alpha > 0$ and $S \succ 0$.

Proof of the lemma (complete the square). First $\alpha = e _ 1^\top A e _ 1 > 0$ since $A \succ 0$, and $S$ is symmetric ($B$ and $v v^\top$ are). For definiteness take any $y \in \mathbb R^{n-1} \setminus \{0\}$ and set $z = \begin{bmatrix} x \\ y \end{bmatrix}$. Then

\[z^\top A z = \alpha x^2 + 2 x\, v^\top y + y^\top B y,\]

a quadratic in $x$ with positive leading coefficient $\alpha$. Minimising over $x$ (at $x = -v^\top y / \alpha$) gives

\[\min _ x z^\top A z = y^\top B y - \frac{(v^\top y)^2}{\alpha} = y^\top \Big( B - \tfrac{1}{\alpha} v v^\top \Big) y = y^\top S y.\]

Since $A \succ 0$ and $z \ne 0$ (because $y \ne 0$), the left-hand side is $> 0$, so $y^\top S y > 0$ for every $y \ne 0$. Hence $S \succ 0$.

Conclusion. Let $A^{(1)} = A$. Inductively each $A^{(k)} \succ 0$, so its pivot $A^{(k)} _ {11} > 0$ (no swap needed) and the lemma gives the next active block $A^{(k+1)} = S^{(k)} \succ 0$. Every pivot is strictly positive, so pivoting is never invoked. These positive pivots are exactly the squared diagonal entries of the Cholesky factor $R$ in $A = R^\top R$.

The lecturer’s route (existence of Cholesky via eigendecomposition + QR, §6.7). The lecture proves “no pivots” indirectly: it first constructs a Cholesky factorisation, then reads the no-pivot property off the block structure of the elimination. Since $A \succ 0$ is symmetric, the spectral theorem (∆symmetric-eigenvalue-decomposition) gives an orthogonal $V$ and $\Lambda = \text{diag}(\lambda _ 1, \ldots, \lambda _ n)$ with every $\lambda _ i > 0$. Put $D := \text{diag}(\sqrt{\lambda _ 1}, \ldots, \sqrt{\lambda _ n}) \succ 0$, $M := V D V^\top$ (the symmetric PD square root $A^{1/2}$, so $M^\top = M$), and QR-factorise $M = QR$ ($Q$ orthogonal, $R$ upper triangular). Then:

\[\begin{aligned} A &= V \Lambda V^\top = V D^2 V^\top && (\star 1) \\ &= (V D V^\top)(V D V^\top) = M^\top M && (\star 2) \\ &= (QR)^\top (QR) = R^\top Q^\top Q R && (\star 3) \\ &= R^\top R && (\star 4) \end{aligned}\]

with $R$ upper triangular — so a Cholesky factorisation exists. Match this against the rank-1 elimination view (∆rank-1-view-of-lu): writing $R = \begin{bmatrix} \rho & s^\top \\ 0 & R _ 2 \end{bmatrix}$ and $r := \begin{bmatrix} \rho \\ s \end{bmatrix}$ (the first column of $R^\top$, i.e. $r^\top$ is the first row of $R$), the first elimination step peels off exactly the rank-1 term $r r^\top$, leaving the trailing active block

\[A^{(2)} = R _ 2^\top R _ 2\]

a Gram matrix — symmetric PSD, and PD because $A \succ 0$ forces $R$ nonsingular with a strictly positive diagonal (uniqueness of Cholesky for $A \succ 0$, ∆bite-cholesky-uniqueness). Its leading entry is $A^{(2)} _ {11} = \|R _ 2(:,1)\| _ 2^2 > 0$. Repeating, every active block is a Gram matrix $R _ j^\top R _ j \succ 0$ with strictly positive leading entry, so a zero pivot never arises and partial pivoting is never triggered — the same conclusion as the Schur-complement argument, reached from the QR-based existence proof.

The steps:

  • $(\star 1)$ Spectral theorem. $A \succ 0$ is symmetric, so $A = V \Lambda V^\top$ with $V$ orthogonal; positive-definiteness makes every $\lambda _ i > 0$, hence $\Lambda = D^2$ with real $D = \text{diag}(\sqrt{\lambda _ i}) \succ 0$.
  • $(\star 2)$ Split into the square root. $V D^2 V^\top = (V D V^\top)(V D V^\top)$ since $V^\top V = I$; and $M := V D V^\top$ is symmetric ($M^\top = V D^\top V^\top = M$), so this is $M^\top M$.
  • $(\star 3)$ QR-factorise $M = QR$.
  • $(\star 4)$ Orthogonality collapses $Q$. $Q^\top Q = I$, leaving $R^\top R$ with $R$ upper triangular — a Cholesky factor.

Source: Lecture 6, §6.7 of the lecture notes (the lecturer’s existence-of-Cholesky-via-QR route, used to justify the PSD trailing block of §5.4); the direct Schur-complement argument above is the standard motivation. Cf. §5.3 (Cholesky) and §7.5.3 (backward stability).

@Define the $LDL^\top$ factorisation of a symmetric matrix $A$.

\[A = LDL^\top\]

where $L$ is lower triangular with $1$s on the diagonal and $D$ is (block) diagonal. It is a symmetric, square-root-free variant of Cholesky.

Domain: any symmetric $A = A^\top$ (Hermitian $A = A^\ast$ in the complex case) — not just $A \succ 0$. In fact $LDL^\top$ is the tool for the indefinite case, where Cholesky $A = R^\top R$ does not apply.

What $D$ looks like:

  • If $A \succ 0$ (or, more generally, all leading principal minors of $A$ are nonzero), $D$ is strictly diagonal with positive entries — this is just Cholesky with the diagonal pulled out, $R = D^{1/2} L^\top$.
  • For a real symmetric indefinite $A$, a strictly diagonal $D$ need not exist (e.g. $\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$). The general factorisation (with a symmetric row/column permutation) has $D$ block diagonal with $1 \times 1$ and $2 \times 2$ blocks; the $2 \times 2$ blocks absorb the indefinite obstruction. See ∆bite-ldlt-real-indefinite-blocks.

Source: Lecture 5, Cholesky factorisation for $A \succ 0$ slide (indefinite-case bullet) and §5.3 of the lecture notes; the $\mathbb R$-vs-$\mathbb C$ caveat is flagged in the §1.1 footnote of the notes. (The block-diagonal-$D$ form is the Bunch–Kaufman/Bunch–Parlett factorisation — the lecture states the $2 \times 2$-block fact but does not give it that name.)

Stability analysis

@Justify that the Cholesky factorisation can be computed in a backwards stable way.

Setup: to solve $Ax = b$, factor $A = R^\top R$ then solve $R^\top y = b$, $R x = y$. Triangular solves are always backward stable (§7.5.1), so the whole solve is backward stable provided the computed factor reproduces $A$ to working precision, $\hat R^\top \hat R = A + \Delta A$ with $\|\Delta A\| = O(\epsilon \|A\|)$. For LU (§7.5.2) this is exactly the condition $\|L\|\,\|U\| = O(\|A\|)$, which can blow up (growth) and is only average-case stable even with pivoting. For Cholesky, two facts make it hold unconditionally.

Fact 1 — no pivoting (∆cholesky-no-pivots-needed-proof): the active trailing block stays symmetric positive definite at every step, so its leading entry is strictly positive and no row swap is ever needed. There is no pivot-growth pathology.

Fact 2 — no element growth: every entry of $R$ satisfies $ \vert R _ {ij} \vert \le \sqrt{\|A\| _ 2}$. Proof: from $A = R^\top R$ the $j$-th diagonal entry is

\[A _ {jj} = \sum _ {i=1}^{j} R _ {ij}^2 \ge R _ {ij}^2 \quad (i \le j),\]

so $ \vert R _ {ij} \vert \le \sqrt{A _ {jj}}$; and for $A \succ 0$, $A _ {jj} = e _ j^\top A e _ j \le \lambda _ {\max}(A) = \|A\| _ 2$. Hence $\|R\|^2 = O(\|A\|)$ — the Cholesky analogue of the $\|L\|\,\|U\| = O(\|A\|)$ condition that fails for LU.

Conclusion: with no pivoting and no growth, the factorisation’s rounding-error analysis gives $\hat R^\top \hat R = A + \Delta A$, $\|\Delta A\| = O(\epsilon \|A\|)$. Composing with the backward-stable triangular solves, $Ax = b$ for any $A \succ 0$ is always solved backward stably — no caveats, unlike LU.

Caveat: the detailed componentwise rounding analysis behind $\|\Delta A\| = O(\epsilon \|A\|)$ is nonexaminable (lecture cites Trefethen-Bau / Higham, as for the triangular-solve and LU analyses).

Source: Lecture 7, Stability of Cholesky for $A \succ 0$ slide and §7.5.3 of the lecture notes. The entry-bound derivation in Fact 2 is the standard one-line justification of the bound the notes assert; the notes explicitly flag the full rounding analysis as nonexaminable.

Worked examples

Compute the Cholesky factorisation $A = R^\top R$ ($R$ upper triangular) of the SPD matrix $A = \begin{bmatrix} 4 & 2 \\ 2 & 5 \end{bmatrix}$ by hand.

$A$ is SPD (leading minors $4 > 0$, $\det = 16 > 0$). Write $R = \begin{bmatrix} r _ {11} & r _ {12} \\ 0 & r _ {22} \end{bmatrix}$ and match $R^\top R = A$ entrywise:

  • $(1,1)$: $r _ {11}^2 = 4 \Rightarrow r _ {11} = 2$.
  • $(1,2)$: $r _ {11} r _ {12} = 2 \Rightarrow r _ {12} = 1$.
  • $(2,2)$: $r _ {12}^2 + r _ {22}^2 = 5 \Rightarrow r _ {22} = 2$.
\[R = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}, \qquad R^\top R = \begin{bmatrix} 2 & 0 \\ 1 & 2 \end{bmatrix}\begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix} = \begin{bmatrix} 4 & 2 \\ 2 & 5 \end{bmatrix} = A \;\checkmark\]

Alternative via the symmetric rank-1 LU step (∆lu-decomposition-gives-cholesky-when-symmetric-positive-definite). At each step, partition the active block as $\begin{pmatrix} \alpha & v^\top \\ v & B \end{pmatrix}$ and read off $R _ k = \sqrt\alpha \begin{pmatrix} 0 _ {k-1}^\top & 1 & v^\top/\alpha \end{pmatrix}$. The next active block is the Schur complement $B - v v^\top / \alpha$.

Step 1 ($k = 1$). $A^{(1)} = A$, so $\alpha = 4$, $v = (2)$, $B = (5)$. Then $R _ 1 = 2 \begin{pmatrix} 1 & 1/2 \end{pmatrix} = \begin{pmatrix} 2 & 1 \end{pmatrix}$, and the Schur complement is $A^{(2)} = 5 - 2 \cdot 2 / 4 = 4$.

Step 2 ($k = 2$). $A^{(2)} = (4)$, so $\alpha = 4$ with $v$ empty. Then $R _ 2 = \begin{pmatrix} 0 & 2 \end{pmatrix}$.

Assembling rows: $R = \begin{bmatrix} R _ 1 \\ R _ 2 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}$ as before.

Both pivots ($r _ {11}^2 = 4$, $r _ {22}^2 = 4$) are strictly positive, so no pivoting is ever needed (∆cholesky-no-pivots-needed-proof).

Source: Lecture 5, §5.3 of the lecture notes (Cholesky for $A \succ 0$).

@example~

Compute the $LDL^\top$ factorisation of (a) the SPD matrix $A = \begin{bmatrix} 4 & 2 \\ 2 & 5 \end{bmatrix}$ and (b) the symmetric indefinite matrix $B = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}$, by hand.

Write $L = \begin{bmatrix} 1 & 0 \\ \ell & 1 \end{bmatrix}$, $D = \mathrm{diag}(d _ 1, d _ 2)$, so $L D L^\top = \begin{bmatrix} d _ 1 & d _ 1 \ell \\ d _ 1 \ell & d _ 1 \ell^2 + d _ 2 \end{bmatrix}$.

(a) SPD $A$: $d _ 1 = 4$; $d _ 1\ell = 2 \Rightarrow \ell = \tfrac12$; $d _ 1\ell^2 + d _ 2 = 5 \Rightarrow d _ 2 = 4$. So $L = \begin{bmatrix} 1 & 0 \\ 0.5 & 1 \end{bmatrix}$, $D = \mathrm{diag}(4, 4)$ — both entries positive. Pulling out square roots recovers the Cholesky factor: $R = D^{1/2} L^\top = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}$ (matches ∆cholesky-worked-example).

(b) Indefinite $B$: eigenvalues $3, -1$, so $B \not\succ 0$ and Cholesky fails — but the leading principal minors ($1$, $\det = -3$) are nonzero, so a strictly-diagonal $LDL^\top$ still exists: $d _ 1 = 1$; $\ell = 2$; $1\cdot 4 + d _ 2 = 1 \Rightarrow d _ 2 = -3$. So $L = \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix}$, $D = \mathrm{diag}(1, -3)$. The sign pattern of $D$ ($+, -$) is the inertia of $B$, and no square roots appear. (If a leading minor vanishes, e.g. $\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$, a strictly-diagonal $D$ does not exist and a $2 \times 2$ block is needed — see ∆ldlt-factorisation.)


Alternative via the symmetric rank-1 LU step (∆pivoted-lu-decomposition-algorithm with $u = v$). At each step, partition the active block as $\begin{pmatrix} \alpha & v^\top \\ v & B \end{pmatrix}$ and read off $d _ k = \alpha$ and $L _ k = \begin{pmatrix} 0 _ {k-1} \\ 1 \\ v/\alpha \end{pmatrix}$ (the $k$th column of $L$). The next active block is the Schur complement $B - v v^\top / \alpha$. Unlike Cholesky, no square root is taken, so $d _ k$ may have either sign.

(a) $A$ revisited. Step 1: $\alpha = 4$, $v = (2)$, $B = (5)$, giving $d _ 1 = 4$, $L _ 1 = (1, \tfrac12)^\top$, and Schur complement $A^{(2)} = 5 - 4/4 = 4$. Step 2: $\alpha = 4$, so $d _ 2 = 4$ and $L _ 2 = (0, 1)^\top$. Assemble: $L = \begin{bmatrix} 1 & 0 \\ 1/2 & 1 \end{bmatrix}$, $D = \mathrm{diag}(4, 4)$.

(b) $B$ revisited. Step 1: $\alpha = 1$, $v = (2)$, trailing $= (1)$, giving $d _ 1 = 1$, $L _ 1 = (1, 2)^\top$, and Schur complement $B^{(2)} = 1 - 2 \cdot 2 / 1 = -3$. Step 2: $\alpha = -3$, so $d _ 2 = -3$ and $L _ 2 = (0, 1)^\top$. Assemble: $L = \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix}$, $D = \mathrm{diag}(1, -3)$. The negative pivot at step 2 surfaces the $(+, -)$ inertia of $B$, exactly where Cholesky would fail (no real $\sqrt{-3}$) but $LDL^\top$ proceeds untroubled.

Source: Lecture 5, §5.3 of the lecture notes (Cholesky / $LDL^\top$, indefinite-case bullet).

@example~

Bite-sized

The Cholesky factorisation $A = R^\top R$ of $A \succ 0 \in \mathbb R^{n \times n}$ costs $\tfrac{1}{3} n^3$ flops, which is half that of LU on the same matrix.

Source: Lecture 5, Cholesky factorisation for $A \succ 0$ slide and §5.3 of the lecture notes.

@bite~ @exam~

For Cholesky $A = R^\top R$ ($A \succ 0$), the diagonal entries of $R$ are positive, in contrast with LU where the diagonal entries of $L$ are all $1$s.

Source: Lecture 5, Cholesky factorisation for $A \succ 0$ slide and §5.3 of the lecture notes.

@bite~

Why does Cholesky factorisation of $A \succ 0$ never require a pivot?

Because the trailing active block at every step is itself symmetric positive definite, so its leading entry $\alpha$ remains strictly positive — no zero or near-zero pivot ever needs to be avoided.

Source: Lecture 7, Stability of Cholesky for $A \succ 0$ slide and §7.5.3 of the lecture notes.

@bite~

A key reason Cholesky $A = R^\top R$ is backward stable: the entries of the computed $R$ satisfy $ \vert R _ {ij} \vert \le \sqrt{\ \vert A\ \vert }$ for all $i, j$, so there is no entry growth.

Source: Lecture 7, Stability of Cholesky for $A \succ 0$ slide and §7.5.3 of the lecture notes.

@bite~

Is the Cholesky factorisation $A = R^\top R$ of $A \succ 0$ backward stable?

Yes — unconditionally. The active block stays positive definite (no pivoting needed) and $ \vert R _ {ij} \vert \le \sqrt{\|A\|}$ (no entry growth). So $Ax = b$ for $A \succ 0$ is always solved backward stably via Cholesky.

Source: Lecture 7, Stability of Cholesky for $A \succ 0$ slide and §7.5.3 of the lecture notes.

@bite~

Under what condition on $R$ is the Cholesky factorisation $A = R^\top R$ of a symmetric positive definite $A$ unique?

The factorisation is unique if we impose that $R$ has positive diagonal entries.

Source: Problem Sheet 2, Question 4(b).

@bite~

When $A = A^\top \in \mathbb R^{n \times n}$ is symmetric indefinite (not positive semidefinite), there is still an $LDL^\top$ factorisation. In the real case the diagonal $D$ may have $2 \times 2$ blocks rather than being strictly diagonal.

Source: Lecture 5, Cholesky factorisation for $A \succ 0$ slide (indefinite-case bullet) and §5.3 of the lecture notes.

@bite~

Strategy for proving Cholesky $A = R^\top R$ of $A \succ 0$ requires no pivots (∆cholesky-no-pivots-needed-proof).

  • A pivot (row swap) is only ever needed to avoid a zero leading entry of the current active block. So “no pivots for $A \succ 0$” is exactly the claim that every successive pivot is strictly positive — and by induction that reduces to one fact about a single elimination step.
  • State the lemma: one elimination step replaces the trailing block by its Schur complement; for an SPD matrix the leading entry stays strictly positive and the Schur complement is again SPD.
  • Prove the lemma by completing the square: the quadratic form is a quadratic in the eliminated variable with positive leading coefficient; minimising over that variable leaves exactly the Schur-complement quadratic form, which is therefore positive.
  • Induct down the elimination: every active block stays SPD, so every pivot is strictly positive and no swap is invoked. Those positive pivots are the squared diagonal entries of the Cholesky factor $R$.

Source: Lecture 7, Stability of Cholesky for $A \succ 0$ slide and §7.5.3 of the lecture notes.

@bite~ @proofsupport~