NLA MT25, Numerical stability


Flashcards

@Define $\text{fl}(X)$ in the context of numerical stability.

The computed, floating-point version of $X$.

@State bounds on $\text{fl}(x + y)$ and $\text{fl}(xy)$ where $x, y$ are scalars.

  • $\text{fl}(x + y) = x + y + c\epsilon$, where $ \vert c \vert \le \max( \vert x \vert , \vert y \vert )$,
  • $\text{fl}(xy) = xy + c\epsilon$ where $ \vert c \vert \le \vert xy \vert $.

@Define the notation $\Delta A$ or $\Delta x$ in the context of numerical stability.

These are perturbations that are small relative to the original quantity in the sense that $\|\Delta A\| = O(\epsilon \|A\|)$ and $\|\Delta x\| = O(\epsilon \|x\|)$.

Suppose we have a problem where we have $x$ and want to find $y = f(x)$. @Define the absolute conditioning $\kappa$ and the relative conditioning $\kappa _ r$ of the problem.

\[\kappa := \lim _ {\epsilon \to 0^+} \sup _ { \vert \vert \delta x \vert \vert < \epsilon} \frac{ \vert \vert f(x + \delta x) - f(x) \vert \vert }{ \vert \vert \delta x \vert \vert }\] \[\kappa _ r := \lim _ {\epsilon \to 0^+} \sup _ { \vert \vert \delta x \vert \vert < \epsilon} \frac{\frac{ \vert \vert f(x + \delta x) - f(x) \vert \vert }{ \vert \vert f(x) \vert \vert }}{\frac{ \vert \vert \delta x \vert \vert }{ \vert \vert x \vert \vert }}\]

Suppose we have a problem where we have $x$ and want to find $y = f(x)$.

@Define what it means for an algorithm computing $\hat y$ to be backwards stable, and explain the motivation.

An algorithm is backward stable if its output $\hat y$ can be written as

\[\hat y = f(x + \Delta x), \qquad \frac{\|\Delta x\|}{\|x\|} = O(\epsilon)\]

where $\epsilon$ is machine precision.

Backward error can be interpreted as saying that the computed output is the exact output of a slightly perturbed input.

Suppose we have a problem where we have $x$ and want to find $y = f(x)$.

@Define what it means for an algorithm computing $\hat y$ to be forward stable, and explain how this relates to backward stability.

An algorithm is forward stable if its computed output $\hat y$ satisfies

\[\frac{ \vert \vert y - \hat y \vert \vert }{ \vert \vert y \vert \vert } = O(\kappa(f) \epsilon)\]

where $\kappa(f)$ is the relative condition number of $f$ at $x$ (see ∆conditioning-absolute-and-relative) and $\epsilon$ is machine precision.

Forward error can be interpreted as saying that the error is as small as a backward stable algorithm would achieve: a backward-stable algorithm produces $\hat y = f(x + \Delta x)$ with $\|\Delta x\| / \|x\| = O(\epsilon)$. By the definition of conditioning,

\[\begin{aligned} \frac{\|y - \hat y\|}{\|y\|} &= \frac{\|f(x) - f(x + \Delta x)\|}{\|f(x)\|} \\ &\le \kappa(f) \cdot \frac{\|\Delta x\|}{\|x\|} + O(\epsilon^2) \\ &= O(\kappa(f)\epsilon) \end{aligned}\]

so that backward stability implies forward stability (see ∆backward-stable-implies-forward-stable-justification).

An algorithm can be forward stable without being backward stable, i.e. it may produce $\hat y$ close to $y$ without $\hat y$ being the exact output of $f$ on a nearby input.

@Define the condition number of a matrix $A$.

\[\kappa _ 2(A) = \frac{\sigma _ \max(A)}{\sigma _ \min(A)}\]

@Prove that for an invertible $A \in \mathbb R^{n \times n}$,

\[\|A\| _ 2 = \sigma _ {\max}(A), \qquad \|A^{-1}\| _ 2 = \frac{1}{\sigma _ {\min}(A)}, \qquad \|A\| _ 2^{-1} = \frac{1}{\sigma _ {\max}(A)} = \sigma _ {\min}(A^{-1}),\]

assembled into the chain

\[\|A\| _ 2^{-1} = \sigma _ {\max}(A)^{-1} \;\le\; \sigma _ {\min}(A)^{-1} = \sigma _ {\max}(A^{-1}) = \|A^{-1}\| _ 2,\]

so in particular $\kappa _ 2(A) = \|A\| _ 2\, \|A^{-1}\| _ 2 = \dfrac{\sigma _ {\max}(A)}{\sigma _ {\min}(A)}$ (∆matrix-condition-number-definition).

Take the SVD $A = U \Sigma V^\top$ with $\Sigma = \text{diag}(\sigma _ 1, \ldots, \sigma _ n)$, $\sigma _ 1 \ge \cdots \ge \sigma _ n > 0$.

  • $\|A\| _ 2 = \max _ {\|x\| _ 2 = 1}\|Ax\| _ 2 = \sigma _ 1 = \sigma _ {\max}(A)$ (the spectral norm is the largest singular value).
  • $A^{-1} = V \Sigma^{-1} U^\top$ is an SVD of $A^{-1}$: its singular values are $\{1/\sigma _ i\}$, which in decreasing order are $\tfrac{1}{\sigma _ n} \ge \cdots \ge \tfrac{1}{\sigma _ 1}$. Hence $\sigma _ {\max}(A^{-1}) = 1/\sigma _ n = 1/\sigma _ {\min}(A)$ and $\sigma _ {\min}(A^{-1}) = 1/\sigma _ 1 = 1/\sigma _ {\max}(A)$.
  • Applying the first bullet to $A^{-1}$: $\|A^{-1}\| _ 2 = \sigma _ {\max}(A^{-1}) = 1/\sigma _ {\min}(A)$, and likewise $\|A\| _ 2^{-1} = 1/\sigma _ {\max}(A) = \sigma _ {\min}(A^{-1})$.
  • Since $\sigma _ {\min}(A) \le \sigma _ {\max}(A)$, taking reciprocals reverses the inequality, giving $\|A\| _ 2^{-1} \le \|A^{-1}\| _ 2$ (equality iff all $\sigma _ i$ are equal, i.e. $A$ is a scalar multiple of an orthogonal matrix).

Source: Lecture 3, §3 of the lecture notes (Proposition 3.1, $\|A\| _ 2 = \sigma _ {\max}$) and Lecture 7, §7.4 (matrix condition number).

@State a theorem bounding the relative forward error of a backward stable solution to $Ax = b$ in terms of the condition number of $A$.

Suppose we have a backward stable solution for $Ax = b$ such that:

  • $(A + \Delta A)\hat x = b$
  • $ \vert \vert \Delta A \vert \vert \le \epsilon \vert \vert A \vert \vert $
  • $\kappa _ 2(A) \ll \epsilon^{-1}$

Then $ \vert \vert A^{-1} \Delta A \vert \vert \ll 1$ and we have

\[\frac{ \vert \vert \hat x - x \vert \vert }{ \vert \vert x \vert \vert } \le \epsilon \kappa _ 2(A) + O(\epsilon^2)\]

In other words, if you solve $Ax = b$ with a backward stable algorithm, then the forward error is bounded by $\epsilon \kappa _ 2(A)$ up to $O(\epsilon^2)$.

@exam~

@Prove that if we have a backward stable solution for $Ax = b$ such that:

  • $(A + \Delta A)\hat x = b$
  • $ \vert \vert \Delta A \vert \vert \le \epsilon \vert \vert A \vert \vert $
  • $\kappa _ 2(A) \ll \epsilon^{-1}$

Then $ \vert \vert A^{-1} \Delta A \vert \vert \ll 1$ and we have

\[\frac{ \vert \vert \hat x - x \vert \vert }{ \vert \vert x \vert \vert } \le \epsilon \kappa _ 2(A) + O(\epsilon^2)\]
\[\begin{aligned} \vert \vert A^{-1} \Delta A \vert \vert &\le \vert \vert A^{-1} \vert \vert \cdot \vert \vert \Delta A \vert \vert \\ &\le \vert \vert A^{-1} \vert \vert \cdot \epsilon \vert \vert A \vert \vert \\ &= \epsilon \vert \vert A^{-1} \vert \vert \cdot \vert \vert A \vert \vert \\ &= \epsilon \frac{\sigma _ \max(A)}{\sigma _ \min(A)} \\ &\ll \epsilon \epsilon^{-1} \\ &= 1 \end{aligned}\]

Then by Neumann series, we obtain

\[\begin{aligned} (A + \Delta A)^{-1} &= (A(I + A^{-1} \Delta A))^{-1} \\ &= (I - A^{-1} \Delta A + O( \vert \vert A^{-1} \Delta A \vert \vert ^2)) A^{-1} \\ \end{aligned}\]

Therefore since

\[\begin{aligned} \hat x &= (A + \Delta A)^{-1}b \\ &= A^{-1}b - A^{-1} \Delta A A^{-1} b + O( \vert \vert A^{-1}\Delta A \vert \vert ^2) \\ &= x - A^{-1}\Delta Ax + O( \vert \vert A^{-1} \Delta A \vert \vert ^2) \end{aligned}\]

Hence

\[\begin{aligned} \vert \vert x - \hat x \vert \vert &\lesssim \vert \vert A^{-1} \Delta A x \vert \vert \\ &\le \vert \vert A^{-1} \vert \vert \, \vert \vert \Delta A \vert \vert \, \vert \vert x \vert \vert \\ &\le \epsilon \vert \vert A \vert \vert \, \vert \vert A^{-1} \vert \vert \, \vert \vert x \vert \vert \\ &= \epsilon \kappa _ 2(A) \vert \vert x \vert \vert \end{aligned}\]

as required.

(This effectively justifies why we define the condition number of a matrix like we do).

@exam~

@Justify why if $Y = f(X)$ is computed in a backward stable manner, i.e.

\[\hat Y = f(X + \Delta X), \quad \vert \vert \Delta X \vert \vert = O(\epsilon \|X\|)\]

and we have the absolute conditioning $ \vert \vert f(X) - f(X + \Delta X) \vert \vert \lesssim \kappa \vert \vert \Delta X \vert \vert = \kappa \epsilon$, then

\[ \vert \vert \hat Y - Y \vert \vert \lesssim \kappa \epsilon\]
\[\begin{aligned} \vert \vert \hat Y - Y \vert \vert &= \vert \vert f(X + \Delta X) - f(X) \vert \vert \\ &\lesssim \kappa \vert \vert \Delta X \vert \vert \\ &= \kappa \epsilon \end{aligned}\]

Stability of matrix and vector operations

@State bounds on $\text{fl}(y^\top x)$ and $\text{fl}(Ax)$ where $x, y$ are vectors and $A$ is a matrix, and then state a non-bound.

  • $\text{fl}(y^\top x) = (y + \Delta y)^\top x$ for some $\Delta y$ with $\|\Delta y\| = O(\epsilon \|y\|)$, and as a consequence,
  • $\text{fl}(Ax) = (A + \Delta A)x$, for some $\Delta A$ with $\|\Delta A\| = O(\epsilon \|A\|)$… however,
  • The statement “$\text{fl}(AB) = (A + \Delta A)(B + \Delta B)$ for some $\|\Delta A\| = O(\epsilon \|A\|)$, $\|\Delta B\| = O(\epsilon \|B\|)$” is not necessarily true (i.e. matrix multiplication is not necessarily backwards stable).

@State a bound on the forward error of matrix multiplication.

Suppose:

  • $A$, $B$ are matrices

Then:

\[ \vert \vert \text{fl}(AB) - AB \vert \vert \le \epsilon \vert \vert A \vert \vert \, \vert \vert B \vert \vert \]

and so consequently

\[\frac{ \vert \vert \text{fl}(AB) - AB \vert \vert }{ \vert \vert AB \vert \vert } \le \epsilon \min(\kappa _ 2(A), \kappa _ 2(B))\]

@exam~

@Prove that if $A$, $B$ are matrices then we have a bound on the forward error of matrix multiplication:

\[ \vert \vert \text{fl}(AB) - AB \vert \vert \le \epsilon \, \vert \vert A \vert \vert \, \vert \vert B \vert \vert \]

and so consequently

\[\frac{ \vert \vert \text{fl}(AB) - AB \vert \vert }{ \vert \vert AB \vert \vert } \le \epsilon \min(\kappa _ 2(A), \kappa _ 2(B))\]

You may use ∆backward-error-of-matrix-vector-multiplication, i.e. that

\[\text{fl}(Av) = (A + \Delta A)v\]

for some $\Delta A$ with $\|\Delta A\| = O(\epsilon \|A\|)$.

We compute $AB$ column-by-column. The $j$th column of $\text{fl}(AB)$ is $\text{fl}(Ab _ j)$, where $b _ j$ is the $j$th column of $B$. By ∆backward-error-of-matrix-vector-multiplication,

\[\text{fl}(Ab _ j) = (A + \Delta A^{(j)})b _ j, \quad \|\Delta A^{(j)}\| \le \epsilon \|A\|,\]

where $\Delta A^{(j)}$ depends on $j$.

Subtracting $A b _ j$ and taking norms, we obtain

\[\begin{aligned} \|\text{fl}(AB) _ {:,j} - (AB) _ {:,j}\| _ 2 &= \|\Delta A^{(j)} b _ j \| _ 2 \\ &\le \|\Delta A^{(j)}\| \, \|b _ j\| _ 2 \\ &\le \epsilon \, \|A\| \, \|b _ j\| _ 2. \end{aligned}\]

Summing over $j$, we obtain

\[\begin{aligned} \|\text{fl}(AB) - AB\| _ F^2 &= \sum _ j \|\text{fl}(AB) _ {:, j} - (AB) _ {:,j}\| _ 2^2 \\ &\le \epsilon^2 \|A\|^2 \sum _ j\|b _ j\| _ 2^2 \\ &= \epsilon^2 \|A\|^2 \|B\| _ F^2. \end{aligned}\]

Using $\|M\| _ 2 \le \|M\| _ F \le \sqrt n \|M\| _ 2$ and absorbing $\sqrt n$ into $\epsilon$ using the convention that $\epsilon = O(u)$ suppressing polynomials in dimensions,

\[ \vert \vert \text{fl}(AB) - AB \vert \vert \le \epsilon \, \vert \vert A \vert \vert \, \vert \vert B \vert \vert .\]

For the second part, suppose $A$ is square invertible. Then for any $X$,

\[\|X\| _ 2 = \|A^{-1}A X\| _ 2 \le \|A^{-1}\| _ 2 \, \|AX\| _ 2 = \frac{\|AX\| _ 2}{\sigma _ \min(A)}.\]

Applying this with $X = B$, we obtain

\[\|AB\| _ 2 \ge \sigma _ \min(A) \|B\| _ 2.\]

Substituting this into the bound in the first part, we obtain

\[\begin{aligned} \frac{\|\text{fl}(AB) - AB\| _ 2}{\|AB\| _ 2} &\le \frac{\epsilon \|A\| _ 2 \|B\| _ 2}{\sigma _ \min(A) \|B\| _ 2} \\ &= \epsilon \cdot \frac{\sigma _ \max(A)}{\sigma _ \min(A)} \\ &= \epsilon \, \kappa _ 2(A). \end{aligned}\]

Symmetrically, applying the analogous bound $\|AB\| _ 2 \ge \sigma _ \min(B) \|A\| _ 2$, we obtain

\[\frac{\|\text{fl}(AB) - AB\| _ 2}{\|AB\| _ 2} \le \epsilon \, \kappa _ 2(B).\]

Taking the minimum,

\[\frac{\|\text{fl}(AB) - AB\| _ 2}{\|AB\| _ 2} \le \epsilon \, \min (\kappa _ 2(A), \kappa _ 2(B))\]

as required.

(For non-square $A$ tall full-rank or $B$ fat full-rank, we may replace $\sigma _ \min$ with the smallest non-zero singular value and read $\kappa _ 2$ as the ratio $\sigma _ \max / \sigma^\text{nz} _ \min$.)

@exam~

Given that if $A$, $B$ are matrices, then we have a bound ∆forward-error-matrix-multiplication on the forward error of matrix multiplication:

\[ \vert \vert \text{fl}(AB) - AB \vert \vert \le \epsilon \vert \vert A \vert \vert \, \vert \vert B \vert \vert \]

@Prove that orthogonal multiplication is backward stable.

Suppose $A = Q$ is orthogonal. Then the forward error bound gives

\[ \vert \vert \text{fl}(QB) - QB \vert \vert \le \epsilon \vert \vert B \vert \vert \]

so there exists $E$ with $\|E\| \le \epsilon \|B\|$ such that

\[\text{fl}(QB) = QB + E.\]

Define $\Delta B := Q^\top E$. Since $Q$ is orthogonal, $\|\Delta B\| = \|Q^\top E\| = \|E\| \le \epsilon \|B\|$, and using $QQ^\top = I$:

\[\text{fl}(QB) = QB + E = QB + QQ^\top E = Q(B + Q^\top E) = Q(B + \Delta B).\]

So computing $QB$ in floating point is the exact result of multiplying $Q$ by a perturbed $B$ with backward error $\|\Delta B\| \le \epsilon \|B\|$.

Caveat: squareness is essential here — $QQ^\top = I$ needs $Q$ orthogonal (square), not merely orthonormal. For a tall orthonormal $Q$ ($Q^\top Q = I$ but $QQ^\top \ne I$), the rounding error $E$ generally has a component outside $\mathrm{range}(Q)$ that cannot be folded into a perturbation of $B$ (one would need $E = Q\Delta B$, i.e. $E \in \mathrm{range}(Q)$, which fails since $QQ^\top E \ne E$); there one gets only mixed forward-backward stability, $\text{fl}(QB) = Q(B + \Delta B) + F$ with $\|\Delta B\|, \|F\| = O(\epsilon\|B\|)$. This is why Householder-QR backward stability (∆householder-qr-backward-stable) is argued via the square reflectors $H _ i$, never the thin assembled $Q$.

@Prove the Problem Sheet III, Q3(a) result that for an orthogonal matrix $Q$ and any conformable $A$,

\[\|\text{fl}(QA) - QA\| = \epsilon \|QA\|, \qquad \|\text{fl}(AQ) - AQ\| = \epsilon \|AQ\|,\]

deriving it from the two results already proved in this entry (∆forward-error-matrix-multiplication-proof, ∆orthogonal-multiplication-backward-stable-proof) rather than from scratch.

Setup: $Q$ orthogonal means $\|Q\| _ 2 = 1$, and orthogonal multiplication is norm-preserving: $\|QA\| = \|A\| = \|AQ\|$ (true for the $2$- and Frobenius norms; the choice does not matter under the $\epsilon = O(u)$ convention).

From the forward-error bound: ∆forward-error-matrix-multiplication-proof gives $\|\text{fl}(QA) - QA\| \le \epsilon \|Q\| _ 2 \|A\| = \epsilon \|A\|$. Since $\|A\| = \|QA\|$, this is $\|\text{fl}(QA) - QA\| \le \epsilon \|QA\|$. Identically, $\|\text{fl}(AQ) - AQ\| \le \epsilon \|A\| \|Q\| _ 2 = \epsilon \|AQ\|$.

Equivalently, from the backward-stable form: ∆orthogonal-multiplication-backward-stable-proof gives $\text{fl}(QA) = Q(A + \Delta A)$ with $\|\Delta A\| \le \epsilon \|A\|$, so

\[\text{fl}(QA) - QA = Q\,\Delta A, \qquad \|Q\,\Delta A\| = \|\Delta A\| \le \epsilon \|A\| = \epsilon \|QA\|,\]

using $\|Q\| _ 2 = 1$ for the middle equality. So Q3(a) is just the forward-error reading of the orthogonal-multiplication backward-stability result, which is itself the orthogonal specialisation of the general forward-error bound. (The problem sheet’s intended route — entrywise, directly from vector-vector backward stability — is equally valid; this derivation instead reuses the proved norm bounds.)

Source: Problem Sheet III, Q3(a). The two cited proofs correspond to §7.6 of the lecture notes.

@sheets~

Some algorithms have intermediate matrix-multiplication steps whose forward error is amplified by the conditioning of the inputs. What is a class of algorithms that avoids this amplification, and why?

General matrix multiplication is not backward stable: forming $\text{fl}(AB)$ may differ from $(A + \Delta A)(B + \Delta B)$ by a factor of $\min(\kappa _ 2(A), \kappa _ 2(B))$ in the relative forward error. However, multiplying by an orthogonal matrix is backward stable, since $\|Q\| _ 2 = 1$ and the bound $\|\text{fl}(QB) - QB\| \le \epsilon \|B\|$ is of the same magnitude as $\|QB\|$. So we base algorithms (Householder QR, Schur, etc.) on orthogonal transformations whenever possible — the intermediate steps can’t amplify error. (Distinct from problem conditioning, which is an inherent property of the problem $f$ that backward-stable algorithms still inherit.)

Suppose an algorithm forms its output by right-multiplying a fixed matrix $A$ by a sequence of nonsingular matrices, $Q = A R _ 1 R _ 2 \cdots R _ n$ (for example, modified Gram-Schmidt QR, where each $R _ j$ is upper triangular). @Prove that if each factor is perturbed $R _ j \to R _ j + \delta R _ j$, the relative error in the output is controlled by the condition numbers of the factors.

Single-step bound: consider one factor applied to an arbitrary $C$. Let $D = C R _ j$ and $\tilde D = C(R _ j + \delta R _ j)$. Then $\tilde D - D = C\, \delta R _ j$, so

\[\frac{\|\tilde D - D\|}{\|D\|} = \frac{\|C\, \delta R _ j\|}{\|C R _ j\|}.\]

Bound the numerator by submultiplicativity, $\|C\, \delta R _ j\| \le \|C\|\, \|\delta R _ j\|$. Bound the denominator from below: since $\|C\| = \|(C R _ j) R _ j^{-1}\| \le \|C R _ j\|\, \|R _ j^{-1}\|$, we have $\|C R _ j\| \ge \|C\| / \|R _ j^{-1}\|$. Hence

\[\frac{\|\tilde D - D\|}{\|D\|} \le \frac{\|C\|\, \|\delta R _ j\|}{\|C\| / \|R _ j^{-1}\|} = \|R _ j\|\, \|R _ j^{-1}\| \cdot \frac{\|\delta R _ j\|}{\|R _ j\|} = \kappa(R _ j)\, \frac{\|\delta R _ j\|}{\|R _ j\|},\]

where $\kappa(R _ j) = \|R _ j\|\, \|R _ j^{-1}\|$ is the condition number of $R _ j$.

Repeated application: applying this once for each $j = 1, \ldots, n$ along the product, the perturbed output $\tilde Q = A(R _ 1 + \delta R _ 1)(R _ 2 + \delta R _ 2) \cdots (R _ n + \delta R _ n)$ satisfies

\[\frac{\|\tilde Q - Q\|}{\|Q\|} \;\lesssim\; \prod _ {j=1}^{n} \kappa(R _ j)\, \frac{\|\delta R _ j\|}{\|R _ j\|}.\]

Upshot: each factor’s perturbation is amplified only by its own condition number $\kappa(R _ j)$, so a scheme built from repeated multiplication by well-conditioned matrices (e.g. orthogonal factors, for which $\kappa = 1$) is stable; ill-conditioned factors are what make the process sensitive.

Source: Lecture 7, §7.4 (matrix condition number) and §7.6 (forward error of matrix products, submultiplicativity) of the lecture notes — perturbation propagation through a product of factors.

@exam~

Bite-sized

IEEE double-precision unit roundoff is $u \approx $ $10^{-16}$. Throughout the course, $\epsilon$ denotes any quantity that is $O(u)$ — absorbing low-degree polynomials in the matrix dimensions (so $u$, $nu$, $n^2 u$ all count as $\epsilon$).

Source: Lecture 7, Floating-point arithmetic slide and §7.1 of the lecture notes.

@bite~

Folk theorem of numerical stability: backward-stable algorithm + well-conditioned problem ⇒ accurate solution (forward error $\lesssim \kappa \epsilon$). Without well-conditioning, even a backward-stable algorithm can give arbitrarily bad forward error — but blame the problem, not the algorithm.

Source: Lecture 7, Backward stable+well conditioned=accurate solution slide and §7.4.1 of the lecture notes.

@bite~

Triangular linear systems $Rx = b$ solved by forward/back substitution are backward stable: the computed $\hat x$ satisfies $(R + \Delta R) \hat x = b$ with $\ \vert \Delta R\ \vert = O(\epsilon \ \vert R\ \vert )$.

Source: Lecture 7, Backward stability of triangular systems slide and §7.5.1 of the lecture notes.

@bite~

Why doesn’t solving $n$ linear systems $A x _ i = e _ i$ give us a backward-stable matrix inverse?

Each solve gives $(A + \Delta A _ i) \hat x _ i = e _ i$, but the $\Delta A _ i$ are different for each right-hand side $e _ i$. There is no single $\Delta A$ of size $O(\epsilon \|A\|)$ such that $(A + \Delta A) [\hat x _ 1, \ldots, \hat x _ n] = I$. Backward stability of one solve does not aggregate into a backward stability statement for the matrix inverse — the “small perturbation” must be the same across all columns.

Source: Lecture 7, §7.6 of the lecture notes (just before “Orthogonality matters for stability”).

@bite~

Stable algorithms: triangular solve, Cholesky for $A \succ 0$, Householder QR, QR-based linear-system and least-squares, and orthogonal matrix multiplication. Unstable / only conditionally stable: classical Gram-Schmidt (loses orthogonality), general matrix-matrix multiplication (only a forward-error bound), the normal equations for least-squares (forward error $\kappa^2$).

Source: Lecture 7, Key points on stability slide and §7.6-7.7 of the lecture notes.

@bite~

Strategy for proving the relative-forward-error bound $\|\hat x - x\| / \|x\| \le \epsilon \kappa _ 2(A) + O(\epsilon^2)$ for a backward-stable linear-system solve (∆backward-stable-linear-system-error-bound-proof).

  • Step 1: bound $\|A^{-1} \Delta A\| \le \epsilon \kappa _ 2(A) \ll 1$, using submultiplicativity and the hypothesis $\kappa _ 2(A) \ll \epsilon^{-1}$.
  • Step 2: Neumann-series expand $(A + \Delta A)^{-1} = (I - A^{-1}\Delta A + O(\|A^{-1}\Delta A\|^2))A^{-1}$, which is justified by step 1.
  • Step 3: compute $\hat x - x = -A^{-1} \Delta A \, x + O(\epsilon^2)$, take norms, use submultiplicativity once more, identify $\|A\| \|A^{-1}\| = \kappa _ 2(A)$.

Source: Lecture 7, §7.4 of the lecture notes (proof of Theorem 7.1).

@bite~ @proofsupport~

Where does the hypothesis $\kappa _ 2(A) \ll \epsilon^{-1}$ enter the relative-forward-error bound proof (∆backward-stable-linear-system-error-bound-proof)?

It is the precondition that makes the Neumann series for $(A + \Delta A)^{-1}$ converge. Specifically, $\|A^{-1} \Delta A\| \le \epsilon \kappa _ 2(A) \ll 1$ — and we need this $\ll 1$ to write $(I - A^{-1}\Delta A)^{-1} = I + A^{-1}\Delta A + (A^{-1}\Delta A)^2 + \cdots$ with the tail being $O(\epsilon^2)$.

When $\kappa _ 2(A) \sim \epsilon^{-1}$ (i.e. the matrix is “numerically singular”), the bound degenerates and you can no longer trust the linear system to be solvable accurately.

Source: Lecture 7, §7.4 of the lecture notes (hypotheses of Theorem 7.1).

@bite~ @proofsupport~

Strategy for proving the forward-error bound on matrix multiplication, $\|\mathrm{fl}(AB) - AB\| \le \epsilon \|A\| \|B\|$, and the relative-error version $\le \epsilon \min(\kappa _ 2(A), \kappa _ 2(B))$ (∆forward-error-matrix-multiplication-proof).

  • Absolute bound, column by column: each column of $\mathrm{fl}(AB)$ is $\mathrm{fl}(A b _ j) = (A + \Delta A^{(j)}) b _ j$ with $\|\Delta A^{(j)}\| \le \epsilon \|A\|$. Sum-of-squares over $j$ gives $\|\cdot\| _ F^2 \le \epsilon^2 \|A\|^2 \|B\| _ F^2$. Pass to 2-norm absorbing dimension factors into $\epsilon$.
  • Relative bound, division by $\sigma _ \min$: use $\|AB\| _ 2 \ge \sigma _ \min(A) \|B\| _ 2$ (valid for invertible $A$). Divide. Get $\epsilon \kappa _ 2(A)$. Symmetric argument on $B$ gives $\epsilon \kappa _ 2(B)$. Take the min.

Why “min” not “max”: either inequality $\|AB\| \ge \sigma _ \min \|\cdot\|$ holds; you’re free to choose the tighter one for the problem at hand.

Source: Lecture 7, §7.6 of the lecture notes (proof of Theorem 7.2).

@bite~ @proofsupport~