NLA MT25, Randomised least-squares


Flashcards

In the context of randomised least-squares, what kinds of least-squares problems are we considering?


\[\min _ x \vert \vert Ax - b \vert \vert _ 2\]

where $A \in \mathbb R^{m \times n}$ is tall-skinny so that $m \gg n$, and $\text{rank}(A) = n$.

Suppose:

  • $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
  • We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$

What’s the intuitive idea behind row subset selection?


Suppose that $Ax^\ast = b$, so that the system is consistent. Then $x^\ast$ can be found by solving the smaller problem

\[x^\ast = \text{argmin} _ x \vert \vert A _ 1 x - b _ 1 \vert \vert _ 2\]

where $A _ 1 \in \mathbb R^{s \times n}$ is any rank-$n$ submatrix of $A$. If this is the case, we can obtain the solution in $O(sn^2)$ operations.

Suppose:

  • $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
  • We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$

The intuitive idea behind row subset selection is that if $Ax^\ast = b$ for some $x^\ast$, then $x^\ast$ can be found by solving the smaller problem

\[x^\ast = \text{argmin} _ x \vert \vert A _ 1 x - b _ 1 \vert \vert _ 2\]

where $A _ 1 \in \mathbb R^{s \times n}$ is any rank-$n$ submatrix of $A$. If this is the case, we can obtain the solution in $O(sn^2)$ operations.

Give an @example to show that this need not be the case when $Ax _ \ast \ne b$.


\[A = \begin{bmatrix} A _ 1 \\ A _ 2 \\ \vdots \\ A _ k \end{bmatrix}, \quad b = \begin{bmatrix} b _ 1 \\ b _ 2 \\ \vdots \\ b _ k \end{bmatrix}\]

where $A _ 1 = \epsilon I _ n$ and $\epsilon \ll 1$, and $A _ i = I _ n$ for $i \ge 2$, and $b _ i = b _ j$ if $i, j \ge 2$.

Then the least squares solution is approximately $x \approx b _ 2$. But if the top $s = n$ rows are chosen, solving $\min _ x \vert \vert A _ 1 x - b _ 1 \vert \vert _ 2$ gets $\hat x = \epsilon^{-1} b _ 1$, and this can be arbitrarily bad.

Suppose:

  • $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
  • We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$

Describe the sketch-and-solve @algorithm in this context.


  1. Generate a random sketch matrix $G \in \mathbb R^{s \times m}$, where $s = cn$ for $c > 1$, typically Gaussian for analysis
  2. Computes $GA$ and $Gb$, and solves the $s \times n$ least-squares problem $\min _ x \vert \vert G(Ax - b) \vert \vert _ 2$

Suppose:

  • $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
  • We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$

The ∆least-squares-sketch-and-solve algorithm approximates this problem by:

  1. Generating a random sketch matrix $G \in \mathbb R^{s \times m}$, where $s = cn$ for $c > 1$, typically Gaussian for analysis
  2. Computing $GA$ and $Gb$, and solves the $s \times n$ least-squares problem $\min _ x \vert \vert G(Ax - b) \vert \vert _ 2$

@Justify its connection to the subset selection algorithm.


Consider the square Gaussian matrix

\[\hat G = \begin{bmatrix} G \\ G _ 2 \end{bmatrix}\]

and consider $\min _ x \vert \vert \hat G(Ax - b) \vert \vert _ 2$, and solve by selecting the first $s$ rows.

Suppose:

  • $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
  • We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$

The ∆least-squares-sketch-and-solve algorithm approximates this problem by:

  1. Generating a random sketch matrix $G \in \mathbb R^{s \times m}$, where $s = cn$ for $c > 1$, typically Gaussian for analysis
  2. Computing $GA$ and $Gb$, and solves the $s \times n$ least-squares problem $\min _ x \vert \vert G(Ax - b) \vert \vert _ 2$

@State a theorem characterising the error in the approximate solutions.


Suppose:

  • $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
  • We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$
  • $x^\ast = \text{argmin} _ x \vert \vert Ax - b \vert \vert _ 2$
  • $G \in \mathbb R^{s \times m}$ is a Gaussian matrix with $s > cn$ and $c > 1$
  • $\hat x = \text{argmin} _ x \vert \vert G(Ax - b) \vert \vert _ 2$

Then, with high probability:

\[\vert \vert A\hat x - b \vert \vert _ 2 \le \frac{\sqrt s + \sqrt{n+1}}{\sqrt{s} - \sqrt{n - 1}} \vert \vert Ax^\ast - b \vert \vert _ 2\]

Suppose:

  • $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
  • We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$

The ∆least-squares-sketch-and-solve algorithm approximates this problem by:

  1. Generating a random sketch matrix $G \in \mathbb R^{s \times m}$, where $s = cn$ for $c > 1$, typically Gaussian for analysis
  2. Computing $GA$ and $Gb$, and solves the $s \times n$ least-squares problem $\min _ x \vert \vert G(Ax - b) \vert \vert _ 2$

@Prove that then if:

  • $x^\ast = \text{argmin} _ x \vert \vert Ax - b \vert \vert _ 2$
  • $G \in \mathbb R^{s \times m}$ is a Gaussian matrix with $s > cn$ and $c > 1$
  • $\hat x = \text{argmin} _ x \vert \vert G(Ax - b) \vert \vert _ 2$

Then, with high probability:

\[\vert \vert A\hat x - b \vert \vert _ 2 \le \frac{\sqrt s + \sqrt{n+1}}{\sqrt{s} - \sqrt{n - 1}} \vert \vert Ax^\ast - b \vert \vert _ 2\]

Let $[A, b] = QR$ be the QR factorisation. We have $G[A, b] = (GQ)R$, and hence we may write

\[\vert \vert Av - b \vert \vert _ 2 = \vert \vert Qw \vert \vert _ 2\]

for some $w \in \mathbb R^{n+1}$, and $ \vert \vert G(Av - b) \vert \vert _ 2 = \vert \vert (GQ)w \vert \vert _ 2$.

By the ∆orthogonal-invariance-of-gaussian-matrices, we have that $GQ$ is also Gaussian. Furthermore, $GQ$ is rectangular $s \times (n+1)$ and hence by ∆marchenko-pastur-nla, it follows that

\[\sigma _ i(GQ) \in [\sqrt s - \sqrt{n+1}, \sqrt s + \sqrt{n+1}]\]

and hence

\[\vert \vert (GQ)w \vert \vert _ 2 \approx \vert \vert Qw \vert \vert _ 2\]

for all $w$.

Then by the subordinate property $ \vert \vert XYv \vert \vert _ 2 \le \vert \vert X \vert \vert _ 2 \vert \vert Yv \vert \vert _ 2$, we obtain

\[\begin{aligned} \vert \vert G(Av - b) \vert \vert _ 2 &= \left \vert \left \vert G[A, b] \begin{bmatrix} v \\ -1 \end{bmatrix} \right \vert \right \vert _ 2 \\ &\le (\sqrt s + \sqrt{n+1}) \left \vert \left \vert R \begin{bmatrix} v \\ -1 \end{bmatrix} \right \vert \right \vert _ 2 \\ &= (\sqrt s + \sqrt{n+1}) \vert \vert Av - b \vert \vert _ 2 \end{aligned}\]

and similarly

\[\vert \vert G(Av - b) \vert \vert _ 2 \ge (\sqrt s - \sqrt{n+1}) \vert \vert Av - b \vert \vert _ 2\]

Since these hold for any vector $v$, applying this to $\hat x$ and the fact that $ \vert \vert G(A \hat x - b) \vert \vert _ 2 \le \vert \vert G(Ax - b) \vert \vert _ 2$, it follows that

\[\begin{aligned} \vert \vert A\hat x - b \vert \vert _ 2 &\le \frac{1}{\sqrt s - \sqrt{n+1}} \vert \vert G(Ax - b) \vert \vert _ 2 \\ &\le \frac{\sqrt s + \sqrt{n+1}}{\sqrt s - \sqrt{n+1}} \vert \vert Ax - b \vert \vert _ 2 \end{aligned}\]

as required.




Related posts