NLA MT25, Randomised least-squares
Flashcards
In the context of randomised least-squares, what kinds of least-squares problems are we considering?
where $A \in \mathbb R^{m \times n}$ is tall-skinny so that $m \gg n$, and $\text{rank}(A) = n$.
Suppose:
- $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
- We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$
What’s the intuitive idea behind row subset selection?
Suppose that $Ax^\ast = b$, so that the system is consistent. Then $x^\ast$ can be found by solving the smaller problem
\[x^\ast = \text{argmin} _ x \vert \vert A _ 1 x - b _ 1 \vert \vert _ 2\]where $A _ 1 \in \mathbb R^{s \times n}$ is any rank-$n$ submatrix of $A$. If this is the case, we can obtain the solution in $O(sn^2)$ operations.
Suppose:
- $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
- We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$
The intuitive idea behind row subset selection is that if $Ax^\ast = b$ for some $x^\ast$, then $x^\ast$ can be found by solving the smaller problem
\[x^\ast = \text{argmin} _ x \vert \vert A _ 1 x - b _ 1 \vert \vert _ 2\]
where $A _ 1 \in \mathbb R^{s \times n}$ is any rank-$n$ submatrix of $A$. If this is the case, we can obtain the solution in $O(sn^2)$ operations.
Give an @example to show that this need not be the case when $Ax _ \ast \ne b$.
where $A _ 1 = \epsilon I _ n$ and $\epsilon \ll 1$, and $A _ i = I _ n$ for $i \ge 2$, and $b _ i = b _ j$ if $i, j \ge 2$.
Then the least squares solution is approximately $x \approx b _ 2$. But if the top $s = n$ rows are chosen, solving $\min _ x \vert \vert A _ 1 x - b _ 1 \vert \vert _ 2$ gets $\hat x = \epsilon^{-1} b _ 1$, and this can be arbitrarily bad.
Suppose:
- $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
- We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$
Describe the sketch-and-solve @algorithm in this context.
- Generate a random sketch matrix $G \in \mathbb R^{s \times m}$, where $s = cn$ for $c > 1$, typically Gaussian for analysis
- Computes $GA$ and $Gb$, and solves the $s \times n$ least-squares problem $\min _ x \vert \vert G(Ax - b) \vert \vert _ 2$
Suppose:
- $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
- We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$
The ∆least-squares-sketch-and-solve algorithm approximates this problem by:
- Generating a random sketch matrix $G \in \mathbb R^{s \times m}$, where $s = cn$ for $c > 1$, typically Gaussian for analysis
- Computing $GA$ and $Gb$, and solves the $s \times n$ least-squares problem $\min _ x \vert \vert G(Ax - b) \vert \vert _ 2$
@Justify its connection to the subset selection algorithm.
Consider the square Gaussian matrix
\[\hat G = \begin{bmatrix} G \\ G _ 2 \end{bmatrix}\]and consider $\min _ x \vert \vert \hat G(Ax - b) \vert \vert _ 2$, and solve by selecting the first $s$ rows.
Suppose:
- $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
- We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$
The ∆least-squares-sketch-and-solve algorithm approximates this problem by:
- Generating a random sketch matrix $G \in \mathbb R^{s \times m}$, where $s = cn$ for $c > 1$, typically Gaussian for analysis
- Computing $GA$ and $Gb$, and solves the $s \times n$ least-squares problem $\min _ x \vert \vert G(Ax - b) \vert \vert _ 2$
@State a theorem characterising the error in the approximate solutions.
Suppose:
- $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
- We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$
- $x^\ast = \text{argmin} _ x \vert \vert Ax - b \vert \vert _ 2$
- $G \in \mathbb R^{s \times m}$ is a Gaussian matrix with $s > cn$ and $c > 1$
- $\hat x = \text{argmin} _ x \vert \vert G(Ax - b) \vert \vert _ 2$
Then, with high probability:
\[\vert \vert A\hat x - b \vert \vert _ 2 \le \frac{\sqrt s + \sqrt{n+1}}{\sqrt{s} - \sqrt{n - 1}} \vert \vert Ax^\ast - b \vert \vert _ 2\]Suppose:
- $A \in \mathbb R^{m \times n}$ is tall-skinny ($m \gg n$) and $\text{rank}(A) = n$
- We are considering the least-squares problem $\min _ x \vert \vert Ax - b \vert \vert _ 2$
The ∆least-squares-sketch-and-solve algorithm approximates this problem by:
- Generating a random sketch matrix $G \in \mathbb R^{s \times m}$, where $s = cn$ for $c > 1$, typically Gaussian for analysis
- Computing $GA$ and $Gb$, and solves the $s \times n$ least-squares problem $\min _ x \vert \vert G(Ax - b) \vert \vert _ 2$
@Prove that then if:
- $x^\ast = \text{argmin} _ x \vert \vert Ax - b \vert \vert _ 2$
- $G \in \mathbb R^{s \times m}$ is a Gaussian matrix with $s > cn$ and $c > 1$
- $\hat x = \text{argmin} _ x \vert \vert G(Ax - b) \vert \vert _ 2$
Then, with high probability:
\[\vert \vert A\hat x - b \vert \vert _ 2 \le \frac{\sqrt s + \sqrt{n+1}}{\sqrt{s} - \sqrt{n - 1}} \vert \vert Ax^\ast - b \vert \vert _ 2\]
Let $[A, b] = QR$ be the QR factorisation. We have $G[A, b] = (GQ)R$, and hence we may write
\[\vert \vert Av - b \vert \vert _ 2 = \vert \vert Qw \vert \vert _ 2\]for some $w \in \mathbb R^{n+1}$, and $ \vert \vert G(Av - b) \vert \vert _ 2 = \vert \vert (GQ)w \vert \vert _ 2$.
By the ∆orthogonal-invariance-of-gaussian-matrices, we have that $GQ$ is also Gaussian. Furthermore, $GQ$ is rectangular $s \times (n+1)$ and hence by ∆marchenko-pastur-nla, it follows that
\[\sigma _ i(GQ) \in [\sqrt s - \sqrt{n+1}, \sqrt s + \sqrt{n+1}]\]and hence
\[\vert \vert (GQ)w \vert \vert _ 2 \approx \vert \vert Qw \vert \vert _ 2\]for all $w$.
Then by the subordinate property $ \vert \vert XYv \vert \vert _ 2 \le \vert \vert X \vert \vert _ 2 \vert \vert Yv \vert \vert _ 2$, we obtain
\[\begin{aligned} \vert \vert G(Av - b) \vert \vert _ 2 &= \left \vert \left \vert G[A, b] \begin{bmatrix} v \\ -1 \end{bmatrix} \right \vert \right \vert _ 2 \\ &\le (\sqrt s + \sqrt{n+1}) \left \vert \left \vert R \begin{bmatrix} v \\ -1 \end{bmatrix} \right \vert \right \vert _ 2 \\ &= (\sqrt s + \sqrt{n+1}) \vert \vert Av - b \vert \vert _ 2 \end{aligned}\]and similarly
\[\vert \vert G(Av - b) \vert \vert _ 2 \ge (\sqrt s - \sqrt{n+1}) \vert \vert Av - b \vert \vert _ 2\]Since these hold for any vector $v$, applying this to $\hat x$ and the fact that $ \vert \vert G(A \hat x - b) \vert \vert _ 2 \le \vert \vert G(Ax - b) \vert \vert _ 2$, it follows that
\[\begin{aligned} \vert \vert A\hat x - b \vert \vert _ 2 &\le \frac{1}{\sqrt s - \sqrt{n+1}} \vert \vert G(Ax - b) \vert \vert _ 2 \\ &\le \frac{\sqrt s + \sqrt{n+1}}{\sqrt s - \sqrt{n+1}} \vert \vert Ax - b \vert \vert _ 2 \end{aligned}\]as required.