Optimisation for Data Science HT25, Useful miscellany


Flashcards

Useful identity in convex convergence proofs

@State a useful identity between two vectors $g, v \in \mathbb R^n$ which is useful when proving convergence with the assumption of convexity.


\[g^\top v - \frac{1}{2L} ||g||^2 = \frac{L}{2}(||v||^2 - ||v - \frac{1}{L} g||^2)\]

E.g. when $g = \nabla f(x^k)$ and $v = x^k - x^\ast$, this becomes

\[\begin{aligned} \nabla f(x^k)^\top (x^k - x^\ast) - \frac{1}{2L}||x^k - x^\ast||^2 &= \frac{L}{2}(||x^k - x^\ast||^2 - ||x^k - x^\ast - \frac{1}{L}\nabla f(x^k)||^2) \\\\ &= \frac L 2(||x^k - x^\ast||^2 - ||x^{k+1} - x^\ast||^2) \end{aligned}\]

which can then be used in a telescoping sum.

@Prove that for any two vectors $g, v \in \mathbb R^n$,

\[g^\top v - \frac{1}{2L} ||g||^2 = \frac{L}{2}(||v||^2 - ||v - \frac{1}{L} g||^2)\]

\[\begin{aligned} g^\top v - \frac{1}{2L} ||g||^2 &= \frac{L}{2} \left( \frac{2}{L} \langle g, v \rangle - \frac{1}{L^2} \langle g, g \rangle \right) \\\\ &= \frac{L}{2}\left( \langle v, v\rangle - \langle v, v \rangle + \frac{2}{L} \langle g, v \rangle - \frac{1}{L^2} \langle g, g\rangle \right) \\\\ &= \frac{L}{2}\left( \langle v, v\rangle - \langle v, v \rangle + \frac{1}{L} \langle g, v \rangle + \frac{1}{L} \langle g, v \rangle - \frac{1}{L^2} \langle g, g\rangle \right) \\\\ &= \frac{L}{2}\left( \langle v, v\rangle - \langle v, v \rangle + \left\langle \frac{1}{L} g, v \right\rangle + \left\langle v, \frac{1}{L} g \right\rangle - \left\langle \frac{1}{L} g, \frac{1}{L} g\right\rangle \right) \\\\ &= \frac{L}{2} \left(\langle v, v\rangle - \left\langle v - \frac 1 L g, v - \frac 1 L g \right\rangle\right) \\\\ &=\frac{L}{2}(||v||^2 - ||v - \frac{1}{L} g||^2) \end{aligned}\]

Useful consequence of Taylor’s theorem

Suppose:

  • $f : \mathbb R^n \to \mathbb R$ is sufficiently smooth
  • $d \in \mathbb R^n$

@State a useful result about how you can rewrite $f(x + d)$.


\[f(x + d) = f(x) + \int^1_0 \nabla f(x + td)^\top d \text dt\]

Suppose:

  • $f : \mathbb R^n \to \mathbb R$ is sufficiently smooth
  • $d \in \mathbb R^n$

@Prove that then

\[f(x + d) = f(x) + \int^1_0 \nabla f(x + td)^\top d \text dt\]

Consider the function $g : [0, 1] \to \mathbb R$ defined by $g(t) = f(x + td)$. Then by the fundamental theorem of calculus,

\[g(1) - g(0) = \int^1_0 g'(t) \text dt\]

Hence

\[f(x + d) = f(x) + \int^1_0 \nabla f(x + td)^\top d \text dt\]

Condition number of a matrix

What is the condition number of a matrix $\kappa(A)$?


\[\kappa(A) =\frac{\lambda_\text{max} }{\lambda_\text{min} }\]



Related posts