Optimisation for Data Science HT25, Accelerated methods


Flashcards

Obsfvz91VNimLlbsImbP9vtQ

Suppose we wish to minimise the convex quadratic function \(f(x) = \frac 1 2 x^\top A x - b^\top x\) where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$.

In this context, @define the general template for iterative updates used in the heavy ball method.


\(x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\) where $\beta _ k > 0$ is some constant (and initially, a steepest descent update is performed to avoid the need for two starting points).

heavy-ball-method-state

Suppose we wish to minimise the convex quadratic function \(f(x) = \frac 1 2 x^\top A x - b^\top x\) where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form \(x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\)

@State a theorem about the convergence of the heavy ball method.


There exists a constant $C > 0$ such that when the heavy ball method is applied with constant step lengths \(\begin{aligned} \alpha _ k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\ \beta _ k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma} \end{aligned}\) satisfies \(\vert \vert x^k - x^\ast \vert \vert \le C\beta^k\) for all $k$.

ObsP6Njw75Xumq9sBUgBGVdq

Suppose we wish to minimise the convex quadratic function \(f(x) = \frac 1 2 x^\top A x - b^\top x\) where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form \(x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\) There is a result that says there is some constant $C > 0$ such that when the heavy ball method is applied with constant step lengths \(\begin{aligned} \alpha _ k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\ \beta _ k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma} \end{aligned}\) satisfies \(\vert \vert x^k - x^\ast \vert \vert \le C\beta^k\) for all $k$. This gives a result about the convergence of $x^k$ to $x^\ast$. Can you @state and @prove a theorem about the convergence of the objective value, possibly under some additional assumptions?


\(\begin{aligned} f(x^k) - f(x^\ast) &\le \nabla f(x^\ast) (x^k - x^\ast) + \frac L 2 \vert \vert x^k - x^\ast \vert \vert ^2 &&(1\star) \\\\ &\le \frac{LC^2}{2} \beta^{2k} &&(2\star) \\\\ &\approx \frac{LC^2}{2} \left( 1 - 2 \sqrt{\frac \gamma L} \right)^{2k} &&(3\star) \end{aligned}\) where:

  • $(1\star)$ follows from the fact $f$ is $L$-smooth
  • $(2\star)$ follows from the fact that $\nabla f(x^\ast) = 0$ and by the above result
  • $(3\star)$ follows from an approximation when $L \gg \gamma$



Related posts