Optimisation for Data Science HT25, Accelerated methods
Flashcards
Obsfvz91VNimLlbsImbP9vtQSuppose we wish to minimise the convex quadratic function
\(f(x) = \frac 1 2 x^\top A x - b^\top x\)
where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$.
In this context, @define the general template for iterative updates used in the heavy ball method.
\(x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\) where $\beta _ k > 0$ is some constant (and initially, a steepest descent update is performed to avoid the need for two starting points).
heavy-ball-method-stateSuppose we wish to minimise the convex quadratic function
\(f(x) = \frac 1 2 x^\top A x - b^\top x\)
where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form
\(x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\)
@State a theorem about the convergence of the heavy ball method.
There exists a constant $C > 0$ such that when the heavy ball method is applied with constant step lengths \(\begin{aligned} \alpha _ k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\ \beta _ k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma} \end{aligned}\) satisfies \(\vert \vert x^k - x^\ast \vert \vert \le C\beta^k\) for all $k$.
ObsP6Njw75Xumq9sBUgBGVdqSuppose we wish to minimise the convex quadratic function
\(f(x) = \frac 1 2 x^\top A x - b^\top x\)
where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form
\(x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\)
There is a result that says there is some constant $C > 0$ such that when the heavy ball method is applied with constant step lengths
\(\begin{aligned}
\alpha _ k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\
\beta _ k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma}
\end{aligned}\)
satisfies
\(\vert \vert x^k - x^\ast \vert \vert \le C\beta^k\)
for all $k$. This gives a result about the convergence of $x^k$ to $x^\ast$. Can you @state and @prove a theorem about the convergence of the objective value, possibly under some additional assumptions?
\(\begin{aligned} f(x^k) - f(x^\ast) &\le \nabla f(x^\ast) (x^k - x^\ast) + \frac L 2 \vert \vert x^k - x^\ast \vert \vert ^2 &&(1\star) \\\\ &\le \frac{LC^2}{2} \beta^{2k} &&(2\star) \\\\ &\approx \frac{LC^2}{2} \left( 1 - 2 \sqrt{\frac \gamma L} \right)^{2k} &&(3\star) \end{aligned}\) where:
- $(1\star)$ follows from the fact $f$ is $L$-smooth
- $(2\star)$ follows from the fact that $\nabla f(x^\ast) = 0$ and by the above result
- $(3\star)$ follows from an approximation when $L \gg \gamma$