Notes - Optimisation for Data Science HT25, Accelerated methods

Course - Optimisation for Data Science HT25^U
- Notes - Optimisation for Data Science HT25, Steepest descent^U

Flashcards

Suppose we wish to minimise the convex quadratic function

\[f(x) = \frac 1 2 x^\top A x - b^\top x\]

where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$.

In this context, @define the general template for iterative updates used in the heavy ball method.

\[x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\]

where $\beta _ k > 0$ is some constant (and initially, a steepest descent update is performed to avoid the need for two starting points).

Suppose we wish to minimise the convex quadratic function

\[f(x) = \frac 1 2 x^\top A x - b^\top x\]

where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form

\[x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\]

@State a theorem about the convergence of the heavy ball method.

heavy-ball-method-state

There exists a constant $C > 0$ such that when the heavy ball method is applied with constant step lengths

\[\begin{aligned} \alpha _ k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\ \beta _ k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma} \end{aligned}\]

satisfies

\[\vert \vert x^k - x^\ast \vert \vert \le C\beta^k\]

for all $k$.

Suppose we wish to minimise the convex quadratic function

\[f(x) = \frac 1 2 x^\top A x - b^\top x\]

where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form

\[x^{k+1} = x^k - \alpha _ k \nabla f(x^k) + \beta _ k (x^k - x^{k-1})\]

There is a result that says there is some constant $C > 0$ such that when the heavy ball method is applied with constant step lengths

\[\begin{aligned} \alpha _ k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\ \beta _ k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma} \end{aligned}\]

satisfies

\[\vert \vert x^k - x^\ast \vert \vert \le C\beta^k\]

for all $k$. This gives a result about the convergence of $x^k$ to $x^\ast$. Can you @state and @prove a theorem about the convergence of the objective value, possibly under some additional assumptions?

ObsP6Njw75Xumq9sBUgBGVdq

\[\begin{aligned} f(x^k) - f(x^\ast) &\le \nabla f(x^\ast) (x^k - x^\ast) + \frac L 2 \vert \vert x^k - x^\ast \vert \vert ^2 &&(1\star) \\\\ &\le \frac{LC^2}{2} \beta^{2k} &&(2\star) \\\\ &\approx \frac{LC^2}{2} \left( 1 - 2 \sqrt{\frac \gamma L} \right)^{2k} &&(3\star) \end{aligned}\]

where:

$(1\star)$ follows from the fact $f$ is $L$-smooth
$(2\star)$ follows from the fact that $\nabla f(x^\ast) = 0$ and by the above result
$(3\star)$ follows from an approximation when $L \gg \gamma$

Flashcards

Related posts