Optimisation for Data Science HT25, Heavy ball method



Flashcards

Suppose we wish to minimise the convex quadratic function

\[f(x) = \frac 1 2 x^\top A x - b^\top x\]

where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$.

In this context, @define the general template for iterative updates used in the heavy ball method.


\[x^{k+1} = x^k - \alpha_k \nabla f(x^k) + \beta_k (x^k - x^{k-1})\]

where $\beta _ k > 0$ is some constant (and initially, a steepest descent update is performed to avoid the need for two starting points).

Suppose we wish to minimise the convex quadratic function

\[f(x) = \frac 1 2 x^\top A x - b^\top x\]

where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form

\[x^{k+1} = x^k - \alpha_k \nabla f(x^k) + \beta_k (x^k - x^{k-1})\]

@State a theorem about the convergence of the heavy ball method.


There exists a constant $C > 0$ such that when the heavy ball method is applied with constant step lengths

\[\begin{aligned} \alpha_k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\ \beta_k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma} \end{aligned}\]

satisfies

\[||x^k - x^\ast|| \le C\beta^k\]

for all $k$.

Suppose we wish to minimise the convex quadratic function

\[f(x) = \frac 1 2 x^\top A x - b^\top x\]

where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form

\[x^{k+1} = x^k - \alpha_k \nabla f(x^k) + \beta_k (x^k - x^{k-1})\]

There is a result that says there is some constant $C > 0$ such that when the heavy ball method is applied with constant step lengths

\[\begin{aligned} \alpha_k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\ \beta_k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma} \end{aligned}\]

satisfies

\[||x^k - x^\ast|| \le C\beta^k\]

for all $k$. This gives a result about the convergence of $x^k$ to $x^\ast$. Can you @state and @prove a theorem about the convergence of the objective value, possibly under some additional assumptions?


\[\begin{aligned} f(x^k) - f(x^\ast) &\le \nabla f(x^\ast) (x^k - x^\ast) + \frac L 2 ||x^k - x^\ast||^2 &&(1\star) \\\\ &\le \frac{LC^2}{2} \beta^{2k} &&(2\star) \\\\ &\approx \frac{LC^2}{2} \left( 1 - 2 \sqrt{\frac \gamma L} \right)^{2k} &&(3\star) \end{aligned}\]

where:

  • $(1\star)$ follows from the fact $f$ is $L$-smooth
  • $(2\star)$ follows from the fact that $\nabla f(x^\ast) = 0$ and by the above result
  • $(3\star)$ follows from an approximation when $L \gg \gamma$



Related posts