Optimisation for Data Science HT25, Heavy ball method
Flashcards
Suppose we wish to minimise the convex quadratic function
\[f(x) = \frac 1 2 x^\top A x - b^\top x\]
where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$.
In this context, @define the general template for iterative updates used in the heavy ball method.
where $\beta _ k > 0$ is some constant (and initially, a steepest descent update is performed to avoid the need for two starting points).
Suppose we wish to minimise the convex quadratic function
\[f(x) = \frac 1 2 x^\top A x - b^\top x\]
where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form
\[x^{k+1} = x^k - \alpha_k \nabla f(x^k) + \beta_k (x^k - x^{k-1})\]
@State a theorem about the convergence of the heavy ball method.
There exists a constant $C > 0$ such that when the heavy ball method is applied with constant step lengths
\[\begin{aligned} \alpha_k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\ \beta_k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma} \end{aligned}\]satisfies
\[||x^k - x^\ast|| \le C\beta^k\]for all $k$.
Suppose we wish to minimise the convex quadratic function
\[f(x) = \frac 1 2 x^\top A x - b^\top x\]
where $A$ is a symmetric matrix with eigenvalues in $[\gamma, L]$. In the heavy ball method, we perform updates of the form
\[x^{k+1} = x^k - \alpha_k \nabla f(x^k) + \beta_k (x^k - x^{k-1})\]
There is a result that says there is some constant $C > 0$ such that when the heavy ball method is applied with constant step lengths
\[\begin{aligned}
\alpha_k &= \frac{4}{(\sqrt L + \sqrt \gamma)^2} \\\\
\beta_k &= \frac{\sqrt L - \sqrt \gamma}{\sqrt L + \sqrt \gamma}
\end{aligned}\]
satisfies
\[||x^k - x^\ast|| \le C\beta^k\]
for all $k$. This gives a result about the convergence of $x^k$ to $x^\ast$. Can you @state and @prove a theorem about the convergence of the objective value, possibly under some additional assumptions?
where:
- $(1\star)$ follows from the fact $f$ is $L$-smooth
- $(2\star)$ follows from the fact that $\nabla f(x^\ast) = 0$ and by the above result
- $(3\star)$ follows from an approximation when $L \gg \gamma$