Notes - Optimisation for Data Science HT25, Accelerated methods
Flashcards
Heavy ball method
Suppose we wish to minimise the convex quadratic function
where is a symmetric matrix with eigenvalues in .
In this context, @define the general template for iterative updates used in the heavy ball method.
where
Suppose we wish to minimise the convex quadratic function
where is a symmetric matrix with eigenvalues in . In the heavy ball method, we perform updates of the form
@State a theorem about the convergence of the heavy ball method.
There exists a constant
satisfies
for all
Suppose we wish to minimise the convex quadratic function
where is a symmetric matrix with eigenvalues in . In the heavy ball method, we perform updates of the form
There is a result that says there is some constant such that when the heavy ball method is applied with constant step lengths
satisfies
for all . This gives a result about the convergence of to . Can you @State and @prove a theorem about the convergence of the objective value, possibly under some additional assumptions?
where:
follows from the fact is -smooth follows from the fact that and by the above result follows from an approximation when
Nesterov acceleration for -smooth -strongly convex functions
Suppose we wish to minimise a strongly convex objective
In this context, @define the general template of the iterative updates used in Nesterov’s accelerated gradient method.
Suppose we wish to minimise a -strongly convex and -smooth objective
In this context, Nesterov’s accelerated gradient method uses updates of the form
@State candidate constant step lengths and give a result about the convergence of in this case.
gives
for all
Nesterov acceleration for -smooth convex functions
@State the @algorithm corresponding to Nesterov acceleration for the minimisation of an -smooth convex function at start point .
-
Initialisation:
- Find
and set
- Find
-
Main body: While
:- Find the minimal
such that
- Find the minimal
Intuitively, this algorithm produces two sequences of iterates,
Nesterov’s acceleration method for the minimisation of an -smooth convex function at start point proceeds as follows:
-
Initialisation:
- Find
and set
-
Main body: While
:
- Find the minimal
such that
@Prove that is decreasing and that it stops to decrease once , and then motivate this process.
- Find
and set
- Find the minimal
such that
Clearly it’s decreasing because of the step
Therefore the condition that
is satisfied with
If
is an upper bound on
Nesterov’s acceleration method for the minimisation of an -smooth convex function at start point proceeds as follows:
-
Initialisation:
- Find
and set
-
Main body: While
:
- Find the minimal
such that
Can you @justify the update for :
- Find
and set
- Find the minimal
such that
which is an upper bound on
Nesterov’s acceleration method for the minimisation of an -smooth convex function at start point proceeds as follows:
-
Initialisation:
- Find
and set
-
Main body: While
:
- Find the minimal
such that
@State a result related to its convergence.
- Find
and set
- Find the minimal
such that
Suppose:
be a convex -smooth function has at least one finite minimiser has a finite minimum
Then for all
Nesterov’s acceleration method for the minimisation of an -smooth convex function at start point proceeds as follows:
-
Initialisation:
- Find
and set
-
Main body: While
:
- Find the minimal
such that
@Prove that if
be a convex -smooth function
has at least one finite minimiser
has a finite minimum
then:
- The sequence of iterates
satisfies for all , so
is -optimal for all
- Find
and set
- Find the minimal
such that
It is easiest to analyse the algorithm in terms of
We have:
and
Hence by considering
where
where the last equality follows from the identity
Now we wish to derive bounds on the last two terms on the right hand side of the big equation.
To bound
The sufficient decrease condition yields that
and so substituting this into the convexity equation, we have that
To bound
And then convexity implies that
so by
Combining everything so far, we have that
(@todo, explain this monster).
Applying this inequality iteratively, we find
where the last inequality follows from
Hence this implies
and so
as required.