Course - Optimisation for Data Science HT25


This course analyses optimisation methods suitable for large-scale data science problems, mainly by deriving results on the rate of convergence under increasing assumptions (smooth, convex, strongly convex) on the objective functions.
The course begins with some optimisation terminology and then covers gradient descent and the proximal method, which can be used to apply steepest descent techniques to regularised problems. Then it covers acceleration techniques such as the heavy ball method, and then moves onto stochastic gradient descent and accelerated techniques in that context. Finally, it covers coordinate descent methods.

Notes

Problem Sheets

To-Do List

Differences between notes and slides

  • 6: Stochastic gradient descent

    • More details on taking expectations in the analysis of the algorithm
  • 7: Reducing the noise floor

    • Proof for dynamic stepsize seems easier to follow, also uses Jensen’s inequality at some point which the notes don’t
    • Proves stronger result about $\lim$ rather than $\lim \inf$
    • Proof for SVRG seems clearer
  • 8: Coordinate descent

    • Wolfe conditions
    • Convergence of cyclic coordinate descent with inexact line searches
    • Interpretation of why convergence on Powell’s objective fails in terms of the results about the convergence in general
    • Convergence of randomised coordinate descent with inexact line searches
    • Dimensionality dilemma
    • More detail about Lipschitz componentwise derivatives and cleaner proof by factoring out a “foundational lemma”
    • Proof in general case with fixed stepsize
    • Proof in convex case with fixed stepsize
    • Proof in $\gamma$-strongly convex case with fixed stepsize
  • 9: Practical coordinate descent

    • The results on randomised coordinate descent assume that the step size is fixed at $1/L$, this lecture is about what happens when you instead use $\alpha _ k$ satisfying the Wolfe conditions (still assuming that $f$ is CL-smooth)
    • Wolfe conditions for RCD with line search
    • Proof Overestimation lemma for coordinate line search (deterministic and in expectation)
    • Proof RCD for General case
    • Proof RCD for Convex case
    • Proof RCD for $\gamma$-strongly convex case
    • CCD in convex case (no proof)
    • CCD in $\gamma$-strongly convex case (no proof)
    • Discussion of cycle complexity and iteration complexity
    • Block coordinate descent algorithm
    • Block Lipschitz smoothness
    • Prove block Lipschitz overestimation property, also check whether it is indeed $L _ \text{bmax}$ rather than $L _ \text{max}$ like the notes say
    • Proof for RBCD CL-smooth $\gamma$-strongly convex with known block structure
    • Proof RBCD CL-smooth $\gamma$-strongly convex with equal sized subsets

Relevant reading




Related posts