Course - Optimisation for Data Science HT25


This course analyses optimisation methods suitable for large-scale data science problems, mainly by deriving results on the rate of convergence under increasing assumptions (smooth, convex, strongly convex) on the objective functions.
The course begins with some optimisation terminology and then covers gradient descent and the proximal method, which can be used to apply steepest descent techniques to regularised problems. Then it covers acceleration techniques such as the heavy ball method, and then moves onto stochastic gradient descent and accelerated techniques in that context. Finally, it covers coordinate descent methods.

Notes

Problem Sheets

To-Do List

Differences between notes and slides

  • 6: Stochastic gradient descent

    • More details on taking expectations in the analysis of the algorithm
  • 7: Reducing the noise floor

    • Proof for dynamic stepsize seems easier to follow, also uses Jensen’s inequality at some point which the notes don’t
    • Proves stronger result about $\lim$ rather than $\lim \inf$
    • Proof for SVRG seems clearer
  • 8: Coordinate descent

    • Wolfe conditions
    • Convergence of cyclic coordinate descent with inexact line searches
    • Interpretation of why convergence on Powell’s objective fails in terms of the results about the convergence in general
    • Convergence of randomised coordinate descent with inexact line searches
    • Dimensionality dilemma
    • More detail about Lipschitz componentwise derivatives and cleaner proof by factoring out a “foundational lemma”
    • Proof in general case with fixed stepsize
    • Proof in convex case with fixed stepsize
    • Proof in $\gamma$-strongly convex case with fixed stepsize
  • 9: Practical coordinate descent

    • Pretty much all completely new content

Relevant reading




Related posts