Course - Optimisation for Data Science HT25
This course analyses optimisation methods suitable for large-scale data science problems, mainly by deriving results on the rate of convergence under increasing assumptions (smooth, convex, strongly convex) on the objective functions.
The course begins with some optimisation terminology and then covers gradient descent and the proximal method, which can be used to apply steepest descent techniques to regularised problems. Then it covers acceleration techniques such as the heavy ball method, and then moves onto stochastic gradient descent and accelerated techniques in that context. Finally, it covers coordinate descent methods.
- Course Webpage
- Lecture Notes
- From the previous year:
- Lecture Notes
- 1, Scope and examples
- 2, Terminology and prerequisites
- 3, Method of steepest descent
- 4, The proximal method
- 5, Acceleration of gradient methods
- 6, Stochastic gradient descent
- 7, Reducing the noise floor in SGD
- 8, Coordinate descent
- 9, Practical coordinate descent
- 10, Outlook (non-examinable)
- Other courses this term: [[Courses HT25]]U
Notes
-
[[Notes - Optimisation for Data Science HT25, Overview of convergence results]]U
- [[Notes - Optimisation for Data Science HT25, Optimisation terminology]]U
- [[Notes - Optimisation for Data Science HT25, Smoothness and convexity]]U
- [[Notes - Optimisation for Data Science HT25, Subgradients]]U
- [[Notes - Optimisation for Data Science HT25, Steepest descent]]U
- [[Notes - Optimisation for Data Science HT25, Line search methods]]U
- [[Notes - Optimisation for Data Science HT25, Proximal methods]]U
- [[Notes - Optimisation for Data Science HT25, Accelerated methods]]U
- [[Notes - Optimisation for Data Science HT25, Stochastic gradient descent]]U
- [[Notes - Optimisation for Data Science HT25, Stochastic variance reduction methods]]U
- [[Notes - Optimisation for Data Science HT25, Coordinate descent]]U
-
[[Notes - Optimisation for Data Science HT25, Coordinate descent with inexact line search]]?
- [[Notes - Optimisation for Data Science HT25, Useful miscellany]]U
Problem Sheets
- From the previous year:
To-Do List
Differences between notes and slides
-
6: Stochastic gradient descent
- More details on taking expectations in the analysis of the algorithm
-
7: Reducing the noise floor
- Proof for dynamic stepsize seems easier to follow, also uses Jensen’s inequality at some point which the notes don’t
- Proves stronger result about $\lim$ rather than $\lim \inf$
- Proof for SVRG seems clearer
-
8: Coordinate descent
- Wolfe conditions
- Convergence of cyclic coordinate descent with inexact line searches
- Interpretation of why convergence on Powell’s objective fails in terms of the results about the convergence in general
- Convergence of randomised coordinate descent with inexact line searches
- Dimensionality dilemma
- More detail about Lipschitz componentwise derivatives and cleaner proof by factoring out a “foundational lemma”
- Proof in general case with fixed stepsize
- Proof in convex case with fixed stepsize
- Proof in $\gamma$-strongly convex case with fixed stepsize
-
9: Practical coordinate descent
- Pretty much all completely new content
Relevant reading
- “Optimisation for Data Science” course at ETH Zurich:
- #uni #notes #task #read 2: Theory of Convex Functions ✅ 2025-03-29
- #uni #notes #task #read 3: Gradient Descent
- #uni #notes #task #read 4: Projected Gradient Descent
- #uni #notes #task #read 5: Coordinate Descent
- #uni #notes #task #read 10: Subgradient Methods
- #uni #notes #task #read 11: Mirror Descent, Smoothing, Proximal Algorithms
- #uni #notes #task #read 12: Stochastic Optimisation
- #uni #notes #task #read 13: Finite Sum Optimisation
-
“Convex Optimisation” at CMU:
- #uni #notes #task #read 1: Convex Sets
- #uni #notes #task #read 2: Convex Functions, Optimization Basics
- #uni #notes #task #read 3: Gradient Descent
- #uni #notes #task #read 4: More Gradient Descent and Subgradients
- #uni #notes #task #read 5: The Subgradient Method and Oracle Lower Bounds
- #uni #notes #task #read 6: Projected Gradient Descent and the Proximal Method
- #uni #notes #task #read 7: More Proximal Method
- #uni #notes #task #read 8: Stochastic Gradient Descent
-
Convex Optimisation, Boyd and Vandenberghe
- #uni #notes #task #read 1: Introduction
- #uni #notes #task #read 2: Convex sets
- #uni #notes #task #read 3: Convex functions
- #uni #notes #task #read 4: Convex optimization problems
- #uni #notes #task #read 9: Unconstrained minimisation
- #uni #notes #task #read 10: Equality constrained minimisation
-
Numerical Optimisation, Nocedal and Wright
- #uni #notes #task #read 1: Introduction
- #uni #notes #task #read 2: Fundamentals of Unconstrained Optimisation
- #uni #notes #task #read 3: Line-search methods
-
Deep Learning, Goodfellow
- #uni #notes #task #read 8: Optimization for Training Deep Models
-
Optimisation for Machine Learning, Sra, Nowozin and Wright
- #uni #notes #task #read 1: Optimization and Machine Learning
- #uni #notes #task #read 2: Convex Optimization with Sparsity-Inducing Norms
- #uni #notes #task #read 4: Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A survey