Notes - Optimisation for Data Science HT25, Smoothness and convexity
Flashcards
Convexity
@Define what it means for a function $f : \mathbb R^n \to \mathbb R$ to be convex.
For all $\lambda \in [0, 1]$ and $x, y \in \mathbb R^n$,
\[f((1-\lambda)x + \lambda y) \le (1 - \lambda)f(x) + \lambda f(y)\]Suppose:
- We have a set of functions $\{f _ i(x) : \mathcal D \to \mathbb R \mid i \in \mathcal N\}$ with $\mathcal N$ a finite index set
- Each $f _ i$ is a convex function defined on a common convex domain $\mathcal D \subseteq \mathbb R^n$.
@Justify that
\[x \mapsto \max _ {i \in \mathcal N} f _ i(x)\]
is a convex function.
For all $x _ 1, x _ 2 \in \mathcal D$ and $\lambda \in [0, 1]$,
\[\begin{aligned} \max _ i f _ i(\lambda x _ 1 + (1-\lambda)x _ 2) &\le \max _ i (\lambda f _ i(x _ 1) + (1-\lambda) f _ i(x _ 2)) \\\\ &\le \lambda \max _ i f _ i(x _ 1 +) + (1 - \lambda)\max _ i f _ i(x _ 2) \end{aligned}\]Suppose:
- $\mathcal D \subseteq R^n$ is a convex domain
- $f : \mathcal D \to \mathbb R$
- $f$ has gradient $\nabla f(x)$ at $x \in \mathcal D$
@Justify that then the first-order Taylor approximation is a lower bounding function, i.e. for all $y \in \mathcal D$,
\[f(x) + \nabla f(x)^\top (y - x) \le f(y)\]
By definition, $f$ is convex if for all $\lambda \in [0, 1]$ and $x, y \in \mathbb R^n$,
\[f((1-\lambda)x + \lambda y) \le (1 - \lambda)f(x) + \lambda f(y)\]Therefore
\[f(x) + \frac{f(x + \lambda(y - x)) - f(x)}{\lambda} \le f(y)\]and taking the limit $\lambda \to 0$ yields the result.
@Define what it means for a function $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ to be proper convex.
$f$ is convex and $f(x) < +\infty$ for at least one point $x$.
Suppose:
- $\mathcal D \subseteq \mathbb R^n$ is a convex domain
- $f : \mathcal D \to \mathbb R$ is a convex function
How is it possible to extend $f$ to a proper convex function on all of $\mathbb R^n$?
Set $f(x) := +\infty$ for $x \notin \mathcal D$.
Suppose $\Omega \subset \mathbb R^n$. @Define the indicator function $I _ \Omega$, and state a result about its convexity.
$\Omega \ne \emptyset$ is a convex set iff $I _ \Omega$ is a proper convex function.
Suppose we have the convex optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{subject to }x \in \mathcal F\]
where $\mathcal F$ is a convex set. How can you convert it into an equivalent unconstrained convex optimisation problem?
Consider
\[\min _ {x \in \mathbb R^n} f(x) + I _ \mathcal F(x)\]Strong convexity
@Define what it means for a function $f : \mathbb R^n \to \mathbb R$ to be strongly convex with modulus of convexity $\gamma > 0$.
For all $\lambda \in [0, 1]$ and $x, y \in \mathbb R^n$:
\[f((1 - \lambda)x + \lambda y) \le (1 - \lambda) f(x) + \lambda f(y) - \frac \gamma 2 \lambda (1 - \lambda)||x - y||^2\]Suppose:
- $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$
- $f$ is $\gamma$-strongly convex
- $f$ is differentiable at $x$
@State a result which lower bounds $f$ in terms of $\nabla f(x)$ and $\gamma$.
For all $y \in \mathbb R^n$,
\[f(x) + \nabla f(x)^\top ( y - x) + \frac \gamma 2 ||y - x||^2 \le f(y)\]Suppose:
- $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$
- $f$ is $\gamma$-strongly convex
- $f$ is differentiable at $x$
@Prove that then for all $y \in \mathbb R^n$,
\[f(x) + \nabla f(x)^\top ( y - x) + \frac \gamma 2 ||y - x||^2 \le f(y)\]
By the definition of $\gamma$-strong convexity, we have that
\[f( (1 - \lambda)x + \lambda y) \le (1 - \lambda) f(x) + \lambda f(y) - \frac \gamma 2 \lambda (1 - \lambda)||y-x||^2\]which implies by rearranging
\[f(x) + \frac{f((1 - \lambda)x + \lambda y) - f(x)}{\lambda} - f(y) \le -\frac \gamma 2 (1-\lambda) ||y-x||^2\]Taking the limit as $\lambda \to 0$ yields
\[f(x) + \nabla f(x)^\top (y - x) - f(y) \le -\frac \gamma 2 ||y - x||^2\]and multiplying by $-1$ and rearranging gives
\[f(y) \ge f(x) + \nabla f(x)^\top ( y - x) + \frac \gamma 2 ||y - x||^2\]as required.
Smoothness
@Define what it means for a function $f : \mathbb R^n \to \mathbb R$ to be $L$-smooth.
It is differentiable everywhere with $L$-Lipschitz continuous gradient, i.e. for all $x, y \in \mathbb R^n$
\[||\nabla f(x) - \nabla f(y) || \le L||x - y||\]Suppose:
- $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$
- $f$ is $L$-smooth (and not necessarily convex)
- $x \in \mathbb R^n$
@State a result which upper bounds $f$ in terms of $\nabla f(x)$ and $L$.
For all $y \in \mathbb R^n$,
\[f(y) \le f(x) + \nabla f(x)^\top ( y - x) + \frac L 2 ||y - x||^2\]Suppose:
- $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$
- $f$ is $L$-smooth (and not necessarily convex)
- $x \in \mathbb R^n$
@Prove that then for all $y \in \mathbb R^n$,
\[f(y) - f(x) - \nabla f(x)^\top ( y - x) \le \frac L 2 ||y - x||^2\]
Let $d = y - x$. Then
\[\begin{align*} f(x + d) &= f(x) + \int^1_0 \nabla f(x + \xi d)^\top d \text d\xi \\\\ &= f(x) + \nabla f(x)^\top d + \int^1_0 [\nabla f(x + \xi d) - \nabla f(x)]^\top d \text d\xi \\\\ &\stackrel{\text{C.S.} }\le f(x) + \nabla f(x)^\top d + \int^1_0 ||\nabla f(x + \xi d) - \nabla f(x)|| \cdot ||d|| \text d\xi \\\\ &\stackrel{\text{L.S.} }{\le} f(x) + \nabla f(x)^\top d + L \int^1_0 \xi ||d||^2 \text d \xi \\\\ &= f(x) + \nabla f(x)^\top d + \frac{L||d||^2}{2} \end{align*}\]Then since $d = y - x$,
\[f(y) \le f(x) + \nabla f(x)^\top (y - x) + \frac L 2 ||y-x||^2\]Connections between strong convexity and smoothness
Suppose:
- $f : \mathbb R^n \to \mathbb R$
- $f$ is $\gamma$-strongly convex
- $f$ is $L$-smooth
@State a result which relates $\gamma$ and $L$.
Suppose:
- $f : \mathbb R^n \to \mathbb R$
- $f$ is $\gamma$-strongly convex
- $f$ is $L$-smooth
@Justify (appealing to other results) that then $\gamma \le L$.
By other results,
\[\frac \gamma 2 ||y - x||^2 \le f(y) - f(x) - \nabla f(x)^\top ( y - x) \le \frac L 2 ||y - x||^2\]for all $y \in \mathbb R^n$.
Matrix characterisations
Suppose we have a quadratic form
\[f(x) = \frac 1 2 x^\top A x - b^\top x\]
where $A$ is a symmetric matrix. Can you link the properties of $A$ to $\gamma$-strong convexity and $L$-smoothness?
If $A$ has eigenvalues in $[\gamma, L]$ where $0 < \gamma < L$, then $A$ is $\gamma$-strongly convex and $L$-smooth.