Notes - Optimisation for Data Science HT25, Smoothness and convexity

Flashcards

[[Course - Optimisation for Data Science HT25]]^U
- [[Notes - Optimisation for Data Science HT25, Optimisation terminology]]^U

Flashcards

Convexity

@Define what it means for a function $f : \mathbb R^n \to \mathbb R$ to be convex.

For all $\lambda \in [0, 1]$ and $x, y \in \mathbb R^n$,

\[f((1-\lambda)x + \lambda y) \le (1 - \lambda)f(x) + \lambda f(y)\]

Suppose:

We have a set of functions $\{f _ i(x) : \mathcal D \to \mathbb R \mid i \in \mathcal N\}$ with $\mathcal N$ a finite index set
Each $f _ i$ is a convex function defined on a common convex domain $\mathcal D \subseteq \mathbb R^n$.

@Justify that

\[x \mapsto \max _ {i \in \mathcal N} f _ i(x)\]

is a convex function.

For all $x _ 1, x _ 2 \in \mathcal D$ and $\lambda \in [0, 1]$,

\[\begin{aligned} \max _ i f _ i(\lambda x _ 1 + (1-\lambda)x _ 2) &\le \max _ i (\lambda f _ i(x _ 1) + (1-\lambda) f _ i(x _ 2)) \\\\ &\le \lambda \max _ i f _ i(x _ 1 +) + (1 - \lambda)\max _ i f _ i(x _ 2) \end{aligned}\]

Suppose:

$\mathcal D \subseteq R^n$ is a convex domain
$f : \mathcal D \to \mathbb R$
$f$ has gradient $\nabla f(x)$ at $x \in \mathcal D$

@Justify that then the first-order Taylor approximation is a lower bounding function, i.e. for all $y \in \mathcal D$,

\[f(x) + \nabla f(x)^\top (y - x) \le f(y)\]

By definition, $f$ is convex if for all $\lambda \in [0, 1]$ and $x, y \in \mathbb R^n$,

\[f((1-\lambda)x + \lambda y) \le (1 - \lambda)f(x) + \lambda f(y)\]

Therefore

\[f(x) + \frac{f(x + \lambda(y - x)) - f(x)}{\lambda} \le f(y)\]

and taking the limit $\lambda \to 0$ yields the result.

@Define what it means for a function $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ to be proper convex.

$f$ is convex and $f(x) < +\infty$ for at least one point $x$.

Suppose:

$\mathcal D \subseteq \mathbb R^n$ is a convex domain
$f : \mathcal D \to \mathbb R$ is a convex function

How is it possible to extend $f$ to a proper convex function on all of $\mathbb R^n$?

Set $f(x) := +\infty$ for $x \notin \mathcal D$.

Suppose $\Omega \subset \mathbb R^n$. @Define the indicator function $I _ \Omega$, and state a result about its convexity.

\[I _ \Omega = \begin{cases} 0 &\text{if }x \in \Omega \\\\ +\infty &\text{otherwise} \end{cases}\]

$\Omega \ne \emptyset$ is a convex set iff $I _ \Omega$ is a proper convex function.

Suppose we have the convex optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{subject to }x \in \mathcal F\]

where $\mathcal F$ is a convex set. How can you convert it into an equivalent unconstrained convex optimisation problem?

Consider

\[\min _ {x \in \mathbb R^n} f(x) + I _ \mathcal F(x)\]

Suppose you are analysing the convergence of an iterative algorithm and have a sequence

\[\Delta_k := f(x^k) - f(x^\ast)\]

where $x^\ast$ is some minimum.

@State a corollary of $f$ being convex that is very useful in this case.

\[\Delta_k \le \nabla f(x^k)^\top (x^k - x^\ast)\]

@important~

Suppose $f : \mathbb R^n \to \mathbb R \cup \lbrace +\infty \rbrace$ is a convex function. @Define $\text{dom}(f)$.

\[\text{dom}(f) = \lbrace x \in \mathbb R^n \mid f(x) < \infty \rbrace\]

$\gamma$-strong convexity

@Define what it means for a function $f : \mathbb R^n \to \mathbb R$ to be strongly convex with modulus of convexity $\gamma > 0$.

For all $\lambda \in [0, 1]$ and $x, y \in \mathbb R^n$:

\[f((1 - \lambda)x + \lambda y) \le (1 - \lambda) f(x) + \lambda f(y) - \frac \gamma 2 \lambda (1 - \lambda)||x - y||^2\]

First-order characterisation

Suppose:

$f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$
$f$ is $\gamma$-strongly convex
$f$ is differentiable at $x$

@State a result which lower bounds $f$ in terms of $\nabla f(x)$ and $\gamma$.

For all $y \in \mathbb R^n$,

\[f(x) + \nabla f(x)^\top ( y - x) + \frac \gamma 2 ||y - x||^2 \le f(y)\]

Suppose:

$f : \mathbb R^n \to \mathbb R$ (todo, case where $+\infty$)
$f$ is $\gamma$-strongly convex
$f$ is differentiable at $x$

@Prove that then for all $y \in \mathbb R^n$,

\[f(x) + \nabla f(x)^\top ( y - x) + \frac \gamma 2 ||y - x||^2 \le f(y)\]

By the definition of $\gamma$-strong convexity, we have that

\[f( (1 - \lambda)x + \lambda y) \le (1 - \lambda) f(x) + \lambda f(y) - \frac \gamma 2 \lambda (1 - \lambda)||y-x||^2\]

which implies by rearranging

\[f(x) + \frac{f((1 - \lambda)x + \lambda y) - f(x)}{\lambda} - f(y) \le -\frac \gamma 2 (1-\lambda) ||y-x||^2\]

Taking the limit as $\lambda \to 0$ yields

\[f(x) + \nabla f(x)^\top (y - x) - f(y) \le -\frac \gamma 2 ||y - x||^2\]

and multiplying by $-1$ and rearranging gives

\[f(y) \ge f(x) + \nabla f(x)^\top ( y - x) + \frac \gamma 2 ||y - x||^2\]

as required.

Suppose you are analysing the convergence of an iterative algorithm and have a sequence

\[\Delta_k := f(x^k) - f(x^\ast)\]

where $x^\ast$ is some minimum.

@State a corollary of $f$ being $\gamma$-strongly convex that is very useful in this case.

\[\Delta_k \le \frac{1}{2\gamma}||\nabla f(x^k)||^2\]

@important~

When asked to prove that a particular function is $\gamma$-strongly convex, what characterisation of convexity should immediately jump to mind?

For all $y \in \mathbb R^n$,

\[f(y) - f(x) - \nabla f(x)^\top (y - x) \ge \frac \gamma 2 ||y - x||^2\]

Sometimes it’s also convenient to let $y = x + d$ where $d = y - x$.

Strong monotonicity characterisation

A function $f : \mathbb R^n \to \mathbb R$ is $L$-smooth iff for all $x$ and $y$,

\[||\nabla f(y) - \nabla f(x)|| \le L||y - x||\]

which says that $\nabla f$ is $L$-Lipschitz. Can you state a characterisation of $\gamma$-strong convexity that has a “similar flavour”?

Call a function $F$ $\gamma$-strongly monotonic if $(F(y) - F(x))^\top (y - x) \ge \gamma \vert \vert y - x \vert \vert ^2$ for every $x$ and $y$. Then $f$ is $\gamma$-strongly convex iff $\nabla f$ is $\gamma$-strongly monotonic, i.e.

\[(\nabla f(y) - \nabla f(x))^\top (y - x) \ge \gamma ||y - x||^2\]

for each $x, y$.

@extra~

$L$-smoothness

@Define what it means for a function $f : \mathbb R^n \to \mathbb R$ to be $L$-smooth.

It is differentiable everywhere with $L$-Lipschitz continuous gradient, i.e. for all $x, y \in \mathbb R^n$

\[||\nabla f(x) - \nabla f(y) || \le L||x - y||\]

First-order characterisation

Suppose:

$f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$
$f$ is $L$-smooth (and not necessarily convex)
$x \in \mathbb R^n$

@State a result which upper bounds $f$ in terms of $\nabla f(x)$ and $L$.

For all $y \in \mathbb R^n$,

\[f(y) \le f(x) + \nabla f(x)^\top ( y - x) + \frac L 2 ||y - x||^2\]

Suppose:

$f : \mathbb R^n \to \mathbb R$ (todo, case where $+\infty$)
$f$ is $L$-smooth (and not necessarily convex)
$x \in \mathbb R^n$

@Prove that then for all $y \in \mathbb R^n$,

\[f(y) - f(x) - \nabla f(x)^\top ( y - x) \le \frac L 2 ||y - x||^2\]

We have for arbitrary $d$:

\[\begin{align*} f(x + d) &= f(x) + \int^1_0 \nabla f(x + t d)^\top d \text dt \\\\ &= f(x) + \nabla f(x)^\top d + \int^1_0 [\nabla f(x + t d) - \nabla f(x)]^\top d \text dt \\\\ &\stackrel{\text{C.S.} }\le f(x) + \nabla f(x)^\top d + \int^1_0 ||\nabla f(x + t d) - \nabla f(x)|| \cdot ||d|| \text dt \\\\ &\stackrel{\text{L.S.} }{\le} f(x) + \nabla f(x)^\top d + L \int^1_0 t ||d||^2 \text d t \\\\ &= f(x) + \nabla f(x)^\top d + \frac{L||d||^2}{2} \end{align*}\]

Let $d = y - x$, then:

\[f(y) \le f(x) + \nabla f(x)^\top (y - x) + \frac L 2 ||y-x||^2\]

(The first step is justified by the fundamental theorem of calculus applied to $g(t) = f(x + td)$).

Connections between strong convexity and smoothness

Suppose:

$f : \mathbb R^n \to \mathbb R$
$f$ is $\gamma$-strongly convex
$f$ is $L$-smooth

@State a result which relates $\gamma$ and $L$.

\[\gamma \le L\]

Suppose:

$f : \mathbb R^n \to \mathbb R$
$f$ is $\gamma$-strongly convex
$f$ is $L$-smooth

@Justify (appealing to other results) that then $\gamma \le L$.

By other results,

\[\frac \gamma 2 ||y - x||^2 \le f(y) - f(x) - \nabla f(x)^\top ( y - x) \le \frac L 2 ||y - x||^2\]

for all $y \in \mathbb R^n$.

Smoothness and convexity of quadratics

Suppose we have a quadratic function

\[f(x) = \frac 1 2 x^\top A x + b^\top x\]

where $A$ is a symmetric matrix. Can you link the properties of $A$ to $\gamma$-strong convexity and $L$-smoothness?

If $A$ has eigenvalues in $[\gamma, L]$ where $0 < \gamma < L$, then $f$ is $\gamma$-strongly convex and $L$-smooth.

@Prove that a positive definite quadratic

\[f(x) = \frac 1 2 x^\top A x + b^\top x + c\]

has $\gamma$-strong convexity parameter $\lambda _ \text{min}(A)$ and $L$-smooth parameter $\lambda _ \text{max}(A)$.

Note $\nabla f(x) = Ax + b$, so $\nabla f(y) - \nabla f(x) = A(y-x)$.

Recall the strong monotonicity characterisation of $\gamma$-strong convexity, i.e. that $f$ is $\gamma$-strongly convex iff for all $x$, $y$

\[(\nabla f(y) - \nabla f(x))^\top (y - x) \ge \gamma ||y - x||^2\]

Hence

\[(\nabla f(y) - \nabla f(x))^\top (y - x) = (y-x)^\top A^\top(y-x) \ge \gamma ||y - x||^2\]

where the final inequality is by the Rayleigh bound that $v^\top H v \ge \lambda _ \text{min}(H) \vert \vert v \vert \vert ^2$ when $H$ is symmetric.

For $L$-smoothness, we have:

\[||\nabla f(y) - \nabla f(x)|| = ||A(y-x)|| \le ||A||_\text{op}||y-x|| = \lambda_\text{max}(A) ||y - x||\]

@State two results that are useful in proving that a positive definite quadratic

\[f(x) = \frac 1 2 x^\top A x + b^\top x + c\]

has $\gamma$-strong convexity parameter $\lambda _ \text{min}(A)$ and $L$-smooth parameter $\lambda _ \text{max}(A)$.

$ \vert \vert A(y - x) \vert \vert \le \lambda _ \max (A) \vert \vert y-x \vert \vert $
For symmetric $A$, $v^\top A v \ge \lambda _ \min(H) \vert \vert v \vert \vert ^2$.

Flashcards

Convexity

$\gamma$-strong convexity

First-order characterisation

Strong monotonicity characterisation

$L$-smoothness

First-order characterisation

Connections between strong convexity and smoothness

Smoothness and convexity of quadratics

Related posts