Optimisation for Data Science HT25, Optimisation terminology
Flashcards
Minimisers
Consider the optimisation model
\[\min _ {x \in \mathbb R^n} f(x) \quad\text{ subject to }x \in \mathcal F\]
@Define what is meant by:
- A local minimiser
- A strict local minimiser
- A global minimiser
$x^\ast \in \mathcal F$ is:
- A local minimiser if $f(x^\ast) \le f(x)$ for all $x \in \mathcal F \cap B _ \varepsilon (x^\ast)$ for some $\varepsilon > 0$.
- A strict local minimiser if $f(x^\ast) < f(x)$ for all $x \in \mathcal F \cap B _ \varepsilon(x^\ast)$
- A global minimiser if $f(x^\ast) \le f(x)$ for all $x \in \mathcal F$.
Suppose $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is a proper convex function. Consider the optimisation model
\[\min _ {x \in \mathbb R^n} f(x) \quad\text{ subject to }x \in \mathcal F\]
Recall that $x^\ast \in \mathcal F$ is:
- A local minimiser if $f(x^\ast) \le f(x)$ for all $x \in \mathcal F \cap B _ \varepsilon (x^\ast)$ for some $\varepsilon > 0$.
- A strict local minimiser if $f(x^\ast) < f(x)$ for all $x \in \mathcal F \cap B _ \varepsilon(x^\ast)$
- A global minimiser if $f(x^\ast) \le f(x)$ for all $x \in \mathcal F$.
@State three results about the minimisation of proper convex functions.
(1): $x^\ast$ is a local minimiser iff it is a global minimiser.
(2): The set of minimisers
\[\text{argmin} _ {x \in \mathbb R^n} f(x) := \\{x^\ast \mid f(x^\ast) = \min _ {x \in \mathbb R^n} f(x)\\}\]is a convex set.
(3): $x^\ast \in \text{argmin} _ {x \in \mathbb R^n} f(x)$ if and only if $\pmb 0 \in \partial f(x^\ast)$.
Suppose $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is a proper convex function. Consider the optimisation model
\[\min _ {x \in \mathbb R^n} f(x) \quad\text{ subject to }x \in \mathcal F\]
Recall that $x^\ast \in \mathcal F$ is:
- A local minimiser if $f(x^\ast) \le f(x)$ for all $x \in \mathcal F \cap B _ \varepsilon (x^\ast)$ for some $\varepsilon > 0$.
- A strict local minimiser if $f(x^\ast) < f(x)$ for all $x \in \mathcal F \cap B _ \varepsilon(x^\ast)$
- A global minimiser if $f(x^\ast) \le f(x)$ for all $x \in \mathcal F$.
@Prove the following three results about the minimisation of proper convex functions:
- $x^\ast$ is a local minimiser iff it is a global minimiser.
- The set of minimisers $\text{argmin} _ {x \in \mathbb R^n} f(x) := \{x^\ast \mid f(x^\ast) = \min _ {x \in \mathbb R^n} f(x)\}$ is a convex set.
- $x^\ast \in \text{argmin} _ {x \in \mathbb R^n} f(x)$ if and only if $\pmb 0 \in \partial f(x^\ast)$.
For (1), note that every global minimiser is in particular a local minimiser.
Now assume there exist $x$ such that $x$ is a local minimiser but not a global minimiser. Then $\exists \varepsilon > 0$ such that $\forall x’ \in B _ \varepsilon(x)$ such that $f(x) \le f(x’)$, but there is some $y$ such that $f(y) < f(x)$.
Let
\[x_\varepsilon := \left(1 - \frac{\varepsilon}{2||y - x||}\right)x + \left( \frac{\varepsilon}{2||y - x||} \right)y\]and note that $x _ \varepsilon \in B _ \varepsilon (x)$. But then we have
\[f(x_\varepsilon) \le \left(1 - \frac{\varepsilon}{2||y - x||}\right)f(x) + \left( \frac{\varepsilon}{2||y - x||} \right)f(y) < f(x)\]and so $x$ is not a local minimiser, a contradiction.
(2, 3) are immediate from chasing definitions.
Convergence rates
Suppose:
- $(\phi _ k) _ {k \in \mathbb N} \in \mathbb R _ +^\mathbb N$
- $\phi _ k \to 0$ as $k \to \infty$.
@Define what it means for $(\phi _ k) _ {k \in \mathbb N}$ to converge:
- Sublinearly
- $R$-linearly
- $Q$-linearly
- $R$-superlinearly
- $Q$-superlinearly
- $Q$-quadratically
and state the implications between these convergence rates. Which rates guarantee decrease at every iteration?
- Sublinearly: Slower than $R$-linearly
- $R$-linearly: $\exists \sigma \in (0, 1)$ and $C > 0$ such that $\forall k \in \mathbb N$, $\phi _ k \le C(1 - \sigma)^k$.
- $Q$-linearly: $\exists \sigma \in (0, 1)$ such that $\forall k \in \mathbb N$, $\frac{\phi _ {k+1} }{\phi _ k} \le 1 - \sigma$ (guarantees decrease)
- $Q$-superlinearly: $\lim _ {k \to \infty} \frac{\phi _ {k+1} }{\phi _ k} = 0$
- $R$-superlinearly: $\exists (\nu _ k) _ {k \in \mathbb N} \in \mathbb R _ +^{\mathbb N}$ such that $\nu _ k \to 0$ $Q$-superlinearly and $\forall k \in \mathbb N$, $\phi _ k \le \nu _ k$
- $Q$-quadratically: $\exists C > 0$ such that $\forall k \in \mathbb N$, $\frac{\phi _ {k+1} }{\phi _ k^2} \le C$.
All the following implications are strictly one directional:
- $R$-linearity is implied by
- $Q$-linearity is implied by
- $R$-superlinearity is implied by
- $Q$-superlinearity is implied by
- $Q$-quadraticity