Continuous Optimisation HT26, Unconstrained optimality conditions for convex problems
Flashcards
@Define what it means for a function $f : \mathcal D \to \mathbb R$ on a convex domain $\mathcal D \subseteq \mathbb R^n$ to be convex (a zeroth-order characterisation), and give a geometric interpretation.
For all $x, y \in \mathcal D$ and $\alpha \in [0, 1]$,
\[f(x + \alpha(y - x)) \le f(x) + \alpha (f(y) - f(x)),\]equivalently $f((1-\lambda) x + \lambda y) \le (1 - \lambda) f(x) + \lambda f(y)$ for $\lambda \in [0, 1]$.
Geometrically: the chord between $(x, f(x))$ and $(y, f(y))$ lies on or above the graph of $f$ over the segment from $x$ to $y$.
Suppose:
- $\mathcal D \subseteq \mathbb R^n$ is a convex domain
- $f \in \mathcal C^1(\mathcal D)$
@State a necessary and sufficient first-order condition for $f$ to be convex.
$f$ is convex iff for all $x, y \in \mathcal D$,
\[f(y) \ge f(x) + \nabla f(x)^\top (y - x).\]Geometrically: the tangent hyperplane to $f$ at any point lies on or below the graph of $f$.
Suppose:
- $\mathcal D \subseteq \mathbb R^n$ is a convex domain
- $f \in \mathcal C^1(\mathcal D)$
@Prove ∆convexity-first-order: $f$ is convex iff for all $x, y \in \mathcal D$,
\[f(y) \ge f(x) + \nabla f(x)^\top (y - x).\]
($\Rightarrow$ convex implies tangent characterisation): by ∆convex-function-definition, for all $\lambda \in (0, 1]$,
\[f(x + \lambda(y - x)) \le f(x) + \lambda(f(y) - f(x)).\]Rearranging,
\[\frac{f(x + \lambda(y - x)) - f(x)}{\lambda} \le f(y) - f(x).\]Letting $\lambda \to 0^+$, the LHS tends to the directional derivative $\nabla f(x)^\top (y - x)$. So $\nabla f(x)^\top (y - x) \le f(y) - f(x)$, which rearranges to the claim.
($\Leftarrow$ tangent characterisation implies convex): let $x, y \in \mathcal D$, $\lambda \in [0, 1]$ and $z = (1 - \lambda)x + \lambda y$. Applying the first-order inequality with basepoint $z$ at both $x$ and $y$:
\[f(x) \ge f(z) + \nabla f(z)^\top (x - z), \qquad f(y) \ge f(z) + \nabla f(z)^\top (y - z).\]Since $x - z = -\lambda (y - x)$ and $y - z = (1 - \lambda)(y - x)$, multiplying the first inequality by $(1 - \lambda)$ and the second by $\lambda$ and summing, the $\nabla f(z)^\top (y - x)$ terms cancel:
\[(1 - \lambda) f(x) + \lambda f(y) \ge (1 - \lambda) f(z) + \lambda f(z) = f((1 - \lambda) x + \lambda y),\]which is the convexity inequality.
Suppose:
- $\mathcal D \subseteq \mathbb R^n$ is a convex domain
- $f \in \mathcal C^2(\mathcal D)$
@State a necessary and sufficient second-order condition for $f$ to be convex.
$f$ is convex iff
\[\nabla^2 f(x) \succeq 0 \quad \text{for all } x \in \mathcal D.\]Suppose:
- $\mathcal D \subseteq \mathbb R^n$ is a convex domain
- $f \in \mathcal C^2(\mathcal D)$
@Prove ∆convexity-second-order: $f$ is convex iff $\nabla^2 f(x) \succeq 0$ for all $x \in \mathcal D$.
($\Rightarrow$ convex implies PSD Hessian): by ∆convexity-first-order, $f(y) \ge f(x) + \nabla f(x)^\top (y - x)$. Setting $y = x + \alpha s$ for $s \in \mathbb R^n$, $\alpha > 0$:
\[f(x + \alpha s) \ge f(x) + \alpha \nabla f(x)^\top s. \qquad (\ast)\] \[f(x + \alpha s) = f(x) + \alpha \nabla f(x)^\top s + \frac{\alpha^2}{2} s^\top \nabla^2 f(x + \tilde \alpha s) s\]for some $\tilde \alpha \in (0, \alpha)$. Subtracting from $(\ast)$:
\[\frac{\alpha^2}{2} s^\top \nabla^2 f(x + \tilde \alpha s) s \ge 0.\]Letting $\alpha \to 0^+$, $\tilde \alpha \to 0$ too, and hence by continuity of $\nabla^2 f$ (since $f \in \mathcal C^2$), $s^\top \nabla^2 f(x) s \ge 0$. Since $s$ was arbitrary, $\nabla^2 f(x) \succeq 0$.
($\Leftarrow$ PSD Hessian implies convex): assume $\nabla^2 f \succeq 0$ on $\mathcal D$. For any $x, y \in \mathcal D$, by ∆second-order-taylor with $\alpha = 1$, $s = y - x$:
\[f(y) = f(x) + \nabla f(x)^\top (y - x) + \frac{1}{2} (y - x)^\top \nabla^2 f(\xi) (y - x)\]for some $\xi$ on the segment from $x$ to $y$. Since $\mathcal D$ is convex, $\xi \in \mathcal D$, so $\nabla^2 f(\xi) \succeq 0$ and the quadratic term is $\ge 0$. Hence
\[f(y) \ge f(x) + \nabla f(x)^\top (y - x).\]This is the first-order characterisation, so by ∆convexity-first-order (reverse direction), $f$ is convex.
Bite-sized
If $f$ is convex on a convex domain $\mathcal D$, then every local minimiser of $f$ is automatically a global minimiser.
If $f \in \mathcal C^1$ is convex on a convex domain $\mathcal D$, then every stationary point ($\nabla f(x^\ast) = 0$) is automatically a global minimiser — first-order conditions are already sufficient in the convex case.
For the quadratic $q(x) := g^\top x + \tfrac{1}{2} x^\top H x$ with $H$ symmetric, $H \succeq 0$ (PSD) implies $q$ is convex; in that case any stationary point $x^\ast$ (i.e. any solution of $Hx + g = 0$) is automatically a global minimiser of $q$.
Give the proof strategy for ∆convexity-first-order (the equivalence: $f \in \mathcal C^1$ convex $\iff$ $f(y) \ge f(x) + \nabla f(x)^\top (y - x)$ for all $x, y$).
- ($\Rightarrow$) Start from the zeroth-order convexity inequality with parameter $\lambda \in (0, 1]$; rearrange to a difference quotient $[f(x + \lambda(y-x)) - f(x)]/\lambda \le f(y) - f(x)$; pass $\lambda \to 0^+$ so the LHS becomes the directional derivative $\nabla f(x)^\top (y - x)$.
- ($\Leftarrow$) For $x, y \in \mathcal D$ and $\lambda \in [0, 1]$, set $z := (1 - \lambda)x + \lambda y$. Apply the first-order inequality with basepoint $z$ at both $x$ and at $y$. Note $x - z = -\lambda(y - x)$ and $y - z = (1 - \lambda)(y - x)$, so the two $\nabla f(z)^\top$ terms are scalar multiples of each other. Multiply the inequalities by $(1 - \lambda)$ and $\lambda$ respectively and sum — the gradient terms cancel — to recover the zeroth-order convexity inequality.
The “difference-quotient” identity driving the ($\Rightarrow$) direction of ∆convexity-first-order. Starting from $f(x + \lambda(y - x)) \le f(x) + \lambda(f(y) - f(x))$ and dividing by $\lambda > 0$:
\[\frac{f(x + \lambda(y - x)) - f(x)}{\lambda} \le f(y) - f(x).\]Passing $\lambda \to 0^+$, the LHS becomes $\nabla f(x)^\top (y - x)$ — the directional derivative at $x$ in direction $y - x$.
Give the proof strategy for ∆convexity-second-order (the equivalence: $f \in \mathcal C^2$ convex $\iff$ $\nabla^2 f(x) \succeq 0$ for all $x$).
- ($\Rightarrow$) Use ∆convexity-first-order to get $f(y) \ge f(x) + \nabla f(x)^\top (y - x)$. Set $y = x + \alpha s$ and subtract from the second-order Taylor expansion ∆second-order-taylor; the result is $\tfrac{\alpha^2}{2} s^\top \nabla^2 f(x + \tilde\alpha s) s \ge 0$. Pass $\alpha \to 0^+$ (so $\tilde\alpha \to 0$) and use continuity of $\nabla^2 f$ to conclude $s^\top \nabla^2 f(x) s \ge 0$; since $s$ is arbitrary, $\nabla^2 f(x) \succeq 0$.
- ($\Leftarrow$) Assume $\nabla^2 f \succeq 0$ everywhere. For any $x, y \in \mathcal D$, apply ∆second-order-taylor with $\alpha = 1$, $s = y - x$. The intermediate point $\xi$ lies in $\mathcal D$ (convex domain), so $\nabla^2 f(\xi) \succeq 0$ and the quadratic remainder is non-negative. Hence $f(y) \ge f(x) + \nabla f(x)^\top (y - x)$, the first-order characterisation. Apply ∆convexity-first-order in the reverse direction to conclude.
The reduced Taylor identity at the heart of the ($\Rightarrow$) direction of ∆convexity-second-order. Subtracting the first-order convexity inequality $f(x + \alpha s) \ge f(x) + \alpha \nabla f(x)^\top s$ from the second-order Taylor expansion $f(x + \alpha s) = f(x) + \alpha \nabla f(x)^\top s + \tfrac{\alpha^2}{2} s^\top \nabla^2 f(x + \tilde\alpha s) s$ leaves
\[\tfrac{\alpha^2}{2} s^\top \nabla^2 f(x + \tilde\alpha s) s \ge 0.\]Dividing by $\alpha^2 / 2 > 0$ and letting $\alpha \to 0^+$ yields $s^\top \nabla^2 f(x) s \ge 0$ by continuity, for every $s$.