Notes - Optimisation for Data Science HT25, Subgradients

[[Course - Optimisation for Data Science HT25]]^U
- [[Notes - Optimisation for Data Science HT25, Optimisation terminology]]^U

Flashcards

Basic definitions

Suppose $f : \mathbb R^n \to \mathbb R$ is a proper convex function. @Define a subgradient $v$ of $f$ at $x$ and the subdifferential $\partial f(x)$.

$v \in \mathbb R^n$ is a subgradient of $f$ at $x$ if for all $y \in \mathbb R^n$, $f(y) \ge f(x) + v^\top (y - x)$.
$\partial f(x)$ is the set of all subgradients of $f$ at $x$.

Properties

Suppose:

$f : \mathbb R^n \to \mathbb R$ is a proper convex function
$a \in \partial f(x)$
$b \in \partial f(y)$

@Prove that then:

$(a - b)^\top (x - y) \ge 0$

Since $a \in \partial f(x)$, we have $f(y) \ge f(x) + a^\top (y - x)$.
Since $b \in \partial f(y)$, we have $f(x) \ge f(y) + b^\top (x - y)$, or equivalently $-f(y) \ge -f(x) - b^\top (y - x)$.
Adding these inequalities together yields $(a - b)^\top (y - x) \le 0$, so the result follows.

Suppose $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is proper convex. @State a result that effectively reduces the subgradient to the normal gradient when $f$ is differentiable, and vice versa.

If $f$ is differentiable at $x$, then $\partial f(x) = \{\nabla f(x)\}$. Likewise, if $\partial f(x) = \{v\}$ for some $v$, then $f$ is differentiable at $x$.

Suppose $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is proper convex. @Prove that:

If $f$ is differentiable at $x$, then $\partial f(x) = \{\nabla f(x)\}$
If $\partial f(x) = \{v\}$ for some $v$, then $f$ is differentiable at $x$.

$f$ differentiable at $x$ implies $\partial f(x) = \{\nabla f(x)\}$:

By convexity, for all $x, y \in \mathbb R^n$ and $t \in [0, 1]$,

\[\begin{aligned} & f((1-t)x + ty) \le (1 - t)f(x) + t f(y) \\\\ \implies& \frac{f(x + t(y - x)) - f(x)}{t} \le f(y) - f(x) \\\\ \end{aligned}\]

Taking the limit at $t \to 0$, we see

\[\nabla f(x)^\top (y - x) \le f(y) - f(x)\]

and hence $\nabla f(x) \in \partial f(x)$.

Likewise, suppose that $\gamma \in \partial f(x)$. Then by the definition of subgradients,

\[\begin{aligned} \gamma^t ( -w) &\le \frac{f(x - tw) - f(x)}{t} \\ &\stackrel{t \to 0^+}\to \nabla f(x)^\top (-w) \end{aligned}\]

But also

\[\begin{aligned} \gamma^\top w &\le \frac{f(x + tw) - f(x)}{t} \\ &\stackrel{t \to 0^-}\to \nabla f(x)^\top w \end{aligned}\]

and hence

\[\nabla f(x)^\top w \le \gamma^\top w \le \nabla f(x)^\top w\]

Therefore

\[\gamma = \nabla f(x)\]

$\partial f(x) = \{v\}$ for some $v$ implies $f$ is differentiable at $x$:

Now suppose that $\partial f(x) = \{v\}$. Then as above, by the definition of subgradients

\[\begin{aligned} v^t ( -w) &\le \frac{f(x - tw) - f(x)}{t} \\ &\stackrel{t \to 0^+}\to \nabla f(x)^\top (-w) \end{aligned}\]

But also

\[\begin{aligned} v^\top w &\le \frac{f(x + tw) - f(x)}{t} \\ &\stackrel{t \to 0^-}\to \nabla f(x)^\top w \end{aligned}\]

Examples

What is

\[\partial(||\cdot ||_2)(x)\]

(i.e. the subdifferential of the $2$-norm)?

\[\partial(||\cdot ||_2)(x) = \begin{cases} \lbrace x / ||x||\rbrace &\text{if }x \ne 0 \\\\ \lbrace v \mid ||v|| \le 1 \rbrace &\text{if }x = 0 \end{cases}\]

@example~

Flashcards

Basic definitions

Properties

Examples

Related posts