Notes - Optimisation for Data Science HT25, Subgradients
Flashcards
Suppose $f : \mathbb R^n \to \mathbb R$ is a proper convex function. Define a subgradient $v$ of $f$ at $x$ and the subdifferential $\partial f(x)$.
- $v \in \mathbb R^n$ is a subgradient of $f$ at $x$ if for all $y \in \mathbb R^n$, $f(y) \ge f(x) + v^\top (y - x)$.
- $\partial f(x)$ is the set of all subgradients of $f$ at $x$.
Suppose:
- $f : \mathbb R^n \to \mathbb R$ is a proper convex function
- $a \in \partial f(x)$
- $b \in \partial f(y)$
@Prove that then:
- $(a - b)^\top (x - y) \ge 0$
- Since $a \in \partial f(x)$, we have $f(y) \ge f(x) + a^\top (y - x)$.
- Since $b \in \partial f(y)$, we have $f(x) \ge f(y) + b^\top (x - y)$, or equivalently $-f(y) \ge -f(x) - b^\top (y - x)$.
- Adding these inequalities together yields $(a - b)^\top (y - x) \le 0$, so the result follows.
Suppose $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is proper convex. @State a result that effectively reduces the subgradient to the normal gradient when $f$ is differentiable, and vice versa.
If $f$ is differentiable at $x$, then $\partial f(x) = \{\nabla f(x)\}$. Likewise, if $\partial f(x) = \{v\}$ for some $v$, then $f$ is differentiable at $x$.
Suppose $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is proper convex. @Prove that:
- If $f$ is differentiable at $x$, then $\partial f(x) = \{\nabla f(x)\}$
- If $\partial f(x) = \{v\}$ for some $v$, then $f$ is differentiable at $x$.
$f$ differentiable at $x$ implies $\partial f(x) = \{\nabla f(x)\}$:
By convexity, for all $x, y \in \mathbb R^n$ and $t \in [0, 1]$,
\[\begin{aligned} & f((1-t)x + ty) \le (1 - t)f(x) + t f(y) \\\\ \implies& \frac{f(x + t(y - x)) - f(x)}{t} \le f(y) - f(x) \\\\ \end{aligned}\]Taking the limit at $t \to 0$, we see
\[\nabla f(x)^\top (y - x) \le f(y) - f(x)\]and hence $\nabla f(x) \in \partial f(x)$.
Likewise, suppose that $\gamma \in \partial f(x)$. Then by the definition of subgradients,
\[\begin{aligned} \gamma^t ( -w) &\le \frac{f(x - tw) - f(x)}{t} \\ &\stackrel{t \to 0^+}\to \nabla f(x)^\top (-w) \end{aligned}\]But also
\[\begin{aligned} \gamma^\top w &\le \frac{f(x + tw) - f(x)}{t} \\ &\stackrel{t \to 0^-}\to \nabla f(x)^\top w \end{aligned}\]and hence
\[\nabla f(x)^\top w \le \gamma^\top w \le \nabla f(x)^\top w\]Therefore
\[\gamma = \nabla f(x)\]$\partial f(x) = \{v\}$ for some $v$ implies $f$ is differentiable at $x$:
Now suppose that $\partial f(x) = \{v\}$. Then as above, by the definition of subgradients
\[\begin{aligned} v^t ( -w) &\le \frac{f(x - tw) - f(x)}{t} \\ &\stackrel{t \to 0^+}\to \nabla f(x)^\top (-w) \end{aligned}\]But also
\[\begin{aligned} v^\top w &\le \frac{f(x + tw) - f(x)}{t} \\ &\stackrel{t \to 0^-}\to \nabla f(x)^\top w \end{aligned}\]