Notes - Optimisation for Data Science HT25, Subgradients


Flashcards

Suppose $f : \mathbb R^n \to \mathbb R$ is a proper convex function. Define a subgradient $v$ of $f$ at $x$ and the subdifferential $\partial f(x)$.


  • $v \in \mathbb R^n$ is a subgradient of $f$ at $x$ if for all $y \in \mathbb R^n$, $f(y) \ge f(x) + v^\top (y - x)$.
  • $\partial f(x)$ is the set of all subgradients of $f$ at $x$.

Suppose:

  • $f : \mathbb R^n \to \mathbb R$ is a proper convex function
  • $a \in \partial f(x)$
  • $b \in \partial f(y)$

@Prove that then:

  • $(a - b)^\top (x - y) \ge 0$

  • Since $a \in \partial f(x)$, we have $f(y) \ge f(x) + a^\top (y - x)$.
  • Since $b \in \partial f(y)$, we have $f(x) \ge f(y) + b^\top (x - y)$, or equivalently $-f(y) \ge -f(x) - b^\top (y - x)$.
  • Adding these inequalities together yields $(a - b)^\top (y - x) \le 0$, so the result follows.

Suppose $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is proper convex. @State a result that effectively reduces the subgradient to the normal gradient when $f$ is differentiable, and vice versa.


If $f$ is differentiable at $x$, then $\partial f(x) = \{\nabla f(x)\}$. Likewise, if $\partial f(x) = \{v\}$ for some $v$, then $f$ is differentiable at $x$.

Suppose $f : \mathbb R^n \to \mathbb R \cup \{+\infty\}$ is proper convex. @Prove that:

  1. If $f$ is differentiable at $x$, then $\partial f(x) = \{\nabla f(x)\}$
  2. If $\partial f(x) = \{v\}$ for some $v$, then $f$ is differentiable at $x$.

$f$ differentiable at $x$ implies $\partial f(x) = \{\nabla f(x)\}$:

By convexity, for all $x, y \in \mathbb R^n$ and $t \in [0, 1]$,

\[\begin{aligned} & f((1-t)x + ty) \le (1 - t)f(x) + t f(y) \\\\ \implies& \frac{f(x + t(y - x)) - f(x)}{t} \le f(y) - f(x) \\\\ \end{aligned}\]

Taking the limit at $t \to 0$, we see

\[\nabla f(x)^\top (y - x) \le f(y) - f(x)\]

and hence $\nabla f(x) \in \partial f(x)$.

Likewise, suppose that $\gamma \in \partial f(x)$. Then by the definition of subgradients,

\[\begin{aligned} \gamma^t ( -w) &\le \frac{f(x - tw) - f(x)}{t} \\ &\stackrel{t \to 0^+}\to \nabla f(x)^\top (-w) \end{aligned}\]

But also

\[\begin{aligned} \gamma^\top w &\le \frac{f(x + tw) - f(x)}{t} \\ &\stackrel{t \to 0^-}\to \nabla f(x)^\top w \end{aligned}\]

and hence

\[\nabla f(x)^\top w \le \gamma^\top w \le \nabla f(x)^\top w\]

Therefore

\[\gamma = \nabla f(x)\]

$\partial f(x) = \{v\}$ for some $v$ implies $f$ is differentiable at $x$:

Now suppose that $\partial f(x) = \{v\}$. Then as above, by the definition of subgradients

\[\begin{aligned} v^t ( -w) &\le \frac{f(x - tw) - f(x)}{t} \\ &\stackrel{t \to 0^+}\to \nabla f(x)^\top (-w) \end{aligned}\]

But also

\[\begin{aligned} v^\top w &\le \frac{f(x + tw) - f(x)}{t} \\ &\stackrel{t \to 0^-}\to \nabla f(x)^\top w \end{aligned}\]



Related posts