Notes - Machine Learning MT23, Convex optimisation

What does it mean for a set $C \subseteq \mathbb R^D$ to be convex?

For all $x, y \in C$ and $\lambda \in [0, 1]$,

\[\lambda \cdot \pmb x + (1 - \lambda) \cdot \pmb y \in C\]

In terms of a matrix $\pmb A \in \mathbb R^{m \times n}$ and $\pmb b \in \mathbb R^m$, what is a polyhedron?

The set of points

\[P = \\{ x\in \mathbb R^n \mid \pmb A \cdot \pmb x \le \pmb b\\}\]

What does it mean for a functtion $f : \mathcal C \to \mathbb R^D$ to be convex?

For all $\pmb x, \pmb y \in \mathcal C$ where $f$ defined and $0 \le \lambda \le 1$,

\[f(\lambda\pmb x + (1 - \lambda)\pmb y) \le \lambda f(\pmb x) + (1-\lambda)f(\pmb y)\]

What makes convex optimisation problems nice?

All locally optimal points are globally optimal.

What is a subderivative of a convex function at some non-differentiable point $x _ 0$?

A real number $c$ such that, for all $x$,

\[f(x) \ge f(x_0) + c(x-x_0)\]

Suppose we have a convex function $f : \mathbb R^n \to \mathbb R$. Can you define a subgradient $\pmb g$ of $f$ at point $\pmb w _ 0$?

Any $\pmb g$ satisfying

\[f(\pmb w) \ge f(\pmb w_0) + \pmb g^\top (\pmb w - \pmb w_0)\]

What are subgradient methods in the context of gradient descent?

Convex optimisation methods that generalise gradient descent to non-differentiable functions via subderivatives.

What is the idea behind projective gradient descent for approximating solving convex optimisation problems?

Use standard gradient descent approaches but use a “projection operator” to ensure that it doesn’t leave the feasible region.