Continuous Optimisation HT26, Unconstrained optimality conditions


Flashcards

Consider the unconstrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x)\]

where $f \in \mathcal C^1 (\mathbb R^n)$. @State the first-order necessary conditions for some point $x^\ast$ to be a local minimiser.

\[x^\ast \text{ local minimiser} \implies \nabla f(x^\ast) = 0\]

Consider the unconstrained optimisation problem $\min _ {x \in \mathbb R^n} f(x)$ with $f \in \mathcal C^1(\mathbb R^n)$. @Prove the first-order necessary condition ∆first-order-necessary-condition-statement, i.e. that

\[x^\ast \text{ a local minimiser} \implies \nabla f(x^\ast) = 0.\]

Assume for contradiction that $\nabla f(x^\ast) \ne 0$, and set $s := -\nabla f(x^\ast)$. Then

\[\nabla f(x^\ast)^\top s = -\nabla f(x^\ast)^\top \nabla f(x^\ast) = -\|\nabla f(x^\ast)\|^2 < 0.\]

By ∆negative-inner-product-implies-descent-direction, $f(x^\ast + \alpha s) < f(x^\ast)$ for all sufficiently small $\alpha > 0$. Thus every neighbourhood of $x^\ast$ contains points with strictly smaller objective value, contradicting $x^\ast$ being a local minimiser. Hence $\nabla f(x^\ast) = 0$.

Source: Lecture 1a, Proof of first-order necessary conditions.

@exam~

Consider the unconstrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x)\]

where $f \in \mathcal C^1 (\mathbb R^n)$. @State an exact characterisation for when $x^\ast$ is a stationary point of $f$.

\[x \text{ stationary point of } f \iff \nabla f(x) = 0\]

Suppose:

  • $f \in \mathcal C^1 (\mathbb R^n)$
  • $x \in \mathbb R^n$
  • $s \in \mathbb R^n \setminus \{ 0 \}$

@State a result about always being able to find a descent direction.

If $\nabla f(x)^\top s < 0$, then $f(x + \alpha s) < f(x)$ for all $\alpha > 0$ sufficiently small.

Suppose:

  • $f \in \mathcal C^1 (\mathbb R^n)$
  • $x \in \mathbb R^n$
  • $s \in \mathbb R^n \setminus \{0\}$
  • $\nabla f(x)^\top s < 0$

@Prove that then $f(x + \alpha s) < f(x)$ for all $\alpha > 0$ sufficiently small.

As $f \in \mathcal C^1 (\mathbb R^n)$, there exists some $\bar \alpha > 0$ such that

\[\nabla f(x + \alpha s)^\top s < 0, \quad \forall \alpha \in [0, \bar \alpha]\]

By Taylor’s theorem, we also have

\[f(x + \alpha s) = f(x) + \alpha \nabla f(x + \tilde \alpha s)^\top s\]

for some $\tilde \alpha \in (0, \alpha)$. Then by the above, we have

\[f(x + \alpha s) < f(x) \quad \forall \alpha \in (0, \bar \alpha]\]

When is $-\nabla f(x)$ a descent direction?

Whenever $\nabla f(x) \ne 0$.

Suppose:

  • $f \in \mathcal C^2 (\mathbb R^n)$
  • $x^\ast \in \mathbb R^n$

@State a second-order necessary condition for $x^\ast$ to be a local minimiser of $f$.

\[x^\ast \text{local minimiser} \implies \nabla ^2 f(x^\ast) \text{ positive semidefinite}\]

@exam~

Give an @example of a function that has both the first-order and second-order derivative equal to $0$ at some point $x^\ast$ yet $x^\ast$ is not a local minimiser.

\[f(x) = x^3\]

Suppose $f \in \mathcal C^2 (\mathbb R^n)$. @State second-order sufficient conditions for $x^\ast$ to be a local minimiser of $f$.

\[\nabla f(x^\ast) = 0, \nabla^2 f(x^\ast) \succ 0 \implies x^\ast \text{ strict local minimiser of }f \]

@exam~

Give an @example to show that the sufficient conditions for $x^\ast$ to be a local minimiser of $f$ given by $\nabla f(x^\ast) = 0$ and $\nabla^2 f(x^\ast) \succ 0$ are not necessary.

\[f(x) = x^4\]

Suppose:

  • $f \in \mathcal C^2 (\mathbb R^n)$
  • $x^\ast \in \mathbb R^n$

@Prove the second-order necessary conditions for $x^\ast$ to be a local minimiser of $f$, i.e. that

\[x^\ast \text{ a local minimiser of } f \implies \nabla^2 f(x^\ast) \text{ is positive semidefinite}.\]

Assume for a contradiction that there exists $s \in \mathbb R^n$ with $s^\top \nabla^2 f(x^\ast) s < 0$. Then since $s \ne 0$ and $f \in \mathcal C^2$, we have that there exists some $\hat \alpha > 0$ such that

\[s^\top \nabla^2 f(x^\ast + \alpha s)s < 0 \quad\text{for all }\alpha \in [0, \hat \alpha]\]

Let $\alpha \in [0, \hat \alpha]$. Using $\nabla f(x^\ast) = 0$ (first-order necessary), the Taylor expansion around $x = x^\ast$ gives

\[f(x^\ast + \alpha s) = f(x^\ast) + \frac{\alpha^2}{2} s^\top \nabla^2 f(x^\ast + \tilde \alpha s) s\]

for some $\tilde \alpha \in (0, \alpha)$. Since $0 < \tilde \alpha < \alpha \le \hat \alpha$, by the continuity of $\nabla^2 f$, we obtain $s^\top \nabla^2 f(x^\ast + \tilde \alpha s) s < 0$. This implies $f(x^\ast + \alpha s) < f(x^\ast)$, and this holds for all $\alpha \in (0, \hat \alpha]$. Contradiction, as $x^\ast$ is a local minimiser.

@exam~

Suppose:

  • $f \in \mathcal C^2(\mathbb R^n)$
  • $x^\ast \in \mathbb R^n$

@Prove the second order sufficient conditions that if

\[\nabla f(x^\ast) = 0, \nabla^2 f(x^\ast) \succ 0 \implies x^\ast \text{ strict local minimiser of }f \]

The continuity and positive definiteness imply there exists a neighbourhood $\mathcal N(x^\ast, \delta)$ of $x^\ast$ such that

\[\nabla^2 f(x^\ast + s) \succ 0 \text{ for all }x^\ast + s \in \mathcal N(x^\ast, \delta)\]

By the second-order Taylor expansion we have

\[f(x + \alpha s) = f(x) + \alpha s^\top \nabla f(x) + \frac{\alpha^2}{2} s^\top \nabla^2 f(x + \tilde \alpha s) s\]

Setting $x = x^\ast$ and $\alpha = 1$. Then for any $s$ with $x^\ast + s \in \mathcal N(x^\ast, \delta)$, we obtain

\[f(x^\ast + s) = f(x^\ast) + s^\top \nabla f(x^\ast) + \frac 1 2 s^\top \nabla^2 f(x^\ast + \tilde \alpha s) s\]

for some $\tilde \alpha \in (0, 1)$.

Then note that

\[ \vert \vert x^\ast + \tilde \alpha s - x^\ast \vert \vert = \tilde \alpha \vert \vert s \vert \vert \le \delta\]

since $\tilde \alpha \in (0, 1)$ and $x^\ast + s \in \mathcal N(x^\ast, \delta)$ (so that $ \vert \vert s \vert \vert \le \delta)$, thus $x^\ast + \tilde \alpha s \in \mathcal N(x^\ast, \delta)$, which ensures that $\nabla^2 f(x^\ast + \tilde \alpha s) \succ 0$.

This, the above result and $\nabla f(x^\ast) = 0$ imply $f(x^\ast + s) > f(x^\ast)$ for all $s \ne 0$ with $x^\ast + s \in \mathcal N(x^\ast, \delta)$.

@exam~

Suppose:

  • $H \in \mathbb R^{n \times n}$ is a symmetric matrix
  • $g \in \mathbb R^n$
  • $q(x) := \frac 1 2 x^\top H x + g^\top x$ is a quadratic function
  • $H$ is nonsingular

@State the unique stationary point $x^\ast$ of $q$ and sufficient conditions for it to be a minimiser, maximiser and a saddle point.

\[x^\ast = -H^{-1}g\]
  • $H$ positive definite $\implies x^\ast$ minimiser
  • $H$ negative definite $\implies x^\ast$ maximiser
  • $H$ indefinite $\implies x^\ast$ saddle point

Suppose:

  • $H \in \mathbb R^{n \times n}$ is a symmetric matrix
  • $g \in \mathbb R^n$
  • $q(x) := \frac 1 2 x^\top H x + g^\top x$ is a quadratic function
  • $H$ is singular

Under what conditions will $q$ have a solution and what type of solution will it be?

$g + Hx = 0$ must be consistent.

Bite-sized

For the quadratic $q(x) := g^\top x + \tfrac{1}{2} x^\top H x$ with $H$ symmetric, $\nabla q(x) = <span class="cloze" tabindex="0">Hx + g</span>$ and $\nabla^2 q(x) = <span class="cloze" tabindex="0">H</span>$ for all $x \in \mathbb R^n$.

Source: Lecture 2, Stationary points of quadratic functions discussion.

@bite~

Stationary points of the quadratic $q(x) := g^\top x + \tfrac{1}{2} x^\top H x$ are exactly the solutions of the linear system $Hx + g = 0$.

Source: Lecture 2, Stationary points of quadratic functions discussion.

@bite~

The second-order sufficient condition ($\nabla f(x^\ast) = 0$ and $\nabla^2 f(x^\ast) \succ 0$) implies $x^\ast$ is a strict local minimiser — strictly better than every nearby point. The second-order necessary condition only requires $\nabla^2 f(x^\ast) \succeq 0$, with no strictness guarantee.

Source: Lecture 2, Second-order sufficient conditions slide.

@bite~

The second-order necessary condition for $x^\ast$ to be a local minimiser is $\nabla^2 f(x^\ast) \succeq 0$ (PSD); the sufficient condition demands $\nabla^2 f(x^\ast) \succ 0$ (PD). Give a one-variable example showing each gap.

  • PSD necessary, not sufficient: $f(x) = x^3$ at $x^\ast = 0$. Here $f'(0) = f''(0) = 0$, so the second-order necessary conditions are satisfied vacuously, yet $x^\ast = 0$ is not a local minimiser (it is an inflection point).
  • PD sufficient, not necessary: $f(x) = x^4$ at $x^\ast = 0$. Here $f''(0) = 0$ (only PSD, not PD), yet $x^\ast = 0$ is a strict local minimiser.

The PSD↔PD gap is exactly the room left for borderline cases like $x^3$ and $x^4$ where the second derivative vanishes.

Source: Lecture 2, examples following the second-order conditions slides; cf. ∆x-cubed-second-order-necessary-counterexample and ∆x-fourth-second-order-sufficient-counterexample.

@bite~

Near any stationary point $x^\ast$ of a $\mathcal C^2$ function $f$, the second-order Taylor expansion gives the local quadratic approximation

\[f(x) \approx <span class="cloze" tabindex="0">f(x^\ast) + \tfrac{1}{2} (x - x^\ast)^\top \nabla^2 f(x^\ast) (x - x^\ast)</span>\]

(the linear term vanishes because $\nabla f(x^\ast) = 0$). This is why the quadratic case is the prototype for general unconstrained behaviour near a critical point.

Source: Lecture 2, closing remark “Near a stationary point $x^\ast$, $f$ is approximately locally quadratic.”

@bite~

Give the proof strategy for ∆descent-direction-lemma-proof (Lemma 1: if $\nabla f(x)^\top s < 0$ then $f(x + \alpha s) < f(x)$ for all sufficiently small $\alpha > 0$).

  • Use continuity of $\nabla f$: since $\nabla f(x)^\top s < 0$ and the map $\alpha \mapsto \nabla f(x + \alpha s)^\top s$ is continuous, there exists $\bar\alpha > 0$ with $\nabla f(x + \alpha s)^\top s < 0$ for all $\alpha \in [0, \bar\alpha]$.
  • Apply the mean-value form of the first-order Taylor expansion: $f(x + \alpha s) = f(x) + \alpha \nabla f(x + \tilde\alpha s)^\top s$ for some $\tilde\alpha \in (0, \alpha)$.
  • Combine: for $\alpha \in (0, \bar\alpha]$, $\tilde\alpha \in (0, \bar\alpha)$ too, so the gradient inner product is still negative — hence $f(x + \alpha s) < f(x)$.

Source: Lecture 1, Descent Directions — Lemma 1 proof.

@bite~ @proofsupport~

Give the proof strategy for ∆second-order-necessary-condition-proof (if $x^\ast$ is a local minimiser of $f \in \mathcal C^2$, then $\nabla^2 f(x^\ast) \succeq 0$).

  • Argue by contradiction: assume there exists $s \in \mathbb R^n$ with $s^\top \nabla^2 f(x^\ast) s < 0$.
  • Use continuity of $\nabla^2 f$ to extend: there exists $\hat\alpha > 0$ with $s^\top \nabla^2 f(x^\ast + \alpha s) s < 0$ for all $\alpha \in [0, \hat\alpha]$.
  • Apply the second-order Taylor expansion at $x^\ast$. The linear term $\alpha \nabla f(x^\ast)^\top s$ vanishes by the first-order necessary condition, leaving $f(x^\ast + \alpha s) = f(x^\ast) + \tfrac{\alpha^2}{2} s^\top \nabla^2 f(x^\ast + \tilde\alpha s) s$ for some $\tilde\alpha \in (0, \alpha)$.
  • Since $\tilde\alpha \in (0, \hat\alpha)$, the Hessian term is negative, so $f(x^\ast + \alpha s) < f(x^\ast)$ on $(0, \hat\alpha]$ — contradicting that $x^\ast$ is a local minimiser.

Source: Lecture 2, Proof of second-order necessary conditions.

@bite~ @proofsupport~

The reduced Taylor identity driving the proof of ∆second-order-necessary-condition-proof. Using the first-order necessary condition $\nabla f(x^\ast) = 0$, the second-order Taylor expansion at $x^\ast$ simplifies to

\[f(x^\ast + \alpha s) = f(x^\ast) + <span class="cloze" tabindex="0">\tfrac{\alpha^2}{2} s^\top \nabla^2 f(x^\ast + \tilde\alpha s) s</span>\]

for some $\tilde\alpha \in (0, \alpha)$. The whole proof hinges on extracting the sign of the Hessian-curvature term.

Source: Lecture 2, Proof of second-order necessary conditions, equation (7).

@bite~ @proofsupport~

In ∆second-order-necessary-condition-proof, where does the first-order necessary condition $\nabla f(x^\ast) = 0$ enter, and what changes if we drop it?

  • It enters in the Taylor expansion: starting from the general $f(x^\ast + \alpha s) = f(x^\ast) + \alpha \nabla f(x^\ast)^\top s + \tfrac{\alpha^2}{2} s^\top \nabla^2 f(\cdot) s$, the linear term $\alpha \nabla f(x^\ast)^\top s$ vanishes only because $\nabla f(x^\ast) = 0$. This is what isolates the quadratic curvature term and makes it the sole determinant of sign.
  • Without it: the linear term $\alpha \nabla f(x^\ast)^\top s$ could be positive or negative; the curvature alone wouldn’t control $f(x^\ast + \alpha s) - f(x^\ast)$ for small $\alpha$, since for sufficiently small $\alpha$ the linear term dominates. The conclusion (PSD Hessian) would not follow.
  • Practically: the second-order necessary condition is always paired with the first-order one — together they say “local min ⟹ stationary point with PSD Hessian”.

Source: Lecture 2, Proof of second-order necessary conditions.

@bite~ @proofsupport~

Give the proof strategy for ∆second-order-sufficient-condition-proof (if $\nabla f(x^\ast) = 0$ and $\nabla^2 f(x^\ast) \succ 0$, then $x^\ast$ is a strict local minimiser).

  • Use continuity of $\nabla^2 f$ + PD at $x^\ast$ to obtain a neighbourhood $\mathcal N(x^\ast, \delta)$ on which $\nabla^2 f \succ 0$ everywhere.
  • Apply the second-order Taylor expansion at $x^\ast$ with $\alpha = 1$. The linear term vanishes by $\nabla f(x^\ast) = 0$, giving $f(x^\ast + s) = f(x^\ast) + \tfrac{1}{2} s^\top \nabla^2 f(x^\ast + \tilde\alpha s) s$ for some $\tilde\alpha \in (0, 1)$.
  • Show that $x^\ast + \tilde\alpha s$ stays in $\mathcal N(x^\ast, \delta)$: since $\tilde\alpha \in (0, 1)$ and $\|s\| \le \delta$, we get $\|\tilde\alpha s\| < \delta$, so the Hessian there is PD.
  • Conclude $s^\top \nabla^2 f(x^\ast + \tilde\alpha s) s > 0$ for $s \ne 0$, hence $f(x^\ast + s) > f(x^\ast)$ — strict local minimiser.

Source: Lecture 2, Proof of second-order sufficient conditions.

@bite~ @proofsupport~