Continuous Optimisation HT26, Constrained optimality conditions for convex problems


Flashcards

Linear constraints

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @State a theorem about the necessity of the ∆kkt-conditions when the constraints are linear.

Suppose:

  • $(c _ E, c _ I)(x) := Ax - b$

Then:

\[x^\ast \text{ local minimiser} \implies x^\ast \text{ is a KKT point}\]

Consider $\min _ {x \in \mathbb R^n} f(x)$ subject to $(c _ E, c _ I)(x) := Ax - b$ (all constraints affine), with $f \in \mathcal C^1$. @Prove ∆kkt-necessary-for-linearly-constrained-problem, i.e. that

\[x^\ast \text{ local minimiser} \implies x^\ast \text{ is a KKT point},\]

with no constraint qualification assumed.

It suffices to show that the ∆constraint-qualifications condition $T _ \Omega(x^\ast) = \mathcal F(x^\ast)$ holds automatically for affine constraints; then ∆kkt-necessity-theorem-statement (which only needs a CQ) applies and $x^\ast$ is a KKT point.

The inclusion $T _ \Omega(x^\ast) \subseteq \mathcal F(x^\ast)$ always holds, so we only show $\mathcal F(x^\ast) \subseteq T _ \Omega(x^\ast)$. Let $s \in \mathcal F(x^\ast)$ be a ∆set-of-linearised-feasible-directions, i.e.

\[A _ E s = 0, \qquad a _ i^\top s \ge 0 \ \text{ for active } i \in \mathcal A(x^\ast) \cap I,\]

where $a _ i^\top$ is row $i$ of $A$. Take the straight-line path $x(\alpha) := x^\ast + \alpha s$, $\alpha \ge 0$. Because the constraints are affine, the first-order linearisation is exact (there is no $\mathcal O(\alpha^2)$ remainder):

\[c(x(\alpha)) = A(x^\ast + \alpha s) - b = \underbrace{(Ax^\ast - b)} _ {=\, c(x^\ast)} + \alpha A s.\]

Checking that $x(\alpha)$ is feasible for small $\alpha \ge 0$:

  • Equalities ($i \in E$): $c _ i(x(\alpha)) = c _ i(x^\ast) + \alpha a _ i^\top s = 0 + 0 = 0$.
  • Active inequalities ($i \in \mathcal A(x^\ast) \cap I$): $c _ i(x(\alpha)) = 0 + \alpha a _ i^\top s \ge 0$, since $a _ i^\top s \ge 0$ and $\alpha \ge 0$.
  • Inactive inequalities ($c _ i(x^\ast) > 0$): $c _ i(x(\alpha)) > 0$ for $\alpha$ small, by continuity.

So $x(\alpha) \in \Omega$ for all small $\alpha \ge 0$, and $x'(0) = s$, giving $s \in T _ \Omega(x^\ast)$ (∆tangent-cone). Hence $\mathcal F(x^\ast) \subseteq T _ \Omega(x^\ast)$, so $T _ \Omega(x^\ast) = \mathcal F(x^\ast)$: the CQ holds with no extra hypothesis.

This is exactly the feasible-path mechanism of the equality case ∆kkt-necessity-equality-only-proof, except that affineness lets the path be a straight line whose feasibility is exact rather than only first-order — which is precisely why no constraint qualification need be assumed. Applying ∆kkt-necessity-theorem-statement now gives that $x^\ast$ is a KKT point.

Source: Lectures 10-11, Theorem 17 (linearly constrained problems); affine constraints satisfy a CQ automatically.

@exam~

Suppose we have the linearly constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) := A _ E x - b _ E = 0, \; c _ I(x) := A _ I x - b _ I \ge 0\]

with $A _ E \in \mathbb R^{m \times n}$, $b _ E \in \mathbb R^m$, $A _ I \in \mathbb R^{p \times n}$, $b _ I \in \mathbb R^p$.

@State the ∆kkt-conditions specialised to this linear case.

Let $A = (A _ E, A _ I)$ and $b = (b _ E, b _ I)$ stack the equality and inequality blocks. Then $x^\ast$ is a KKT point iff there exists $(y^\ast, \lambda^\ast) \in \mathbb R^m \times \mathbb R^p$ such that

  • Stationarity: $\nabla f(x^\ast) = A _ E^\top y^\ast + A _ I^\top \lambda^\ast$
  • Primal feasibility: $A _ E x^\ast - b _ E = 0$, $A _ I x^\ast - b _ I \ge 0$
  • Dual feasibility: $\lambda^\ast \ge 0$
  • Complementary slackness: $(\lambda^\ast)^\top (A _ I x^\ast - b _ I) = 0$

The linear constraints have constant gradients ($\nabla c _ j(x)$ = row $j$ of $A _ E$ for $j \in E$, similarly for $i \in I$), so the stationarity condition’s constraint-gradient sum collapses to the matrix expression $A _ E^\top y^\ast + A _ I^\top \lambda^\ast$. No CQ is required for necessity in the linear case — see ∆kkt-necessary-for-linearly-constrained-problem.

Convex programming problems

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @Define what it means for this optimisation problem to be a convex programming problem.

  • $f$ is a convex function
  • $c _ i$ is a concave function for all $i \in I$
  • $c _ E(x) = Ax - b$ (affine equality constraints)

Caveat on generality: The truly general notion of convex programming is just “minimise a convex objective over a convex feasible set”. The lecturer’s functional definition is sufficient for this, but not necessary.

  • $c _ i$ concave is sufficient for $\{x : c _ i(x) \ge 0\}$ to be convex, but the weakest functional condition is $c _ i$ quasi-concave (i.e. super-level sets are convex), which is strictly more general. For example, $c(x) = x^3$ on $\mathbb R$ is not concave, but $\{x : x^3 \ge 0\} = \{x \ge 0\}$ is convex.
  • Affine $c _ E$ is essentially necessary for the equality set $\{c _ E(x) = 0\}$ to be convex in general (any nonlinear $c _ E$ typically gives a non-convex level set).

Why the stronger definition matters for the sufficiency theorem: the proof of ∆kkt-condition-sufficient-when-convex-proof doesn’t just invoke convexity of $\Omega$ and $f$ — it uses the functional concavity of $c _ i$ via the first-order upper bound $c _ i(x) \le c _ i(\hat x) + \nabla c _ i(\hat x)^\top (x - \hat x)$, and the affineness of $c _ E$ via $A(x - \hat x) = 0$ from primal feasibility. The natural hierarchy is:

  • Variational sufficiency: “$\nabla f(\hat x)^\top (x - \hat x) \ge 0$ for all $x \in \Omega$” + convex $f$ + convex $\Omega$ ⟹ $\hat x$ is a global minimiser. This needs only the geometric facts — quasi-concave $c _ i$ and even nonlinear $c _ E$ with convex level set would do.
  • KKT sufficiency (the lecture’s theorem): KKT stationarity (with multipliers and constraint gradients) ⟹ global min. To convert “$\nabla f = A^\top \hat y + \sum _ i \hat\lambda _ i \nabla c _ i(\hat x)$” back into “$\nabla f(\hat x)^\top (x - \hat x) \ge 0$ on $\Omega$” requires precisely the gradient inequalities that concave $c _ i$ + affine $c _ E$ deliver.

So the lecturer’s stronger functional definition is exactly what bridges the KKT-multiplier form to variational sufficiency. Quasi-concave $c _ i$ makes $\Omega$ convex but breaks the gradient inequality the KKT proof relies on, so the standard proof fails even though the variational form would still go through.

@exam~

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall that this is a convex program if $f$ is a convex function, $c _ i$ is a concave function for all $i \in I$, and $c _ E(x) = Ax - b$. @State a result about what this implies about the structure of $\Omega$.

$\Omega$ is convex.

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Suppose further that this is a convex program, so that:

  • $f$ is a convex function
  • $c _ i$ is a concave function for all $i \in I$
  • $c _ E(x) = Ax - b$

@Justify that this implies $\Omega$ is convex.

Let $x, y \in \Omega$, and define $z = (1 - \alpha) x + \alpha y$ for any $\alpha \in [0, 1]$. We aim to show $z \in \Omega$ for all $\alpha \in [0, 1]$. Note that

\[\begin{aligned} c _ E(z) &= A((1-\alpha)x + \alpha y) - b \\ &= (1-\alpha)(Ax - b) + \alpha (Ay - b) \\ &= 0 \end{aligned}\]

since $Ax - b = 0$ and $Ay - b = 0$. So $z$ satisfies the equality constraints. Furthermore, for each $i \in I$,

\[\begin{aligned} c _ i(z) &= c _ i((1-\alpha)x + \alpha y) \\ &\ge (1-\alpha)c _ i(x) + \alpha c _ i(y) & (\text{convexity})\\ &\ge 0 & (x, y \in \Omega) \end{aligned}\]

Hence $z \in \Omega$.

@exam~

KKT points of convex problems are sufficient for global minimisation

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @State a theorem that says the ∆kkt-conditions are sufficient for optimality in convex problems.

Suppose further that this is a convex program, so that:

  • $f$ is a convex function
  • $c _ i$ is a concave function for all $i \in I$
  • $c _ E(x) = Ax - b$
  • $\hat x$ is a KKT point (i.e. satisfies the ∆kkt-conditions)

Then:

  • $\hat x$ is a global minimiser of the constrained optimisation problem

Caveat on what the proof actually uses (full discussion in ∆convex-programming-problem): the lecturer’s functional definition of a convex program — concave $c _ i$, affine $c _ E$ — is strictly stronger than the geometric condition “convex $f$ on convex $\Omega$”, and the proof (∆kkt-condition-sufficient-when-convex-proof) genuinely uses the stronger version: concavity of $c _ i$ via the first-order upper bound $c _ i(x) \le c _ i(\hat x) + \nabla c _ i(\hat x)^\top (x - \hat x)$, and affineness of $c _ E$ via $A(x - \hat x) = 0$. Replacing concave $c _ i$ with merely quasi-concave $c _ i$ (which would still give convex $\Omega$), or nonlinear $c _ E$ with the same level set, would break the gradient-based KKT-multiplier argument — even though a more general variational sufficiency theorem ($\nabla f(\hat x)^\top (x - \hat x) \ge 0$ on convex $\Omega$ + convex $f$ ⟹ global min) would still go through on the weaker geometric assumptions. The lecturer’s stronger functional definition is what translates KKT-with-multipliers into variational sufficiency.

@exam~

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @Prove that if further this is a convex program, so that:

  • $f$ is a convex function
  • $c _ i$ is a concave function for all $i \in I$
  • $c _ E(x) = Ax - b$
  • $\hat x$ is a KKT point (i.e. satisfies the ∆kkt-conditions)

then:

  • $\hat x$ is a global minimiser of the constrained optimisation problem

Recall that for a convex function $f$, we have ∆convex-functions-have-first-order-lower-bound so that

\[f(x) \ge f(\hat x) + \nabla f(\hat x)^\top (x - \hat x)\]

for all $x \in \mathbb R^n$. By the KKT conditions, we also have

\[\nabla f(\hat x) = A^\top \hat y + \sum _ {i \in I} \hat \lambda _ i \nabla c _ i(\hat x)\]

and thus combining these we obtain

\[\begin{aligned} f(x) &\ge f(\hat x) + (A^\top \hat y)^\top (x - \hat x) + \sum _ {i \in I} \hat \lambda _ i \left( \nabla c _ i(\hat x)^\top(x - \hat x) \right) \\ &= f(\hat x) + \hat y A (x - \hat x) + \sum _ {i \in I} \hat \lambda _ i (\nabla c _ i(\hat x)^\top (x - \hat x)) \end{aligned}\]

Let $x \in \Omega$ be arbitrary. Then $Ax = b$ and $c(x) \ge 0$. Then $Ax = b$ and $A \hat x = b$ imply together that $A(x - \hat x) = 0$.

Furthermore

\[\begin{aligned} c _ i \text{ concave} &\implies c _ i(x) \le c _ i(\hat x) + \nabla c _ i(\hat x)^\top (x - \hat x) \\ &\implies \nabla c _ i(\hat x)^\top (x - \hat x) \ge c _ i(x) - c _ i(\hat x) \\ &\implies \hat \lambda _ i(\nabla c _ i(\hat x)^\top(x - \hat x)) \ge \hat \lambda _ i (c _ i(x) - c _ i(\hat x)) = \hat \lambda _ i c _ i(x) \ge 0 \end{aligned}\]

since $\hat \lambda \ge 0$, $\hat \lambda _ i c _ i(x) = 0$ and $c(x) \ge 0$. Thus from (4) and (5), we have $f(x) \ge f(\hat x)$.

@exam~

Quadratic programming

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @Define what it means for this optimisation problem to be a quadratic programming problem.

\[\min _ {x \in \mathbb R^n} c^\top x + \frac 1 2x^\top H x \quad \text{ s.t. } \quad Ax = b, \tilde A x \ge \tilde b\]

Suppose we have the quadratic programming problem

\[\min _ {x \in \mathbb R^n} c^\top x + \frac 1 2x^\top H x \quad \text{ s.t. } \quad Ax = b, \tilde A x \ge \tilde b\]

@State the KKT conditions in this special case.

$\hat x$ is a KKT point of a quadratic programming problem iff $\exists (\hat y, \hat \lambda) \in \mathbb R^m \times \mathbb R^p$ such that

  • $H\hat x + c = A^\top \hat y + \tilde A^\top \hat \lambda$
  • $A\hat x = b$, $\tilde A \hat x \ge \tilde b$
  • $\hat \lambda \ge 0$, $\hat \lambda^\top (\tilde A \hat x - \tilde b) = 0$

Duality theory for quadratic programming problems

Suppose we have the quadratic programming problem

\[\min _ {x \in \mathbb R^n} c^\top x + \frac 1 2x^\top H x \quad \text{ s.t. } \quad Ax = b, \tilde A x \ge \tilde b\]

The KKT conditions in this case say that $\hat x$ is a KKT point of a quadratic programming problem iff $\exists (\hat y, \hat \lambda) \in \mathbb R^m \times \mathbb R^p$ such that

  • $H\hat x + c = A^\top \hat y + \tilde A^\top \hat \lambda$
  • $A\hat x = b$, $\tilde A \hat x \ge \tilde b$
  • $\hat \lambda \ge 0$, $\hat \lambda^\top (\tilde A \hat x - \tilde b) = 0$

Suppose further that $A := 0$ and $H \succ 0$. @State the primal and dual problem in this context, and state a result about when the optimal solution of the primal and dual problems will coincide.

Primal problem:

\[\min _ {x \in \mathbb R^n} c^\top x + \frac 1 2 x^\top H x \quad\text{ s.t. }\quad \tilde A x \ge \tilde b\]

Dual problem:

\[\max _ {(x, \lambda) \in \mathbb R^n \times \mathbb R} -\frac 1 2 x^\top H x + \tilde b^\top \lambda \quad\text{ s.t. }\quad {-Hx}+\tilde A^\top \lambda = c, \lambda \ge 0\]

Provided these maxima and minima exist, then they are equal in this case.

For the (strictly convex) QP $\min _ {x \in \mathbb R^n} c^\top x + \tfrac 1 2 x^\top H x$ subject to $\tilde A x \ge \tilde b$, with $H \succ 0$ (the $A := 0$ specialisation of ∆qp-duality-statement), @Prove that the Lagrangian dual is

\[\max _ {(x, \lambda)} -\tfrac 1 2 x^\top H x + \tilde b^\top \lambda \quad \text{s.t.} \quad -Hx + \tilde A^\top \lambda = c, \ \lambda \ge 0,\]

and prove weak duality (dual optimum $\le$ primal optimum).

Lagrangian and dual function: with multiplier $\lambda \ge 0$ for $\tilde A x \ge \tilde b$,

\[\mathcal L(x, \lambda) = c^\top x + \tfrac 1 2 x^\top H x - \lambda^\top(\tilde A x - \tilde b) = (c - \tilde A^\top \lambda)^\top x + \tfrac 1 2 x^\top H x + \tilde b^\top \lambda,\]

and the dual function is $g(\lambda) := \min _ x \mathcal L(x, \lambda)$.

Derivation (minimise over $x$): $\mathcal L$ is strictly convex in $x$ (since $H \succ 0$), so its unconstrained minimiser solves

\[\nabla _ x \mathcal L = Hx + c - \tilde A^\top \lambda = 0 \quad\Longleftrightarrow\quad -Hx + \tilde A^\top \lambda = c. \quad (\dagger)\]

By $(\dagger)$, $c - \tilde A^\top \lambda = -Hx$, so the linear term is $(c - \tilde A^\top \lambda)^\top x = -x^\top H x$ and

\[g(\lambda) = -x^\top H x + \tfrac 1 2 x^\top H x + \tilde b^\top \lambda = -\tfrac 1 2 x^\top H x + \tilde b^\top \lambda.\]

The dual is $\max _ {\lambda \ge 0} g(\lambda)$; carrying $x$ as a variable tied to $\lambda$ by the stationarity equation $(\dagger)$ gives exactly the stated dual.

Weak duality: let $\lambda \ge 0$ and let $x^\ast$ be any primal-feasible point, so $\tilde A x^\ast - \tilde b \ge 0$. Then

\[g(\lambda) = \min _ x \mathcal L(x, \lambda) \le \mathcal L(x^\ast, \lambda) = f(x^\ast) - \underbrace{\lambda^\top(\tilde A x^\ast - \tilde b)} _ {\ge\, 0} \le f(x^\ast).\]

Taking $x^\ast$ to be the primal minimiser ($f(x^\ast) = p^\ast$) and maximising the left-hand side over $\lambda \ge 0$,

\[d^\ast = \max _ {\lambda \ge 0} g(\lambda) \le p^\ast,\]

i.e. every dual-feasible value is a lower bound on the primal optimum.

Strong duality ($d^\ast = p^\ast$) additionally holds here because $H \succ 0$ makes this a Slater-feasible convex QP; that equality is stated in lectures (∆qp-duality-statement), not proved.

Source: Lectures 10-11, Duality theory for QP problems (dual derivation; weak-duality lower bound $\min _ x \mathcal L(x, \lambda) \le f(x^\ast)$), the latter examined in Continuous Optimisation 2023, Q3(a)(ii).

@exam~

Second-order optimality conditions

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall that for a convex programming problem, the ∆kkt-conditions are already sufficient for global optimality (∆kkt-condition-sufficient-when-convex).

@Justify why in general (when the problem is not convex) we still need to consider second-order conditions about curvature, even when constraint qualifications hold, and @define the critical cone as the directions of curvature along which we need to check.

Setup: let $x^\ast$ be a KKT point with Lagrange multipliers $(y^\ast, \lambda^\ast) \in \mathbb R^m \times \mathbb R^p$ from the ∆kkt-conditions, so

\[\nabla f(x^\ast) = J _ E(x^\ast)^\top y^\ast + J _ I(x^\ast)^\top \lambda^\ast, \quad \lambda^\ast \ge 0, \quad \lambda _ i^\ast c _ i(x^\ast) = 0.\]

Here $y^\ast \in \mathbb R^m$ multiplies the equality constraints (sign-unrestricted) and $\lambda^\ast \in \mathbb R^p$ the inequalities (non-negative, with complementary slackness killing the inactive ones).

Let $\mathcal F(x^\ast)$ be the ∆set-of-linearised-feasible-directions at $x^\ast$:

\[\mathcal F(x^\ast) = \{s : J _ E(x^\ast)s = 0, \; s^\top \nabla c _ i(x^\ast) \ge 0 \text{ for } i \in \mathcal A(x^\ast) \cap I\}\]

i.e. directions along which equalities stay at zero and active inequalities stay non-negative, to first order. Under a CQ, $\mathcal F(x^\ast) = T _ \Omega(x^\ast)$ (the ∆tangent-cone), so these are exactly the directions a feasible perturbation can actually take.

Calculation: for any $s \in \mathcal F(x^\ast)$, substituting KKT stationarity and using complementary slackness ($\lambda _ i^\ast = 0$ for inactive $i$):

\[\begin{aligned} s^\top \nabla f(x^\ast) &= s^\top J _ E(x^\ast)^\top y^\ast + \sum _ {i \in \mathcal A(x^\ast) \cap I} \lambda _ i^\ast \, s^\top \nabla c _ i(x^\ast) \\ &= \underbrace{(J _ E(x^\ast) s)^\top y^\ast} _ {=\, 0,\; s \in \mathcal F} + \sum _ {i \in \mathcal A(x^\ast) \cap I} \underbrace{\lambda _ i^\ast} _ {\ge\, 0} \underbrace{s^\top \nabla c _ i(x^\ast)} _ {\ge\, 0} \\ &\ge 0. \end{aligned}\]

Conclusion: along any $s \in \mathcal F(x^\ast)$, one of two things happens:

  • $s^\top \nabla f(x^\ast) > 0$: $f$ strictly increases to first order along $s$, no curvature check needed.
  • $s^\top \nabla f(x^\ast) = 0$: first-order information is silent — only the Hessian of the Lagrangian determines whether $f$ increases or decreases.

The latter set is exactly the critical cone

\[F(\lambda^\ast) = \{s \in \mathcal F(x^\ast) : s^\top \nabla c _ i(x^\ast) = 0 \text{ for all } i \in \mathcal A(x^\ast) \cap I \text{ with } \lambda _ i^\ast > 0\}\]

(see ∆critical-cone): for active inequalities with $\lambda _ i^\ast > 0$, the inequality $\lambda _ i^\ast s^\top \nabla c _ i(x^\ast) \ge 0$ above must be tight (equal to zero) to allow the sum to vanish. These are precisely the directions along which curvature actually matters.

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @State a theorem giving second-order necessary conditions for constrained optimisation problems.

Suppose:

  • Some ∆constraint-qualifications hold for this problem
  • $x^\ast$ is a local minimiser
  • $(y^\ast, \lambda^\ast)$ are the Lagrange multipliers of the KKT conditions at $x^\ast$

Then:

\[s^\top \nabla^2 _ {xx} \mathcal L(x^\ast, y^\ast, \lambda^\ast) s \ge 0 \quad\text{for all }s\in F(\lambda^\ast)\]

where $\mathcal L(x, y, \lambda) = f(x) - y^\top c _ E(x) - \lambda^\top c _ I(x)$ is the Lagrangian function.

@exam~

Suppose we have the constrained optimisation problem

\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @State a theorem giving second-order sufficient conditions for constrained optimisation problems.

Suppose:

  • $x^\ast$ is a feasible point of the problem
  • $(y^\ast, \lambda^\ast)$ are such that the KKT conditions are satisfied by $(x^\ast, y^\ast, \lambda^\ast)$
  • $s^\top \nabla^2 _ {xx} \mathcal L(x^\ast, y^\ast, \lambda^\ast) s > 0$ for all $s \in \mathcal F(\lambda^\ast)$, $s \ne 0$.

Then:

  • $x^\ast$ is a constrained local minimiser of $f$

@exam~

@Justify, via an example, why second-order constrained optimality conditions involve $\nabla^2 _ {xx} \mathcal L$ (the Lagrangian Hessian), rather than $\nabla^2 f$.

Consider $\min _ x \, x _ 1$ subject to either:

  • (blue) $c(x) = 1 - (x _ 1^2 + x _ 2^2) = 0$, the unit circle centred at the origin.
  • (red) $c(x) = 1 - ((x _ 1 + 2)^2 + x _ 2^2) = 0$, the unit circle centred at $(-2, 0)$.

Figure: the blue unit circle (centre origin) and the red unit circle (centre $(-2, 0)$) meeting at the single point $x^\ast = (-1, 0)$; the objective $x _ 1$ decreases to the left, so $x^\ast$ is the leftmost (minimal-$x _ 1$) point of the blue circle and the rightmost (maximal-$x _ 1$) point of the red circle.

Both circles pass through $x^\ast = (-1, 0)$, and at this point both ∆kkt-conditions hold, but with different multipliers (the constraint is an equality, so the multiplier is sign-unrestricted):

  • Blue: $\nabla c(x^\ast) = (2, 0)$, so stationarity $(1, 0) = \lambda^\ast (2, 0)$ gives $\lambda^\ast = 1/2 > 0$; $x^\ast$ is the global minimiser (leftmost point of the blue circle).
  • Red: $\nabla c(x^\ast) = (-2, 0)$, so stationarity $(1, 0) = \lambda^\ast (-2, 0)$ gives $\lambda^\ast = -1/2 < 0$; $x^\ast$ is a maximiser (rightmost point of the red circle).

But $f(x) = x _ 1$ is linear, so $\nabla^2 f(x) = 0$ everywhere, and it cannot distinguish min from max. The distinction is visible in the Lagrangian Hessian:

\[\nabla^2 _ {xx} \mathcal L(x, \lambda) = \nabla^2 f(x) - \lambda \nabla^2 c(x) = 0 - \lambda \cdot (-2I) = 2\lambda I.\]

This is $2\lambda I = I \succ 0$ for the blue case ($\lambda^\ast = 1/2$), so ∆kkt-conditions-sufficient-second-order confirms $x^\ast$ is a local min; and $2\lambda I = -I \prec 0$ for the red case ($\lambda^\ast = -1/2$), so ∆kkt-conditions-sufficient-second-order fails, correctly since $x^\ast$ is a maximum.

@exam~

Suppose we have the constrained optimisation problem

\[\min f(x) \quad \text{ s.t. }\quad c _ E(x) = 0, c _ I(x) \ge 0\]

where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Suppose $x^\ast$ is a KKT point with multipliers $(y^\ast, \lambda^\ast)$.

@Define the critical cone $F(\lambda^\ast)$ at $x^\ast$, and @justify why it is the natural set of directions to consider for second-order optimality conditions.

\[F(\lambda^\ast) = \{s \in \mathcal F(x^\ast) : s^\top \nabla c _ i(x^\ast) = 0 \text{ for all } i \in \mathcal A(x^\ast) \cap I \text{ with } \lambda _ i^\ast > 0\}\]

where:

  • $\mathcal F(x^\ast)$ is the ∆set-of-linearised-feasible-directions, i.e. $\{s : J _ E(x^\ast)s = 0, \; s^\top \nabla c _ i(x^\ast) \ge 0 \text{ for } i \in \mathcal A(x^\ast) \cap I\}$.
  • $\mathcal A(x^\ast) = E \cup \{i \in I : c _ i(x^\ast) = 0\}$ is the active set, so $\mathcal A(x^\ast) \cap I = \{i \in I : c _ i(x^\ast) = 0\}$ is the active inequality indices only (the intersection strips off the equality indices $E$).

Why these directions: if $x^\ast$ is a KKT point, then for any $s \in \mathcal F(x^\ast)$, KKT stationarity gives

\[s^\top \nabla f(x^\ast) = \underbrace{(J _ E(x^\ast) s)^\top y^\ast} _ {=\,0,\; s \in \mathcal F} + \sum _ {i \in \mathcal A(x^\ast) \cap I} \underbrace{\lambda _ i^\ast} _ {\ge\,0}\, \underbrace{s^\top \nabla c _ i(x^\ast)} _ {\ge\,0} \ge 0\]

(the equality-constraint term vanishes since $J _ E s = 0$ on $\mathcal F$; inactive-inequality terms vanish by complementary slackness $\lambda _ i^\ast = 0$ for inactive $i$, leaving only $i \in \mathcal A \cap I$).

So $f$ can only stay constant or increase to first order along $s \in \mathcal F(x^\ast)$, and curvature only matters where this first-order inequality is tight. The sum vanishes precisely when each contributing term $\lambda _ i^\ast \, s^\top \nabla c _ i(x^\ast)$ does, which for indices with $\lambda _ i^\ast > 0$ forces $s^\top \nabla c _ i(x^\ast) = 0$. Indices with $\lambda _ i^\ast = 0$ contribute zero automatically and impose no constraint on $s$ — that’s why the critical cone’s defining equality is restricted to $\lambda _ i^\ast > 0$.

Worked examples

Consider

\[\min _ {x \in \mathbb R^2} x _ 1^4 - 2x _ 2^2 - x _ 2 \quad \text{s.t.} \quad x _ 1^2 + x _ 2^2 + x _ 2 \le 0.\]

Is the problem convex? Is every minimiser a KKT point? Find all KKT points, classify them, and find the global minimiser.

Convexity: $\nabla^2 f = \operatorname{diag}(12x _ 1^2, -4)$ is indefinite whenever $x _ 1 \ne 0$, so $f$ is not convex and the problem is not convex. (The feasible set is convex: rewrite the constraint as $x _ 1^2 + (x _ 2 + \tfrac12)^2 \le \tfrac14$, a closed disc.)

Every minimiser KKT? Yes: ∆scq holds, since the disc has a strict interior (e.g. $(0,-\tfrac12)$ gives $x _ 1^2 + x _ 2^2 + x _ 2 = -\tfrac14 < 0$), so every local minimiser is a KKT point.

Step 1 (set up in $\ge 0$ form): write $c(x) = -x_1^2 - x_2^2 - x_2 \ge 0$. Then $\nabla f = (4x_1^3,\ -4x_2 - 1)$ and $\nabla c = (-2x_1,\ -2x_2 - 1)$. Stationarity $\nabla f = \lambda \nabla c$ gives

\[2x_1^3 = -\lambda x_1, \qquad 4x_2 + 1 = \lambda(2x_2 + 1).\]

Watch: the constraint is quadratic, so $\nabla(x_1^2 + x_2^2 + x_2) = (2x_1,\ 2x_2 + 1)$; the multiplier multiplies the gradient, giving $\lambda(2x_2 + 1)$, not $\lambda(x_2 + 1)$.

Step 2 (case split on complementary slackness $\lambda c = 0$, $\lambda \ge 0$):

  • $\lambda = 0$: gives $x^1 = (0, -\tfrac14)$, an interior stationary point of $f$.
  • $\lambda > 0$: the constraint is active, $x _ 1^2 + x _ 2^2 + x _ 2 = 0$. The first equation $2x _ 1^3 = -\lambda x _ 1$ with $\lambda > 0$ forces $x _ 1 = 0$ (else $2x _ 1^2 = -\lambda < 0$). Then $x_2^2 + x_2 = 0$, so $x_2 \in \{0, -1\}$: $x^2 = (0,0)$ with $\lambda = 1$, and $x^3 = (0,-1)$ with $\lambda = 3$.

Step 3 (classify): $\nabla^2_{xx}\mathcal L = \nabla^2 f - \lambda \nabla^2 c = \operatorname{diag}(12x_1^2 + 2\lambda,\ -4 + 2\lambda)$.

  • $x^1 = (0,-\tfrac14)$: $\nabla^2 f = \operatorname{diag}(0,-4)$, a saddle, not a minimiser.
  • $x^3 = (0,-1)$, $\lambda = 3$: $\operatorname{diag}(6,2) \succ 0$, a local minimiser.
  • $x^2 = (0,0)$, $\lambda = 1$: $\operatorname{diag}(2,-2)$ is indefinite, but on the critical cone $\nabla c(0,0)^\top s = 0$, i.e. $s_2 = 0$, we get $s^\top \nabla^2_{xx}\mathcal L\, s = 2s_1^2 > 0$. So $x^2$ is also a local minimiser (∆kkt-conditions-sufficient-second-order).

Step 4 (global minimiser): $f$ is continuous and the feasible set is compact (the closed disc — closed and bounded in $\mathbb R^2$), so by Weierstrass (the extreme value theorem) $f$ attains its minimum on it: a global minimiser exists. This existence is what makes the value comparison conclusive — by ∆scq every local (hence the global) minimiser is a KKT point, so the global minimiser must be one of the candidates and we may just compare objective values. Comparing $f(x^2) = 0$ and $f(x^3) = -1$, so $x^3 = (0,-1)$ is the global minimiser. (Without compactness the smallest value among the KKT points need not be the global infimum — it might fail to be attained.)

Source: Continuous Optimisation 2017, Q2(a); method per Lectures 10-11 (SCQ, KKT, second-order constrained conditions).

@example~ @exam~

Bite-sized

A constrained optimisation problem with constraints $c _ E(x) = 0$, $c _ I(x) \ge 0$ is convex in the lecturer’s sense if $f$ is convex, each $c _ i$ for $i \in I$ is concave (so the inequality region $\{c _ i \ge 0\}$ is convex), and $c _ E$ is affine.

Source: Lecture 11, Convex constrained problems slide.

@bite~

For a convex programming problem, the KKT conditions are not just necessary (under a CQ) but also sufficient for global optimality: any KKT point is automatically a global minimiser.

Source: Lecture 11, Theorem 18.

@bite~

A quadratic programming (QP) problem is convex iff the Hessian $H$ in $c^\top x + \tfrac{1}{2} x^\top H x$ is positive semidefinite ($H \succeq 0$).

Source: Lecture 11, Optimality conditions for QP problems slide.

@bite~

At a KKT point $x^\ast$ of a (potentially nonconvex) constrained problem, the second-order conditions are checked along the critical cone $F(\lambda^\ast)$ rather than all of $\mathbb R^n$, because along feasible directions outside this cone the first-order term $s^\top \nabla f(x^\ast)$ is already strictly positive and curvature is irrelevant.

Source: Lecture 11, Second-order optimality conditions discussion; cf. ∆critical-cone.

@bite~

Give the proof strategy for ∆convex-program-feasible-set-convex-proof (a convex programming problem has a convex feasible set $\Omega$).

  • Pick $x, y \in \Omega$ and $\alpha \in [0, 1]$, set $z = (1 - \alpha)x + \alpha y$.
  • Equality constraints: $c _ E$ is affine ($c _ E(x) = Ax - b$), so $c _ E(z) = (1 - \alpha)c _ E(x) + \alpha c _ E(y) = 0$. Affineness preserves equality by linear combination of zeros.
  • Inequality constraints: each $c _ i$ is concave, so $c _ i(z) \ge (1 - \alpha)c _ i(x) + \alpha c _ i(y) \ge 0$ since both $c _ i(x) \ge 0$ and $c _ i(y) \ge 0$. Concavity preserves inequalities by Jensen.
  • Hence $z \in \Omega$.

Source: Lecture 11, Proof that $\Omega$ is convex for convex programs.

@bite~ @proofsupport~

Give the proof strategy for ∆kkt-condition-sufficient-when-convex-proof (Theorem 18: for a convex programming problem, KKT points are global minimisers).

  • Use the first-order convexity inequality $f(x) \ge f(\hat x) + \nabla f(\hat x)^\top (x - \hat x)$ from ∆convexity-first-order.
  • Substitute the KKT stationarity condition $\nabla f(\hat x) = A^\top \hat y + \sum _ {i \in I} \hat\lambda _ i \nabla c _ i(\hat x)$.
  • Equality part: for $x \in \Omega$, $Ax = b = A\hat x$ gives $A(x - \hat x) = 0$, so the equality-multiplier term vanishes.
  • Inequality part: for each $i \in I$, concavity of $c _ i$ gives $\nabla c _ i(\hat x)^\top (x - \hat x) \ge c _ i(x) - c _ i(\hat x)$. Multiplying by $\hat\lambda _ i \ge 0$ and using complementary slackness $\hat\lambda _ i c _ i(\hat x) = 0$, $\hat\lambda _ i (\nabla c _ i(\hat x)^\top (x - \hat x)) \ge \hat\lambda _ i c _ i(x) \ge 0$ (since $c _ i(x) \ge 0$).
  • Combine: $f(x) \ge f(\hat x)$ for every $x \in \Omega$, so $\hat x$ is a global minimiser.

Source: Lecture 11, Proof of Theorem 18.

@bite~ @proofsupport~

The concavity-driven inequality chain in ∆kkt-condition-sufficient-when-convex-proof (Theorem 18). For each $i \in I$, concavity of $c _ i$ gives $\nabla c _ i(\hat x)^\top (x - \hat x) \ge c _ i(x) - c _ i(\hat x)$. Multiplying by $\hat \lambda _ i \ge 0$ and using complementary slackness $\hat\lambda _ i c _ i(\hat x) = 0$:

\[\hat\lambda _ i \bigl(\nabla c _ i(\hat x)^\top (x - \hat x)\bigr) \ge \hat\lambda _ i c _ i(x) \ge <span class="cloze" tabindex="0">0</span>\]

(the final $\ge 0$ uses $\hat\lambda _ i \ge 0$ and $c _ i(x) \ge 0$ from feasibility). Summing over $i$ closes the proof.

Source: Lecture 11, Proof of Theorem 18, inequality $(4)$ and discussion.

@bite~ @proofsupport~

In ∆kkt-condition-sufficient-when-convex-proof (Theorem 18), three KKT/structural ingredients all combine to make the inequality $\hat\lambda _ i (\nabla c _ i(\hat x)^\top (x - \hat x)) \ge 0$ work. Identify each and what it contributes.

  • Concavity of $c _ i$ gives the first-order upper bound $c _ i(x) \le c _ i(\hat x) + \nabla c _ i(\hat x)^\top (x - \hat x)$, equivalently $\nabla c _ i(\hat x)^\top (x - \hat x) \ge c _ i(x) - c _ i(\hat x)$. Without concavity we have no such bound.
  • Dual feasibility $\hat\lambda _ i \ge 0$ preserves the inequality direction when multiplying through by $\hat\lambda _ i$. With $\hat\lambda _ i < 0$ the inequality would flip and the argument fails.
  • Complementary slackness $\hat\lambda _ i c _ i(\hat x) = 0$ kills the $c _ i(\hat x)$ term so the right-hand side reduces to $\hat\lambda _ i c _ i(x)$, which is $\ge 0$ by primal feasibility $c _ i(x) \ge 0$ combined with $\hat\lambda _ i \ge 0$.

Each of the three KKT pieces (stationarity is used separately to set up the bound) plays a load-bearing role; remove any one and the proof breaks.

Source: Lecture 11, Proof of Theorem 18.

@bite~ @proofsupport~