Continuous Optimisation HT26, Constrained optimality conditions for convex problems
Flashcards
Linear constraints
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @State a theorem about the necessity of the ∆kkt-conditions when the constraints are linear.
Suppose:
- $(c _ E, c _ I)(x) := Ax - b$
Then:
\[x^\ast \text{ local minimiser} \implies x^\ast \text{ is a KKT point}\]Consider $\min _ {x \in \mathbb R^n} f(x)$ subject to $(c _ E, c _ I)(x) := Ax - b$ (all constraints affine), with $f \in \mathcal C^1$. @Prove ∆kkt-necessary-for-linearly-constrained-problem, i.e. that
\[x^\ast \text{ local minimiser} \implies x^\ast \text{ is a KKT point},\]
with no constraint qualification assumed.
It suffices to show that the ∆constraint-qualifications condition $T _ \Omega(x^\ast) = \mathcal F(x^\ast)$ holds automatically for affine constraints; then ∆kkt-necessity-theorem-statement (which only needs a CQ) applies and $x^\ast$ is a KKT point.
The inclusion $T _ \Omega(x^\ast) \subseteq \mathcal F(x^\ast)$ always holds, so we only show $\mathcal F(x^\ast) \subseteq T _ \Omega(x^\ast)$. Let $s \in \mathcal F(x^\ast)$ be a ∆set-of-linearised-feasible-directions, i.e.
\[A _ E s = 0, \qquad a _ i^\top s \ge 0 \ \text{ for active } i \in \mathcal A(x^\ast) \cap I,\]where $a _ i^\top$ is row $i$ of $A$. Take the straight-line path $x(\alpha) := x^\ast + \alpha s$, $\alpha \ge 0$. Because the constraints are affine, the first-order linearisation is exact (there is no $\mathcal O(\alpha^2)$ remainder):
\[c(x(\alpha)) = A(x^\ast + \alpha s) - b = \underbrace{(Ax^\ast - b)} _ {=\, c(x^\ast)} + \alpha A s.\]Checking that $x(\alpha)$ is feasible for small $\alpha \ge 0$:
- Equalities ($i \in E$): $c _ i(x(\alpha)) = c _ i(x^\ast) + \alpha a _ i^\top s = 0 + 0 = 0$.
- Active inequalities ($i \in \mathcal A(x^\ast) \cap I$): $c _ i(x(\alpha)) = 0 + \alpha a _ i^\top s \ge 0$, since $a _ i^\top s \ge 0$ and $\alpha \ge 0$.
- Inactive inequalities ($c _ i(x^\ast) > 0$): $c _ i(x(\alpha)) > 0$ for $\alpha$ small, by continuity.
So $x(\alpha) \in \Omega$ for all small $\alpha \ge 0$, and $x'(0) = s$, giving $s \in T _ \Omega(x^\ast)$ (∆tangent-cone). Hence $\mathcal F(x^\ast) \subseteq T _ \Omega(x^\ast)$, so $T _ \Omega(x^\ast) = \mathcal F(x^\ast)$: the CQ holds with no extra hypothesis.
This is exactly the feasible-path mechanism of the equality case ∆kkt-necessity-equality-only-proof, except that affineness lets the path be a straight line whose feasibility is exact rather than only first-order — which is precisely why no constraint qualification need be assumed. Applying ∆kkt-necessity-theorem-statement now gives that $x^\ast$ is a KKT point.
Suppose we have the linearly constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) := A _ E x - b _ E = 0, \; c _ I(x) := A _ I x - b _ I \ge 0\]
with $A _ E \in \mathbb R^{m \times n}$, $b _ E \in \mathbb R^m$, $A _ I \in \mathbb R^{p \times n}$, $b _ I \in \mathbb R^p$.
@State the ∆kkt-conditions specialised to this linear case.
Let $A = (A _ E, A _ I)$ and $b = (b _ E, b _ I)$ stack the equality and inequality blocks. Then $x^\ast$ is a KKT point iff there exists $(y^\ast, \lambda^\ast) \in \mathbb R^m \times \mathbb R^p$ such that
- Stationarity: $\nabla f(x^\ast) = A _ E^\top y^\ast + A _ I^\top \lambda^\ast$
- Primal feasibility: $A _ E x^\ast - b _ E = 0$, $A _ I x^\ast - b _ I \ge 0$
- Dual feasibility: $\lambda^\ast \ge 0$
- Complementary slackness: $(\lambda^\ast)^\top (A _ I x^\ast - b _ I) = 0$
The linear constraints have constant gradients ($\nabla c _ j(x)$ = row $j$ of $A _ E$ for $j \in E$, similarly for $i \in I$), so the stationarity condition’s constraint-gradient sum collapses to the matrix expression $A _ E^\top y^\ast + A _ I^\top \lambda^\ast$. No CQ is required for necessity in the linear case — see ∆kkt-necessary-for-linearly-constrained-problem.
Convex programming problems
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @Define what it means for this optimisation problem to be a convex programming problem.
- $f$ is a convex function
- $c _ i$ is a concave function for all $i \in I$
- $c _ E(x) = Ax - b$ (affine equality constraints)
Caveat on generality: The truly general notion of convex programming is just “minimise a convex objective over a convex feasible set”. The lecturer’s functional definition is sufficient for this, but not necessary.
- $c _ i$ concave is sufficient for $\{x : c _ i(x) \ge 0\}$ to be convex, but the weakest functional condition is $c _ i$ quasi-concave (i.e. super-level sets are convex), which is strictly more general. For example, $c(x) = x^3$ on $\mathbb R$ is not concave, but $\{x : x^3 \ge 0\} = \{x \ge 0\}$ is convex.
- Affine $c _ E$ is essentially necessary for the equality set $\{c _ E(x) = 0\}$ to be convex in general (any nonlinear $c _ E$ typically gives a non-convex level set).
Why the stronger definition matters for the sufficiency theorem: the proof of ∆kkt-condition-sufficient-when-convex-proof doesn’t just invoke convexity of $\Omega$ and $f$ — it uses the functional concavity of $c _ i$ via the first-order upper bound $c _ i(x) \le c _ i(\hat x) + \nabla c _ i(\hat x)^\top (x - \hat x)$, and the affineness of $c _ E$ via $A(x - \hat x) = 0$ from primal feasibility. The natural hierarchy is:
- Variational sufficiency: “$\nabla f(\hat x)^\top (x - \hat x) \ge 0$ for all $x \in \Omega$” + convex $f$ + convex $\Omega$ ⟹ $\hat x$ is a global minimiser. This needs only the geometric facts — quasi-concave $c _ i$ and even nonlinear $c _ E$ with convex level set would do.
- KKT sufficiency (the lecture’s theorem): KKT stationarity (with multipliers and constraint gradients) ⟹ global min. To convert “$\nabla f = A^\top \hat y + \sum _ i \hat\lambda _ i \nabla c _ i(\hat x)$” back into “$\nabla f(\hat x)^\top (x - \hat x) \ge 0$ on $\Omega$” requires precisely the gradient inequalities that concave $c _ i$ + affine $c _ E$ deliver.
So the lecturer’s stronger functional definition is exactly what bridges the KKT-multiplier form to variational sufficiency. Quasi-concave $c _ i$ makes $\Omega$ convex but breaks the gradient inequality the KKT proof relies on, so the standard proof fails even though the variational form would still go through.
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall that this is a convex program if $f$ is a convex function, $c _ i$ is a concave function for all $i \in I$, and $c _ E(x) = Ax - b$. @State a result about what this implies about the structure of $\Omega$.
$\Omega$ is convex.
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Suppose further that this is a convex program, so that:
- $f$ is a convex function
- $c _ i$ is a concave function for all $i \in I$
- $c _ E(x) = Ax - b$
@Justify that this implies $\Omega$ is convex.
Let $x, y \in \Omega$, and define $z = (1 - \alpha) x + \alpha y$ for any $\alpha \in [0, 1]$. We aim to show $z \in \Omega$ for all $\alpha \in [0, 1]$. Note that
\[\begin{aligned} c _ E(z) &= A((1-\alpha)x + \alpha y) - b \\ &= (1-\alpha)(Ax - b) + \alpha (Ay - b) \\ &= 0 \end{aligned}\]since $Ax - b = 0$ and $Ay - b = 0$. So $z$ satisfies the equality constraints. Furthermore, for each $i \in I$,
\[\begin{aligned} c _ i(z) &= c _ i((1-\alpha)x + \alpha y) \\ &\ge (1-\alpha)c _ i(x) + \alpha c _ i(y) & (\text{convexity})\\ &\ge 0 & (x, y \in \Omega) \end{aligned}\]Hence $z \in \Omega$.
KKT points of convex problems are sufficient for global minimisation
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @State a theorem that says the ∆kkt-conditions are sufficient for optimality in convex problems.
Suppose further that this is a convex program, so that:
- $f$ is a convex function
- $c _ i$ is a concave function for all $i \in I$
- $c _ E(x) = Ax - b$
- $\hat x$ is a KKT point (i.e. satisfies the ∆kkt-conditions)
Then:
- $\hat x$ is a global minimiser of the constrained optimisation problem
Caveat on what the proof actually uses (full discussion in ∆convex-programming-problem): the lecturer’s functional definition of a convex program — concave $c _ i$, affine $c _ E$ — is strictly stronger than the geometric condition “convex $f$ on convex $\Omega$”, and the proof (∆kkt-condition-sufficient-when-convex-proof) genuinely uses the stronger version: concavity of $c _ i$ via the first-order upper bound $c _ i(x) \le c _ i(\hat x) + \nabla c _ i(\hat x)^\top (x - \hat x)$, and affineness of $c _ E$ via $A(x - \hat x) = 0$. Replacing concave $c _ i$ with merely quasi-concave $c _ i$ (which would still give convex $\Omega$), or nonlinear $c _ E$ with the same level set, would break the gradient-based KKT-multiplier argument — even though a more general variational sufficiency theorem ($\nabla f(\hat x)^\top (x - \hat x) \ge 0$ on convex $\Omega$ + convex $f$ ⟹ global min) would still go through on the weaker geometric assumptions. The lecturer’s stronger functional definition is what translates KKT-with-multipliers into variational sufficiency.
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @Prove that if further this is a convex program, so that:
- $f$ is a convex function
- $c _ i$ is a concave function for all $i \in I$
- $c _ E(x) = Ax - b$
- $\hat x$ is a KKT point (i.e. satisfies the ∆kkt-conditions)
then:
- $\hat x$ is a global minimiser of the constrained optimisation problem
Recall that for a convex function $f$, we have ∆convex-functions-have-first-order-lower-bound so that
\[f(x) \ge f(\hat x) + \nabla f(\hat x)^\top (x - \hat x)\]for all $x \in \mathbb R^n$. By the KKT conditions, we also have
\[\nabla f(\hat x) = A^\top \hat y + \sum _ {i \in I} \hat \lambda _ i \nabla c _ i(\hat x)\]and thus combining these we obtain
\[\begin{aligned} f(x) &\ge f(\hat x) + (A^\top \hat y)^\top (x - \hat x) + \sum _ {i \in I} \hat \lambda _ i \left( \nabla c _ i(\hat x)^\top(x - \hat x) \right) \\ &= f(\hat x) + \hat y A (x - \hat x) + \sum _ {i \in I} \hat \lambda _ i (\nabla c _ i(\hat x)^\top (x - \hat x)) \end{aligned}\]Let $x \in \Omega$ be arbitrary. Then $Ax = b$ and $c(x) \ge 0$. Then $Ax = b$ and $A \hat x = b$ imply together that $A(x - \hat x) = 0$.
Furthermore
\[\begin{aligned} c _ i \text{ concave} &\implies c _ i(x) \le c _ i(\hat x) + \nabla c _ i(\hat x)^\top (x - \hat x) \\ &\implies \nabla c _ i(\hat x)^\top (x - \hat x) \ge c _ i(x) - c _ i(\hat x) \\ &\implies \hat \lambda _ i(\nabla c _ i(\hat x)^\top(x - \hat x)) \ge \hat \lambda _ i (c _ i(x) - c _ i(\hat x)) = \hat \lambda _ i c _ i(x) \ge 0 \end{aligned}\]since $\hat \lambda \ge 0$, $\hat \lambda _ i c _ i(x) = 0$ and $c(x) \ge 0$. Thus from (4) and (5), we have $f(x) \ge f(\hat x)$.
Quadratic programming
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @Define what it means for this optimisation problem to be a quadratic programming problem.
Suppose we have the quadratic programming problem
\[\min _ {x \in \mathbb R^n} c^\top x + \frac 1 2x^\top H x \quad \text{ s.t. } \quad
Ax = b, \tilde A x \ge \tilde b\]
@State the KKT conditions in this special case.
$\hat x$ is a KKT point of a quadratic programming problem iff $\exists (\hat y, \hat \lambda) \in \mathbb R^m \times \mathbb R^p$ such that
- $H\hat x + c = A^\top \hat y + \tilde A^\top \hat \lambda$
- $A\hat x = b$, $\tilde A \hat x \ge \tilde b$
- $\hat \lambda \ge 0$, $\hat \lambda^\top (\tilde A \hat x - \tilde b) = 0$
Duality theory for quadratic programming problems
Suppose we have the quadratic programming problem
\[\min _ {x \in \mathbb R^n} c^\top x + \frac 1 2x^\top H x \quad \text{ s.t. } \quad
Ax = b, \tilde A x \ge \tilde b\]
The KKT conditions in this case say that $\hat x$ is a KKT point of a quadratic programming problem iff $\exists (\hat y, \hat \lambda) \in \mathbb R^m \times \mathbb R^p$ such that
- $H\hat x + c = A^\top \hat y + \tilde A^\top \hat \lambda$
- $A\hat x = b$, $\tilde A \hat x \ge \tilde b$
- $\hat \lambda \ge 0$, $\hat \lambda^\top (\tilde A \hat x - \tilde b) = 0$
Suppose further that $A := 0$ and $H \succ 0$. @State the primal and dual problem in this context, and state a result about when the optimal solution of the primal and dual problems will coincide.
Primal problem:
\[\min _ {x \in \mathbb R^n} c^\top x + \frac 1 2 x^\top H x \quad\text{ s.t. }\quad \tilde A x \ge \tilde b\]Dual problem:
\[\max _ {(x, \lambda) \in \mathbb R^n \times \mathbb R} -\frac 1 2 x^\top H x + \tilde b^\top \lambda \quad\text{ s.t. }\quad {-Hx}+\tilde A^\top \lambda = c, \lambda \ge 0\]Provided these maxima and minima exist, then they are equal in this case.
For the (strictly convex) QP $\min _ {x \in \mathbb R^n} c^\top x + \tfrac 1 2 x^\top H x$ subject to $\tilde A x \ge \tilde b$, with $H \succ 0$ (the $A := 0$ specialisation of ∆qp-duality-statement), @Prove that the Lagrangian dual is
\[\max _ {(x, \lambda)} -\tfrac 1 2 x^\top H x + \tilde b^\top \lambda \quad \text{s.t.} \quad -Hx + \tilde A^\top \lambda = c, \ \lambda \ge 0,\]
and prove weak duality (dual optimum $\le$ primal optimum).
Lagrangian and dual function: with multiplier $\lambda \ge 0$ for $\tilde A x \ge \tilde b$,
\[\mathcal L(x, \lambda) = c^\top x + \tfrac 1 2 x^\top H x - \lambda^\top(\tilde A x - \tilde b) = (c - \tilde A^\top \lambda)^\top x + \tfrac 1 2 x^\top H x + \tilde b^\top \lambda,\]and the dual function is $g(\lambda) := \min _ x \mathcal L(x, \lambda)$.
Derivation (minimise over $x$): $\mathcal L$ is strictly convex in $x$ (since $H \succ 0$), so its unconstrained minimiser solves
\[\nabla _ x \mathcal L = Hx + c - \tilde A^\top \lambda = 0 \quad\Longleftrightarrow\quad -Hx + \tilde A^\top \lambda = c. \quad (\dagger)\]By $(\dagger)$, $c - \tilde A^\top \lambda = -Hx$, so the linear term is $(c - \tilde A^\top \lambda)^\top x = -x^\top H x$ and
\[g(\lambda) = -x^\top H x + \tfrac 1 2 x^\top H x + \tilde b^\top \lambda = -\tfrac 1 2 x^\top H x + \tilde b^\top \lambda.\]The dual is $\max _ {\lambda \ge 0} g(\lambda)$; carrying $x$ as a variable tied to $\lambda$ by the stationarity equation $(\dagger)$ gives exactly the stated dual.
Weak duality: let $\lambda \ge 0$ and let $x^\ast$ be any primal-feasible point, so $\tilde A x^\ast - \tilde b \ge 0$. Then
\[g(\lambda) = \min _ x \mathcal L(x, \lambda) \le \mathcal L(x^\ast, \lambda) = f(x^\ast) - \underbrace{\lambda^\top(\tilde A x^\ast - \tilde b)} _ {\ge\, 0} \le f(x^\ast).\]Taking $x^\ast$ to be the primal minimiser ($f(x^\ast) = p^\ast$) and maximising the left-hand side over $\lambda \ge 0$,
\[d^\ast = \max _ {\lambda \ge 0} g(\lambda) \le p^\ast,\]i.e. every dual-feasible value is a lower bound on the primal optimum.
Strong duality ($d^\ast = p^\ast$) additionally holds here because $H \succ 0$ makes this a Slater-feasible convex QP; that equality is stated in lectures (∆qp-duality-statement), not proved.
Second-order optimality conditions
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall that for a convex programming problem, the ∆kkt-conditions are already sufficient for global optimality (∆kkt-condition-sufficient-when-convex).
@Justify why in general (when the problem is not convex) we still need to consider second-order conditions about curvature, even when constraint qualifications hold, and @define the critical cone as the directions of curvature along which we need to check.
Setup: let $x^\ast$ be a KKT point with Lagrange multipliers $(y^\ast, \lambda^\ast) \in \mathbb R^m \times \mathbb R^p$ from the ∆kkt-conditions, so
\[\nabla f(x^\ast) = J _ E(x^\ast)^\top y^\ast + J _ I(x^\ast)^\top \lambda^\ast, \quad \lambda^\ast \ge 0, \quad \lambda _ i^\ast c _ i(x^\ast) = 0.\]Here $y^\ast \in \mathbb R^m$ multiplies the equality constraints (sign-unrestricted) and $\lambda^\ast \in \mathbb R^p$ the inequalities (non-negative, with complementary slackness killing the inactive ones).
Let $\mathcal F(x^\ast)$ be the ∆set-of-linearised-feasible-directions at $x^\ast$:
\[\mathcal F(x^\ast) = \{s : J _ E(x^\ast)s = 0, \; s^\top \nabla c _ i(x^\ast) \ge 0 \text{ for } i \in \mathcal A(x^\ast) \cap I\}\]i.e. directions along which equalities stay at zero and active inequalities stay non-negative, to first order. Under a CQ, $\mathcal F(x^\ast) = T _ \Omega(x^\ast)$ (the ∆tangent-cone), so these are exactly the directions a feasible perturbation can actually take.
Calculation: for any $s \in \mathcal F(x^\ast)$, substituting KKT stationarity and using complementary slackness ($\lambda _ i^\ast = 0$ for inactive $i$):
\[\begin{aligned} s^\top \nabla f(x^\ast) &= s^\top J _ E(x^\ast)^\top y^\ast + \sum _ {i \in \mathcal A(x^\ast) \cap I} \lambda _ i^\ast \, s^\top \nabla c _ i(x^\ast) \\ &= \underbrace{(J _ E(x^\ast) s)^\top y^\ast} _ {=\, 0,\; s \in \mathcal F} + \sum _ {i \in \mathcal A(x^\ast) \cap I} \underbrace{\lambda _ i^\ast} _ {\ge\, 0} \underbrace{s^\top \nabla c _ i(x^\ast)} _ {\ge\, 0} \\ &\ge 0. \end{aligned}\]Conclusion: along any $s \in \mathcal F(x^\ast)$, one of two things happens:
- $s^\top \nabla f(x^\ast) > 0$: $f$ strictly increases to first order along $s$, no curvature check needed.
- $s^\top \nabla f(x^\ast) = 0$: first-order information is silent — only the Hessian of the Lagrangian determines whether $f$ increases or decreases.
The latter set is exactly the critical cone
\[F(\lambda^\ast) = \{s \in \mathcal F(x^\ast) : s^\top \nabla c _ i(x^\ast) = 0 \text{ for all } i \in \mathcal A(x^\ast) \cap I \text{ with } \lambda _ i^\ast > 0\}\](see ∆critical-cone): for active inequalities with $\lambda _ i^\ast > 0$, the inequality $\lambda _ i^\ast s^\top \nabla c _ i(x^\ast) \ge 0$ above must be tight (equal to zero) to allow the sum to vanish. These are precisely the directions along which curvature actually matters.
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @State a theorem giving second-order necessary conditions for constrained optimisation problems.
Suppose:
- Some ∆constraint-qualifications hold for this problem
- $x^\ast$ is a local minimiser
- $(y^\ast, \lambda^\ast)$ are the Lagrange multipliers of the KKT conditions at $x^\ast$
Then:
\[s^\top \nabla^2 _ {xx} \mathcal L(x^\ast, y^\ast, \lambda^\ast) s \ge 0 \quad\text{for all }s\in F(\lambda^\ast)\]where $\mathcal L(x, y, \lambda) = f(x) - y^\top c _ E(x) - \lambda^\top c _ I(x)$ is the Lagrangian function.
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @State a theorem giving second-order sufficient conditions for constrained optimisation problems.
Suppose:
- $x^\ast$ is a feasible point of the problem
- $(y^\ast, \lambda^\ast)$ are such that the KKT conditions are satisfied by $(x^\ast, y^\ast, \lambda^\ast)$
- $s^\top \nabla^2 _ {xx} \mathcal L(x^\ast, y^\ast, \lambda^\ast) s > 0$ for all $s \in \mathcal F(\lambda^\ast)$, $s \ne 0$.
Then:
- $x^\ast$ is a constrained local minimiser of $f$
@Justify, via an example, why second-order constrained optimality conditions involve $\nabla^2 _ {xx} \mathcal L$ (the Lagrangian Hessian), rather than $\nabla^2 f$.
Consider $\min _ x \, x _ 1$ subject to either:
- (blue) $c(x) = 1 - (x _ 1^2 + x _ 2^2) = 0$, the unit circle centred at the origin.
- (red) $c(x) = 1 - ((x _ 1 + 2)^2 + x _ 2^2) = 0$, the unit circle centred at $(-2, 0)$.
Figure: the blue unit circle (centre origin) and the red unit circle (centre $(-2, 0)$) meeting at the single point $x^\ast = (-1, 0)$; the objective $x _ 1$ decreases to the left, so $x^\ast$ is the leftmost (minimal-$x _ 1$) point of the blue circle and the rightmost (maximal-$x _ 1$) point of the red circle.
Both circles pass through $x^\ast = (-1, 0)$, and at this point both ∆kkt-conditions hold, but with different multipliers (the constraint is an equality, so the multiplier is sign-unrestricted):
- Blue: $\nabla c(x^\ast) = (2, 0)$, so stationarity $(1, 0) = \lambda^\ast (2, 0)$ gives $\lambda^\ast = 1/2 > 0$; $x^\ast$ is the global minimiser (leftmost point of the blue circle).
- Red: $\nabla c(x^\ast) = (-2, 0)$, so stationarity $(1, 0) = \lambda^\ast (-2, 0)$ gives $\lambda^\ast = -1/2 < 0$; $x^\ast$ is a maximiser (rightmost point of the red circle).
But $f(x) = x _ 1$ is linear, so $\nabla^2 f(x) = 0$ everywhere, and it cannot distinguish min from max. The distinction is visible in the Lagrangian Hessian:
\[\nabla^2 _ {xx} \mathcal L(x, \lambda) = \nabla^2 f(x) - \lambda \nabla^2 c(x) = 0 - \lambda \cdot (-2I) = 2\lambda I.\]This is $2\lambda I = I \succ 0$ for the blue case ($\lambda^\ast = 1/2$), so ∆kkt-conditions-sufficient-second-order confirms $x^\ast$ is a local min; and $2\lambda I = -I \prec 0$ for the red case ($\lambda^\ast = -1/2$), so ∆kkt-conditions-sufficient-second-order fails, correctly since $x^\ast$ is a maximum.
Suppose we have the constrained optimisation problem
\[\min f(x) \quad \text{ s.t. }\quad c _ E(x) = 0, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Suppose $x^\ast$ is a KKT point with multipliers $(y^\ast, \lambda^\ast)$.
@Define the critical cone $F(\lambda^\ast)$ at $x^\ast$, and @justify why it is the natural set of directions to consider for second-order optimality conditions.
where:
- $\mathcal F(x^\ast)$ is the ∆set-of-linearised-feasible-directions, i.e. $\{s : J _ E(x^\ast)s = 0, \; s^\top \nabla c _ i(x^\ast) \ge 0 \text{ for } i \in \mathcal A(x^\ast) \cap I\}$.
- $\mathcal A(x^\ast) = E \cup \{i \in I : c _ i(x^\ast) = 0\}$ is the active set, so $\mathcal A(x^\ast) \cap I = \{i \in I : c _ i(x^\ast) = 0\}$ is the active inequality indices only (the intersection strips off the equality indices $E$).
Why these directions: if $x^\ast$ is a KKT point, then for any $s \in \mathcal F(x^\ast)$, KKT stationarity gives
\[s^\top \nabla f(x^\ast) = \underbrace{(J _ E(x^\ast) s)^\top y^\ast} _ {=\,0,\; s \in \mathcal F} + \sum _ {i \in \mathcal A(x^\ast) \cap I} \underbrace{\lambda _ i^\ast} _ {\ge\,0}\, \underbrace{s^\top \nabla c _ i(x^\ast)} _ {\ge\,0} \ge 0\](the equality-constraint term vanishes since $J _ E s = 0$ on $\mathcal F$; inactive-inequality terms vanish by complementary slackness $\lambda _ i^\ast = 0$ for inactive $i$, leaving only $i \in \mathcal A \cap I$).
So $f$ can only stay constant or increase to first order along $s \in \mathcal F(x^\ast)$, and curvature only matters where this first-order inequality is tight. The sum vanishes precisely when each contributing term $\lambda _ i^\ast \, s^\top \nabla c _ i(x^\ast)$ does, which for indices with $\lambda _ i^\ast > 0$ forces $s^\top \nabla c _ i(x^\ast) = 0$. Indices with $\lambda _ i^\ast = 0$ contribute zero automatically and impose no constraint on $s$ — that’s why the critical cone’s defining equality is restricted to $\lambda _ i^\ast > 0$.
Worked examples
Consider
\[\min _ {x \in \mathbb R^2} x _ 1^4 - 2x _ 2^2 - x _ 2 \quad \text{s.t.} \quad x _ 1^2 + x _ 2^2 + x _ 2 \le 0.\]
Is the problem convex? Is every minimiser a KKT point? Find all KKT points, classify them, and find the global minimiser.
Convexity: $\nabla^2 f = \operatorname{diag}(12x _ 1^2, -4)$ is indefinite whenever $x _ 1 \ne 0$, so $f$ is not convex and the problem is not convex. (The feasible set is convex: rewrite the constraint as $x _ 1^2 + (x _ 2 + \tfrac12)^2 \le \tfrac14$, a closed disc.)
Every minimiser KKT? Yes: ∆scq holds, since the disc has a strict interior (e.g. $(0,-\tfrac12)$ gives $x _ 1^2 + x _ 2^2 + x _ 2 = -\tfrac14 < 0$), so every local minimiser is a KKT point.
Step 1 (set up in $\ge 0$ form): write $c(x) = -x_1^2 - x_2^2 - x_2 \ge 0$. Then $\nabla f = (4x_1^3,\ -4x_2 - 1)$ and $\nabla c = (-2x_1,\ -2x_2 - 1)$. Stationarity $\nabla f = \lambda \nabla c$ gives
\[2x_1^3 = -\lambda x_1, \qquad 4x_2 + 1 = \lambda(2x_2 + 1).\]Watch: the constraint is quadratic, so $\nabla(x_1^2 + x_2^2 + x_2) = (2x_1,\ 2x_2 + 1)$; the multiplier multiplies the gradient, giving $\lambda(2x_2 + 1)$, not $\lambda(x_2 + 1)$.
Step 2 (case split on complementary slackness $\lambda c = 0$, $\lambda \ge 0$):
- $\lambda = 0$: gives $x^1 = (0, -\tfrac14)$, an interior stationary point of $f$.
- $\lambda > 0$: the constraint is active, $x _ 1^2 + x _ 2^2 + x _ 2 = 0$. The first equation $2x _ 1^3 = -\lambda x _ 1$ with $\lambda > 0$ forces $x _ 1 = 0$ (else $2x _ 1^2 = -\lambda < 0$). Then $x_2^2 + x_2 = 0$, so $x_2 \in \{0, -1\}$: $x^2 = (0,0)$ with $\lambda = 1$, and $x^3 = (0,-1)$ with $\lambda = 3$.
Step 3 (classify): $\nabla^2_{xx}\mathcal L = \nabla^2 f - \lambda \nabla^2 c = \operatorname{diag}(12x_1^2 + 2\lambda,\ -4 + 2\lambda)$.
- $x^1 = (0,-\tfrac14)$: $\nabla^2 f = \operatorname{diag}(0,-4)$, a saddle, not a minimiser.
- $x^3 = (0,-1)$, $\lambda = 3$: $\operatorname{diag}(6,2) \succ 0$, a local minimiser.
- $x^2 = (0,0)$, $\lambda = 1$: $\operatorname{diag}(2,-2)$ is indefinite, but on the critical cone $\nabla c(0,0)^\top s = 0$, i.e. $s_2 = 0$, we get $s^\top \nabla^2_{xx}\mathcal L\, s = 2s_1^2 > 0$. So $x^2$ is also a local minimiser (∆kkt-conditions-sufficient-second-order).
Step 4 (global minimiser): $f$ is continuous and the feasible set is compact (the closed disc — closed and bounded in $\mathbb R^2$), so by Weierstrass (the extreme value theorem) $f$ attains its minimum on it: a global minimiser exists. This existence is what makes the value comparison conclusive — by ∆scq every local (hence the global) minimiser is a KKT point, so the global minimiser must be one of the candidates and we may just compare objective values. Comparing $f(x^2) = 0$ and $f(x^3) = -1$, so $x^3 = (0,-1)$ is the global minimiser. (Without compactness the smallest value among the KKT points need not be the global infimum — it might fail to be attained.)
Bite-sized
A constrained optimisation problem with constraints $c _ E(x) = 0$, $c _ I(x) \ge 0$ is convex in the lecturer’s sense if $f$ is convex, each $c _ i$ for $i \in I$ is concave (so the inequality region $\{c _ i \ge 0\}$ is convex), and $c _ E$ is affine.
For a convex programming problem, the KKT conditions are not just necessary (under a CQ) but also sufficient for global optimality: any KKT point is automatically a global minimiser.
A quadratic programming (QP) problem is convex iff the Hessian $H$ in $c^\top x + \tfrac{1}{2} x^\top H x$ is positive semidefinite ($H \succeq 0$).
At a KKT point $x^\ast$ of a (potentially nonconvex) constrained problem, the second-order conditions are checked along the critical cone $F(\lambda^\ast)$ rather than all of $\mathbb R^n$, because along feasible directions outside this cone the first-order term $s^\top \nabla f(x^\ast)$ is already strictly positive and curvature is irrelevant.
Give the proof strategy for ∆convex-program-feasible-set-convex-proof (a convex programming problem has a convex feasible set $\Omega$).
- Pick $x, y \in \Omega$ and $\alpha \in [0, 1]$, set $z = (1 - \alpha)x + \alpha y$.
- Equality constraints: $c _ E$ is affine ($c _ E(x) = Ax - b$), so $c _ E(z) = (1 - \alpha)c _ E(x) + \alpha c _ E(y) = 0$. Affineness preserves equality by linear combination of zeros.
- Inequality constraints: each $c _ i$ is concave, so $c _ i(z) \ge (1 - \alpha)c _ i(x) + \alpha c _ i(y) \ge 0$ since both $c _ i(x) \ge 0$ and $c _ i(y) \ge 0$. Concavity preserves inequalities by Jensen.
- Hence $z \in \Omega$.
Give the proof strategy for ∆kkt-condition-sufficient-when-convex-proof (Theorem 18: for a convex programming problem, KKT points are global minimisers).
- Use the first-order convexity inequality $f(x) \ge f(\hat x) + \nabla f(\hat x)^\top (x - \hat x)$ from ∆convexity-first-order.
- Substitute the KKT stationarity condition $\nabla f(\hat x) = A^\top \hat y + \sum _ {i \in I} \hat\lambda _ i \nabla c _ i(\hat x)$.
- Equality part: for $x \in \Omega$, $Ax = b = A\hat x$ gives $A(x - \hat x) = 0$, so the equality-multiplier term vanishes.
- Inequality part: for each $i \in I$, concavity of $c _ i$ gives $\nabla c _ i(\hat x)^\top (x - \hat x) \ge c _ i(x) - c _ i(\hat x)$. Multiplying by $\hat\lambda _ i \ge 0$ and using complementary slackness $\hat\lambda _ i c _ i(\hat x) = 0$, $\hat\lambda _ i (\nabla c _ i(\hat x)^\top (x - \hat x)) \ge \hat\lambda _ i c _ i(x) \ge 0$ (since $c _ i(x) \ge 0$).
- Combine: $f(x) \ge f(\hat x)$ for every $x \in \Omega$, so $\hat x$ is a global minimiser.
The concavity-driven inequality chain in ∆kkt-condition-sufficient-when-convex-proof (Theorem 18). For each $i \in I$, concavity of $c _ i$ gives $\nabla c _ i(\hat x)^\top (x - \hat x) \ge c _ i(x) - c _ i(\hat x)$. Multiplying by $\hat \lambda _ i \ge 0$ and using complementary slackness $\hat\lambda _ i c _ i(\hat x) = 0$:
\[\hat\lambda _ i \bigl(\nabla c _ i(\hat x)^\top (x - \hat x)\bigr) \ge \hat\lambda _ i c _ i(x) \ge <span class="cloze" tabindex="0">0</span>\](the final $\ge 0$ uses $\hat\lambda _ i \ge 0$ and $c _ i(x) \ge 0$ from feasibility). Summing over $i$ closes the proof.
In ∆kkt-condition-sufficient-when-convex-proof (Theorem 18), three KKT/structural ingredients all combine to make the inequality $\hat\lambda _ i (\nabla c _ i(\hat x)^\top (x - \hat x)) \ge 0$ work. Identify each and what it contributes.
- Concavity of $c _ i$ gives the first-order upper bound $c _ i(x) \le c _ i(\hat x) + \nabla c _ i(\hat x)^\top (x - \hat x)$, equivalently $\nabla c _ i(\hat x)^\top (x - \hat x) \ge c _ i(x) - c _ i(\hat x)$. Without concavity we have no such bound.
- Dual feasibility $\hat\lambda _ i \ge 0$ preserves the inequality direction when multiplying through by $\hat\lambda _ i$. With $\hat\lambda _ i < 0$ the inequality would flip and the argument fails.
- Complementary slackness $\hat\lambda _ i c _ i(\hat x) = 0$ kills the $c _ i(\hat x)$ term so the right-hand side reduces to $\hat\lambda _ i c _ i(x)$, which is $\ge 0$ by primal feasibility $c _ i(x) \ge 0$ combined with $\hat\lambda _ i \ge 0$.
Each of the three KKT pieces (stationarity is used separately to set up the bound) plays a load-bearing role; remove any one and the proof breaks.