Continuous Optimisation HT26, KKT conditions
Flashcards
Basic definitions
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @Define what it means for $\hat x$ to be a Karush-Kuhn-Tucker (KKT) point.
$\hat x$ is a KKT point if there exists $\hat y \in \mathbb R^m$ and $\hat \lambda \in \mathbb R^p$ such that $(\hat x, \hat y, \hat \lambda)$ satisfies
- Feasibility: $c _ E(\hat x) = 0$, $c _ I(\hat x) \ge 0$, the point satisfies the constraints.
- Stationarity: $\nabla f(\hat x) = J _ E(\hat x)^\top \hat y + J _ I(\hat x)^\top \hat \lambda$, so that $\nabla f(\hat x)$ lies in the cone spanned by the active constraints and there is no direction that is both feasible and descent.
- Dual feasibility: $\hat \lambda _ i \ge 0$
- Complementary slackness: $\hat \lambda _ i c _ i(\hat x) = 0$ for all $i \in I$, so that if a constraint is slack, then $\lambda _ i = 0$, or if $c _ i(\hat x) = 0$, then we may have $\lambda _ i > 0$.
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. @Define the Lagrangian function of this constrained optimisation problem and state the correct implication between KKT points and stationary points of the Lagrangian.
$\mathcal L : \mathbb R^n \times \mathbb R^m \times \mathbb R^p \to \mathbb R$ is the Lagrangian function defined by
\[\mathcal L(x, y, \lambda) := f(x) - y^\top c _ E(x) - \lambda^\top c _ I(x), \quad x \in \mathbb R^n\]We have the implication that if $\hat x$ is a KKT point of the constrained optimisation problem, then $\nabla _ x \mathcal L(\hat x, \hat y, \hat \lambda) = 0$.
Tangent cone and linearised feasible directions
Suppose we have a feasible set $\Omega$ in a constrained optimisation problem. @Define the tangent cone $T _ \Omega(x)$ to $\Omega$ at $x$, and give intuition for why it’s important in constrained optimisation.
Geometric reading: $s \in T _ \Omega(x)$ iff there is a sequence of feasible points $z^k \in \Omega$ approaching $x$ such that the rescaled offsets $(z^k - x)/t^k$ approach $s$. So $T _ \Omega(x)$ is exactly the set of directions in which you can actually move while staying in $\Omega$ — the feasibility-respecting infinitesimal motions at $x$.
- For a smooth manifold (e.g. $\{c _ E(x) = 0\}$ with $J _ E(x)$ full rank), $T _ \Omega(x)$ is the tangent space $\ker J _ E(x)$.
- For a polyhedron (linear constraints), $T _ \Omega(x)$ is the polyhedral cone of directions that don’t immediately violate any active constraint.
- For a “cusp” or tangency point where the feasible set pinches (see ∆constraint-qualifications-failure-example: disc tangent to a line, $\Omega = \{(0,0)\}$), $T _ \Omega(x)$ collapses to $\{0\}$ — no feasible motion is possible.
Why it’s important: $T _ \Omega(x)$ is the geometric object that determines first-order optimality at constrained minimisers. If $x^\ast$ is a local min and $s \in T _ \Omega(x^\ast)$ were a descent direction, the feasible sequence realising $s$ would give $f(z^k) < f(x^\ast)$, contradicting local optimality. So:
\[x^\ast \text{ local min} \implies \nabla f(x^\ast)^\top s \ge 0 \quad \text{for all } s \in T _ \Omega(x^\ast).\]This is the variational first-order optimality condition — purely geometric, no constraint gradients or multipliers. To translate it into the KKT form (with $\hat y, \hat \lambda$ and $\nabla c _ i$) you need the algebraic ∆set-of-linearised-feasible-directions $\mathcal F(x)$ to match $T _ \Omega(x)$ — that’s exactly what a ∆constraint-qualifications buys you (see ∆kkt-necessity-equality-only-proof for the precise step where this enters).
Suppose we have a feasible set $\Omega$ in a constrained optimisation problem. @Define the set of linearised feasible directions $\mathcal F(x)$ at $x$.
Constraint qualifications
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall the ∆kkt-conditions that $\hat x$ is a KKT point if there exists $\hat y \in \mathbb R^m$ and $\hat \lambda \in \mathbb R^p$ such that $(\hat x, \hat y, \hat \lambda)$ satisfies
- $\nabla f(\hat x) = \sum _ {j \in E} \hat y _ j \nabla c _ j(\hat x) + \sum _ {i \in I} \hat \lambda _ i \nabla c _ i(\hat x)$
- $c _ E(\hat x) = 0$, $c _ I(\hat x) \ge 0$
- $\hat \lambda _ i \ge 0$, $\hat \lambda _ i c _ i(\hat x) = 0$, for all $i \in I$
We would like to be able to say that the KKT conditions are:
- Sufficient: $\hat x \text{ KKT point} \implies \hat x \text{ (constrained) local min}$
- Necessary: $\hat x \text{ (constrained) local minimiser} \implies \hat x \text{ KKT point}$
But in general we need additional assumptions about the problem to make these implications hold. What are they?
- For sufficiency, we require some information about the curvature of the problem
- For necessity, we require “constraint qualifications”
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall the ∆kkt-conditions that $\hat x$ is a KKT point if there exists $\hat y \in \mathbb R^m$ and $\hat \lambda \in \mathbb R^p$ such that $(\hat x, \hat y, \hat \lambda)$ satisfies
- $\nabla f(\hat x) = \sum _ {j \in E} \hat y _ j \nabla c _ j(\hat x) + \sum _ {i \in I} \hat \lambda _ i \nabla c _ i(\hat x)$
- $c _ E(\hat x) = 0$, $c _ I(\hat x) \ge 0$
- $\hat \lambda _ i \ge 0$, $\hat \lambda _ i c _ i(\hat x) = 0$, for all $i \in I$
We would like to be able to say that the KKT conditions are necessary, i.e. that
\[\hat x \text{ (constrained) local minimiser} \implies \hat x \text{ KKT point}\]
To do this, we need constraint qualifications. What is the purpose of these? Answer first in words and then in terms of the ∆tangent-cone and ∆set-of-linearised-feasible-directions.
In the proof of necessity (∆kkt-necessity-equality-only-proof) we argue by perturbing $x^\ast$ along a feasible curve $x(\alpha) = x^\ast + \alpha s + O(\alpha^2)$ tangent to a direction $s$, then Taylor-expanding the constraints and objective. For the argument to extract conclusions about every direction $s$ flagged as “feasible to first order” by the constraint linearisation — i.e. every $s \in \mathcal F(x^\ast)$ — each such $s$ must actually be realised by some feasible curve, i.e. must lie in the geometric ∆tangent-cone $T _ \Omega(x^\ast)$.
Formally, a constraint qualification at $x$ is a sufficient condition for
\[T _ \Omega(x) = \mathcal F(x),\]i.e. the ∆tangent-cone (geometric: limits of feasible sequences, the directions you can actually move) coincides with the ∆set-of-linearised-feasible-directions (algebraic: directions satisfying the first-order linearisation of the active constraints).
The inclusion $T _ \Omega(x) \subseteq \mathcal F(x)$ always holds (any feasible sequence’s limiting direction must satisfy the first-order linearisation). The reverse can fail at cusps or tangency points where the algebraic linearisation overstates what’s geometrically possible — see ∆constraint-qualifications-failure-example, where the disc-tangent-to-line geometry gives $T _ \Omega = \{0\}$ but $\mathcal F = \{(s _ 1, 0) : s _ 1 \in \mathbb R\}$. Without a CQ, the necessity proof’s Taylor argument tries to draw conclusions about directions in $\mathcal F \setminus T _ \Omega$ that no feasible curve actually approaches, and the KKT first-order argument breaks.
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall the ∆kkt-conditions that $\hat x$ is a KKT point if there exists $\hat y \in \mathbb R^m$ and $\hat \lambda \in \mathbb R^p$ such that $(\hat x, \hat y, \hat \lambda)$ satisfies
- $\nabla f(\hat x) = \sum _ {j \in E} \hat y _ j \nabla c _ j(\hat x) + \sum _ {i \in I} \hat \lambda _ i \nabla c _ i(\hat x)$
- $c _ E(\hat x) = 0$, $c _ I(\hat x) \ge 0$
- $\hat \lambda _ i \ge 0$, $\hat \lambda _ i c _ i(\hat x) = 0$, for all $i \in I$
We would like to be able to say that the KKT conditions are necessary, i.e. that
\[\hat x \text{ (constrained) local minimiser} \implies \hat x \text{ KKT point}\]
To do this, we need constraint qualifications. @Define the Slater Constraint Qualification (SCQ).
- The equality constraints are affine, so that $c _ E(x) = Ax - b$.
- There exists $x$ such that $c _ E(x) = 0$ and $c _ I(x) > 0$ (all the inequality constraints are strictly satisfied).
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall the ∆kkt-conditions that $\hat x$ is a KKT point if there exists $\hat y \in \mathbb R^m$ and $\hat \lambda \in \mathbb R^p$ such that $(\hat x, \hat y, \hat \lambda)$ satisfies
- $\nabla f(\hat x) = \sum _ {j \in E} \hat y _ j \nabla c _ j(\hat x) + \sum _ {i \in I} \hat \lambda _ i \nabla c _ i(\hat x)$
- $c _ E(\hat x) = 0$, $c _ I(\hat x) \ge 0$
- $\hat \lambda _ i \ge 0$, $\hat \lambda _ i c _ i(\hat x) = 0$, for all $i \in I$
We would like to be able to say that the KKT conditions are necessary, i.e. that
\[\hat x \text{ (constrained) local minimiser} \implies \hat x \text{ KKT point}\]
To do this, we need constraint qualifications. @Define the Linear Independence Constraint Qualification (LICQ).
- At relevant $x$, $\nabla c _ i(x), i \in \mathcal A(x)$ are linearly independent.
Give an @example of an optimisation problem where both ∆scq and ∆licq fail, and describe what goes wrong as a result.
Take any objective $f$ with feasible region
\[\Omega = \{(x _ 1, x _ 2) \mid c _ 1(x) := 1 - x _ 1^2 - (x _ 2 - 1)^2 \ge 0; \; c _ 2(x) := -x _ 2 \ge 0\}.\]$c _ 1 \ge 0$ is the closed disc of radius $1$ centred at $(0, 1)$; $c _ 2 \ge 0$ is the half-plane $x _ 2 \le 0$. The disc is tangent to the line $x _ 2 = 0$ at the origin, so $\Omega = \{(0, 0)\}$.
At $x^\ast = (0, 0)$ both constraints are active, with gradients
\[\nabla c _ 1(0, 0) = (0, 2), \qquad \nabla c _ 2(0, 0) = (0, -1).\]- LICQ fails: $\nabla c _ 1(x^\ast)$ and $\nabla c _ 2(x^\ast)$ are anti-parallel, hence linearly dependent.
- SCQ fails: every feasible point lies on the boundary of both constraints, so no $x$ has $c _ 1(x) > 0$ and $c _ 2(x) > 0$.
What goes wrong: the algebraic linearisation overstates the actual geometry. $T _ \Omega(x^\ast) = \{0\}$ (the only feasible sequence is constant), while
\[\mathcal F(x^\ast) = \{s : s^\top \nabla c _ 1(x^\ast) \ge 0, \; s^\top \nabla c _ 2(x^\ast) \ge 0\} = \{(s _ 1, 0) : s _ 1 \in \mathbb R\},\]so $T _ \Omega(x^\ast) \subsetneq \mathcal F(x^\ast)$.
KKT-necessity fails too: since $\Omega = \{x^\ast\}$, $x^\ast$ is trivially the global minimiser for any $f$. But KKT stationarity at $x^\ast$ requires $\nabla f(x^\ast) = \lambda _ 1^\ast (0, 2) + \lambda _ 2^\ast (0, -1)$ with $\lambda _ 1^\ast, \lambda _ 2^\ast \ge 0$, forcing the first component of $\nabla f(x^\ast)$ to be zero. So for e.g. $f(x) = x _ 1$ (gradient $(1, 0)$), the global minimiser $(0, 0)$ is not a KKT point — exactly the failure mode the CQ hypothesis exists to rule out (see ∆kkt-necessity-equality-only-proof, where the CQ is what guarantees the feasible-path tangent direction $s$ can take any element of $\ker J _ E$).
Necessity of KKT conditions
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall the ∆kkt-conditions that $\hat x$ is a KKT point if there exists $\hat y \in \mathbb R^m$ and $\hat \lambda \in \mathbb R^p$ such that $(\hat x, \hat y, \hat \lambda)$ satisfies
- $\nabla f(\hat x) = \sum _ {j \in E} \hat y _ j \nabla c _ j(\hat x) + \sum _ {i \in I} \hat \lambda _ i \nabla c _ i(\hat x)$
- $c _ E(\hat x) = 0$, $c _ I(\hat x) \ge 0$
- $\hat \lambda _ i \ge 0$, $\hat \lambda _ i c _ i(\hat x) = 0$, for all $i \in I$
@State (poorly) a theorem that describes the necessity of the KKT conditions.
Under suitable constraint qualifications, if $x^\ast$ is a local minimiser of the constrained optimisation problem, then $x^\ast$ is a KKT point of the constrained optimisation problem.
Suppose we have the constrained optimisation problem
\[\min _ {x \in \mathbb R^n} f(x) \quad \text{s.t.} \quad c _ E(x) = 0,\, c _ I(x) \ge 0\]
where $c _ E : \mathbb R^n \to \mathbb R^m$ and $c _ I : \mathbb R^n \to \mathbb R^p$. Recall the ∆kkt-conditions that $\hat x$ is a KKT point if there exists $\hat y \in \mathbb R^m$ and $\hat \lambda \in \mathbb R^p$ such that $(\hat x, \hat y, \hat \lambda)$ satisfies
- $\nabla f(\hat x) = \sum _ {j \in E} \hat y _ j \nabla c _ j(\hat x) + \sum _ {i \in I} \hat \lambda _ i \nabla c _ i(\hat x)$
- $c _ E(\hat x) = 0$, $c _ I(\hat x) \ge 0$
- $\hat \lambda _ i \ge 0$, $\hat \lambda _ i c _ i(\hat x) = 0$, for all $i \in I$
@Prove that, under suitable constraint qualifications,
\[x^\ast \text{ local min} \implies x^\ast \text{ is a KKT point}\]
in the special case where there are no inequality conditions.
Let $I = \emptyset$. Then the KKT conditions are equivalent to:
- $c _ E(x^\ast) = 0$ (which is trivially implied by the definition of $x^\ast$ being a local minimiser), and
- $\nabla f(x^\ast) = J _ E(x^\ast)^\top y^\ast$ for some $y^\ast \in \mathbb R^m$, where $J _ E$ is the Jacobian matrix of the constraints $c _ E$.
Consider now functions $x : \mathbb R \to \mathbb R^n$ which are feasible perturbations and paths around $x^\ast$ parameterised by a scalar $\alpha$. For each such path we require:
- $x \in \mathcal C^1(\mathbb R)$
- $x(0) = x^\ast$
- $c _ E(x(\alpha)) = 0$ for all sufficiently small $\alpha$ (so the curve stays in the feasible set)
- $s := x'(0) \ne 0$ (so the path is non-trivial)
By Taylor’s theorem applied to $x$ itself, the first two bullets imply
\[x(\alpha) = x^\ast + \alpha s + \mathcal O(\alpha^2),\]so $s$ is the velocity of the path at $\alpha = 0$, not an independent algebraic assumption on the form of $x$.
Separately, the theorem’s “suitable constraint qualifications” hypothesis enters not as a condition on any individual path, but as a fact about the family of feasible paths satisfying the four bullets: it guarantees that as $x$ ranges over all such paths, the tangent $s = x'(0)$ can be made to equal any prescribed element of $\text{Null}(J _ E(x^\ast))$. This is what lets us upgrade the per-path conclusion of Step 2 to a “for all $s \in \text{Null}(J _ E(x^\ast))$” statement, which Step 3 then plugs into linear algebra.
Step 1: we show that the tangent $s$ of any feasible path satisfying the bullets above lies in $\text{Null}(J _ E(x^\ast))$. For any $i \in E$, by Taylor’s theorem for $c _ i(x(\alpha))$ around $x^\ast$, we have
\[\begin{aligned} 0 &= c _ i(x(\alpha)) \\ &= c _ i(x^\ast + \alpha s + \mathcal O(\alpha^2)) \\ &= c _ i(x^\ast) + \nabla c _ i(x^\ast)^\top (x^\ast + \alpha s - x^\ast) + \mathcal O(\alpha^2) \\ &= \alpha \nabla c _ i(x^\ast)^\top s + \mathcal O(\alpha^2) \end{aligned}\]where we used $c _ i(x^\ast) = 0$. Dividing both sides by $\alpha$, we obtain
\[0 = \nabla c _ i(x^\ast)^\top s + \mathcal O(\alpha)\]for all $\alpha$ sufficiently small. As $\alpha \to 0$, we obtain
\[\nabla c _ i(x^\ast)^\top s = 0\]for all $i \in E$, and so $J _ E(x^\ast) s = 0$.
Step 2: we show that the same tangent $s$ also satisfies $\nabla f(x^\ast)^\top s = 0$, and then use the constraint qualification to extend this from “the $s$ we chose” to “every $s \in \text{Null}(J _ E(x^\ast))$”. Expanding $f$, we have that
\[\begin{aligned} f(x(\alpha)) &= f(x^\ast) + \nabla f(x^\ast)^\top (x^\ast + \alpha s - x^\ast) + \mathcal O(\alpha^2) \\ &= f(x^\ast) + \alpha \nabla f(x^\ast)^\top s + \mathcal O(\alpha^2) \end{aligned}\]Since $x^\ast$ is a local minimiser of $f$, we have $f(x(\alpha)) \ge f(x^\ast)$ for all $\alpha$ sufficiently small. Thus $\alpha \nabla f(x^\ast)^\top s + \mathcal O(\alpha^2) \ge 0$ for all $\alpha$ sufficiently small. Considering $\alpha > 0$, we divide by $\alpha$ to obtain
\[\nabla f(x^\ast)^\top s + \mathcal O(\alpha) \ge 0\]and letting $\alpha \to 0$, we deduce $\nabla f(x^\ast)^\top s \ge 0$. Similarly by considering $\alpha < 0$, we obtain $\nabla f(x^\ast)^\top s \le 0$. Hence $\nabla f(x^\ast)^\top s = 0$ for the tangent $s$ of the particular path we chose. By the constraint qualification, every $s \in \text{Null}(J _ E(x^\ast))$ is realised as the tangent of some such feasible path, and applying the argument above to each one we conclude
\[\nabla f(x^\ast)^\top s = 0 \text{ for all }s \in \text{Null}(J _ E(x^\ast)) \quad (1)\]Step 3: we deduce the missing half of the KKT conditions, $\nabla f(x^\ast) = J _ E(x^\ast)^\top y^\ast$, from $(1)$ by linear algebra. By the rank-nullity theorem, $(1)$ says that $\nabla f(x^\ast) \in \text{Null}(J _ E(x^\ast))^\perp$. The fundamental theorem of linear algebra (a corollary of rank-nullity) states that for any matrix $A$,
\[\text{Null}(A)^\perp = \text{Range}(A^\top).\]Applied with $A = J _ E(x^\ast)$, we have that $\nabla f(x^\ast)$ must belong to the range space of $J _ E(x^\ast)^\top$ and so $\nabla f(x^\ast) = J _ E(x^\ast)^\top y^\ast$ for some $y^\ast$. as required.
Worked examples
Find all KKT points of the equality-constrained problem
\[\min _ {x \in \mathbb R^2} x _ 1^2 + x _ 2^2 \quad \text{s.t.} \quad x _ 1 + x _ 2^2 = \alpha, \qquad \alpha \ge 0,\]
and classify each as a minimiser or not. A discussion based on $\alpha$ is needed.
Why KKT is necessary: the single constraint $c(x) = x _ 1 + x _ 2^2 - \alpha$ has $\nabla c(x) = (1, 2x _ 2) \ne 0$ everywhere, so ∆licq holds and ∆kkt-necessity-theorem-statement applies: every local minimiser is a KKT point.
Step 1 (Lagrangian + stationarity): with $\mathcal L = x _ 1^2 + x _ 2^2 - \mu(x _ 1 + x _ 2^2 - \alpha)$,
\[\nabla _ x \mathcal L = \begin{pmatrix} 2x _ 1 - \mu \\ 2x _ 2 - 2\mu x _ 2 \end{pmatrix} = 0 \quad\Longrightarrow\quad 2x _ 1 - \mu = 0, \quad 2x _ 2(1 - \mu) = 0.\]Watch: the first equation gives $\mu = 2x _ 1$, not $\mu = x _ 1/2$. This value is load-bearing: get it wrong and the second family below disappears.
Step 2 (case split) on $2x _ 2(1-\mu) = 0$:
- $x _ 2 = 0$: feasibility gives $x _ 1 = \alpha$ and $\mu = 2\alpha$. KKT point $(\alpha, 0)$ for every $\alpha \ge 0$.
- $\mu = 1$: then $x _ 1 = \tfrac12$, and feasibility $x _ 1 + x _ 2^2 = \alpha$ gives $x _ 2^2 = \alpha - \tfrac12$, real only if $\alpha \ge \tfrac12$. KKT points $\left(\tfrac12, \pm\sqrt{\alpha - \tfrac12}\right)$.
Step 3 (classify via second-order conditions): $\nabla^2 _ {xx}\mathcal L = \operatorname{diag}(2,\ 2(1-\mu))$, checked on feasible directions $\nabla c(x)^\top s = 0$ (∆second-order-necessary-conditions-constrained).
- At $(\alpha, 0)$: $\nabla c = (1,0)$, so feasible directions have $s _ 1 = 0$, i.e. $s = (0, s _ 2)$. With $\mu = 2\alpha$, $s^\top \nabla^2 _ {xx}\mathcal L\, s = 2(1 - 2\alpha) s _ 2^2$, which is $\ge 0$ iff $\alpha \le \tfrac12$. So $(\alpha,0)$ is the global minimiser for $\alpha \in [0, \tfrac12]$; for $\alpha > \tfrac12$ the second-order necessary condition fails, so it is not a minimiser.
- At $\left(\tfrac12, \pm\sqrt{\alpha-\tfrac12}\right)$ (only for $\alpha > \tfrac12$): $\mu = 1$, $\nabla^2 _ {xx}\mathcal L = \operatorname{diag}(2,0) \succeq 0$. These are the global minimisers, since $f = \alpha - \tfrac14 < \alpha^2 = f(\alpha, 0)$.
Existence: a global minimiser exists by Weierstrass (continuous $f$, closed feasible set).
Consider
\[\min _ {x \in \mathbb R^n} x _ 1 + x _ 2 + \cdots + x _ n \quad \text{s.t.} \quad x _ 1 x _ 2 \cdots x _ n = 1 \ \text{ and } \ x _ i \ge a _ i,\ i = 1,\ldots,n,\]
where $a _ i > 0$ and $a _ 1 a _ 2 \cdots a _ n < 1$. Show the problem has a unique KKT point.
Step 1 (linearise the awkward constraint): since each $x _ i \ge a _ i > 0$, rewrite $\prod _ i x _ i = 1$ as
\[c(x) := \sum _ {i=1}^n \log x _ i = 0.\]This is the key move: it turns the product into a sum and makes stationarity linear in $1/x _ i$, avoiding the messy $\prod _ {j \ne i} x _ j$.
Step 2 (stationarity): with $\mathcal L = \sum _ i x _ i - \mu \sum _ i \log x _ i - \sum _ i \lambda _ i (x _ i - a _ i)$,
\[\partial _ {x _ i}\mathcal L = 1 - \frac{\mu}{x _ i} - \lambda _ i = 0 \quad\Longrightarrow\quad 1 = \frac{\mu}{x _ i} + \lambda _ i.\]Step 3 (case split on complementary slackness $\lambda _ i(x _ i - a _ i) = 0$, $\lambda _ i \ge 0$):
- $\lambda _ i = 0$: then $x _ i = \mu$.
- $\lambda _ i > 0$: then $x _ i = a _ i$, with $\lambda _ i = 1 - \mu/a _ i \ge 0$.
In both cases $x _ i = \max(a _ i, \mu)$, so a KKT point is fixed entirely by the single scalar $\mu > 0$.
Step 4 (uniqueness via monotonicity): the equality constraint forces
\[\prod _ {i=1}^n \max(a _ i, \mu) = 1.\]The left side equals $\prod _ i a _ i < 1$ for $\mu \le \min_i a_i$, is continuous and strictly increasing for $\mu > \min _ i a _ i$, and $\to \infty$ as $\mu \to \infty$. So there is exactly one $\mu^\ast$ with product $1$, which fixes a unique $x^\ast = (\max(a _ i, \mu^\ast)) _ i$ and unique multipliers. Hence a unique KKT point.
Step 5 (it is a minimiser): objective and inequality constraints are linear, so only the equality contributes to the Lagrangian Hessian. With $c = \sum _ i \log x _ i$, $\nabla^2 c = -\operatorname{diag}(1/x _ i^2)$, so $\nabla^2 _ {xx}\mathcal L = -\mu \nabla^2 c = \mu \operatorname{diag}(1/x _ i^2) \succ 0$ since $\mu > 0$. Hence the KKT point is a strict local minimiser.
Bite-sized
Complementary slackness $\hat \lambda _ i c _ i(\hat x) = 0$ (for $i \in I$) splits inequality constraints into two classes: either the constraint is active ($c _ i(\hat x) = 0$) and $\hat \lambda _ i \ge 0$ may be nonzero, or inactive ($c _ i(\hat x) > 0$) and $\hat \lambda _ i = 0$. The algorithm has to discover this combinatorial structure, which is precisely the difficulty interior-point methods sidestep by replacing $c _ i \lambda _ i = 0$ with $c _ i \lambda _ i = \mu$.
The Lagrangian function of the constrained problem $\min f(x)$ s.t. $c _ E(x) = 0$, $c _ I(x) \ge 0$ is $\mathcal L(x, y, \lambda) := f(x) - y^\top c _ E(x) - \lambda^\top c _ I(x)$. The KKT stationarity condition is then equivalent to $\nabla _ x \mathcal L(\hat x, \hat y, \hat \lambda) = 0$.
The tangent cone $T _ \Omega(x)$ and the set of linearised feasible directions $\mathcal F(x)$ both attempt to capture “feasible directions at $x$”. What does each capture, and what is the role of a constraint qualification?
- $T _ \Omega(x)$ is geometric: limits of normalised feasible sequences $(z^k - x)/t^k$, $z^k \in \Omega$, $t^k \to 0^+$. It depends only on the set $\Omega$, not the constraint functions.
- $\mathcal F(x)$ is algebraic: directions $s$ satisfying the first-order linearisation $s^\top \nabla c _ i(x) = 0$ (for $i \in E$), $s^\top \nabla c _ i(x) \ge 0$ (for active $i \in I$). It depends on the chosen representation of $\Omega$ via constraint functions.
In general $T _ \Omega(x) \subseteq \mathcal F(x)$, with strict inclusion possible (e.g. cusp-like geometries where $\mathcal F$ is too large). A constraint qualification is a sufficient condition for $T _ \Omega(x) = \mathcal F(x)$, so the algebraic linearisation faithfully represents the geometry and the KKT proof goes through.
A feasible descent direction at $x \in \Omega$ for the equality-only case is any $s$ satisfying $J _ E(x) s = 0$ (stays feasible to first order) and $\nabla f(x)^\top s < 0$ (decreases $f$). The non-existence of such $s$ at a local minimiser is the geometric content of the KKT conditions.
Give the proof strategy for ∆kkt-necessity-equality-only-proof (Theorem 16, equality-only case: under a constraint qualification, local minimisers are KKT points).
- Setup: parameterise a $\mathcal C^1$ feasible perturbation $x(\alpha) = x^\ast + \alpha s + O(\alpha^2)$, $x(0) = x^\ast$, $c _ E(x(\alpha)) = 0$ for small $\alpha$. Existence of such $x(\alpha)$ for given $s \ne 0$ is what the constraint qualification buys.
- Step 1 (linearise the constraints): Taylor each $c _ i \in c _ E$ along $x(\alpha)$, use $c _ i(x^\ast) = 0$, divide by $\alpha$, let $\alpha \to 0$: $\nabla c _ i(x^\ast)^\top s = 0$. Hence $J _ E(x^\ast) s = 0$, i.e. $s \in \ker J _ E(x^\ast)$.
- Step 2 (expand $f$): Taylor $f \circ x$: $f(x(\alpha)) = f(x^\ast) + \alpha \nabla f(x^\ast)^\top s + O(\alpha^2) \ge f(x^\ast)$. Divide by $\alpha > 0$, $\alpha \to 0$: $\nabla f(x^\ast)^\top s \ge 0$. Symmetric argument with $\alpha < 0$ gives $\le 0$. So $\nabla f(x^\ast)^\top s = 0$ for every $s \in \ker J _ E(x^\ast)$.
- Step 3 (linear algebra): by $\ker(A)^\perp = \operatorname{Range}(A^\top)$ applied to $A = J _ E(x^\ast)$, $\nabla f(x^\ast) \in \operatorname{Range}(J _ E(x^\ast)^\top)$, so $\nabla f(x^\ast) = J _ E(x^\ast)^\top y^\ast$ for some $y^\ast$. That is the missing stationarity half of KKT.
The constraint linearisation at the heart of Step 1 of ∆kkt-necessity-equality-only-proof. For each $i \in E$, Taylor-expanding $c _ i(x(\alpha))$ with $x(\alpha) = x^\ast + \alpha s + O(\alpha^2)$ and $c _ i(x^\ast) = 0$ gives $c _ i(x(\alpha)) = \alpha \nabla c _ i(x^\ast)^\top s + O(\alpha^2) = 0$. Dividing by $\alpha$ and letting $\alpha \to 0$ yields
\[\nabla c _ i(x^\ast)^\top s = 0 \quad \forall i \in E, \quad \text{i.e.} \quad <span class="cloze" tabindex="0">J _ E(x^\ast) s = 0</span>.\]This identifies the admissible $s$ as the null space of the equality-constraint Jacobian.
In ∆kkt-necessity-equality-only-proof (Theorem 16 / equality case), where does the constraint qualification enter, and what specifically breaks without one?
- The CQ enters at the very first step: the existence of a $\mathcal C^1$ feasible curve $x(\alpha) = x^\ast + \alpha s + O(\alpha^2)$ tangent to any given direction $s \in \ker J _ E(x^\ast)$. Without the CQ, $\ker J _ E(x^\ast)$ may contain directions that are not tangent to any feasible path — i.e. the algebraic tangent space (∆set-of-linearised-feasible-directions $\mathcal F(x^\ast)$) is strictly larger than the geometric one (∆tangent-cone $T _ \Omega(x^\ast)$).
- Without the CQ the proof’s local-minimum step (Step 2) still produces $\nabla f(x^\ast)^\top s \ge 0$ for $s$ tangent to feasible curves, but not for every $s \in \ker J _ E(x^\ast)$. So we cannot conclude $\nabla f(x^\ast) \in \operatorname{Range}(J _ E^\top)$ — the KKT multipliers $y^\ast$ may not exist.
- See ∆constraint-qualifications-failure-example for a concrete cusp where CQ fails and KKT can fail to hold.
For an optimisation problem with a single active constraint (one equality $c(x) = 0$, or one active inequality), what does the ∆licq reduce to, and why?
LICQ requires the gradients of the active constraints to be linearly independent. With only one active constraint $c$, that set is just $\{\nabla c(x)\}$ — and a single vector is linearly independent iff it is nonzero. So LICQ reduces to
\[\nabla c(x) \neq 0\]at the relevant $x$ (the candidate KKT point / local minimiser).
Example (Exam 2016, Q3(a)): for $\min\ x _ 1^2 + x _ 2^2$ s.t. $c(x) = x _ 1 + x _ 2^2 - \alpha = 0$, we have $\nabla c(x) = (1,\ 2x _ 2) \neq 0$ for every $x$, so LICQ holds everywhere and the ∆kkt-conditions are necessary at any local minimiser.
Visualising the KKT conditions
The ∆kkt-conditions are easiest to feel as a force balance. Drag the candidate point $x$ and the objective’s minimiser; the checklist tracks all four conditions live. Stationarity $\nabla f = \sum _ i \lambda _ i \nabla c _ i$ asks whether the downhill pull $-\nabla f$ can be balanced by the active walls pushing back along their inward normals $\nabla c _ i$ with strengths $\lambda _ i$. Dual feasibility needs those pushes to be non-negative (a red arrow flags a $\lambda _ i < 0$, meaning $\nabla f$ leaves the cone and a feasible descent direction exists), and ∆bite-complementary-slackness-intuition forces $\lambda _ i = 0$ on the inactive constraints. All four light up exactly at a constrained minimiser.
Visualising the critical cone
The ∆set-of-linearised-feasible-directions $\mathcal F(x^\ast)$ and the ∆critical-cone $F(\lambda^\ast)$ are clearest in two dimensions. Drag $x^\ast$ around the boundary of the feasible region (it snaps to edges and vertices, fixing the active set) and drag the gradient $\nabla f$. At a KKT point the critical cone collapses to $\{0\}$ when every active constraint is strongly active, and opens to a ray or a line exactly where a multiplier vanishes, picking out the directions on which the ∆second-order-necessary-conditions-constrained must be checked.