Notes - ADS HT24, Randomised rounding

[[Course - Algorithms and Data Structures HT24]]^U
- [[Notes - ADS HT24, Approximation algorithms]]^U

Flashcards

What is the idea behind randomised rounding methods for approximating integral linear programs?

Round the solutions to the relaxed LP randomly.

Suppose we have an NP-complete minimisation problem $\mathbf \Pi$. What’s the main idea behind the LP rounding method for finding approximation algorithms?

Express $\mathbf \Pi$ as an integer linear program
Relax integrality constraints to just inequalities
Round a solution to the relaxed problem to integers

The $\mathbf{MinVertexCover}$ optimisation problem is:

Input: Graph $G = (V, E)$

Output: A vertex cover of minimum size

Give an integer linear program for this problem
State the relaxed version with a rounding rule
Prove that this rounding rule gives a valid vertex cover, and that in fact it is a $2$-approximation algorithm

Let $x _ v$ be variables for each vertex $v \in V$, where $x _ v = 1$ if $v$ belongs to a vertex cover and $0$ otherwise.

Then the integer linear program is:

\[\begin{aligned} \min \quad& \sum_{v \in V} x_v \\\\ \text{s.t.} \quad & x_u + x_v \ge 1 \quad \forall (u, v) \in E \\\\ \quad & x_v \in \\{0, 1\\} \quad \forall v \in V \end{aligned}\]

The relaxed version is:

\[\begin{aligned} \min \quad& \sum_{v \in V} x_v \\\\ \text{s.t.} \quad & x_u + x_v \ge 1 \quad \forall (u, v) \in E \\\\ \quad & 0 \le x_v \le 1 \quad \forall v \in V \end{aligned}\]

Suppose we have a solution $\{x^\star _ v\} _ {v \in V}$ to the relaxed problem. Then we can construct a solution $y _ v$ to the original ILP as follows:

\[y_v = \begin{cases} 0 &\text{if } x_v < 1/2 \\\\ 1 &\text{if } x_v \ge 1/2 \end{cases}\]

Proof this is a valid vertex cover: We need to show that every edge $(u, v)$ has at least one endpoint in $C = \{v \mid y _ v = 1\}$. Since $\forall e \in E$, $x _ u + x _ v \ge 1$, at least one of $x^\star _ u, x^\star _ v \ge 1/2$ and so at least one of $y _ u$ or $y _ v$ is $1$.

Proof that this is a $2$-approximation algorithm: Note that since the feasible region for the relaxed problem is bigger, we must have

\[\sum_{v \in V} x^\star_v \le |C^\star|\]

Hence

\[\begin{aligned} |C| &= \big|\\{v \mid y_v = 1\\}\big| \\\\ &= \sum_{v \in V} y_v \\\\ &\le 2 \sum_{v \in V} x^\star_v \\\\ &\le 2|C^\star| \end{aligned}\]

since $y _ v \le 2x^\star _ v$ for any $v$ in the definition above.

The $\mathbf{MinSetCover}$ optimisation problem is:

Input: Set $U = \{u _ 1, \cdots, u _ n\}$, collection $\mathcal F = \{S _ 1, \cdots, S _ m\}$ of subsets of $U$.

Output: The minimum number of sets from $\mathcal F$ whose union is $U$

We can define the frequency $f _ u$ of some $u \in U$ as $ \vert \{S \in \mathcal F \mid u \in S\} \vert $, and then let $f = \max _ {u \in U} f _ u$.

Give an integer linear program for this problem
State the relaxed version with a rounding rule
Prove that this rounding rule gives a valid set cover, and that in fact it is a $f$-approximation algorithm

Let $x _ S$ be variables for each subset $S \in \mathcal F$, where $x _ s = 1$ if $v$ belongs to a set cover and $0$ otherwise.

Then the integer linear program is:

\[\begin{aligned} \min \quad& \sum_{S \in \mathcal F} x_S \\\\ \text{s.t.} \quad & \sum_{S \in \mathcal F \text{ s.t. } u \in S} x_S \ge 1 \quad \forall u \in U \\\\ \quad & x_S \in \\{0, 1\\} \quad \forall S \in \mathcal F \end{aligned}\]

The relaxed linear program is:

\[\begin{aligned} \min \quad& \sum_{S \in \mathcal F} x_S \\\\ \text{s.t.} \quad & \sum_{S \in \mathcal F \text{ s.t. } u \in S} x_S \ge 1 \quad \forall u \in U \\\\ \quad & 0 \le x_S \le 1 \quad \forall S \in \mathcal F \end{aligned}\]

Suppose we have a solution $\{x^\star _ v\} _ {S \in \mathcal F}$ to the relaxed problem. Then we can construct a solution (though maybe not optimal) to the original problem $y _ S$ like so:

\[C = \\{S \in \mathcal F \mid x^\star_S \ge 1/f\\}\] \[y_S = \begin{cases} 1 &\text{if } S \in C \\\\ 0 &\text{if } S \notin C \end{cases}\]

Proof this is a valid set cover: We need to show that for every $u \in U$, $\sum _ {S \in F \text{ s.t. } u \in S} y _ S \ge 1$. But this sum has at most $f$ terms, and since $\sum _ {S \in \mathcal F} x^\star _ S \ge 1$, so at least one $x^\star _ S$ with $u \in S$ has $x^\star _ S \ge 1/f$. Hence at least one $y _ S$ is $1$.

Proof this is an $f$-approximation algorithm: Note that since the feasible region for the relaxed problem is bigger, we must have

\[\sum_{S \in \mathcal F} x^\star_S \le |C^\star|\]

Hence

\[\begin{aligned} |C| &= \big|\\{S \in \mathcal F \mid y_S = 1\\}\big| \\\\ &= \sum_{S \in \mathcal F} y_S \\\\ &\le f \sum_{S \in \mathcal F} x^\star_S \\\\ &\le f|C^\star| \end{aligned}\]

The $\mathbf{LargeCut}$ problem is:

Input: Graph $G$, integer $M > 0$.

Output: Does $G$ have a cut of capacity $\ge M$?

Can you:

Give an integer linear program for this problem (as an optimisation problem, rather than a feasibility problem)
State the relaxed version with a rounding rule
Show that this is a $\frac 1 2$-approximation algorithm
(don’t worry about showing that the integer linear program is indeed correct)

Integer linear program:

\[\begin{aligned} &\text{max} &&\sum_{e \in E} w(e) z_e \\\\ &\text{s.t.} &&z_e \le x_u &&& \forall e = (u, v) \in E \\\\ & &&z_e \le 1 - x_v &&&\forall e = (u, v) \in E \\\\ & && z_e, x_v \in \\{0, 1\\} &&&\forall e \in E, \forall v \in V \end{aligned}\]

We interpret $u \in A$ if $x _ u = 0$, and $u \in B$ if $x _ u = 1$. Then $z _ e = 1$ if the edge crosses the cut.

Relaxed linear program:

\[\begin{aligned} &\text{max} &&\sum_{e \in E} w(e) z_e \\\\ &\text{s.t.} &&z_e \le x_u &&& \forall e = (u, v) \in E \\\\ & &&z_e \le 1 - x_v &&&\forall e = (u, v) \in E \\\\ & && 0 \le z_e, x_v \le 1 &&&\forall e \in E, \forall v \in V \end{aligned}\]

Rounding rule:

The general strategy in these problems is to somehow relate $z _ e$ to the probability that assignment of terms in the objective function are also in the correct solution.

Since we want a $\frac 1 2$-approximation algorithm, we want $p(e \text{ included})$ to be at least $\frac{z _ e}{2}$. To achieve this, we can add each vertex $u$ to $A$ with probability $\frac 1 4 + \frac{x _ u}{2}$. Then we have

\[\begin{aligned} p(e = (u, v) \text{ included}) &= \left(\frac 1 4 + \frac{x_u}{2}\right) \left(\frac 3 4 - \frac{x_v}{2}\right) \\\\ &\ge \left(\frac{1}{4} + \frac{z_e}{2}\right)^2 \\\\ &\ge \frac{z_e}{2} \end{aligned}\]

(where we use the inequality that $(a + b)^2 \ge 4ab$. Then

\[\begin{aligned} \mathbb E[\text{weight}] &= \sum_{e \in E} w(e) \cdot \mathbb E[\mathbb 1_e] \\\\ &= \sum_{e \in E} w(e) \cdot p(e = (u, v) \text{ included}) \\\\ &\ge \sum_{e \in E} w(e) \cdot \frac{z_e}{2} \\\\ &= \frac 1 2 \sum_{e \in E} w(e) z_e \\\\ &\ge \frac 1 2 \text{OPT} \end{aligned}\]

since in the relaxed LP we are optimising over a larger region.

The $\mathbf{MaxSAT}$ optimisation problem is:

Input: CNF formula $\phi$

Output: The max number of satisfied clauses for some assignment $\sigma$.

Describe a randomised $1/2$-approximation algorithm (where the $1/2$ approximation of the optimal solution is in expectation), and prove that after repeating it $10( \vert C \vert + 1)$ times, the probability of it not satisfying at least half the optimal number of clauses is $e^{-10}$.

Randomly pick true or false for each variable with probability $1 / 2$.

Let $X$ be the number of satisfied clauses for some particular assigment $\sigma$ and $X _ c$ be the random variable which is $1$ if $C$ is satisfied by $\sigma$ and $0$ otherwise. Then

\[X = \sum_{c \in C} X_c\]

Hence

\[\begin{aligned} \mathbb E[X] &= \sum_{c \in C} \mathbb E[X_c] \\\\ &= \sum_{c \in C} \left( 1 - \frac{1}{2^{|c|}\\,} \right) \\\\ &\ge \frac 1 2 |C| \end{aligned}\]

Then we wish to estimate

\[P\left(\text{after 10(|C| + 1) iterations, the best } X < \frac 1 2 |C| \right) = P\left(X < \frac 1 2 |C|\right)^{10(|C|+1)}\]

We use Markov’s inequality which states:

Let $Y$ be a nonnegative random variable and $a > 0$. Then $P(X \ge a) \le \frac{\mathbb E[X]}{a}$.

Let $Y$ be $ \vert C \vert - X$, i.e. the number of unsatisfied clauses, and note that $\mathbb E[Y] = \frac 1 2 \vert C \vert $ also. Then

\[\begin{aligned} P\left(Y > \frac 1 2 |C|\right) &= P\left(Y \ge \frac 1 2 (|C| + 1)\right)\\\\ &\le^{(\star 1)} \frac{\mathbb E[Y]}{\frac{1}{2} (|C| + 1)} \\\\ &\le \frac{\frac 1 2 |C|}{\frac 1 2(|C| + 1)} \\\\ &= \frac{|C|}{|C| + 1} \\\\ &= \left(1 - \frac{1}{|C| + 1}\right) \end{aligned}\]

Then

\[\begin{aligned} P(\text{after }10(|C| + 1) \text{ iterations, the best }Y > |C|/2 ) &\le \left(1 - \frac{1}{|C| + 1}\right)^{10(|C| + 1)} \\\\ &\le e^{-10} \end{aligned}\]

Thus

\[\begin{aligned} P(X < |C|/2) &\le \left(1 - \frac{1}{|C|+1}\right)^{10(|C|+1)} \\\\ &\le e^{-10} \end{aligned}\]

The $\mathbf{MaxSAT}$ optimisation problem is:

Input: CNF formula $\phi$

Output: The max number of satisfied clauses for some assignment $\sigma$.

A $\frac 1 2$-approximation algorithm for $\mathbf{MaxSAT}$ is given by just choosing randomly whether to set each variable to true or false. Can you describe how to use the method of conditional expectations in order to derandomise the algorithm?

Let $G(r _ 1, \cdots, r _ i)$ be the proportion of “good solutions” in the tree of all possible choices after using $i$ random bits $r _ 1, \cdots, r _ i$. Then, we have

\[\begin{aligned} G(r_1, \cdots, r_i) &= \mathbb P(\text{alg. terminates w/ good solution} \mid r_1, \cdots, r_i) \\\\ &= \frac 1 2 G(r_1, \cdots, r_i, 0) + \frac 1 2 G(r_1, \cdots, r_i, 1) \end{aligned}\]

Thus at least one of $G(r _ 1, \cdots, r _ i, 0)$ or $G(r _ 1, \cdots, r _ i, 1)$ is larger than $G(r _ 1, \cdots, r _ i)$. Since we can calculate $G$ given a new choice by simplifying the formula given that assignment and calculating the updated expectation, assuming there are $n$ variables we can ensure

\[\begin{aligned} G(r_1, \cdots, r_n) &\ge G(r_1, \cdots, r_{n-1}) \\\\ &\ge \cdots \\\\ &\ge G(r_1) \\\\ &\ge G(\emptyset) \\\\ &= \mathbb P(\text{alg. terminates w/ good solution}) \\\\ &= \frac 1 2 \end{aligned}\]

Since $G(r _ 1, \cdots, r _ n)$ is either $0$ or $1$ (we are at the bottom of the tree), we can conclude that $G(r _ 1, \cdots, r _ n) = 1$, and we have successfully derandomised the algorithm.

The $\mathbf{MaxSAT}$ optimisation problem is:

Input: CNF formula $\phi$

Output: The max number of satisfied clauses for some assignment $\sigma$.

One $\frac 1 2$-approximation algorithm is given by picking a random assignment or true/false for each variable. If $X$ is the number of satisfied clauses, this gives:

\[\begin{aligned} \mathbb E[X] &= \sum_{c \in C} \left(1 - \frac{1}{2^{|c|} }\right) \\\\ &\ge \frac 1 2 \text{OPT}(\Phi) \end{aligned}\]

Can you:

Describe an algorithm based on randomised LP rounding that achieves a better $1 - 1/e$ approximation ratio
Show how you can combine these methods to achieve a $3/4$-approximation algorithm.

Let:

$\Phi$ be a CNF formula containing variables $x _ 1, \cdots, x _ n$
$\text{OPT}(\Phi)$ be an optimal solution to $\mathbf{MaxSAT}$ for $\Phi$
$C$ denote all clauses
For each clause $c$:
- $c^+ = \{\text{the indices of variables appearing positively}\}$
- $c^{-} = \{\text{the indices of variables appearing negatively}\}$

We can formulate $\mathbf{MaxSAT}$ as the following integer LP:

\[\begin{aligned} &\text{max} &&\sum_{c \in C} z_c \\\\ &\text{s.t.} && \sum_{i \in c^+} y_i + \sum_{i \in c^-} (1-y_i) \ge z_c &&& \forall c \in C \\\\ & && y_i, z_c \in \\{0, 1\\} &&& \forall i, \forall c \in C \end{aligned}\]

The relaxed LP is then:

\[\begin{aligned} &\text{max} &&\sum_{c \in C} z_c \\\\ &\text{s.t.} && \sum_{i \in c^+} y_i + \sum_{i \in c^-} (1-y_i) \ge z_c &&& \forall c \in C \\\\ & && 0 \le y_i, z_c \le 1 &&& \forall i, \forall c \in C \end{aligned}\]

Let $y _ i^\star, z _ c^\star$ be the optimal solution to this LP. Construct a solution to the original LP via picking

\[y_i = \begin{cases} 1 &\text{with probability } y_i^\star \\\\ 0 &\text{with probability } (1-y_i^\star) \end{cases}\]

Then if $X$ is the number of satisfied clauses under this assignment, we have

\[\begin{aligned} \mathbb E[X] &= \sum_{c \in C} \mathbb P(c \text{ is satisfied}) \\\\ &= \sum_{c \in C} (1 - \mathbb P(c \text{ is not satisfied})) \\\\ &= \sum_{c \in C} \left(1- \left( \prod_{i \in c^+}(1-y_i) \prod_{i \in c^-}y_i \right)\right) \\\\ &\ge \sum_{c \in C}\left(1- \left( \frac{\sum_{i \in c^+} (1-y_i) + \sum_{i \in c^-} y_i}{|c|} \right)^{|c|}\right) && (1\star)\\\\ &\ge \sum_{c \in C}\left(1- \left( \frac{|c| - z_c^\star}{|c|} \right)^{|c|}\right) && (2\star) \\\\ &= \sum_{c \in C} \left(1- \left( 1-\frac{z_c^\star}{|c|} \right)^{|c|}\right) && (3\star) \\\\ &\ge \sum_{c \in C} \left(1- \left( 1-\frac{1}{|c|} \right)^{|c|}\right) z_c^\star \\\\ &\ge \sum_{c \in C} (1 - 1/e)z_c^\star \\\\ &\ge (1-1/e)\text{OPT}(\Phi) \end{aligned}\]

where:

$(1\star)$ follows from the AM-GM inequality
$(2\star)$ follows from the LP constraint for clause $c$
$(3\star)$ follows from the general statement that $1 - (1-x/k)^k \ge (1-(1-1/k)^k)x$ for $x \in [0, 1]$.

We now have two expressions for the expected number of satisfied clauses, one from the algorithm that picks randomly, and the one we have just formulated:

\[\begin{aligned} &\mathbb E[X] = \sum_{c \in C} \left(1 - \frac{1}{2^{|c|} }\right) \ge \frac 1 2 \text{OPT}(\Phi) \\\\ &\mathbb E[X] \ge \sum_{c \in C} \left(1- \left( 1-\frac{1}{|c|} \right)^{|c|}\right) z_c^\star \ge (1 - 1/e)\text{OPT}(\Phi) \end{aligned}\]

Create a new algorithm that tries both of these algorithms and then outputs each with equal probability. Then

\[\begin{aligned} \mathbb E[X] &\ge \frac 1 2 \sum_{c \in C} \left(1 - \frac{1}{2^{|c|} }\right) + \frac 1 2 \sum_{c \in C} \left(1- \left( 1-\frac{1}{|c|} \right)^{|c|}\right) z_c^\star \\\\ &= \frac 1 2\sum_{c \in C} \left(\left(1 - \frac{1}{2^{|c|} }\right) + \left(1- \left( 1-\frac{1}{|c|} \right)^{|c|}\right) z_c^\star \right) \\\\ &\ge \frac 1 2\sum_{c \in C} \left(\left(1 - \frac{1}{2^{|c|} }\right) + 1- \left( 1-\frac{1}{|c|} \right)^{|c|} \right)z_c^\star \\\\ &\ge \frac 3 4 \sum_{c \in C} z^\star_c \\\\ &\ge \frac 3 4 \text{OPT}(\Phi) \end{aligned}\]

Here we use the fact that

\[(1 -1/2^k)+ 1 - (1-1/k)^k \ge 3/2\]

ADS HT24, Randomised rounding

Flashcards

Proofs

Related posts