Notes - ADS HT24, Matchings
- Shouldn’t need to but algorithm specifies that c_p(u, v) is negative if it is in the matching – only required without prices
- C_p is updated with respect to the original prices.
Flashcards
What is a bipartite graph $G = (V, E)$?
A graph where there exists a “bipartition” $(L, R)$ where edges are only from $L$ to $R$ or vice-versa.
What is a matching $M$ of a graph $G = (V, E)$?
A subset of the edges $E$ such that no two edges of $M$ share an endpoint.
What is the bipartite matching problem?
Given a bipartite graph $G$, find a max cardinality matching.
Suppose we have a bipartite graph $G = (L \cup R, E)$ and we want to find a bipartite matching. Describe the construction used to do this, which reduces it to a flow problem.
- Create $G’ = (V’, E’)$ where
- All edges $(u, v)$ go from left to right
- Add a source node $s$ and sink node $t$
- Add edges from $s$ to every vertex in $L$
- Add edges from every vertex in $R$ to $t$
- Set all edge capacitities to $1$
Suppose we have a bipartite graph $G = (L \cup R, E)$ and we want to find a bipartite matching. One way to do this is to
- Create $G’ = (V’, E’)$ where
- All edges $(u, v)$ go from left to right
- Add a source node $s$ and sink node $t$
- Add edges from $s$ to every vertex in $L$
- Add edges from every vertex in $R$ to $t$
- Set all edge capacitities to $1$
Quickly prove that
\[\text{max flow on }G' = \text{maximum matching on } G\]
- All edges $(u, v)$ go from left to right
- Add a source node $s$ and sink node $t$
- Add edges from $s$ to every vertex in $L$
- Add edges from every vertex in $R$ to $t$
- Set all edge capacitities to $1$
$\text{maximum matching on }G \le \text{max flow on }G’$
Let $M$ be a maximum matching. Then for each $(u, v) \in M$, with $u \in L$ and $v \in R$, send one unit of flow along $(s, u), (u, v), (v, t)$ in $G’$. This gives a flow with value $ \vert M \vert $.
$\text{max flow on }G’ \le \text{maximum matching on } G$
By the integrality theorem there exists an integer flow $f$ on $G’$, say with $ \vert f \vert = K$. Then letting $M = \{e \in E \mid f(e) = 1 \}$ gives a matching of size $k$ on $G$.
What does it mean for a matching $M$ to be perfect?
If every vertex $v$ is matched, i.e. it appears in exactly one edge of $M$.
Can you state Hall’s theorem, that gives you neccessary and sufficient conditions for a bipartite graph $G$ to have a perfect matching?
Suppose:
- $G = (L \cup R, E)$ is a bipartite graph
Then
- $G$ has a perfect matching
If and only if
- For every set $S \subseteq L$, $ \vert N(S) \vert \ge \vert S \vert $ (where $N(S)$ denotes the set of all nodes reachable in one edge from $S$, “neighbours”).
Quickly prove Hall’s theorem, i.e. that if
Suppose:
- $G = (L \cup R, E)$ is a bipartite graph
Then
- $G$ has a perfect matching
If and only if
- For every set $S \subseteq L$, $ \vert N(S) \vert \ge \vert S \vert $ (where $N(S)$ denotes the set of all nodes reachable in one edge from $S$, “neighbours”).
Perfect matching $\implies$ For every set $S \subseteq L, \vert N(S) \vert \ge \vert S \vert $
Each $v \in S$ must be matched to a different node in $N(S)$, so $ \vert N(S) \vert \ge \vert S \vert $.
For every set $S \subseteq L, \vert N(S) \vert \ge \vert S \vert $ $\implies$ Perfect matching
Consider the flow network $G’$ where
- All edges $(u, v)$ go from left to right
- Add a source node $s$ and sink node $t$
- Add edges from $s$ to every vertex in $L$
- Add edges from every vertex in $R$ to $t$
- Set all edge capacitities to $1$
(this is the standard construction for converting bipartite matching problems to max flow problems).
We want to show that the max flow in $G’ = \vert L \vert $, since this implies that the best matching has size $ \vert L \vert $, which implies it is perfect.
Consider an arbitrary $s$-$t$ cut $(A, B)$. Define $A _ L = A\cap L$ and $A _ R = A\cap R$. Then
\[\begin{aligned} \text{cap}(A, B) &= \sum_{e \text{ out of }A} c(e) \\\\ &\ge |L\setminus A_L| + |N(A_L) \setminus A_R| + |A_R| \quad (1\star)\\\\ &\ge |L| - |A_L| + |N(A_L)| - |A_R| + |A_R| \\\\ &\ge |L| - |A_L| + |A_L| - |A_R| + |A_R| \quad (2\star)\\\\ &= |L| \end{aligned}\]i.e. we have a cut in $G$ with capactiy $\ge \vert L \vert $.
- $(1\star)$: This is a fancy way of counting the edges that cross the cut:
- $ \vert L\setminus A _ L \vert $ corresponds to edges from $s$ to $L$, where these edges leave $A$
- $ \vert N(A _ L)\setminus A _ R \vert $ corresponds to edges from $L$ to $R$ that leave $A$
- $ \vert A _ R \vert $ correponds to edges from $R$ to $t$, where the edges leave $A$
- $(2\star)$: This comes from the assumptionthat $ \vert S \vert \le \vert N(S) \vert $ for any subset $S$.
For clearer illustration of $(1\star)$:
By the duality between min-cut and max-flow, and the fact the max-flow is bounded by $ \vert L \vert $ as a consequence of the construction, we have a max-flow with flow $ \vert L \vert $, so we are done.
Suppose that $G = (L \cup R, E)$ does not have a perfect matching. Then by Hall’s theorem, $\exists S \subseteq L$ s.t. $ \vert N(S) \vert < \vert S \vert $. Quickly prove that by modifying the graph $G’$ normally used to convert bipartite matching problems to max flow problems, we can find such a set.
Consider the flow network $G’$ where
- All edges $(u, v)$ go from left to right
- Add a source node $s$ and sink node $t$
- Add edges from $s$ to every vertex in $L$
- Add edges from every vertex in $R$ to $t$
- Set all edge capacitities to $1$
- …but increase the capacity of all $L$-$R$ edges to $\infty$
Since $G$ does not have a perfect matching, the value of the max flow $< n$ and so we can find an $s$-$t$ cut $(A, B)$ with $\text{cap}(A, B) < n$.
Increasing the capacity of all $L$-$R$ edges does not alter the value of the max flow, and by doing this we limit the types of $s$-$t$ cuts we can find corresponding to flows (in particular, $N(A _ L) \subseteq A _ R$, otherwise we would have an $\infty$-edge crossing the cut).
Let $A _ L = A \cap L$ and $A _ R = A \cap R$. ($A _ L$ turns out to be the relevant set later).
Then $N(A _ L) \subseteq A _ R$, otherwise $\text{cap}(A, B)$ contains an $\infty$-edge, so not $< n$.
But since $\text{cap}(A, B) = \vert L \setminus A _ L \vert + \vert A _ R \vert $ (this counts edges crossing the cut by leaving it from $s$, and also edges crossing the cut by leaving from $A _ R$). Therefore
\[\begin{aligned} |N(A_L)| &\le |A_R| \\\\ &= \text{cap}(A, B) - |L \setminus A_L| \\\\ &< |L| - |L \setminus A_L| \\\\ &= |L| - |L| + |A_L| \\\\ &= |A_L| \end{aligned}\]Why in a min-cost bipartitie mathcing problem can you assume that the graph is complete?
Otherwise you can add dummy edges with $\infty$ cost.
Suppose we are trying to solve the min-cost bipartite perfect matching problem.
- What is an $M$-alternating path $P$?
- Why is this useful?
- How can you use this to construct a greedy algorithm which actually solves the problem?
An $M$-alternating path $P$ is an alternating sequence of matched and unmatched vertices that starts and ends with unmatched vertices.
These are useful since if $M$ is a matching and $P$ is a $M$-alternating path, then $M’ = M \oplus P = (M \setminus P) \cup (P \setminus M)$ gives a matching of size $ \vert M \vert + 1$ and $\text{cost}(M’) = \text{cost}(M) + \text{cost}(P)$.
This gives a greedy algorithm where we repeatedly find $M$-alternating paths with minimum cost, and update the matching until we have a perfect matching.
Quickly prove that if $M$ is a matching, then
\[M\text{ is min cost perfect matching} \iff G_M \text{ does not contain a negative cycle}\]
where $G _ M$ is the residual graph for the flow formulation of the problem, annotated with costs / profits where if
- $(u, v) \notin M$, then cost $(u, v)$ in $G _ M$ with cost $c(u, v) = \text{cost}(u, v)$.
- $(u, v) \in M$, then cost $(v, u)$ in $G _ M$ with cost $c(v, u) = -\text{cost}(u, v)$ (i.e. unmatching pays us back, and note we have flipped)
- Edges from $s$ or $t$ to $G$ are flipped if the corresponding path is in the matching, and have cost $0$
and a negative cycle is a cycle $C$ such that the total cost along it is negative, i.e.
\[\sum_{e \in C} \text{cost}(u, v) < 0 \iff
\sum_{e \in C\setminus M} c(e) < \sum_{e \in C\cap M} c(e)\]
$M$ is min cost perfect matching $\implies$ $G _ M$ does not contain a negative cycle
Take the contrapositive, i.e. we aim to show that if $G _ M$ has a negative cycle, then $M$ is not min cost. Let $C$ be the edges in such a $M$-alternating cycle. Then by constructing the new matching $M’$ via $M \oplus C$, we see that $\text{cost}(M’) < \text{cost}(M)$, so $M$ is not min cost.
$G _ M$ does not contain a negative cycle $\implies$ $M$ is min cost perfect matching
Let $M’$ be an arbitrary perfect matching and look at edges $M \oplus M’$ (i.e. all edges in either $M$ or $M’$, but not in $M \cap M’$).
Then note that $M \oplus M’$ is a disjoint union of cycles $C _ 1, \cdots, C _ k$, since both are perfect matchings, each vertex either has degree $0$ or degree $2$.
Furthermore, this implies
\[\text{cost}(M') - \text{cost}(M) = \sum_{i = 1}^k \left( \text{cost}(C_i \cap M') - \text{cost}(C_i \cap M) \right)\](Why do we avoid double counting here? Because $C _ i$ are in $M \oplus M’$, which means we don’t have any edges in the intersection). Since we have no negative cycles, it must be the case that
\[\text{cost}(C_i \cap M') \ge \text{cost}(C_i \cap M)\]otherwise we would have a negative cycle in $G _ M$ (if you draw a picture of two different matchings, it becomes clear since we can augment along this path). Hence we see overall
\[\text{cost}(M') \ge \text{cost}(M)\]and since $M’$ is arbitrary, $M$ must be optimal.
Give the pseudocode for the algorithm that solves the minimum cost perfect matching in bipartite graph $G = (L \cup R, E)$.
MinCostPerfectMatching(G, c):
M = empty
initialise price[u] = 0 for all u in L \cup R
while M is not a perfect matching:
compute the reduced costs c_p
# these are modifications to the costs given by
# c_p(u, v) = price[u] + c(u, v) - price[v]
# note that it is c rather than c_p
P = min cost s-t path in G_M wrt costs c_p
# G_M is the residual graph, where if
# (x, y) \notin M, then cost (x, y) in G_M is c_p(x, y)
# (x, y) \in M, then cost (y, x) in G_M is -c_p(x, y)
# edges out of s and edges into t cost 0 and are removed
# if nodes they are connected to are in the matching
d = distances in G_M wrt costs c_p from s to all nodes in L \cup R
price[u] = price[u] + d[u] for all u in L \cup R
M = updated matching after augmenting along P
# augmenting along P means taking symmetric difference
return M
The pseudocode for the algorithm solving the min-cost matching in a bipartite graph is given by:
MinCostPerfectMatching(G, c):
M = empty
initialise price[u] = 0 for all u in L \cup R
while M is not a perfect matching:
compute the reduced costs c_p
# these are modifications to the costs given by
# c_p(u, v) = price[u] + c(u, v) - price[v]
# note that it is c rather than c_p
P = min cost s-t path in G_M wrt costs c_p
# G_M is the residual graph, where if
# (x, y) \notin M, then cost (x, y) in G_M is c_p(x, y)
# (x, y) \in M, then cost (y, x) in G_M is -c_p(x, y)
# edges out of s and edges into t cost 0 and are removed
# if nodes they are connected to are in the matching
d = distances in G_M wrt costs c_p from s to all nodes in L \cup R
price[u] = price[u] + d[u] for all u in L \cup R
M = updated matching after augmenting along P
return M
The invariant used to show this algorithm is correct is that
at every loop iteration, prices $p$ are compatible with $M$
Where we say prices are compatible with $M$ if
- $c _ p(u, v) \ge 0$ for all $(u, v) \notin M$
- $c _ p(u, v) = 0$ for all $(u, v) \in M$
Can you prove that this condition implies that $G _ M$ has no negative cycle with respect to both $c$ and $c _ p$?
MinCostPerfectMatching(G, c):
M = empty
initialise price[u] = 0 for all u in L \cup R
while M is not a perfect matching:
compute the reduced costs c_p
# these are modifications to the costs given by
# c_p(u, v) = price[u] + c(u, v) - price[v]
# note that it is c rather than c_p
P = min cost s-t path in G_M wrt costs c_p
# G_M is the residual graph, where if
# (x, y) \notin M, then cost (x, y) in G_M is c_p(x, y)
# (x, y) \in M, then cost (y, x) in G_M is -c_p(x, y)
# edges out of s and edges into t cost 0 and are removed
# if nodes they are connected to are in the matching
d = distances in G_M wrt costs c_p from s to all nodes in L \cup R
price[u] = price[u] + d[u] for all u in L \cup R
M = updated matching after augmenting along P
return M
at every loop iteration, prices $p$ are compatible with $M$
Suppose we have a cycle in $G _ M$
\[C = (l_1, r_1, l_2, r_2, \cdots, l_k, r_k, l_1)\]where $l _ i \in L$ and $r _ i \in R$. Then cost with respect to $c _ p$ is
\[\begin{aligned} & && \text{cost w.r.t. } c_p \\\\ & =&& c_p(l_1, r_1) + c_p (r_1, l_2) + \cdots + c_p (l_k, r_k) + c_p(r_k, l_1) \\\\ & =&& c_p(l_1, r_1) - c_p (l_2, r_1) + \cdots + c_p (l_k, r_k) - c_p(l_1, r_k) \\\\ & =&&c(l_1, r_1) - c(l_2, r_1) + \cdots + c(l_k, r_k) - c(l_1, r_k) \\\\ & =&& c(l_1, r_1) + c (r_1, l_2) + \cdots + c (l_k, r_k) + c(r_k, l_1) \\\\ & =&& \text{cost w.r.t. c} \end{aligned}\]so it also equals the cost with respect to $c$ (this is because the prices cancel out in the reduced prices).
By the assumption that the prices are compatible, the cost of forward edges is positive (since if it’s in the residual network, then it’s not in the original matching) and cost of backward edges is $0$ (similarly, these are the edges actually in the matching), then the cost of any cycle is non-negative.
The pseudocode for the algorithm solving the min-cost matching in a bipartite graph is given by:
MinCostPerfectMatching(G, c):
M = empty
initialise price[u] = 0 for all u in L \cup R
while M is not a perfect matching:
compute the reduced costs c_p
# these are modifications to the costs given by
# c_p(u, v) = price[u] + c(u, v) - price[v]
# note that it is c rather than c_p
P = min cost s-t path in G_M wrt costs c_p
# G_M is the residual graph, where if
# (x, y) \notin M, then cost (x, y) in G_M is c_p(x, y)
# (x, y) \in M, then cost (y, x) in G_M is -c_p(x, y)
# edges out of s and edges into t cost 0 and are removed
# if nodes they are connected to are in the matching
d = distances in G_M wrt costs c_p from s to all nodes in L \cup R
price[u] = price[u] + d[u] for all u in L \cup R
M = updated matching after augmenting along P
return M
The invariant used to show this algorithm is correct is that
at every loop iteration, prices $p$ are compatible with $M$
Where we say prices are compatible with $M$ if
- $c _ p(u, v) \ge 0$ for all $(u, v) \notin M$
- $c _ p(u, v) = 0$ for all $(u, v) \in M$
Assuming this invariant holds, what is the running time of this algorithm?
MinCostPerfectMatching(G, c):
M = empty
initialise price[u] = 0 for all u in L \cup R
while M is not a perfect matching:
compute the reduced costs c_p
# these are modifications to the costs given by
# c_p(u, v) = price[u] + c(u, v) - price[v]
# note that it is c rather than c_p
P = min cost s-t path in G_M wrt costs c_p
# G_M is the residual graph, where if
# (x, y) \notin M, then cost (x, y) in G_M is c_p(x, y)
# (x, y) \in M, then cost (y, x) in G_M is -c_p(x, y)
# edges out of s and edges into t cost 0 and are removed
# if nodes they are connected to are in the matching
d = distances in G_M wrt costs c_p from s to all nodes in L \cup R
price[u] = price[u] + d[u] for all u in L \cup R
M = updated matching after augmenting along P
return M
at every loop iteration, prices $p$ are compatible with $M$
- Each time, the size of the matching increases by $1$, so there are $O(n)$ iterations
- Distances can be found in time $O(m \log n)$ with Dijkstra’s (here we use the assumption that $c _ p(u, v) \ge 0$ in $G _ M$).
Hence running time is $O(mn \log n)$.
Suppose:
- $G = (L \cup R, E)$ is a bipartite graph
- $\mathcal M$ is a path-cycle matching, i.e. a set of paths and cycles that are mutually disjoint
- $t$ is the number of edges that appear in paths or cycles in $\mathcal M$
Quickly describe an algorithm that will find a path-cycle matching $\mathcal M$ of $G$ that uses a maximum number of edges.
Overall idea: a mutually disjoint set of paths and cycles should hint that we need to find a subgraph somehow where all vertices have degree $0$, $1$ or $2$. We can modify the stanard construction of a flow network from a bipartite graph to do this; rather than having capacity $1$ on all $s-L$ and $R-t$ edges, if we have capacity $2$, it means we can also identify cycles between $L$ and $R$. You can argue that after modifying the construction this way and finding a maximum flow, restricting our attention to the edges in $G$ whose flow is $1$ gives us the subgraph that we want. Then you can use the standard argument about how the max flow corresponds to the max matching since any path-cycle matching can give a flow on the original graph.
Let $t^\star$ be the max number of edges that are used by an optimal path/cycle matching.
Construct a flow network by adding a source vertex $s$ and a sink vertex $t$, direct each edge from left to right and set its capacity to $1$.
Add edges from $s$ to each vertex in $L$ with capacity 2 and add edges from each vertex in $T$ to $t$ with capacity $2$.
Let $f^\star$ be a maximum flow on this network, and let $M^\star$ be the induced subgraph in $G$ obtained by edges in $E$ whose flow in $f^\star$ is equal to $1$.
Then $M^\star$ has exactly $ \vert f^\star \vert $ edges (why? the flow must get from $s$ to $t$ somehow, so there must be a set of edges between $L$ and $R$ with flow $1$ such that they sum to $ \vert f^\star$ | ) and every vertex of $G$ has degree $0, 1$ or $2$ in $M^\star$. |
So $M^\star$ is a disjoint union of paths and cycles. Hence $t^\star \ge \vert f^\star$ | . |
In fact, it must be the case that $t^\star = \vert f^\star \vert $, since a path cycle matching $M$ that uses $t$ edges of $G$ yields a flow with value $t$.
Why? Let $E(M)$ be the edges that $M$ uses, then assign flow $1$ to each edge in $E(M)$, and flow $1$ to edges $(s, u)$ and $(u, t)$ if $u$ is the start/end vertex of a path in $M$, set the flow to $2$ otherwise.
Since $f^\star$ is a maximum flow, we can conclude therefore that $t^\star = \vert f^\star \vert $ and that $M^\star$ is a path/cycle matching of $G$ that achieves the optimum.
The pseudocode for the algorithm solving the min-cost matching in a bipartite graph is given by:
MinCostPerfectMatching(G, c):
M = empty
initialise price[u] = 0 for all u in L \cup R
while M is not a perfect matching:
compute the reduced costs c_p
# these are modifications to the costs given by
# c_p(u, v) = price[u] + c(u, v) - price[v]
# note that it is c rather than c_p
P = min cost s-t path in G_M wrt costs c_p
# G_M is the residual graph, where if
# (x, y) \notin M, then cost (x, y) in G_M is c_p(x, y)
# (x, y) \in M, then cost (y, x) in G_M is -c_p(x, y)
# edges out of s and edges into t cost 0 and are removed
# if nodes they are connected to are in the matching
d = distances in G_M wrt costs c_p from s to all nodes in L \cup R
price[u] = price[u] + d[u] for all u in L \cup R
M = updated matching after augmenting along P
return M
The invariant used to show this algorithm is correct is that
at every loop iteration, prices $p$ are compatible with $M$
Where we say prices are compatible with $M$ if
- $c _ p(u, v) \ge 0$ for all $(u, v) \notin M$
- $c _ p(u, v) = 0$ for all $(u, v) \in M$
Quickly prove that this invariant holds.
MinCostPerfectMatching(G, c):
M = empty
initialise price[u] = 0 for all u in L \cup R
while M is not a perfect matching:
compute the reduced costs c_p
# these are modifications to the costs given by
# c_p(u, v) = price[u] + c(u, v) - price[v]
# note that it is c rather than c_p
P = min cost s-t path in G_M wrt costs c_p
# G_M is the residual graph, where if
# (x, y) \notin M, then cost (x, y) in G_M is c_p(x, y)
# (x, y) \in M, then cost (y, x) in G_M is -c_p(x, y)
# edges out of s and edges into t cost 0 and are removed
# if nodes they are connected to are in the matching
d = distances in G_M wrt costs c_p from s to all nodes in L \cup R
price[u] = price[u] + d[u] for all u in L \cup R
M = updated matching after augmenting along P
return M
at every loop iteration, prices $p$ are compatible with $M$
Suppose $(M, p)$ is compatible, and consider $(M’, p’)$ after the loop execution.
We want to show that $c _ {p’}(u, v) \ge 0$ for all $(u, v) \notin M$ and $c _ {p’}(u, v) = 0$ if $(u, v) \in M’$.
Note that
\[c_{p'}(u, v) = (\text{price}[u] + d[u]) + c_p(u, v) - (\text{price}[v] + d[v])\]We break into three cases:
$(u, v) \notin M’ \cup M$: Then $(u, v) \in G _ M$ with cost $c _ p(u, v)$. Hence $d[v] \le d[u] + c _ p(u, v)$. So the reduced cost of this edge decreases. (Why is the inequality true?)
$(u, v) \in M$. Then in $G _ M$, $(v, u)$ is the only edge into $u$ and has cost $-c^p(u, v)$. So the shortest path from $s$ to $u$ goes $s \to v \to u$ and hence $d[u] = d[v] - c _ p(u, v)$. So the reduced cost of this edge remains the same.
$(u, v) \in M’ \setminus M$: Then $(u, v)$ is in $G _ M$ with cost $c _ p(u, v)$ and it is on the shortest path from $s$ to $t$. So shortest path from $s$ to $v$ goes $s \to u \to v$, so $d[v] = d[u] + c _ p(u, v)$. Then the reduced cost remains the same.