Notes - ADS HT24, Disjoint sets
Flashcards
Quickly describe how we can create a disjoint set data structure with linked lists, and describe the time complexity of each operation.
We need to implement $\mathbf{MakeSet}(x)$, $\mathbf{FindSet}(x)$, and $\mathbf{Union}(x, y)$. This can be done straightforwardly with a singly linked list:
- $\mathbf{MakeSet}(x)$, create a new linked list, time $O(1)$
- $\mathbf{FindSet}(x)$, follow head pointer, time $O(1)$
- $\mathbf{Union}(x, y)$, if $x$ is a member of $S$ and $y$ is a member of $T$, either append $T$ to $S$ or $S$ to $T$. This takes time $O( \vert S \vert )$ or $O( \vert T \vert )$ since we need to update the head pointers in the set we are appending.
Suppose we are implementing a disjoint set data structure with linked lists like so:
- $\mathbf{MakeSet}(x)$, create a new linked list, time $O(1)$
- $\mathbf{FindSet}(x)$, follow head pointer, time $O(1)$
- $\mathbf{Union}(x, y)$, if $x$ is a member of $S$ and $y$ is a member of $T$, either append $T$ to $S$ or $S$ to $T$. This takes time $O( \vert S \vert )$ or $O( \vert T \vert )$ since we need to update the head pointers in the set we are appending.
Furthermore, we use the weighted-union heiristic, where we always append the shorter list to the longer list. Can you state a bound on the time complexity of $m$ $\mathbf{MakeSet}$, $\mathbf{FindSet}$ or $\mathbf{Union}$ operations on $n$ items?
Suppose we are implementing a disjoint set data structure with linked lists like so:
- $\mathbf{MakeSet}(x)$, create a new linked list, time $O(1)$
- $\mathbf{FindSet}(x)$, follow head pointer, time $O(1)$
- $\mathbf{Union}(x, y)$, if $x$ is a member of $S$ and $y$ is a member of $T$, either append $T$ to $S$ or $S$ to $T$. This takes time $O( \vert S \vert )$ or $O( \vert T \vert )$ since we need to update the head pointers in the set we are appending.
Furthermore, we use the weighted-union heiristic, where we always append the shorter list to the longer list. Quickly prove that we have a bound on time complexity of $m$ $\mathbf{MakeSet}$, $\mathbf{FindSet}$ or $\mathbf{Union}$ operations on $n$ items given by
\[O(m + n \log n)\]
We use the aggregate method.
- $\mathbf{MakeSet}$ and $\mathbf{FindSet}$ take $O(1)$ time, so cost at most $O(m)$
- The total cost for $\mathbf{Union}$ is $O(\text{total number of head pointer updates})$.
Note that the head pointer of each element $x$ is updated at most $\log n$ times:
- When $x$’s head pointer is updated, $x$ is in the smaller list that is attached to the larger list
- So $x$’s list grows by a factor $\ge 2$
- Since there are $n$ elements, there can be at most $\log n$ updates.
Give the pseudocode for a disjoint-set forest using the union-by-rank heuristic and path-compression. Remember, disjoint sets have 3 operations:
- $\mathbf{MakeSet}(x)$ (simple)
- $\mathbf{FindSet}(x)$
- $\mathbf{Union}(x, y)$ (uses a subroutine)
MakeSet(x):
x.parent = x
x.rank = 0
FindSet(x):
if x != x.parent:
x.parent = FindSet(x.parent)
return x.parent
Union(x, y):
Link(FindSet(x), FindSet(y))
Link(x, y):
if x.rank > y.rank:
y.parent = x
else:
if x.rank == y.rank:
y.rank += 1
x.parent = y
What is the link-by-height heuristic for a disjoint-set forest?
When performing Union($x$, $y$), link the root of the shorter tree to the taller tree.
Suppose we are using the link-by-height heuristic on a disjoint-set forest. What bound does this then give on a sequence of $m$ $\mathbf{MakeSet}$, $\mathbf{FindSet}$ or $\mathbf{Union}$ operations on $n$ items?
Suppose we are using the link-by-height heuristic on a disjoint-set forest. Quickly prove that on a sequence of $m$ $\mathbf{MakeSet}$, $\mathbf{FindSet}$ or $\mathbf{Union}$ operations on $n$ items, we have time complexity
\[O(m \log n)\]
We show that:
\[\text{number of nodes in tree with height } k \ge 2^k\]We do this by induction on the number of $\mathbf{Union}$ operations.
If $k = 0$, this is trivial. Otherwise consider the root node $x$ with height $h[x] = k > 0$ and that $h[x]$ became equal to $k$ after the $i$-th $\mathbf{Union}$.
- Just before the $i$-th union, $h[x] = k - 1$. (By assumption, since we are assuming that this union increments the height by one)
- Suppose in the $i$-th $\mathbf{Union}$ we are linking a tree with root $y$ to $x$, again, $h[y] = k-1$ since we are assuming that this union increments the height by one, if it were smaller, then $h[x]$ would remain $k -1$.
- By the inductive hypothesis, both $x, y$ trees contained $\ge 2^{k-1}$ nodes. Hence after the union, $x$’s tree contains $\ge 2^{k-1} + 2^{k-1} = 2^k$ nodes.
Thus every root node in the disjoint forest has height $k \le \log n$. Then each $\mathbf{FindSet}(x)$ and $\mathbf{Union}(x, y)$ operation takes $O(\log n)$ time in the worst case.
Suppose we are using the link-by-height heuristic on a disjoint-set forest. Then on a sequence of $m$ $\mathbf{MakeSet}$, $\mathbf{FindSet}$ or $\mathbf{Union}$ operations on $n$ items, we have time complexity
\[O(m \log n)\]
How does this compare if we instead use path compression and link-by-rank (where the rank is updated identically to the height, and serves as an upper bound on the height)?
Then it takes time
\[O(m \log^\ast n)\]where $\log^\ast n$ is the iterated logarithm, defined by
\[\log^\ast (n) = \begin{cases} 0 &\text{if } n \le 1 \\\\ 1 + \log^\ast (\log n) &\text{otherwise} \end{cases}\]Suppose we are using the link-by-rank heuristic on a disjoint-set forest with path compression. We have the result that a sequence of $m$ $\mathbf{MakeSet}$, $\mathbf{FindSet}$ or $\mathbf{Union}$ operations on $n$ items has cost
\[O(m \log^\star n)\]
This is quite difficult to prove. Quickly prove that we still have at least the same
\[O(m \log n)\]
as in the link-by-height heuristic, by justifying that the rank of a tree serves as an upper bound to its height.
We induct on the number of disjoint set operations. For $\mathbf{MakeSet}$ and $\mathbf{Find}$:
- $\mathbf{MakeSet}$: immediate, since the rank and the height of a newly created tree are both $0$.
- $\mathbf{Find}$: the height of $T$ does not increase, and the rank stays the same.
$\mathbf{Union}$ is a little more involved. Consider $\mathbf{Union}(x’, y’)$ and where the roots of $x’$ and $y’$ are $x$ and $y$ respectively. If $x = y$, then $T _ x = T _ y$ and a $\mathbf{Union}$ operation can only decrease the height of the tree via path compression. So assume $x \ne y$.
Let $r(v)$ denote the rank of a node and let $h(T)$ denote the height of a tree. Then:
- If $r(x) > r(y)$, after the $\mathbf{Union}$ operation $y$ points to $x$, so the merged tree is rooted at $x$ whose rank is given by $r(x)$. Say the height of the merged tree is $h$. Then $h \le \max\{h(T _ x), h(T _ y) + 1\}$. Then by induction we have that $r(x) \ge h(T _ x)$ and $r(y) \ge h(T _ y)$, so $r(x) \ge h(T _ y) + 1$. It then folows that $r(x) \le h$.
- If $r(x) < r(y)$, the case is symmetric.
- If $r(x) = r(y)$, assume wlog that $y$ points to $x$, so the merged tree is rooted at $x$ whose rank now equals $r(x) + 1$. Then the height of the merged tree $\le \max\{h(T _ x), h(T _ y)+1\}$ as before. Then $r(x) = r(y) \ge h(T _ y)$, so $r(x) + 1 \ge h(T _ y) + 1$ and we are done.
Can you state a theorem that bounds the total run-time for $m$ operations of $\mathbf{Union}$ or $\mathbf{FindSet}$ in a disjoint-set forest with path compression?
Where $\log^\ast n$ is the iterated logarithm, defined by
\[\log^\ast (n) = \begin{cases} 0 &\text{if } n \le 1 \\\\ 1 + \log^\ast (\log n) &\text{otherwise} \end{cases}\]Can you define the iterated logarithm $\log^\ast n$, giving the first $2^{65536}$ values?
Then
$n$ | $\log^\ast n$ |
---|---|
$1$ | 0 |
$2$ | 1 |
$[3,4]$ | 2 |
$[5, \cdots, 16]$ | 3 |
$[17, \cdots, 65536]$ | 4 |
$[65537, \cdots, 2^{65536}]$ | 5 |
Call a binary tree $T$ well-balanced if, for every non-leaf node, the height of its left subtree and the height of its right subtree differ by at most two.
Quickly show that the height of a well -balanced tree with $n$ nodes is at most $O(\log n)$.
As is standard with these types of questions, we show instead that a well-balanced tree with height $h$ has at least $2^{\Omega(h)}$ nodes, or equivalently that
\[\text{number of nodes in well-balanced tree of height }h \ge c^h\]for some $c > 1$. The constant here is actually quite difficult to work out, so by examining each of the cases we can instead come up with constraints on the constant which then we show exists at the end.
To do this, we use induction on the height of the tree $h$.
We have three base cases, $h = 0, 1, 2$ corresponding to the fact that the heights of subtrees can differ by at least $2$. The case where $h = 0$ means we need $1 \ge c^0$, which actually says nothing. The cases where $h = 1$ and $h = 2$ means we need $2 \ge c$ and $2 \ge c^2$, since any tree of height $\ge 1$ must have at least two nodes.
Now suppose we have a well-balanced tree with height $h \ge 3$. Let $x$ be the root of $T$. Both subtrees of $x$ have height at most $h-1$, and at least one of them has height $h-1$. By the well-balanced property, the other one must have height at least $h - 3$, so it follows from the inductive hypothesis that the total number of nodes in $T$ is at least $1 + c^{h-1} + c^{h-3}$. Thus it suffices to show that
\[1 + c^{h-1} + c^{h-3} \ge c^h\]for all $h \ge 3$. To do this, we consider the function
\[f(h) := 1 + c^{h - 1} + c^{h-3} - c^h\]and want to show that $f(h) \ge 0$. Now for $h = 3$, we have
\[f(3) = 2 + c^2 - c^3\]so by adding $2 + c^2 - c^3$ we can add this as a constraint. Then we show $f$ is increasing: the derivative is
\[\begin{aligned} f'(h) &= (c^{h-1} + c^{h-3} - c^h) \ln c \\\\ &= c^h \left(\frac 1 c + \frac 1 {c^3} - 1\right)\ln c \\\ \end{aligned}\]To make this true, we require
\[\frac 1 c + \frac 1 {c^3} - 1 \ge 0\]So overall we are asking the question: does there exist $c$ such that
- $c \ge 1$
- $c \le 2$
- $c^2 \le 2$
- $c^2 + c \ge c$
- $1/c + 1/c^3 \ge 1$
And the answer is… yes. E.g. take $c = 1.01$.