Notes - DAA HT23, Depth-first search


In the DFS algorithm, the notion of a “colour” is used. What are the three colours of a node, and what do they each represent?


  • White, a node has not been explored yet.
  • Gray, a node has been visited but its adjacency list has not been fully explored.
  • Black, every node in its adjacency list has been fully explored.

The output of DFS can be seen as three arrays, $d, f, \pi$, all indexed by vertices. What does each represent?


  • $d[v]$, the discovery time of a vertex
  • $f[v]$, the finishing time of a vertex
  • $\pi[v]$, the predecessor of a vertex

What is the time complexity of DFS, in terms of $(V, E)$?


\[\Theta(|V| + |E|)\]

The output of DFS includes an array indexed by vertices $\pi[v]$ of “backpointers”, i.e. the predecessor of a given node in the exploration process. This can be used to define a set of edges $E _ \pi$ to make a graph $G _ \pi = (V, E _ \pi)$ called the “depth-first search forest”. How is $E _ \pi$ defined in this case?


\[E_\pi = \\{ (\pi[v], v) : v\in V, \pi[v] \ne \text{nil} \\}\]

The output of DFS includes an array indexed by vertices $\pi[v]$ of “backpointers”, i.e. the predecessor of a given node in the exploration process. This can be used to define a set of edges $E _ \pi$ to make a graph $G _ \pi = (V, E _ \pi)$ called the “depth-first search forest”. What is true about the colour of each node $u$ and $v$ when $(u, v)$ is in $E _ \pi$?


When $(u, v)$ is explored, $u$ is grey and $v$ is white.

Do the outputs of DFS like the finishing times $f$ and the backpointers $\pi$ depend on the order in which the nodes are visited?


Yes.

Can you state the parenthesis theorem about the discovery and finishing times for vertices $u$ and $v$ in a graph explored by depth first search?


Exactly one of the following holds:

  • $d[v] < d[u] < f[u] < f[v]$ and $u$ is a descendant of $v$.
  • $d[u] < d[v] < f[v] < f[u]$ and $v$ is a descendant of $u$.
  • $d[u] < f[u] < d[v] < f[v]$ or $d[v] < f[v] < d[u] < f[u]$ and neither are a descendant of the other.

The parenthesis theorem states that for all $u, v$ in a graph explored by DFS, exactly one of the following holds:

  • $d[v] < d[u] < f[u] < f[v]$ and $u$ is a descendant of $v$.
  • $d[u] < d[v] < f[v] < f[u]$ and $v$ is a descendant of $u$.
  • $d[u] < f[u] < d[v] < f[v]$ or $d[v] < f[v] < d[u] < f[u]$ and neither are a descendant of the other.

Why is this called the parenthesis theorem?


Let

  • $( _ u$ correspond to $d[u]$,
  • ${} _ u)$ correspond to $f[u]$,
  • $[ _ v$ correspond to $d[v]$,
  • ${} _ v]$ correspond to $f[v]$

Then the following are valid

\[\begin{aligned} (_u \, {}_u) [_v \, {}_v] \\\\ (_u \, [_v \, {}_v] \, {}_u) \\\\ \end{aligned}\]

but

\[(_u \, [_v \, {}_u) \, {}_v]\]

is not.

The parenthesis theorem states that for all $u, v$ in a graph explored by DFS, exactly one of the following holds:

  • $d[v] < d[u] < f[u] < f[v]$ and $u$ is a descendant of $v$.
  • $d[u] < d[v] < f[v] < f[u]$ and $v$ is a descendant of $u$.
  • $d[u] < f[u] < d[v] < f[v]$ or $d[v] < f[v] < d[u] < f[u]$ and neither are a descendant of the other.

What does this let you say in general about when $v$ is a descendant of $u$?


$v$ is a descendant of $u$ if and only if $d[u] < d[v] < f[v] < f[u]$.

Running depth-first search gives three outputs,

  • $d[v]$, an array of discovery times,
  • $f[v]$, an array of finishing times,
  • $\pi[v]$, an array of backpointers.

The array $\pi$ gives rise to a “depth-first search forest”. But there is also an implicit characterisation of the different edges of the original graph, as “tree” edges, “back” edges, “forward” edges and “cross” edges. What does each represent?


  • Tree edge; edges of the DFS forest
  • Back edge, edge from a node in a DFS forest to a previous (ancestor) node in the same DFS forest
  • Forward edge, edge from a node in a DFS forest to a successive (non-child descendant) node in the DFS tree.
  • Cross edge, edge from a node to neither a descendant or a successor in the tree.

Running depth-first search gives three outputs,

  • $d[v]$, an array of discovery times,
  • $f[v]$, an array of finishing times,
  • $\pi[v]$, an array of backpointers.

The array $\pi$ gives rise to a “depth-first search forest”. But there is also an implicit characterisation of the different edges of the original graph, as “tree” edges, “back” edges, “forward” edges and “cross” edges:

  • Tree edge; edges of the DFS forest
  • Back edge, edge from a node in a DFS forest to a previous (ancestor) node in the same DFS forest
  • Forward edge, edge from a node in a DFS forest to a successive (non-child descendant) node in the DFS tree.
  • Cross edge, edge from a node to neither a descendant or a successor in the tree.

When does each occur in terms of the starting a finishing times for $(u, v)$?


  • Tree or forward edge, $d[u] < d[v] < f[v] < f[u]$
  • Back edge, $d[v] < d[u] < f[u] < f[v]$
  • Cross edge, $d[v] < f[v] < d[u] < f[u]$

Can you give the characterisation of a tree or forward edge $(u, v)$ in terms of its discovery and finishing times?


\[d_u < d_v < f_v < f_u\]

Can you give the characterisation of a back edge $(u, v)$ in terms of its discovery and finishing times?


\[d_v < d_u < f_u < f_v\]

Can you give the characterisation of a cross edge $(u, v)$ in terms of its discovery and finishing times?


\[d_v < f_v < d_u < f_u\]

What lemma gives rise the linear time DFS-based algorithm for detecting if a graph has a cycle?


A directed graph has a cycle if and only if its DFS has a back edge.

What is a topological sorting of a DAG?


An ordering of the vertices so that for every directed edge $(u, v)$, $u$ comes before $v$ in the ordering.

Why couldn’t you have the topological sorting of a graph that isn’t a DAG?


Because then it would be impossible for some nodes to be before their predecessors.

What’s a very simple algorithm to compute a topological ordering of a directed graph $(V, E)$?


Run DFS to find finishing times, and then output nodes in the reverse order of finishing time.

What type of data structure is used when implementing depth-first search non-recursively?


A stack.

Can you give pseudocode for DFS on a graph $G$?


DFS(G):
	for every vertex v in G:
		v.color = white
		v.pi = null

	time = 0

	for every vertex v in G:
		if v.color = white:
			DFS-VISIT(G, v)

DFS-VISIT(G, v):
	v.color = grey
	v.d = time
	time += 1

	for every vertex u adjacent to v:
		if u.color = white:
			u.pi = v
			DFS-VISIT(G, u)

	v.color = black
	v.f = time
	time += 1

Can you quickly prove that a directed graph will be cyclic if and only if the DFS contains a back edge?


  • Back edge implies cyclic: Suppose that DFS produces a back edge $(u, v)$. Then $v$ must be an ancestor of $u$ in the DFS forest. Therefore the simple path formed by $u$ to $v$ in the DFS forest and the edge $(u, v)$ forms a cycle.
  • Cycle implies back edge: Suppose $c$ is a cycle and $v$ is the vertex to be discovered in $c$. Let $(u, v)$ be the preceding edge in the cycle. Then at time $v.d$ the vertices of $c$ form a path of white vertices from $v$ to $u$. Then by the white path theorem, $u$ becomes a descendant of $v$ in the DFS forest. Therefore $(u, v)$ is a back edge.

Can you state the white-path theorem about the depth-first search?


In a DFS forest, $u$ has $v$ as a descendant if and only if at time $u.d$ there is a path from $u$ to $v$ consisting of white vertices.

Can you justify why the DFS topological sort algorithm is correct?


Need to show that for any $u, v \in V$ where $u$ and $v$ are connected by an edge, $f _ v < f _ u$. Suppose we reach edge $u$. $v$ must either be white or black, otherwise $(u, v)$ would be a back edge, and there can be no back edges in a DAG. If it is white, then it comes a descendant of $u$ and so $f _ v < f _ u$. If it is black, it has already been explored, and so it must have a finishing time smaller than $u$.

What is the difference between a cycle and a simple cycle?


A simple cycle cannot repeat vertices.




Related posts