Notes - Models of Computation MT23, Regular expressions
Flashcards
Define regular expressions inductively, describing both all regular expressions and the language recognised by regular expressions.
- The constants $\epsilon$ and $\varnothing$ are regular expressions.
- $L(\epsilon) := \{\epsilon\}$
- $L(\varnothing) := \varnothing$
- For $a \in \Sigma$, $a$ is a regular expression.
- $L(a) := \{a\}$
- If $E$ and $F$ are regular expressions, then so are $(E + F)$, $(E \cdot F)$ and $(E^\star)$.
- $L((E + F)) := L(E) \cup L(F)$
- $L((E \cdot F)) := L(E) \cdot L(F)$
- $L((E^\star)) := (L(E))^\star$
Can you state Kleene’s theorem?
Let $L \subseteq \Sigma^\star$. Then the following are equivalent:
- $L$ is regular
- $L$ is denoted by some regular expression $E$.
Quickly prove the “easy” part of Kleene’s theorem that establishes that if $L$ is denoted by some regular expression $E$, then $L$ is regular.
- Base cases: $E = \epsilon$, $E = a \in \Sigma$. We have DFAs that recognise both of these.
- Induction: Take two regular expressions $E$ and $F$. As regular languages are closed under concatenation, union and star, we have DFAs that recognise $E + F$, $E \cdot F$ and $E^\ast$.
When proving the hard part of Kleene’s theorem, i.e. that if $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$, you define $E^X _ {q,q’}$ and induct on the size of $X$. What does $E^X _ {q, q’}$ represent in this case?
- If $L$ is regular, then there exists some NFA $M = (Q, \Sigma, \delta, q _ 0, F)$ that recognises $L$.
- Then $X \subseteq Q$ and $E^X _ {q, q’}$ is the regular expression which expresses all strings $w$ such that there is a path from $q$ to $q’$ in $M$ labelled by $w$ (i.e. $q \xRightarrow[]{w} q’$) such that all intermediate states along the path lie in $X$.
When proving the hard part of Kleene’s theorem, i.e. that if $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$, you define $E^X _ {q,q’}$ and induct on the size of $X$. Specifically, if $L$ is regular, then there exists some NFA $M = (Q, \Sigma, \delta, q _ 0, F)$. Then $X \subseteq Q$ and $E^X _ {q, q’}$ is the regular expression which expresses all strings $w$ such that there is a path from $q$ to $q’$ in $M$ labelled by $w$ (i.e. $q \xRightarrow[]{w} q’$) such that all intermediate states along the path lie in $X$. How can you then use this construction to create a regular expression that denotes $L$?
For two regular expressions $E$ and $F$, what does it mean to say $E \le F$?
or equivalently,
\[E + F \equiv F\]Quickly prove the hard part of Kleene’s theorem:
If $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$.
If $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$.
Let $L$ be given by the DFA $(Q, \Sigma, q _ 0, \delta, F)$. Then for all $X \subseteq Q$, $q, q’ \in Q$, we aim to construct a regular expression $E^X _ {q, q’}$ s.t.
\[L(E_{q, q'}^X) = \\{s \in \Sigma^\star \mid q \Longrightarrow q', \text{ all intermediate states in } X \\}\]If we can do this, then we can find a regular expression $E$ s.t. $L(E) = L$ by considering
\[E = \sum_{q \in F} E^Q_{q_0, q}\]We construct $E^X _ {q, q’}$ by induction. For the base case, $X = \emptyset$ and define $E^\emptyset _ {q, q’}$ in two cases
\[E^\emptyset_{q, q'} = \begin{cases} \sum_{a \text{ s.t. } \delta(q, a) = q'} a &\text{ if }q \ne q' \\\\ \epsilon + \sum_{a \text{ s.t. } \delta(q, a) = q'} a &\text{ if }q = q' \\\\ \end{cases}\]where we interpret the empty sum as $\emptyset$.
For the inductive step, suppose $X$ is nonempty. Choose $r \in X$ as a “seperating state” and consider the ways we can get from $q$ to $q’$ in two different cases:
- We never visit $r$, the seperating state.
- We
- Visit $r$ for the first time, travelling from $q$ to $r$
- Followed by zero or more repetions of loops from $r$ back to $r$ (not visiting $r$ in the middle)
- Visit $r$ for the last time, travelling from $r$ to $q’$
Hence we see
\[E^X_ {q, q'} = E^{X\setminus \\{r\\} }_ {q, q'} + \left( E^{X\setminus \\{r\\} }_ {q, r} \right) \cdot\left(E^{X\setminus \\{r\\} }_ {r, r}\right)^\star \cdot \left(E^{X\setminus \\{r\\} }_ {r, q'}\right)\]and each of these have smaller sets of intermediate states, so we are done.
When proving that
If $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$.
We aim to construct for all $X \subseteq Q$, $q, q’ \in Q$ a regular expression $E^X _ {q, q’}$ s.t.
\[L(E_{q, q'}^X) = \\{s \in \Sigma^\star \mid q \Longrightarrow q', \text{ all intermediate states in } X \\}\]
Suppose $ \vert X \vert > 1$ and that we can come up with regular expressions for any set of intermediate states of size less than $ \vert X \vert $. How can you break down $E^X _ {q, q’}$ into a sum of smaller regular expressions?
If $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$.
Choose $r \in X$ as a “seperating state” and consider the ways we can get from $q$ to $q’$ in two different cases:
\[E^X_{q, q'} = E^{X\setminus \\{r\\} }_{q, q'} + \left( E^{X\setminus \\{r\\} }_{q, r} \right) \cdot\left(E^{X\setminus \\{r\\} }_{r, r}\right)^\star \cdot \left(E^{X\setminus \\{r\\} }_{r, q'}\right)\]Write a regular expression for the language
\[L = \\{w \in \\{a,b\\}^\star \mid w \text{ has at least two occurences of } aa\\}\]
and highlight why this is actually slightly ambigious question.
Slightly ambigious because it doesn’t specify whether to treat $aaa$ as one occurence or two, but exam questions count this as two.