Models of Computation MT23, Regular expressions


Flashcards

Define regular expressions inductively, describing both all regular expressions and the language recognised by regular expressions.


  • The constants $\epsilon$ and $\varnothing$ are regular expressions.
    • $L(\epsilon) := \{\epsilon\}$
    • $L(\varnothing) := \varnothing$
  • For $a \in \Sigma$, $a$ is a regular expression.
    • $L(a) := \{a\}$
  • If $E$ and $F$ are regular expressions, then so are $(E + F)$, $(E \cdot F)$ and $(E^\star)$.
    • $L((E + F)) := L(E) \cup L(F)$
    • $L((E \cdot F)) := L(E) \cdot L(F)$
    • $L((E^\star)) := (L(E))^\star$

Can you state Kleene’s theorem?


Let $L \subseteq \Sigma^\star$. Then the following are equivalent:

  1. $L$ is regular
  2. $L$ is denoted by some regular expression $E$.

Quickly prove the “easy” part of Kleene’s theorem that establishes that if $L$ is denoted by some regular expression $E$, then $L$ is regular.


  • Base cases: $E = \epsilon$, $E = a \in \Sigma$. We have DFAs that recognise both of these.
  • Induction: Take two regular expressions $E$ and $F$. As regular languages are closed under concatenation, union and star, we have DFAs that recognise $E + F$, $E \cdot F$ and $E^\ast$.

When proving the hard part of Kleene’s theorem, i.e. that if $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$, you define $E^X _ {q,q’}$ and induct on the size of $X$. What does $E^X _ {q, q’}$ represent in this case?


  • If $L$ is regular, then there exists some NFA $M = (Q, \Sigma, \delta, q _ 0, F)$ that recognises $L$.
  • Then $X \subseteq Q$ and $E^X _ {q, q’}$ is the regular expression which expresses all strings $w$ such that there is a path from $q$ to $q’$ in $M$ labelled by $w$ (i.e. $q \xRightarrow[]{w} q’$) such that all intermediate states along the path lie in $X$.

When proving the hard part of Kleene’s theorem, i.e. that if $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$, you define $E^X _ {q,q’}$ and induct on the size of $X$. Specifically, if $L$ is regular, then there exists some NFA $M = (Q, \Sigma, \delta, q _ 0, F)$. Then $X \subseteq Q$ and $E^X _ {q, q’}$ is the regular expression which expresses all strings $w$ such that there is a path from $q$ to $q’$ in $M$ labelled by $w$ (i.e. $q \xRightarrow[]{w} q’$) such that all intermediate states along the path lie in $X$. How can you then use this construction to create a regular expression that denotes $L$?


\[\sum_{f \in F} E^Q_{q_0, f}\]

For two regular expressions $E$ and $F$, what does it mean to say $E \le F$?


\[L(E) \subseteq L(F)\]

or equivalently,

\[E + F \equiv F\]

Quickly prove the hard part of Kleene’s theorem:

If $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$.


Let $L$ be given by the DFA $(Q, \Sigma, q _ 0, \delta, F)$. Then for all $X \subseteq Q$, $q, q’ \in Q$, we aim to construct a regular expression $E^X _ {q, q’}$ s.t.

\[L(E_{q, q'}^X) = \\{s \in \Sigma^\star \mid q \Longrightarrow q', \text{ all intermediate states in } X \\}\]

If we can do this, then we can find a regular expression $E$ s.t. $L(E) = L$ by considering

\[E = \sum_{q \in F} E^Q_{q_0, q}\]

We construct $E^X _ {q, q’}$ by induction. For the base case, $X = \emptyset$ and define $E^\emptyset _ {q, q’}$ in two cases

\[E^\emptyset_{q, q'} = \begin{cases} \sum_{a \text{ s.t. } \delta(q, a) = q'} a &\text{ if }q \ne q' \\\\ \epsilon + \sum_{a \text{ s.t. } \delta(q, a) = q'} a &\text{ if }q = q' \\\\ \end{cases}\]

where we interpret the empty sum as $\emptyset$.

For the inductive step, suppose $X$ is nonempty. Choose $r \in X$ as a “seperating state” and consider the ways we can get from $q$ to $q’$ in two different cases:

  1. We never visit $r$, the seperating state.
  2. We
    • Visit $r$ for the first time, travelling from $q$ to $r$
    • Followed by zero or more repetions of loops from $r$ back to $r$ (not visiting $r$ in the middle)
    • Visit $r$ for the last time, travelling from $r$ to $q’$

Hence we see

\[E^X_ {q, q'} = E^{X\setminus \\{r\\} }_ {q, q'} + \left( E^{X\setminus \\{r\\} }_ {q, r} \right) \cdot\left(E^{X\setminus \\{r\\} }_ {r, r}\right)^\star \cdot \left(E^{X\setminus \\{r\\} }_ {r, q'}\right)\]

and each of these have smaller sets of intermediate states, so we are done.

When proving that

If $L$ is a regular language, then there exists a regular expression $E$ that denotes $L$.

We aim to construct for all $X \subseteq Q$, $q, q’ \in Q$ a regular expression $E^X _ {q, q’}$ s.t.

\[L(E_{q, q'}^X) = \\{s \in \Sigma^\star \mid q \Longrightarrow q', \text{ all intermediate states in } X \\}\]

Suppose $ \vert X \vert > 1$ and that we can come up with regular expressions for any set of intermediate states of size less than $ \vert X \vert $. How can you break down $E^X _ {q, q’}$ into a sum of smaller regular expressions?


Choose $r \in X$ as a “seperating state” and consider the ways we can get from $q$ to $q’$ in two different cases:

\[E^X_{q, q'} = E^{X\setminus \\{r\\} }_{q, q'} + \left( E^{X\setminus \\{r\\} }_{q, r} \right) \cdot\left(E^{X\setminus \\{r\\} }_{r, r}\right)^\star \cdot \left(E^{X\setminus \\{r\\} }_{r, q'}\right)\]

Write a regular expression for the language

\[L = \\{w \in \\{a,b\\}^\star \mid w \text{ has at least two occurences of } aa\\}\]

and highlight why this is actually slightly ambigious question.


\[E = (a+b)^\star aa (a + b)^\star aa (a + b) + (a+b)^\star aaa(a+b)^\star\]

Slightly ambigious because it doesn’t specify whether to treat $aaa$ as one occurence or two, but exam questions count this as two.

Give a regular expression for the language over ${0, 1}$ containing no adjacent $1$s.


\[(0 + 10)^\star (\epsilon + 1)\]

Give a regular expression for the language over ${0, 1}$ containg no occurences of $00$.


\[(1 + 01 + 001)^\star (\epsilon + 0 + 00)\]



Related posts