Languages and Machines#


Strings#

Alphabet and Symbol

An alphabet is a non-empty, finite set and the elements of an alphabet are called the symbols of the alphabet.

We use capital Greek letters to denote alphabets.

String

A string (or word) over an alphabet is a finite sequence of symbols from that alphabet.

For example, \(01001\) is a string over the alphabet \(\Sigma = \{ 0, 1 \}\) and \(abracadabra\) is a word over the English alphabet \(\Sigma = \{ a, \dots, z \}\).

Sets of strings

The set of all strings of length \(n\) over the alphabet \(\Sigma\) is denoted \(\Sigma^n\).

The set of all strings over the alphabet \(\Sigma\) including the empty string \(\varepsilon\) is denoted \(\Sigma^* := \bigcup_{i \in \mathbb{N}} \Sigma^i\).

String Length

If \(w\) is a string over \(\Sigma\) then \(|w|\) is called the length of \(w\) and is defined as the number of symbols that it contains.

The string of length zero is called the empty string and is written \(\varepsilon\).

If \(w\) has length \(n\) we can write \(w = w_1 w_2 \dots w_n\) where each \(w_i \in \Sigma\).

String Reversal

The reverse of \(w\) is the string obtained by writing \(w\) in the opposite order \(w_n w_{n - 1} \dots w_1\) and is denoted \(w^\mathcal{R}\).

Substring

String \(z\) is a substring of \(w\) if \(z\) appears consecutively within \(w\). For example, \(cad\) is a substring of \(abracadabra\).

String Concatenation

Given string \(x\) of length \(m\) and string \(y\) of length \(n\) the concatenation of \(x\) and \(y\) is denoted \(xy\) and is defined as the string obtained by appending \(y\) to the end of \(x\) as in \(x_1 \dots x_m y_1 \dots y_n\).

A string concatenated with itself \(k\) times is denoted \(x^k = \underbrace{x \dots x}_{k}\)

Lexicographic Order

The lexicographic order of strings is just dictionary order.

For example, the lexicographic order of all strings over the alphabet \(\{ 0, 1 \}\) is \(\{ \varepsilon, 0, 00, 000, \dots, 001, 01, 010, \dots, 011, 1, 10, 100, \dots, 101, 11, 110, \dots, 111, \dots \}\).

Shortlex Order

The shortlex order (or string order) of strings is a modified lexicographic order in which shorter strings precede longer strings.

For example, the shortlex order of all strings over the alphabet \(\{ 0, 1 \}\) is \(\{ \varepsilon, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, \dots \}\).

Prefix and Proper Prefix

String \(x\) is a prefix of string \(y\) if there is a string \(z\) s.t. \(xz = y\).

String \(x\) is a proper prefix of string \(y\) if there is a string \(z\) s.t. \(xz = y\) and \(x \ne y\) (i.e., \(z \ne \varepsilon\)).


Languages#

Language

A language \(L\) is a set of strings over an alphabet \(\Sigma\).

Note that \(L \subseteq \Sigma^*\).

The language with no words is called the empty language and is denoted \(\varnothing\).

A language is prefix free if no element is a proper prefix of another element.


Regular Languages#

A language is called regular if some finite automaton regonizes it.

Regular Operations#

Regular operations are operations defined on regular languages.

Union#

Let \(A\) and \(B\) be languages. The regular operation union is defined as

\[A \cup B = \{ x : x \in A \,\,\text{or}\,\, x \in B \}\]
Concatenation#

Let \(A\) and \(B\) be languages. The regular operation concatenation is defined as

\[A \circ B = \{ xy : x \in A \,\,\text{and}\,\, y \in B \}\]
Star#

Let \(A\) be a language. The regular operation star is defined as

\[A^* = \{ x_1 x_2 \dots x_k : k \ge 0, x_i \in A \}\]

[Example]

\( \begin{aligned} A &= \{ 0, 1 \} \\ B &= \{ 0, 1 \} \\ A \cup B &= \{ 0, 1 \} \\ A \circ B &= \{ 00, 01, 10, 11 \} \\ A^* = B^* &= \{ \varepsilon, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, \dots \} \\ \end {aligned} \)

[Example]

\( \begin{aligned} A &= \{ \text{bat}, \text{cat} \} \\ B &= \{ \text{nat}, \text{rat} \} \\ A \cup B &= \{ \text{bat}, \text{cat}, \text{nat}, \text{rat} \} \\ A \circ B &= \{ \text{batnat}, \text{batrat}, \text{catnat}, \text{catrat} \} \\ A^* &= \{ \varepsilon, \text{bat}, \text{cat}, \text{batbat}, \text{batcat}, \text{catbat}, \text{catcat}, \text{batbatbat}, \text{batbatcat}, \text{batcatbat}, \text{batcatcat}, \dots \} \\ \end {aligned} \)


Properties of Regular Languages#


Closure under Regular Union#

THEOREM [Sipser pp. 45, 1.45 pp. 59]

The class of regular languages is closed under the union operation.

In other words, if \(A\) and \(B\) are regular languages then \(A \cup B\) is a regular language.

Proof Idea

Since \(A\) and \(B\) are regular there are finite automata \(M_1\) and \(M_2\) that recognize them. To show that \(A \cup B\) is regular is to show that some finite automaton \(M\) constructed from \(M_1\) and \(M_2\) recognizes it.

\(M\) is constructed from \(M_1\) and \(M_2\) in the sense that it simulates them.

\(M\) accepts its input whenever \(M_1\) or \(M_2\) accepts their input.

If \(M_1\) has \(k_1\) states and \(M_2\) has \(k_2\) states then the number of states in \(M\) is \(k_1 \times k_2\).

The transitions of \(M\) go from pair to pair and update the current state for both \(M_1\) and \(M_2\).

The accept states of \(M\) are those pairs wherein either \(M_1\) or \(M_2\) is in an accept state.

\(Proof \,\, by \,\, construction\) via deterministic simulation

Let \(M_1 = (Q_1, \Sigma_1, \delta_1, q_1, F_1)\) recognize \(A_1\) and let \(M_2 = (Q_2, \Sigma_2, \delta_2, q_2, F_2)\) recognize \(A_2\).
Construct \(M = (Q, \Sigma, \delta, q_0, F)\) to recognize \(A_1 \cup A_2\).

\[\begin{split} \begin{align*} Q &= \{ (r_1, r_2) : r_1 \in Q_1 \,\,\text{and}\,\, r_2 \in Q_2 \} \\ &= Q_1 \times Q_2 \\ \end {align*} \end{split}\]

\(\Sigma = \Sigma_1 \cup \Sigma_2\)
For each \((r_1, r_2) \in Q\) and each \(a \in \Sigma\) let \(\delta((r_1, r_2), a) = (\delta_1 (r_1, a), \delta_2 (r_2, a))\)
\(q_0 = (q_1, q_2)\)
\(F\) is the set of pairs in which either member is an accept state of \(M_1\) or \(M_2\).

\[\begin{split} \begin{align*} F &= \{ (r_1, r_2) : r_1 \in F_1 \,\,\text{or}\,\, r_2 \in F_2 \} \\ &= (F_1 \times Q_2) \cup (Q_1 \times F_2) \\ \end {align*} \end{split}\]

\(\blacksquare\)

Proof Idea

We have two regular languages \(A_1, A_2\) and we want to show that \(A_1 \cup A_2\) is a regular language.

The idea is to take two NFAs \(N_1, N_2\) for regular languages \(A_1, A_2\) and combine them into a new NFA \(N\) to recognize \(A_1 \cup A_2\).

This is achieved in the following way.

\(N\) accepts its input if either \(N_1\) or \(N_2\) accepts the input. \(N\) has a start state that branches to the start states of the old machines with \(\varepsilon\) arrows. \(N\) non-deterministically guesses which of the two machines accepts the input: if one of them accepts the input then \(N\) accepts it too.

\(Proof\) via non-deterministic simulation

Let NFA \(N_1 = (Q_1, \Sigma, \delta_1, q_1, F_1)\) recognize \(A_1\) and let NFA \(N_2 = (Q_2, \Sigma, \delta_2, q_2, F_2)\) recognize \(A_2\).

Construct NFA \(N = (Q, \Sigma, \delta, q_0, F)\) to recognize \(A_1 \cup A_2\).

\(1\quad\) \(Q = \{ q_0 \} \cup Q_1 \cup Q_2\) \(\quad\) The states of \(N\) are all the states of \(N_1\) and \(N_2\) with the addition of a new start state \(q_0\).
\(2\quad\) The state \(q_0\) is the start state of \(N\).
\(3\quad\) The set of accept states of \(N\) are all the accept states of \(N_1\) and \(N_2\). That way, \(N\) accepts if either \(N_1\) accepts or \(N_2\) accepts.
\(4\quad\) Define \(\delta\) so that for any \(q \in Q\) and any \(a \in \Sigma_\varepsilon\)

\[\begin{split} \delta(q, a) = \begin{cases} \delta_1 (q, a) & q \in Q_1 \\ \delta_2 (q, a) & q \in Q_2 \\ \{ q_1, q_2 \} & q = q_0 \,\,\text{and}\,\, a = \varepsilon \\ \varepsilon & q = q_0 \,\,\text{and}\,\, a \ne \varepsilon \\ \end {cases} \end{split}\]

\(\blacksquare\)


Closure under Regular Intersection#

\(Theorem\)

If \(A_1\) and \(A_2\) are regular languages then \(A_1 \cap A_2\) is a regular language.

\(Proof \,\, by \,\, construction\)

\(F\) is the set of pairs in which both members are accept states of \(M_1\) and \(M_2\) respectively.

\[\begin{split} \begin{align*} F &= \{ (r_1, r_2) : r_1 \in F_1 \,\,\text{and}\,\, r_2 \in F_2 \} \\ &= F_1 \times F_2 \\ \end {align*} \end{split}\]

\(\blacksquare\)


Closure under Regular Concatenation#

THEOREM [Sipser 1.47 pp. 60]

The class of regular languages is closed under the concatenation operation.

In other words, if \(A_1\) and \(A_2\) are regular languages then \(A_1 \circ A_2\) is a regular language.

Proof Idea

We have two regular languages \(A_1, A_2\) and we want to show that \(A_1 \circ A_2\) is a regular language.

The idea is to take two NFAs \(N_1, N_2\) for regular languages \(A_1, A_2\) and combine them into a new NFA \(N\) to recognize \(A_1 \circ A_2\).

This is achieved in the following way.

Assign \(N\)’s start state to be the start state of \(N_1\). The accept states of \(N_1\) have additional \(\varepsilon\) arrows that non-deterministically allow branching to \(N_2\) whenever \(N_1\) is in an accept state, signifying that it has found an initial piece of the input that constitutes a string in \(A_1\). The accept states of \(N\) are the accept states of \(N_2\) only. Therefore, it accepts when the input can be split into two parts, the first accepted by \(N_1\) and the second accepted by \(N_2\). \(N\) non-deterministically guesses where to make the split.

\(Proof\) via non-deterministic simulation

Let NFA \(N_1 = (Q_1, \Sigma, \delta_1, q_1, F_1)\) recognize \(A_1\) and let NFA \(N_2 = (Q_2, \Sigma, \delta_2, q_2, F_2)\) recognize \(A_2\).

Construct NFA \(N = (Q, \Sigma, \delta, q_1, F_2)\) to recognize \(A_1 \circ A_2\).

\(1\quad\) \(Q = Q_1 \cup Q_2\) \(\quad\) The states of \(N\) are all the states of \(N_1\) and \(N_2\).
\(2\quad\) The state \(q_1\) is the same as the start state of \(N_1\).
\(3\quad\) The accept states \(F_2\) are the same as the accept states of \(N_2\).
\(4\quad\) Define \(\delta\) so that for any \(q \in Q\) and any \(a \in \Sigma_\varepsilon\)

\[\begin{split} \delta(q, a) = \begin{cases} \delta_1 (q, a) & q \in Q_1 \,\,\text{and}\,\, q \not\in F_1 \\ \delta_1 (q, a) & q \in F_1 \,\,\text{and}\,\, a \ne \varepsilon \\ \delta_1 (q, a) \cup \{ q_2 \} & q \in F_1 \,\,\text{and}\,\, a = \varepsilon \\ \delta_2 (q, a) & q \in Q_2 \\ \end {cases} \end{split}\]

\(\blacksquare\)


Closure under Regular Star#

THEOREM [Sipser 1.49 pp. 62]

The class of regular languages is closed under the star operation.

In other words, if \(A_1\) is a regular language then \(A_1^*\) is a regular language.

Proof Idea

We have a regular language \(A_1\) and we want to show that \(A_1^*\) is a regular language.

The idea is to take an NFA \(N_1\) for regular language \(A_1\) and modify it into a new NFA \(N\) to recognize \(A_1^*\).

This is achieved in the following way.

\(N\) accepts its input whenever it can be broken into several pieces and \(N_1\) accepts each piece. We can construct \(N\) like \(N_1\) with additional \(\varepsilon\) arrows returning to the start state from the accept states. This way, when processing gets to the end of a piece that \(N_1\) accepts, the machine \(N\) has the option of jumping back to the start state to try to read another piece that \(N_1\) accepts. In addition, we must modify \(N\) so that it accepts \(\varepsilon\) which is always a member of \(A_1^*\).

A bad approach is to add the start state to the set of accept states. This has the effect of adding \(\varepsilon\) to the recognized language, but it may also add other undesirable strings.

Instead, add a new start state, which is also an accept state and which has an \(\varepsilon\) arrow to the old start state. This solution has the desired effect of adding \(\varepsilon\) to the language without adding anything else.

\(Proof\) via non-deterministic simulation

Let NFA \(N_1 = (Q_1, \Sigma, \delta_1, q_1, F_1)\) recognize \(A_1\).

Construct NFA \(N = (Q, \Sigma, \delta, q_0, F)\) to recognize \(A_1^*\).

\(1\quad\) \(Q = \{ q_0 \} \cup Q_1\) \(\quad\) The states of \(N\) are the states of \(N_1\) and a new start state \(q_0\).
\(2\quad\) The state \(q_0\) is the new start state of \(N\).
\(3\quad\) \(F = \{ q_0 \} \cup F_1\) \(\quad\) The accept states are the old accept states and the new start state.
\(4\quad\) Define \(\delta\) so that for any \(q \in Q\) and any \(a \in \Sigma_\varepsilon\)

\[\begin{split} \delta(q, a) = \begin{cases} \delta_1 (q, a) & q \in Q_1 \,\,\text{and}\,\, q \not\in F_1 \\ \delta_1 (q, a) & q \in F_1 \,\,\text{and}\,\, a \ne \varepsilon \\ \delta_1 (q, a) \cup \{ q_1 \} & q \in F_1 \,\,\text{and}\,\, a = \varepsilon \\ \{ q_1 \} & q = q_0 \,\,\text{and}\,\, a = \varepsilon \\ \varnothing & q = q_0 \,\,\text{and}\,\, a \ne \varepsilon \\ \end {cases} \end{split}\]

\(\blacksquare\)


Precedence of the regular operations#

\(1\quad\) star
\(2\quad\) concatenation
\(3\quad\) union


Regular Expressions#


We use arithmetical operators over numbers to construct arithmetic expressions which evaluate to numbers. And we use regular operators over symbols to construct regular expressions which evaluate to languages.

The following definition follows from the definition of the regular operations.

DEFINITION [Sipser 1.51 pp. 64]

Let \(\Sigma\) be any alphabet.
The regular expression \(\Sigma\) describes the language consisting of all strings of length \(1\) over this alphabet.
The regular expression \(\Sigma^*\) describes the language consisting of all strings over this alphabet.

INDUCTIVE DEFINITION Regular Expression [Sipser 1.52 pp. 64]

\(R\) is a regular expression if \(R\) is
\(1\quad\) \(a\) for some \(a\) in the alphabet \(\Sigma\) \(\quad\) The regular expression \(a\) represents the language \(\{ a \}\)
\(2\quad\) \(\varepsilon\) \(\quad\) The regular expression \(\varepsilon\) represents the language \(\{ \varepsilon \}\)
\(3\quad\) \(\varnothing\) \(\quad\) The regular expression \(\varnothing\) represents the empty language
\(4\quad\) \((R_1 \cup R_2)\) where \(R_1\) and \(R_2\) are regular expressions \(\quad\) The regular expression \((R_1 \cup R_2)\) represents the language produced by the union of \(R_1\) and \(R_2\)
\(5\quad\) \((R_1 \circ R_2)\) where \(R_1\) and \(R_2\) are regular expressions \(\quad\) The regular expression \((R_1 \circ R_2)\) represents the language produced by the concatenation of \(R_1\) and \(R_2\)
\(6\quad\) \((R_1^*)\) where \(R_1\) is a regular expression \(\quad\) The regular expression \((R_1^*)\) represents the language produced by the star of \(R_1\)

\( \begin{aligned} R^+ :=&& RR^* && \text{one or more} \\ R^* =&& R^+ \cup \{ \varepsilon \} && \text{zero or more} \\ R^k :=&& \underbrace{R \dots R}_{k} && k \,\,\text{times} \\ \end {aligned} \)

To distinguish between a regular expression and the language that it describes we write \(L(R)\) to be the language of \(R\).


Examples#

Example

\(0 \cup 1\)

s.h. for \(\{ 0 \} \cup \{ 1 \} = \{ 0, 1 \}\)

\(L(0 \cup 1) = \{ w : w \,\,\text{is either}\,\, 0 \,\,\text{or}\,\, 1 \}\)

Example

\(0^*\)

s.h. for \(\{ 0 \}^* = \{ \varepsilon, 0, 00, 000, \dots \}\)

\(L(0^*) = \{ w : w \,\,\text{consists of zero or more}\,\, 0 \,\,\text{s} \}\)

Example

\(\Sigma 0^* = (0 \cup 1) 0^* = 0^+ \cup 10^*\) where \(\Sigma = \{ 0, 1 \}\)

s.h. for \( (\{ 0 \} \cup \{ 1 \}) \circ \{ 0 \}^* = \{ 0, 1 \} \circ \{ 0 \}^* = \{ 0 \} \circ \{ 0 \}^* \cup \{ 1 \} \circ \{ 0 \}^* = 00^* \cup 10^* = 0^+ \cup 10^* \)

\(L(\Sigma 0^*) = \{ w : w \,\,\text{consists of a}\,\, 0 \,\,\text{or a}\,\, 1 \,\,\text{followed by zero or more}\,\, 0 \,\,\text{s}\}\)

Example [Sipser 1.51 pp.64]

\((0 \cup 1)^* = \Sigma^*\)

s.h. for \((\{ 0 \} \cup \{ 1 \})^* = \{ 0, 1 \}^* = \{ \varepsilon, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, \dots \}\)

The language consisting of all possible strings of \(0\) s and \(1\) s.

Example

\((0 \cup 1)^* 1 = \Sigma^* 1\)

s.h. for \((\{ 0 \} \cup \{ 1 \})^* \circ \{ 1 \} = \{ 0, 1 \}^* \circ \{ 1 \} = \{ \varepsilon, \textcolor{red}{\cancel{0}}, 1, \textcolor{red}{\cancel{00}}, 01, \textcolor{red}{\cancel{10}}, 11, \textcolor{red}{\cancel{000}}, 001, \textcolor{red}{\cancel{010}}, 011, \textcolor{red}{\cancel{100}}, 101, \textcolor{red}{\cancel{110}}, 111, \dots \}\)

The language consisting of all strings that end in a \(1\).

Example

\((0 (0 \cup 1)^*) \cup ((0 \cup 1)^* 1) = (0 \Sigma^*) \cup (\Sigma^* 1)\)

s.h. for \( (\{ 0 \} \circ (\{ 0 \} \cup \{ 1 \})^*) \cup ((\{ 0 \} \cup \{ 1 \})^* \circ \{ 1 \}) = (\{ 0 \} \circ \{ 0, 1 \}^*) \cup (\{ 0, 1 \}^* \circ \{ 1 \}) = \{ \varepsilon, 0, 1, 00, 01, \textcolor{red}{\cancel{10}}, 11, 000, 001, 010, 011, \textcolor{red}{\cancel{100}}, 101, \textcolor{red}{\cancel{110}}, 111, \dots \} \)

The language consisting of all strings that start with a \(0\) or end with a \(1\).

Example [Sipser 1.53 pp. 65]

\(0^* 1 0^*\)

s.h. for \(\{ 0 \}^* \circ \{ 1 \} \circ \{ 0 \}^*\)

\(L(0^* 1 0^*) = \{ w : w \,\,\text{contains a single}\,\, 1 \}\)

Example [Sipser 1.53 pp. 65]

\(\Sigma^* 1 \Sigma^*\) where \(\Sigma = \{ 0, 1 \}\)

s.h. for \(\{ 0, 1 \}^* \circ \{ 1 \} \circ \{ 0, 1 \}^*\)

\(L(\Sigma^* 1 \Sigma^*) = \{ w : w \,\,\text{has at least one}\,\, 1 \}\)

Example [Sipser 1.53 pp. 65]

\(\Sigma^* 001 \Sigma^*\) where \(\Sigma = \{ 0, 1 \}\)

s.h. for \( \{ 0, 1 \}^* \circ \{ 0 \} \circ \{ 0 \} \circ \{ 1 \} \circ \{ 0, 1 \}^* = \{ 0, 1 \}^* \circ \{ 001 \} \circ \{ 0, 1 \}^* \)

\(L(\Sigma^* 001 \Sigma^*) = \{ w : w \,\,\text{contains the substring}\,\, 001 \}\)

Example [Sipser 1.53 pp. 65]

\(1^* (0 1^+)^*\)

s.h. for \( \{ 1 \}^* \circ (\{ 0 \} \circ \{ 1 \} \circ \{ 1 \}^*)^* = \{ 1 \}^* \circ (\{ 01 \} \circ \{ 1 \}^*)^* \)

\(L(1^* (0 1^+)^*) = \{ w : w \,\,\text{every}\,\, 0 \,\,\text{in}\,\, w \,\,\text{is followed by at least one}\,\, 1 \}\)

Example [Sipser 1.53 pp. 65]

\((\Sigma\Sigma)^*\) where \(\Sigma = \{ 0, 1 \}\)

s.h. for \( (\{ 0, 1 \} \circ \{ 0, 1 \})^* = \{ 00, 01, 10, 11 \}^* \)

\(L((\Sigma\Sigma)^*) = \{ w : w \,\,\text{is a string of even length} \}\)

Example [Sipser 1.53 pp. 65]

\((\Sigma\Sigma\Sigma)^*\) where \(\Sigma = \{ 0, 1 \}\)

s.h. for \( (\{ 0, 1 \} \circ \{ 0, 1 \} \circ \{ 0, 1 \})^* = \{ 000, 001, 010, 011, 100, 101, 110, 111 \}^* \)

\(L((\Sigma\Sigma\Sigma)^*) = \{ w : \text{the length of}\,\, w \,\,\text{is a multiple of}\,\, 3 \}\)

Example [Sipser 1.53 pp. 65]

\(01 \cup 10\)

s.h. for \( (\{ 0 \} \circ \{ 1 \}) \cup (\{ 1 \} \circ \{ 0 \}) = \{ 01 \} \cup \{ 10 \} = \{ 01, 10 \} \)

\(L(01 \cup 10) = \{ 01, 10 \}\)

Example [Sipser 1.53 pp. 65]

\(0 \Sigma^* 0 \cup 1 \Sigma^* 1 \cup 0 \cup 1\) where \(\Sigma = \{ 0, 1 \}\)

s.h. for \( (\{ 0 \} \circ \{ 0, 1 \}^* \circ \{ 0 \}) \cup (\{ 1 \} \circ \{ 0, 1 \}^* \circ \{ 1 \}) \cup \{ 0 \} \cup \{ 1 \} = (\{ 0 \} \circ \{ 0, 1 \}^* \circ \{ 0 \}) \cup (\{ 1 \} \circ \{ 0, 1 \}^* \circ \{ 1 \}) \cup \{ 0, 1 \} \)

\(L(0 \Sigma^* 0 \cup 1 \Sigma^* 1 \cup 0 \cup 1) = \{ w : w \,\,\text{starts and ends with the same symbol} \}\)

Example [Sipser 1.53 pp. 65]

\((0 \cup \varepsilon) 1^* = 01^* \cup 1^*\)

s.h. for \( (\{ 0 \} \cup \{ \varepsilon \}) \circ \{ 1 \}^* = (\{ 0, \varepsilon \}) \circ \{ 1 \}^* \)

\(L((0 \cup \varepsilon) 1^*) = \{ w : w \,\,\text{starts and ends with the same symbol} \}\)

Example [Sipser 1.53 pp. 65]

\(R = (0 \cup \varepsilon)(1 \cup \varepsilon)\)

\( (\{ 0 \} \cup \{ \varepsilon \}) \circ (\{ 1 \} \cup \{ \varepsilon \}) = \{ 0, \varepsilon \} \circ \{ 1, \varepsilon \} = \{ \varepsilon, 0, 1, 01 \} \)

\(L(R) = \{ \varepsilon, 0, 1, 01 \}\)

Example [Sipser 1.53 pp. 65]

\(R = 1^* \varnothing = \{ 1 \}^* \circ \{ \} = \varnothing = L(R)\)

The concatenation of any language and the empty language yields the empty language.

Example [Sipser 1.53 pp. 65]

\(R = \varnothing^* = \{ \}^* = \{ \varepsilon \} = L(R)\)

The star of the empty language yields the language consisting of the empty string.

Example [Sipser pp. 66]

\( \begin{aligned} D &= \{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 \} \\ R &= (+ \cup - \cup \varepsilon) (D^+ \cup D^+ . D^* \cup D^* . D^+) \\ &= ( \{ + \} \cup \{ - \} \cup \{ \varepsilon \} ) \circ ( D \circ D^* \cup D \circ D^* \circ \{ . \} \circ D^* \cup D^* \circ \{ . \} \circ D \circ D^* ) \\ &= \{ +, -, \varepsilon \} \circ ( D \circ D^* \cup D \circ D^* \circ \{ . \} \circ D^* \cup D^* \circ \{ . \} \circ D \circ D^* ) \\ &= L(R) \\ \end {aligned} \)

\( \begin{aligned} 72& \\ 3&.14159 \\ +7&. \\ -&.01 \\ \end {aligned} \)


Algebraic properties of the regular operations#

CLAIM

Let \(\text{REG}\) be the class of regular languages.

\( \begin{aligned} \forall L_1, L_2 \in \text{REG} && L_1 \cup L_2 \in \text{REG} &&& \text{regular union is closed} \\ \forall L_1, L_2, L_3 \in \text{REG} && (L_1 \cup L_2) \cup L_3 = L_1 \cup (L_2 \cup L_3) &&& \text{regular union is associative} \\ \exists \varnothing \in \text{REG} \,\, \forall L \in \text{REG} && L \cup \varnothing = L &&& \text{identity} \\ \forall L_1, L_2 \in \text{REG} && L_1 \cup L_2 = L_2 \cup L_1 &&& \text{regular union is commutative} \\ \\ \forall L_1, L_2 \in \text{REG} && L_1 \circ L_2 \in \text{REG} &&& \text{regular concatenation is closed} \\ \forall L_1, L_2, L_3 \in \text{REG} && (L_1 \circ L_2) \circ L_3 = L_1 \circ (L_2 \circ L_3) &&& \text{regular concatenation is associative} \\ \exists \{ \varepsilon \} \in \text{REG} \,\, \forall L \in \text{REG} && \{ \varepsilon \} \circ L = L = L \circ \{ \varepsilon \} &&& \text{left and right identity} \\ && \textcolor{red}{L_1 \circ L_2 \ne L_2 \circ L_1} &&& \text{regular concatenation is not commutative} \\ \\ \forall L_1, L_2, L_3 \in \text{REG} && L_1 \circ (L_2 \cup L_3) = (L_1 \circ L_2) \cup (L_1 \circ L_3) &&& \text{regular concatenation is left-distributive over regular union} \\ \forall L_1, L_2, L_3 \in \text{REG} && (L_1 \cup L_2) \circ L_3 = (L_1 \circ L_3) \cup (L_2 \circ L_3) &&& \text{regular concatenation is right-distributive over regular union} \\ \\ \forall L_1 \in \text{REG} && L_1^* \in \text{REG} &&& \text{regular star is closed} \\ \end {aligned} \)


Equivalence of Finite Automata and Regular Expressions#

Regular expressions and finite automata are equivalent in their descriptive power: any regular expression can be converted into a finite automaton that recognizes the language it describes and vice versa.

THEOREM [Sipser 1.54 pp. 66]

A language is regular iff some regular expression describes it.

If a language is described by a regular expression then it is regular.

\(Proof\)

Let \(R\) be a regular expression that describes a language \(A\). We convert \(R\) into an NFA \(N\) that recognizes \(A\) for each of the six cases in the formal definition of a regular expression.

\(1\quad\) \(R = a\) for some \(a \in \Sigma\) \(\quad\) Then \(L(R) = \{ a \}\) and the following NFA recognizes \(L(R)\).

\(N = (\{ q_0, q_1 \}, \{ a \}, \delta, q_0, \{ q_1 \})\)

\( \begin{array}{c|c} \delta & a & \varepsilon \\ \hline q_0 & \{ q_1 \} & \varnothing \\ q_1 & \varnothing & \varnothing \\ \end {array} \)

\(\delta (q_1, a) = \{ q_2 \}\) and \(\delta (r, b) = \varnothing\) for \(r \ne q_1\) or \(b \ne a\).

\(2\quad\) \(R = \varepsilon\) \(\quad\) Then \(L(R) = \{ \varepsilon \}\) and the following NFA recognizes \(L(R)\).

\(N = (\{ q_0 \}, \varnothing, \delta, q_0, \{ q_0 \})\)

\( \begin{array}{c|c} \delta & \varepsilon \\ \hline q_0 & \varnothing \\ \end {array} \)

\(\delta (r, b) = \varnothing\) for any \(r\) and \(b\).

\(3\quad\) \(R = \varnothing\) \(\quad\) Then \(L(R) = \varnothing\) and the following NFA recognizes \(L(R)\).

\(N = (\{ q_0 \}, \varnothing, \delta, q_0, \varnothing)\)

\( \begin{array}{c|c} \delta & \varepsilon \\ \hline q_0 & \varnothing \\ \end {array} \)

\(\delta (r, b) = \varnothing\) for any \(r\) and \(b\).

\(4\quad\) \(R = R_1 \cup R_2\)

\( \begin{aligned} N &= (Q, \Sigma, \delta, q_0, F) \\ Q &= \{ q_0 \} \cup Q_1 \cup Q_2 \\ \Sigma &= \Sigma_1 \cup \Sigma_2 \\ F &= F_1 \cup F_2 \\ \delta(q, a) &= \begin{cases} \delta_1 (q, a) & q \in Q_1 \\ \delta_2 (q, a) & q \in Q_2 \\ \{ q_1, q_2 \} & q = q_0 \,\,\text{and}\,\, a = \varepsilon \\ \varepsilon & q = q_0 \,\,\text{and}\,\, a \ne \varepsilon \\ \end {cases} \end {aligned} \)

\(5\quad\) \(R = R_1 \circ R_2\)

\( \begin{aligned} N &= (Q, \Sigma, \delta, q_1, F_2) \\ Q &= Q_1 \cup Q_2 \\ \Sigma &= \Sigma_1 \cup \Sigma_2 \\ \delta(q, a) &= \begin{cases} \delta_1 (q, a) & q \in Q_1 \,\,\text{and}\,\, q \not\in F_1 \\ \delta_1 (q, a) & q \in F_1 \,\,\text{and}\,\, a \ne \varepsilon \\ \delta_1 (q, a) \cup \{ q_2 \} & q \in F_1 \,\,\text{and}\,\, a = \varepsilon \\ \delta_2 (q, a) & q \in Q_2 \\ \end {cases} \end {aligned} \)

\(6\quad\) \(R = R_1^*\)

\( \begin{aligned} N &= (Q, \Sigma, \delta, q_0, F) \\ Q &= \{ q_0 \} \cup Q_1 \\ \Sigma &= \Sigma_1 \cup \Sigma_2 \\ F &= \{ q_0 \} \cup F_1 \\ \delta(q, a) &= \begin{cases} \delta_1 (q, a) & q \in Q_1 \,\,\text{and}\,\, q \not\in F_1 \\ \delta_1 (q, a) & q \in F_1 \,\,\text{and}\,\, a \ne \varepsilon \\ \delta_1 (q, a) \cup \{ q_1 \} & q \in F_1 \,\,\text{and}\,\, a = \varepsilon \\ \{ q_1 \} & q = q_0 \,\,\text{and}\,\, a = \varepsilon \\ \varnothing & q = q_0 \,\,\text{and}\,\, a \ne \varepsilon \\ \end {cases} \end {aligned} \)

\(\blacksquare\)

If a language is regular then it is described by a regular expression.


Regex -> NFA -> DFA

Example [Sipser 1.56 pp. 68]

\( \begin{aligned} R &= (ab \cup a)^* \\ &= (\{ a \} \circ \{ b \} \cup \{ a \})^* \\ &= (\{ ab \} \cup \{ a \})^* \\ &= \{ ab, a \}^* \\ \end {aligned} \)

\(R = a \quad N = (\{ q_1, q_2 \}, \{ a \}, \delta, q_1, \{ q_2 \})\)

\( \begin{array}{c|c} \delta & a & \varepsilon \\ \hline q_1 & \{ q_2 \} & \varnothing \\ q_2 & \varnothing & \varnothing \\ \end {array} \)

\(R = b \quad N = (\{ q_3, q_4 \}, \{ b \}, \delta, q_3, \{ q_4 \})\)

\( \begin{array}{c|c} \delta & b & \varepsilon \\ \hline q_3 & \{ q_4 \} & \varnothing \\ q_4 & \varnothing & \varnothing \\ \end {array} \)

\( \begin{aligned} R &= ab \\ Q &= Q_1 \cup Q_2 = \{ q_1, q_2 \} \cup \{ q_3, q_4 \} = \{ q_1, q_2, q_3, q_4 \} \\ \Sigma &= \Sigma_1 \cup \Sigma_2 = \{ a \} \cup \{ b \} = \{ a, b \} \\ N &= (\{ q_1, q_2, q_3, q_4 \}, \{ a, b \}, \delta, q_1, \{ q_4 \}) \\ \end {aligned} \)

\( \begin{array}{c|c} \delta & a & b & \varepsilon \\ \hline q_1 & \{ q_2 \} & \varnothing & \varnothing \\ q_2 & \varnothing & \varnothing & \{ q_3 \} \\ q_3 & \varnothing & \{ q_4 \} & \varnothing \\ q_4 & \varnothing & \varnothing & \varnothing \\ \end {array} \)

\( \begin{aligned} R &= ab \cup a \\ Q &= \{ s \} \cup Q_1 \cup Q_2 = \{ s \} \cup \{ q_0, q_1, q_2, q_3 \} \cup \{ q_0, q_1 \} = \{ s, q_0, q_1, q_2, q_3 \} \\ \Sigma &= \Sigma_1 \cup \Sigma_2 = \{ a, b \} \cup \{ a \} = \{ a, b \} \\ F &= \{ q_3 \} \cup \{ q_1 \} = \{ q_1, q_3 \} \\ N &= (Q, \Sigma, \delta, s, F) \\ \end {aligned} \)

\( \begin{array}{c|c} \delta & a & b & \varepsilon \\ \hline s & \varepsilon & \varepsilon & \{ q_0 \} \\ q_0 & \{ q_1 \} & \varnothing & \varnothing \\ q_1 & \varnothing & \varnothing & \{ q_2 \} \\ q_2 & \varnothing & \{ q_3 \} & \varnothing \\ q_3 & \varnothing & \varnothing & \varnothing \\ \end {array} \)


Resources#

https://math.mit.edu/~sipser/18404/

https://www.cs.odu.edu/~toida/nerzic/390teched/web_course.html

https://www.cs.ucr.edu/~jiang/cs150/slides4week3_AlgebraicLaws+PL.pdf


Figures#

  • [ w ] Chomsky, Noam (1928-)

  • [ w ] Kleene, Stephen (1909-1994)

  • [ w ] Mealy, George (1927-2010)


Terms#

  • [ w ] Abstract Machine

  • [ w ] Alphabet

  • [ w ] Automata Theory

  • [ w ] Automaton

  • [ w ] Chomsky Hierarchy

  • [ w ] Finite-State Machine

  • [ w ] Formal Grammar

  • [ w ] Formal Language

  • [ w ] Kleene Star

  • [ w ] Mealy Machine

  • [ w ] Non Terminal Symbol

  • [ w ] Production Rule

  • [ w ] Pushdown Automaton

  • [ w ] Regular Language

  • [ w ] Symbol

  • [ w ] Terminal Symbol

  • [ w ] Turing Machine