swift-mirror/docs/Generics/chapters/monoids.tex

\documentclass[../generics]{subfiles}

\begin{document}

\chapter{Monoids}\label{monoids}

\lettrine{M}{onoids} are one of the fundamental objects that we study in abstract algebra. Their simplicity allows for a great deal of generality, so we will quickly narrow our focus to \emph{finitely-presented} monoids. We will see that every finitely-presented monoid can be encoded by a Swift generic signature. After a brief detour into computability theory, we will introduce the \emph{word problem}, and relate the word problem to our derived requirements formalism of \SecRef{derived req}. The word problem turns out to be undecidable in general, but we will see it can be solved in those finitely-presented monoids that admit a \emph{convergent} presentation. This prepares us for for the next chapter, where we attempt to solve derived requirements by encoding a generic signature as a finitely-presented monoid. We begin with standard definitions, found in texts such as \cite{semigroup} or \cite{postmodern}.
\begin{definition}
A \IndexDefinition{monoid}\emph{monoid} \index{$\cdot$}\index{$\cdot$!z@\igobble|seealso{monoid}}$(M,\, \cdot,\, \varepsilon)$ is a structure consisting of a \index{set}set~$M$, a \index{binary operation}binary operation~``\;$\cdot$\;'', and an identity element~$\varepsilon\in M$, which together satisfy the following three axioms:
\begin{itemize}
\item The set $M$ is \emph{closed} under the binary operation: for all $x$, $y \in M$, $x\cdot y\in M$.
\item The binary operation is \IndexDefinition{associative operation}\emph{associative}: for all $x, y, z \in M$, $x\cdot(y\cdot z)=(x\cdot y)\cdot z$.
\item The element $\varepsilon$ acts as the \IndexDefinition{identity element}\emph{identity}: for all $x\in M$, $x\cdot \varepsilon=\varepsilon\cdot x=x$.
\end{itemize}
When the binary operation and identity element are clear from context, we can write~$M$ instead of $(M,\,\cdot,\,\varepsilon)$. We also sometimes omit the symbol ``\;$\cdot$\;'', so if $x$, $y\in M$, then we can write $xy$ instead of $x\cdot y$. Finally, associativity allows us to omit parentheses without ambiguity in an expression like $x\cdot y\cdot z$ or $xyz$.

Specific instances of monoids can be quite complicated and abstract, but our first example is easy to describe.
\end{definition}

\begin{example}
The set of \index{natural numbers!addition}natural numbers $\NN$ satisfies the monoid axioms if we take addition as our operation and zero as the identity element:
\begin{itemize}
\item The sum of two natural numbers is again a natural number.
\item Addition is associative: for all $x$, $y$, $z\in\NN$, $(x+y)+z=x+(y+z)$.
\item Zero is the identity: for all $x\in\NN$, $x+0=0+x=x$.
\end{itemize}
\end{example}

We will see shortly that $(\NN,+,0)$ is an instance of the following construction.

\begin{definition}
Let $A$ be a \index{set!generators}set. The \IndexDefinition{free monoid}\emph{free monoid} generated by $A$, denoted $A^*$, is the \index{set!free monoid}set of all finite strings of elements of $A$. (This is the same notation as the ``\;\texttt{*}\;'' operator in a \index{regular language}\emph{regular expression}.) The set $A$ is called the \index{alphabet!z@\igobble|seealso{generating set}}\emph{alphabet} or \IndexDefinition{generating set}\emph{generating set} of $A^*$. The generating set may be finite or infinite, but we assume it is finite unless stated otherwise. The \index{binary operation!free monoid}binary operation on $A^*$ is \emph{string concatenation}, and the \index{identity element!free monoid}identity element $\varepsilon$ is the empty string. The elements of $A^*$ are also called \IndexDefinition{term}\emph{terms}. We say that $u$ is a \IndexDefinition{subterm}\emph{subterm} of $t$ if $t=xuy$ for some $x$, $y\in A^*$. The \IndexDefinition{term length}\emph{length} of a term $t\in A^*$ is denoted $|t|\in\NN$.
\end{definition}

The reader might wish to review the discussion of the type parameter graph from \SecRef{type parameter graph}. In abstract algebra, the \index{directed graph!Cayley graph}\index{Cayley graph!free monoid}\emph{Cayley graph} of a monoid plays a similar role, and we will discover many parallels between the two. We will describe the general construction in the next section, but for now we'll just look the Cayley graph of the free monoid~$A^*$. The vertices in this graph are the terms of $A^*$, and the vertex for the \index{identity element!Cayley graph}identity element is the distinguished \index{root vertex!Cayley graph}root vertex. Then, for each vertex~$t$ and each generator $g\in A$, we also add an edge with \index{source vertex!Cayley graph}source~$t$ and \index{source vertex!Cayley graph}destination~$tg$, and label this edge ``$g$''. (This is sometimes called the \emph{right} Cayley graph, and the left Cayley graph can then be defined in the opposite way by joining $t$ with $gt$.)

\begin{example}
The free monoid with two generators $\{a,b\}^*$ consists of all finite strings made up of $a$ and $b$. Two typical elements are $abba$ and $bab$, and their concatenation is $abba\cdot bab=abbabab$. Unlike $(\NN,+,0)$, this monoid operation is not \index{commutative operation}\emph{commutative}, so for example, $abba\cdot bab\neq bab\cdot abba$. The Cayley graph of $\{a,b\}^*$ is an infinite binary \index{tree!Cayley graph of free monoid}tree. Every vertex has two successors, corresponding to multiplication on the right by $a$~and~$b$, respectively:

\begin{center}
\begin{tikzpicture}[x=1.6cm,y=1.2cm]
\node (E) [root] at (0,0) {\strut $\varepsilon$};

\node (A) [interior] at (-2,0) {\strut $a$};
\node (B) [interior] at (2,0) {\strut $b$};

\node (AA) [interior] at (-2,2) {\strut $aa$};
\node (AB) [interior] at (-2,-2) {\strut $ab$};
\node (BA) [interior] at (2,2) {\strut $ba$};
\node (BB) [interior] at (2,-2) {\strut $bb$};

\node (AAA) [interior] at (-3,2) {\strut $aaa$};
\node (AAB) [interior] at (-1,2) {\strut $aab$};
\node (ABA) [interior] at (-3,-2) {\strut $aba$};
\node (ABB) [interior] at (-1,-2) {\strut $abb$};
\node (BAA) [interior] at (1,2) {\strut $baa$};
\node (BAB) [interior] at (3,2) {\strut $bab$};
\node (BBA) [interior] at (1,-2) {\strut $bba$};
\node (BBB) [interior] at (3,-2) {\strut $bbb$};

\node (AAAA) [] at (-3,3) {\strut $\ldots$};
\node (AAAB) [] at (-3,1) {\strut $\ldots$};
\node (AABA) [] at (-1,3) {\strut $\ldots$};
\node (AABB) [] at (-1,1) {\strut $\ldots$};
\node (ABAA) [] at (-3,-1) {\strut $\ldots$};
\node (ABAB) [] at (-3,-3) {\strut $\ldots$};
\node (ABBA) [] at (-1,-1) {\strut $\ldots$};
\node (ABBB) [] at (-1,-3) {\strut $\ldots$};
\node (BAAA) [] at (1,3) {\strut $\ldots$};
\node (BAAB) [] at (1,1) {\strut $\ldots$};
\node (BABA) [] at (3,3) {\strut $\ldots$};
\node (BABB) [] at (3,1) {\strut $\ldots$};
\node (BBAA) [] at (1,-1) {\strut $\ldots$};
\node (BBAB) [] at (1,-3) {\strut $\ldots$};
\node (BBBA) [] at (3,-1) {\strut $\ldots$};
\node (BBBB) [] at (3,-3) {\strut $\ldots$};

\path [arrow] (E) edge [above] node {\scriptsize{$a$}} (A);
\path [arrow] (E) edge [above] node {\scriptsize{$b$}} (B);

\path [arrow] (A) edge [left] node {\scriptsize{$a$}} (AA);
\path [arrow] (A) edge [left] node {\scriptsize{$b$}} (AB);
\path [arrow] (B) edge [right] node {\scriptsize{$a$}} (BA);
\path [arrow] (B) edge [right] node {\scriptsize{$b$}} (BB);

\path [arrow] (AA) edge [above] node {\scriptsize{$a$}} (AAA);
\path [arrow] (AA) edge [above] node {\scriptsize{$b$}} (AAB);
\path [arrow] (AB) edge [below] node {\scriptsize{$a$}} (ABA);
\path [arrow] (AB) edge [below] node {\scriptsize{$b$}} (ABB);
\path [arrow] (BA) edge [above] node {\scriptsize{$a$}} (BAA);
\path [arrow] (BA) edge [above] node {\scriptsize{$b$}} (BAB);
\path [arrow] (BB) edge [below] node {\scriptsize{$a$}} (BBA);
\path [arrow] (BB) edge [below] node {\scriptsize{$b$}} (BBB);

\path [arrow] (AAA) edge [left] node {\scriptsize{$a$}} (AAAA);
\path [arrow] (AAA) edge [left] node {\scriptsize{$b$}} (AAAB);
\path [arrow] (AAB) edge [right] node {\scriptsize{$a$}} (AABA);
\path [arrow] (AAB) edge [right] node {\scriptsize{$b$}} (AABB);
\path [arrow] (ABA) edge [left] node {\scriptsize{$a$}} (ABAA);
\path [arrow] (ABA) edge [left] node {\scriptsize{$b$}} (ABAB);
\path [arrow] (ABB) edge [right] node {\scriptsize{$a$}} (ABBA);
\path [arrow] (ABB) edge [right] node {\scriptsize{$b$}} (ABBB);
\path [arrow] (BAA) edge [left] node {\scriptsize{$a$}} (BAAA);
\path [arrow] (BAA) edge [left] node {\scriptsize{$b$}} (BAAB);
\path [arrow] (BAB) edge [right] node {\scriptsize{$a$}} (BABA);
\path [arrow] (BAB) edge [right] node {\scriptsize{$b$}} (BABB);
\path [arrow] (BBA) edge [left] node {\scriptsize{$a$}} (BBAA);
\path [arrow] (BBA) edge [left] node {\scriptsize{$b$}} (BBAB);
\path [arrow] (BBB) edge [right] node {\scriptsize{$a$}} (BBBA);
\path [arrow] (BBB) edge [right] node {\scriptsize{$b$}} (BBBB);
\end{tikzpicture}
\end{center}
Every term defines a \index{path!Cayley graph}path in the Cayley graph; we start from the root $\varepsilon$, and follow a series of successive ``$a$'' or ``$b$'' edges until we arrive at the vertex representing our term. In general, every vertex in the Cayley graph has the same number of successors, equal to the number of elements in our generating set~$A$.
\end{example}

Before we continue, we need a shorthand where $xxx$ becomes $x^3$, and so on:

\begin{definition}
Let $M$ be a monoid. If $x\in M$ and $n\in\NN$, we can define $x^n$ as the ``$n$th power'' of $x$:
\[
x^n = \begin{cases}
\varepsilon&n=0\\
x&n=1\\
x^{n-1}\cdot x&n>1
\end{cases}
\]
\end{definition}
The choice of notation is justified by the following ``law of exponents.''
\begin{proposition}\label{monoid exp law}
Let $M$ be a monoid. If $x\in M$ is some element, then $x^m\cdot x^n=x^{m+n}$ for all $m$, $n\in\NN$.
\end{proposition}
\begin{proof}
\index{induction}
\index{proof by induction|see{induction}}
We start with a fixed but arbitrary $m\in\NN$, and then prove the result by \index{induction}induction on $n\in\NN$ (\SecRef{generic signature validity}).

\BaseCase We need to show that $x^m\cdot x^0=x^{m+0}$. We make use of the definition of~$x^0$, the identity element axiom of~$M$, and the fact that $m=m+0$ in $\NN$:
\[x^m\cdot x^0=x^m\cdot\varepsilon=x^{m}=x^{m+0}\]

\InductiveStep First, we assume the induction hypothesis:
\[x^m\cdot x^{n-1}=x^{m+n-1}\]
Then, we use the monoid operation of $M$ to multiply both sides by $x$:
\[(x^m\cdot x^{n-1})\cdot x =x^{m+n-1}\cdot x\]
We use the associativity of $M$ and the definition of $x^n$ to rewrite the left-hand side:
\[
(x^m\cdot x^{n-1})\cdot x = x^m\cdot (x^{n-1}\cdot x) = x^m\cdot x^{n-1+1}=x^m\cdot x^n
\]
Finally, we use the definition of $x^n$ to rewrite the right-hand side:
\[
x^{m+n-1}\cdot x = x^{m+n-1+1}=x^{m+n}
\]
Putting everything together, we get $x^m\cdot x^n=x^{m+n}$. By induction, this is actually true for all $n\in\NN$, and since we also started with an arbitrary $m\in\NN$, we're done.
\end{proof}

\begin{example}\label{monoid n graph}
The free monoid over one generator $\{a\}^*$ is \index{monoid isomorphism}\emph{isomorphic} to $(\NN,+,0)$. We will talk about isomorphisms later, but essentially this means that both define the same object, up to a choice of notation.

Indeed, every non-zero element of $\NN$ can be uniquely expressed as a sum of $1$'s. For example, $3=1+1+1$ and $5=1+1+1+1+1$. This means we can write $\varepsilon$ instead of~$0$, $a$ instead of~$1$, and $\cdot$ instead of~$+$. Now, the equation $3+5=8$ translates into $aaa\cdot aaaaa=aaaaaaaa$, or using exponent notation, $a^3\cdot a^5 = a^8$.

Next, consider the \index{Cayley graph!free monoid}Cayley graph of $\{a\}^*$. Here, every vertex has one successor, because the generating set has a single element~$a$. This looks like the type parameter graph of protocol~\tN\ from \ExRef{protocol n graph}, and we will see why in \SecRef{monoidsasprotocols}:

\begin{center}
\begin{tikzpicture}[x=2cm]
\node (T) [root] {\strut $\varepsilon$};
\node (TA) [interior, right=of T] {\strut $a$};
\node (TAA) [interior, right=of TA] {\strut $a^2$};
\node (TAAA) [interior, right=of TAA] {\strut $a^3$};
\node (Rest) [right=of TAAA] {$\cdots$};

\path [arrow] (T) edge [above] node {\footnotesize{$a$}} (TA);
\path [arrow] (TA) edge [above] node {\footnotesize{$a$}} (TAA);
\path [arrow] (TAA) edge [above] node {\footnotesize{$a$}} (TAAA);
\path [arrow] (TAAA) edge [above] node {\footnotesize{$a$}} (Rest);
\end{tikzpicture}
\end{center}
\end{example}

\begin{example}
The free monoid over an \index{empty set}empty set $\varnothing^*$ is $\{\varepsilon\}$, the one-element set that consists of the empty string. This is called the \IndexDefinition{trivial monoid}\emph{trivial monoid}. Its \index{Cayley graph!trivial monoid}Cayley graph is a single vertex.
\end{example}

\section{Finitely-Presented Monoids}\label{finitely presented monoids}

Every element of a free monoid has a \emph{unique} expression as a product of generators. Finitely-presented monoids are more general, because multiple distinct combinations can name the same element. To model this phenomenon, we start from a finite set of \index{rule|see{rewrite rule}}\IndexDefinition{rewrite rule}\emph{rewrite rules}, which then generate an equivalence relation on terms. We first turned to equivalence relations in \SecRef{valid type params}, to understand the same-type requirements of a generic signature; the below construction is similar.

A \IndexDefinition{monoid presentation}\emph{monoid presentation} is a list of generators and rewrite rules:
\[\Pres{ \underbrace{a_1,\,\ldots,\, a_m}_{\text{generators}} }{ \underbrace{u_1 \sim v_1,\,\ldots,\, u_n \sim v_n}_{\text{rewrite rules}}} \]

Also, if $A := \{a_1,\ldots, a_m\}$ and $R := \{(u_1,v_1),\,\ldots,\,(u_n,v_n)\}\subseteq A^* \times A^*$ are finite sets, we can form the monoid presentation~$\AR$. The equivalence relation that~$R$ generates on the terms of~$A^*$ is denoted by $x\sim_R y$, or $x \sim y$ when $R$ is clear from context. Note that when we write $x=y$, we always mean that $x$ and $y$ are \emph{identical} in $A^*$, not just equivalent in $\AR$.

We will describe the intuitive model behind the \index{term equivalence relation}term equivalence relation first, and then give a rigorous definition in the next section. Syntactically, a rewrite rule $(u,v)\in R$ is a pair of terms. Semantically, a rewrite rule tells us that if we find $u$ anywhere within a term, we can replace this subterm with $v$, and obtain another equivalent term. Term equivalence is symmetric, so we can also replace $v$ with $u$, too. Finally, it is transitive, so we can iterate these rewrite steps any number of times to prove an equivalence.

A monoid presentation $\AR$ then defines a \IndexDefinition{finitely-presented monoid}\emph{finitely-presented monoid}. The \index{set!finitely-presented monoid}elements of a finitely-presented monoid are the \index{equivalence class!terms}equivalence classes of $\sim_R$, the \index{binary operation!finitely-presented monoid}binary operation is concatenation of terms, and the \index{identity element!finitely-presented monoid}identity element is the equivalence class of $\varepsilon$.

\begin{example}
Finitely-presented monoids generalize the free monoids, because every free monoid (with a finite generating set) is also finitely presented if we start from an empty set of rewrite rules. In this case $x\sim y$ if and only if $x=y$.
\end{example}

\begin{example}\label{monoid z4 example}
Consider the finite set $\{0,1,2,3\}$ with the binary operation $+$ given by the following table. The operation is \index{modular arithmetic}addition modulo 4. We call this monoid $\mathbb{Z}_4$. This method of specifying a finite monoid is called a \IndexDefinition{Cayley table}\emph{Cayley table}:
\begin{center}
\begin{tabular}{c|cccc}
$+$&0&1&2&3\\
\hline
0&0&1&2&3\\
1&1&2&3&0\\
2&2&3&0&1\\
3&3&0&1&2
\end{tabular}
\end{center}
If we write $a$ instead of $1$, $\varepsilon$ instead of $0$, and $\cdot$ instead of $+$, we can also describe $\mathbb{Z}_4$ as a finitely-presented monoid with one generator and one rewrite rule:
\[\mathbb{Z}_4 := \Pres{a}{a^4\sim\varepsilon}\]
The rule $a^4\sim\varepsilon$ allows us to insert or delete $aaaa$, which gives us this general principle:
\[a^m\sim a^n\qquad\text{if and only if}\qquad m\equiv n\pmod 4\]
Therefore, the rule partitions the terms of $\{a^*\}$ into four infinite equivalence classes:
\begin{gather*}
\{\varepsilon,\, a^4,\, a^8, \ldots,\, a^{4k},\, \ldots\}\\
\{a,\, a^5,\, a^9,\, \ldots,\, a^{4k+1},\, \ldots\}\\
\{a^2,\, a^6,\, a^{10},\, \ldots,\, a^{4k+2},\, \ldots\}\\
\{a^3,\, a^7,\, a^{11},\, \ldots,\, a^{4k+3},\, \ldots\}
\end{gather*}

In the \index{Cayley graph!finitely-presented monoid}Cayley graph of a finitely-presented monoid, the \index{vertex!Cayley graph}vertices are \index{equivalence class!terms}equivalence classes of terms. The \index{edge!Cayley graph}edge set is now defined on these equivalence classes, so for each equivalence class $\EquivClass{t}$ and generator $g\in A$, we have an edge with source~$\EquivClass{t}$ and destination~$\EquivClass{tg}$. We will see later this does not depend on our choice of labels. With $\mathbb{Z}_4$ we can label each vertex with the shortest term in each equivalence class. The Cayley graph for $\mathbb{Z}_4$ is the same as the graph we associated with the \texttt{Z4} protocol from \ExRef{protocol z4 graph}:

\begin{center}
\begin{tikzpicture}
\node (T) [root] at (1,2) {\strut $\varepsilon$};
\node (TA) [interior] at (2,1) {\strut $a$};
\node (TAA) [interior] at (1,0) {\strut $a^2$};
\node (TAAA) [interior] at (0,1) {\strut $a^3$};

\path [arrow, bend left] (T) edge [right] node {\footnotesize{$a$}} (TA);
\path [arrow, bend left] (TA) edge [right] node {\footnotesize{$a$}} (TAA);
\path [arrow, bend left] (TAA) edge [left] node {\footnotesize{$a$}} (TAAA);
\path [arrow, bend left] (TAAA) edge [left] node {\footnotesize{$a$}} (T);
\end{tikzpicture}
\end{center}
\end{example}

\begin{example}\label{commutative free monoid example}
We saw that the free monoid $\{a,b\}^*$ is not \index{commutative operation}commutative. If we wish to make it commutative in the ``most general'' way, we can consider the \IndexDefinition{free commutative monoid}\emph{free commutative monoid} on two generators:
\[\Pres{a,b}{ba\sim ab}\]
The rule says we can permute the $a$'s and $b$'s within a term, but nothing else, so in particular the terms in an equivalence class must all have the same \index{term length}length. For this reason, every equivalence class is finite, and furthermore, we can transform an arbitrary term into a ``canonical form'' where the $a$'s come first. This allows us to calculate by summing exponents (in fact, this monoid is isomorphic to the \index{Cartesian product}Cartesian product $\NN\times\NN$, whose elements are ordered pairs added component-wise). For example:
\[a^2 b^3 \cdot a^7 b = a^9 b^4\]

The existence of this canonical form means that every equivalence class contains a unique term of the form $a^mb^n$ for $m$, $n\in\NN$, so we will use these as the labels in our \index{Cayley graph}Cayley graph. Now, multiplication by $a$ increments the first exponent, and multiplication by $b$ increments the second. If we arrange the vertices on a grid, then we see that each vertex is joined with the vertex immediately below and to the right:

\begin{center}
\begin{tikzpicture}[x=1.8cm,y=1.8cm]
\node (A0B0) [root] at (0,4) {\strut $\varepsilon$};
\node (A1B0) [interior] at (1,4) {\strut $a$};
\node (A2B0) [interior] at (2,4) {\strut $a^2$};
\node (A3B0) [interior] at (3,4) {\strut $a^3$};
\node (A4B0) [] at (4,4) {\strut $\ldots$};

\node (A0B1) [interior] at (0,3) {\strut $b$};
\node (A1B1) [interior] at (1,3) {\strut $ab$};
\node (A2B1) [interior] at (2,3) {\strut $a^2b$};
\node (A3B1) [interior] at (3,3) {\strut $a^3b$};
\node (A4B1) [] at (4,3) {\strut $\ldots$};

\node (A0B2) [interior] at (0,2) {\strut $b^2$};
\node (A1B2) [interior] at (1,2) {\strut $ab^2$};
\node (A2B2) [interior] at (2,2) {\strut $a^2b^2$};
\node (A3B2) [interior] at (3,2) {\strut $a^3b^2$};
\node (A4B2) [] at (4,2) {\strut $\ldots$};

\node (A0B3) [interior] at (0,1) {\strut $b^3$};
\node (A1B3) [interior] at (1,1) {\strut $ab^3$};
\node (A2B3) [interior] at (2,1) {\strut $a^2b^3$};
\node (A3B3) [interior] at (3,1) {\strut $a^3b^3$};
\node (A4B3) [] at (4,1) {\strut $\ldots$};

\node (A0B4) [] at (0,0) {\strut $\ldots$};
\node (A1B4) [] at (1,0) {\strut $\ldots$};
\node (A2B4) [] at (2,0) {\strut $\ldots$};
\node (A3B4) [] at (3,0) {\strut $\ldots$};

\foreach \a in {0,1,2,3} {
\pgfmathtruncatemacro\aa{\a+1}
\foreach \b in {0,1,2,3} {
\pgfmathtruncatemacro\bb{\b+1}
\path [arrow] (A\a B\b) edge [above] node {\scriptsize{$a$}} (A\aa B\b);
\path [arrow] (A\a B\b) edge [left] node {\scriptsize{$b$}} (A\a B\bb);
}
}
\end{tikzpicture}
\end{center}

This Cayley graph (as well as the one for $\mathbb{Z}_4$) has something we didn't observe in free monoids: multiple \index{path!Cayley graph}paths with the same source and destination. For example, because $ababab\sim aaabbb$, we can start from~$\varepsilon$ and then follow the edges by reading either term from left to right. In either case, we end up at $a^3b^3$. The equivalence class of $a^3b^3$ describes the ``walking routes'' that take us to the bottom right corner of the grid.
\end{example}

\newcommand{\Bicyclic}{\Pres{a, b}{ab\sim\varepsilon}}

\begin{example}\label{bicyclic}
For the next example, we also start with two generators and a single rewrite rule, but the resulting monoid looks rather different.

If we're given a string of ``\texttt{(}'' and ``\texttt{)}'' and we wish to determine if the parentheses are balanced, we can repeatedly delete occurrences of the substring ``\texttt{()}'' until none remain. If we end up with the empty string, then we know that every opening parenthesis was matched by a closing parenthesis in the original string. Now, write $a$ instead of ``\texttt{(}'' and $b$ instead of ``\texttt{)}''. This is the \IndexDefinition{bicyclic monoid}\emph{bicyclic monoid}, where balanced strings of parentheses are precisely those terms equivalent to~$\varepsilon$:
\[\Bicyclic\]

Consider the term $baaba$, which contains $ab$ as a subterm. Our rule allows us to replace this subterm with~$\varepsilon$, thus $baaba\sim baa$. We can then insert $ab$ in a different position, so for example $baa\sim babaa$. Equivalence is \index{transitive relation}transitive, so we also have $baaba\sim babaa$. Note that this monoid is not commutative, for instance $ab\not\sim ba$. In this monoid, every equivalence class has a unique term of minimum \index{term length}length, the one without any $ab$'s. This term has the form $b^ma^n$ for $m$, $n\in\NN$, so we will label our vertices with those. As in the previous example, we arrange the vertices of the \index{Cayley graph}Cayley graph on a grid:

\begin{center}
\begin{tikzpicture}[x=1.8cm,y=1.8cm]
\node (B0A0) [root] at (0,4) {\strut $\varepsilon$};
\node (B0A1) [interior] at (1,4) {\strut $a$};
\node (B0A2) [interior] at (2,4) {\strut $a^2$};
\node (B0A3) [interior] at (3,4) {\strut $a^3$};
\node (B0A4) [] at (4,4) {\strut $\ldots$};

\node (B1A0) [interior] at (0,3) {\strut $b$};
\node (B1A1) [interior] at (1,3) {\strut $ba$};
\node (B1A2) [interior] at (2,3) {\strut $ba^2$};
\node (B1A3) [interior] at (3,3) {\strut $ba^3$};
\node (B1A4) [] at (4,3) {\strut $\ldots$};

\node (B2A0) [interior] at (0,2) {\strut $b^2$};
\node (B2A1) [interior] at (1,2) {\strut $b^2a$};
\node (B2A2) [interior] at (2,2) {\strut $b^2a^2$};
\node (B2A3) [interior] at (3,2) {\strut $b^2a^3$};
\node (B2A4) [] at (4,2) {\strut $\ldots$};

\node (B3A0) [interior] at (0,1) {\strut $b^3$};
\node (B3A1) [interior] at (1,1) {\strut $b^3a$};
\node (B3A2) [interior] at (2,1) {\strut $b^3a^2$};
\node (B3A3) [interior] at (3,1) {\strut $b^3a^3$};
\node (B3A4) [] at (4,1) {\strut $\ldots$};

\node (B4A0) [] at (0,0) {\strut $\ldots$};

\foreach \a in {0,1,2,3} {
\pgfmathtruncatemacro\aa{\a+1}
\foreach \b in {0,1,2,3} {
\path [arrow, bend left] (B\b A\a) edge [above] node {\scriptsize{$a$}} (B\b A\aa);
\path [arrow, bend left] (B\b A\aa) edge [below] node {\scriptsize{$b$}} (B\b A\a);
}
}

\foreach \b in {0,1,2,3} {
\pgfmathtruncatemacro\bb{\b+1}
\path [arrow] (B\b A0) edge [left] node {\scriptsize{$b$}} (B\bb A0);
}

\end{tikzpicture}
\end{center}

To understand the edge relation better, observe that if we multiply any term by $a$ on the right, we can then multiply by $b$ to ``cancel out'' the $a$ and end up back where we started. On the other hand, if there are no more $a$'s to cancel on the right, adding a new~$b$ takes us into a new ``state'' from which we cannot return. There is a unique shortest path from $\varepsilon$ to every other vertex; it is the one that does not contain $ab$.
\end{example}

\begin{example}
Not every ``elementary'' monoid is finitely presented, one example being the \index{natural numbers!multiplication}natural numbers under \emph{multiplication}. This is a consequence of these four properties of the natural numbers:
\begin{enumerate}
\item Every non-zero \index{prime number}natural number has a unique representation as a product of prime numbers (\emph{The Fundamental Theorem of Arithmetic}).
\item There are infinitely many \index{prime number}prime numbers (\emph{Euclid's Theorem}).
\item Multiplication of natural numbers is \index{commutative operation}commutative.
\item Multiplication by zero always yields zero.
\end{enumerate}
These give us an \index{infinite monoid presentation}\emph{infinite} presentation of $(\NN,\cdot,1)$. This monoid is generated by zero and all the prime numbers: $\{0,\,2,\,3,\,5,\,7,\,11,\,\ldots\}$. Then, we add the rule $0\cdot 0\sim 0$ because zero times zero is zero, followed by two infinite sets of rewrite rules:
\begin{itemize}
\item For every pair of distinct prime numbers $p$ and $q$, we have $p\cdot q\sim q\cdot p$, so our binary operation is commutative.
\item For every prime number $p$, we have $0\cdot p\sim 0$ and $p\cdot 0\sim 0$, so that zero behaves like  the \index{zero element}zero element.
\end{itemize}
The unique prime factorization of a natural number is really a term written using these generators, for example $84=2^2\cdot 3\cdot 7$. This monoid is a far more intricate object than $(\NN,+,0)$, which we recall was just the free monoid with one generator.
\end{example}

\paragraph{Mathematical aside.}
A typical course of abstract algebra often starts with the study of \emph{groups}, which are monoids where each element $x$ has an inverse element $x^{-1}$, such that $x\cdot x^{-1}=x^{-1}\cdot x=\varepsilon$. Groups have a lot more structure than monoids, all because of this one additional axiom. The natural numbers under addition are not a group, but if we instead consider the set of all integers $\mathbb{Z}$, then $\mathbb{Z}$ is a group under addition, because each $n\in\mathbb{Z}$ has an inverse element: $n+(-n)=(-n)+n=0$. Addition is commutative, and a group with a \index{commutative operation}commutative binary operation is also called an \index{Abelian group}\emph{Abelian group}.

\index{group}
\index{ring}
\index{field}
\index{integers}
\index{rational numbers}
\index{complex numbers}
\index{quaternions}
\index{matrices}
\index{polynomials}
The next level of structure is the \emph{ring}. A ring is a set $R$ with \emph{two} binary operations, denoted $+$ and $\cdot$, and two special elements, $0$~and~$1$. The ring axioms state that $(R,+,0)$ is an Abelian group, $(R,\cdot,1)$ is a monoid, and $+$ and $\cdot$ are related via the \index{distributive law}\emph{distributive law}, so $a(b+c)=ab+ac$ and $(a+b)c=ac+bc$. With the usual addition and multiplication operations, $\mathbb{Z}$ becomes a ring. If every non-zero element has a multiplicative inverse, we have a \emph{division ring}. If we also say that multiplication is commutative, we have a \emph{field}. In a field, the non-zero elements define an Abelian group under multiplication. The set of rational numbers $\mathbb{Q}$ is a field.

This tower of abstractions also describes the complex numbers, quaternions, matrices, polynomials, modular arithmetic, and other ``number-like'' objects which commonly arise in various applications of mathematics.

\section{Equivalence of Terms}\label{rewrite graph}

The \index{Cayley graph}Cayley graph of a monoid presentation helped us understand the relationships between equivalence classes. To better see the structure within a single equivalence class, we follow \cite{SQUIER1994271} and introduce another directed graph, called the \emph{rewrite graph} of a monoid presentation $\AR$. We begin by describing the edges of our graph.

\begin{definition}\label{rewrite step def} A monoid presentation $\AR$ generates a set of \emph{rewrite steps}:
\begin{enumerate}
\item A \IndexDefinition{rewrite step}rewrite step is an \index{ordered tuple}ordered tuple of terms, written \index{$\Rightarrow$}\index{$\Rightarrow$!z@\igobble|seealso{rewrite path, rewrite step}}$x(u\Rightarrow v)y$, where $x$, $y\in A^*$, and one of $(u,v)$ or $(v,u)\in R$.
\item If $(u,v)\in R$, we say that $x(u\Rightarrow v)y$ is a \IndexDefinition{positive rewrite step}\emph{positive} rewrite step. If $(v,u)\in R$, then $x(u\Rightarrow v)y$ is a \IndexDefinition{negative rewrite step}\emph{negative} rewrite step.
\item The terms $x$ and $y$ are called the \IndexDefinition{left whisker}left and \IndexDefinition{right whisker}right \emph{whiskers}, respectively. If $x$ or $y$ is identically~$\varepsilon$, we omit them in the notation.
\item For rewrite steps to serve as edges in our graph, we define the \emph{source} and \emph{destination} of a rewrite step $s:= x(u\Rightarrow v)y$ as the following terms of~$A^*$:
\begin{gather*}
\Src(s) := xuy\\
\Dst(s) := xvy
\end{gather*}
\end{enumerate}
\end{definition}

The rewrite step $x(u\Rightarrow v)y$ represents the transformation of $xuy$ into $xvy$ via the replacement of $u$ with $v$, while leaving the whiskers $x$ and $y$ unchanged. (We can assume without loss of generality that $u\neq v$, ruling out redundant steps where $\Src(s)=\Dst(s)$.) We can draw a picture:
\begin{ceqn}
\[
\begin{array}{|c|c|c|}
\hline
\multicolumn{3}{|c|}{xuy}\\
\hline
\hline
&u&\\
x&\Downarrow&y\\
&v&\\
\hline
\hline
\multicolumn{3}{|c|}{xvy}\\
\hline
\end{array}
\]
\end{ceqn}

\begin{definition}
The \IndexDefinition{rewrite graph}\emph{rewrite graph} of a \index{monoid presentation!rewrite graph}monoid presentation $\AR$ has the terms of~$A^*$ as \index{vertex!rewrite graph}vertices, and rewrite steps as \index{edge!rewrite graph}edges. Except in the trivial case where $A=\varnothing$, this graph is \index{infinite graph!rewrite graph}infinite, but we will only look at small \index{subgraph!rewrite graph}subgraphs at any one time. For example, every rewrite step determines a subgraph with two vertices and one edge:
\begin{center}
\begin{tikzcd}[column sep=huge]
xuy\arrow[r, Rightarrow, "x(u\Rightarrow v)y"] &xvy
\end{tikzcd}
\end{center}
\end{definition}
From the rewrite graph, we will then define $x\sim_R y$ to mean that this graph has a \index{path!rewrite graph}\emph{path} with source~$x$ and destination~$y$. The following definition of a rewrite path is identical to the general case of a path in a \index{directed graph!rewrite graph}directed graph from \SecRef{type parameter graph}.

\begin{definition}
A \IndexDefinition{rewrite path}\emph{rewrite path} consists of an initial term $t\in A^*$ together with a sequence of zero or more rewrite steps $(s_1,\ldots,s_n)$ which satisfy the following conditions:
\begin{enumerate}
\item If the rewrite path contains at least one rewrite step, the source of the first rewrite step must equal the initial term: $\Src(s_1)=t$.
\item If the rewrite path contains at least two steps, the source of each subsequent step must equal the destination of the preceding step: $\Src(s_{i+1})=\Dst(s_i)$ for $0<i\leq n$.
\end{enumerate}
Like a rewrite step, each rewrite path also has a source and destination. We define the source and destination of a rewrite path $p$ as follows, where $n$ is the \IndexDefinition{rewrite path length}length of~$p$:
\begin{gather*}
\Src(p):=t\\
\Dst(p):=\begin{cases}
\Dst(s_n)&\text{if $n>0$}\\
t&\text{if $n=0$}
\end{cases}
\end{gather*}
To each term $t\in A^*$ we can associate a rewrite path with an empty sequence of rewrite steps, called the \index{$1_t$}\index{$1_t$!z@\igobble|seealso{empty rewrite path}}\IndexDefinition{empty rewrite path}\emph{empty rewrite path} and denoted $1_t$. We have $\Src(1_t)=\Dst(1_t)=t$. (This is why a rewrite path must store its initial term, to specify the source and destination in the empty case.) A non-empty rewrite path $p$ can be written as a composition of steps: $p=s_1\circ\cdots\circ s_n$. We can also visualize a rewrite path as a path in the rewrite graph:

\begin{center}
\begin{tikzcd}[column sep=large]
  \Src(p)\arrow[r, "s_1", Rightarrow]&\cdots\arrow[r, Rightarrow, "\cdots"]&\cdots\arrow[r, "s_n", Rightarrow]&\Dst(p)
\end{tikzcd}
\end{center}
\end{definition}

\newcommand{\BRule}{(ab\Rightarrow\varepsilon)}
\newcommand{\BRuleInv}{(\varepsilon\Rightarrow ab)}

\begin{example}\label{bicyclic 2}
Recall the bicyclic monoid $\Bicyclic$ from \ExRef{bicyclic}. We saw that $baaba\sim baa$ because we obtain $baa$ by deleting the subterm $ab$ from $baaba$. We can encode this with a rewrite step $ba\BRule a$:
\begin{center}
\begin{tikzcd}[column sep=huge]
baaba\arrow[r, Rightarrow, "ba\BRule a"] & baa
\end{tikzcd}
\end{center}
We also saw that $baa\sim babaa$. We can encode the transformation of $baa$ into $babaa$ with a rewrite step $b\BRuleInv aa$:
\begin{center}
\begin{tikzcd}[column sep=huge]
baa\arrow[r, Rightarrow, "b\BRuleInv aa"] & babaa
\end{tikzcd}
\end{center}
The destination of the first rewrite step is also the source of the second rewrite step. We can compose them to form the rewrite path $ba\BRule a\circ b\BRuleInv aa$. This rewrite path encodes the fact that $baaba\sim babaa$:
\begin{center}
\begin{tikzcd}[column sep=huge]
baaba\arrow[r, "ba\BRule a", Rightarrow]&baa\arrow[r, "b\BRuleInv aa", Rightarrow]&babaa
\end{tikzcd}
\end{center}
\end{example}

\paragraph{The algebra of rewriting.} To ensure that the term equivalence relation ultimately satisfies the necessary axioms, we introduce some algebraic operations on rewrite steps and paths, called inversion, composition, and whiskering.

\begin{definition}
The \IndexDefinition{inverse rewrite step}\emph{inverse} of a rewrite step $s:=x(u\Rightarrow v)y$, denoted $s^{-1}$, is defined as $x(v\Rightarrow u)y$, so we swap the $u$ and the $v$. The inverse of a positive rewrite step is negative, and vice versa, which means that the edges in the rewrite graph come in complementary pairs:
\begin{center}
\begin{tikzcd}[column sep=huge]
xuy\arrow[r, Rightarrow, "x(u\Rightarrow v)y", bend left] \arrow[r, Leftarrow, "x(v\Rightarrow u)y"', bend right] & xvy
\end{tikzcd}
\end{center}

We can also \IndexDefinition{inverse rewrite path}invert a rewrite \emph{path}, to get a new path which undoes the transformation described by the original path. The \index{empty rewrite path}empty rewrite path is its own inverse. For a non-empty rewrite path, we invert each rewrite step and perform them in reverse order:
\[p^{-1}:=\begin{cases}
1_t&\text{if $p=1_t$}\\
s_n^{-1}\circ\cdots\circ s_1^{-1}&\text{if $p=s_1\circ\cdots\circ s_n$}
\end{cases}
\]
When shown in the rewrite graph, the inverse path $p^{-1}$ is $p$ but backwards:
\begin{center}
\begin{tikzcd}[column sep=large]
  \Src(p)\arrow[r, "s_1", Leftarrow]&\cdots\arrow[r, Leftarrow, "\cdots"]&\cdots\arrow[r, "s_n", Leftarrow]&\Dst(p)
\end{tikzcd}
\end{center}
\end{definition}

We can think of a rewrite step as a rewrite path with length~1. Then, if $p$ is \emph{either} a rewrite step or a rewrite path, we see that taking the inverse of $p$ interchanges its source and destination:
\begin{gather*}
\Src(p^{-1})=\Dst(p)\\
\Dst(p^{-1})=\Src(p)
\end{gather*}

\begin{example}\label{bicyclic 3}
We exhibited this rewrite path in \ExRef{bicyclic 2}, with source $baaba$ and destination $babaa$:
\begin{center}
\begin{tikzcd}[column sep=huge]
baaba\arrow[r, "ba\BRule a", Rightarrow]&baa\arrow[r, "b\BRuleInv aa", Rightarrow]&babaa
\end{tikzcd}
\end{center}
We can construct the inverse rewrite path from $babaa$ to $baaba$ by inverting both steps and swapping the order in which we apply them:
\begin{center}
\begin{tikzcd}[column sep=huge,row sep=large]
babaa\arrow[r, "b\BRule aa", Rightarrow]&baa\arrow[r, Rightarrow, "ba\BRuleInv a"]&baaba
\end{tikzcd}
\end{center}
\end{example}

\begin{definition}
\IndexDefinition{rewrite path composition}%
If $p_1$ and $p_2$ are two rewrite paths with $\Dst(p_1)=\Src(p_2)$, we define their \emph{composition} \index{$\circ$}\index{$\circ$!z@\igobble|seealso{rewrite path}}$p_1 \circ p_2$ as the rewrite path consisting of the initial term of~$p_1$, followed by the rewrite steps of~$p_1$, and finally the steps of~$p_2$. This gives us a new rewrite path with the same source as~$p_1$ and the same destination as~$p_2$:
\begin{gather*}
\Src(p_1 \circ p_2) = \Src(p_1)\\
\Dst(p_1 \circ p_2) = \Dst(p_2)
\end{gather*}
Shown as a diagram, composition joins two paths (the ``$=$'' in the middle is not an edge in the graph, but denotes we have two identical terms):
\begin{center}
\begin{tikzcd}
\Src(p_1)\arrow[r, Rightarrow]&\cdots\arrow[r, Rightarrow]&(\Dst(p_1)=\Src(p_2))\arrow[r, Rightarrow]&\cdots\arrow[r, Rightarrow]&\Dst(p_2)
\end{tikzcd}
\end{center}
Empty rewrite paths \index{empty rewrite path}act as the identity with respect to composition:
\begin{gather*}
1_{\Src(p)}\circ p = p\\
p\circ 1_{\Dst(p)} = p
\end{gather*}
The inverse of the composition is the composition of the inverses, but flipped:
\[(p_1 \circ p_2)^{-1} = p_2^{-1} \circ p_1^{-1}\]
Finally, rewrite path composition is associative. If $p_3$ is another rewrite path such that $\Dst(p_2)=\Src(p_3)$:
\[p_1 \circ (p_2 \circ p_3) = (p_1 \circ p_2) \circ p_3\]
\end{definition}

\begin{example}\label{bicyclic 4}
We continue with \ExRef{bicyclic 3}. Consider these two rewrite paths in the bicyclic monoid, which we will call $p_1$ and $p_2$:
\begin{center}
\begin{tikzcd}[column sep=huge,row sep=large]
baabb\arrow[r, "ba\BRule b", Rightarrow]&bab\arrow[r, Rightarrow, "b\BRule"]&b\\
aabbb\arrow[r, "a\BRule bb", Rightarrow]&abb\arrow[r, Rightarrow, "\BRule b"]&b
\end{tikzcd}
\end{center}
The two paths have the same destination term~$b$, but different source terms. If we invert the second path and compose it with the first, we get the rewrite path $p_1 \circ p_2^{-1}$, which transforms $baabb$ into $aabbb$:
\begin{center}
\begin{tikzcd}[column sep=huge,row sep=large]
baabb\arrow[r, "ba\BRule b", Rightarrow]&bab\arrow[r, Rightarrow, "b\BRule"]&b\arrow[r, Rightarrow, "\BRuleInv b"]&abb\arrow[r, "a\BRuleInv bb", Rightarrow]&aabbb
\end{tikzcd}
\end{center}
\end{example}

\begin{definition}
If $s:=x(u\Rightarrow v)y$ and $z\in A^*$, we define the \index{$\WL$}\index{$\WL$!z@\igobble|seealso{rewrite path whiskering}}left and \index{$\WR$}\index{$\WR$!z@\igobble|seealso{rewrite path whiskering}}right \IndexDefinition{rewrite step whiskering}\emph{whiskering} action of $z$ on $s$. Whiskering extends a rewrite step by lengthening its whiskers:
\begin{gather*}
z\WL s := zx(u\Rightarrow v)y\\
s\WR z := x(u\Rightarrow v)yz
\end{gather*}
The \IndexDefinition{rewrite path whiskering}whiskering operation generalizes to rewrite paths as follows. First, we apply the monoid operation to the initial term, and then we whisker each rewrite step. If the path is empty, whiskering produces another empty rewrite path on a new term:
\begin{gather*}
z\WL 1_t = 1_{z\cdot t}\\
1_t\WR z = 1_{t\cdot z}
\end{gather*}
If the path is non-empty, the \index{distributive law}distributive law relates the whiskering of paths and steps:
\begin{gather*}
z\WL (s_1\circ\cdots\circ s_n) = (z\WL s_1)\circ\cdots\circ (z\WL s_n)\\
(s_1\circ\cdots\circ s_n)\WR z = (s_1\WR z)\circ\cdots\circ (s_n\WR z)
\end{gather*}
Here is a diagram for $z\WL p$ and $p\WR z$:
\begin{center}
\begin{tikzcd}[column sep=large]
z\cdot\Src(p)\arrow[r, "z\WL s_1", Rightarrow]&z\cdots\arrow[r, Rightarrow]&z\cdots\arrow[r, "z\WL s_n", Rightarrow]&z\cdot \Dst(p)\\
\Src(p)\cdot z\arrow[r, "s_1\WR z", Rightarrow]&\cdots z\arrow[r, Rightarrow]&\cdots z\arrow[r, "s_n\WR z", Rightarrow]&\Dst(p)\cdot z
\end{tikzcd}
\end{center}

Once again, if we let $p$ denote either a rewrite path or a single rewrite step, we see that whiskering is related to the monoid operation of $A^*$ in two ways:
\begin{gather*}
\Src(z\WL p)=z\cdot \Src(p)\\
\Dst(z\WL p)=z\cdot \Dst(p)\\[\medskipamount]
\Src(p\WR z)=\Src(p)\cdot z\\
\Dst(p\WR z)=\Dst(p)\cdot z
\end{gather*}
Also, if $z^\prime\in A^*$ is some other term:
\begin{gather*}
z\WL (z^\prime \WL p)=(z\cdot z^\prime) \WL p\\
(p\WR z)\WR z^\prime=p\WR (z\cdot z^\prime)
\end{gather*}
The left and right whiskering actions are compatible with each other:
\[(z\WL p)\WR z^\prime = z\WL(p\WR z^\prime)\]
Finally, the distributive law generalizes from rewrite steps to rewrite paths:
\begin{gather*}
z\WL (p_1 \circ p_2) = (z\WL p_1) \circ (z\WL p_2)\\
(p_1 \circ p_2)\WR z = (p_1\WR z) \circ (p_2\WR z)
\end{gather*}
\end{definition}

\begin{proposition}
Let $\AR$ be a monoid presentation. Let $P$ be the set of rewrite paths of $\AR$. The following axioms characterize~$P$:
\begin{enumerate}
\item (Base case) For each $(u,v)\in R$, there is a rewrite path $(u\Rightarrow v)\in P$.
\item (Closure under inversion) If $p\in P$, then $p^{-1}\in P$.
\item (Closure under composition) If $p_1$, $p_2\in P$ with $\Dst(p_1)=\Src(p_2)$, then $p_1\circ p_2\in P$.
\item (Closure under whiskering) If $p\in P$ and $z\in A^*$, then $z\WL p$ and $p\WR z\in P$.
\end{enumerate}
\end{proposition}
\begin{proof}
By \DefRef{rewrite step def} there is a positive rewrite step $(u\Rightarrow v)$ for each $(u,v)\in R$. Since every rewrite step is also a rewrite path of length~1, this establishes (1). Also, rewrite steps are closed under inverses and whiskering, which establishes (2) and (4) for rewrite paths of length~1. The general case of (2), (3) and (4) can be shown by induction on the \index{rewrite path length}length of the rewrite path.
\end{proof}

Recall that $x\sim_R y$ means our rewrite graph has a path from $x$~to~$y$.

\begin{proposition}
Let $\AR$ be a monoid presentation. Then $\sim_R$ is an equivalence relation on $A^*$.
\end{proposition}
\begin{proof}
Let $P$ be the set of rewrite paths of $\AR$. We check each axiom in turn.
\begin{enumerate}
\item \index{reflexive relation}(Reflexivity) For each $x\in A^*$, there is an \index{empty rewrite path}empty rewrite path $1_x\in P$. Since $\Src(1_x)=\Dst(1_x)=x$, it follows that $x\sim x$.
\item \index{symmetric relation}(Symmetry) Assume $x\sim y$. Then there is a $p\in P$ such that $x=\Src(p)$ and $y=\Dst(p)$. Since $P$ is closed under taking inverses, $p^{-1}\in P$. But $y=\Src(p^{-1})$ and $x=\Dst(p^{-1})$, so we have $y\sim x$.
\item \index{transitive relation}(Transitivity) Assume $x\sim y$ and $y\sim z$. Then there is a $p_1\in P$ with $x=\Src(p_1)$ and $y=\Dst(p_1)$, and a $p_2\in P$ with $y=\Src(p_2)$ and $z=\Dst(p_2)$. We have $\Dst(p_1)=\Src(p_2)$, and $P$ is closed under composition so $p_1\circ p_2 \in P$. We also know that:
\begin{gather*}
\Src(p_1\circ p_2)=\Src(p_1)=x\\
\Dst(p_1\circ p_2)=\Dst(p_2)=z
\end{gather*}
Thus, $x\sim z$.
\end{enumerate}
Notice how we used inversion and composition above, but not whiskering.
\end{proof}

We will use whiskering to show that $\sim$ is \emph{compatible} with term concatenation:
\begin{definition}\label{translation invariant def}
Let $M$ be a monoid. A relation $R\subseteq M\times M$ is \IndexDefinition{translation-invariant relation}\emph{translation-invariant} if for all $(x,y)\in R$ and $z\in M$, we have both $(zx,zy)$ and $(xz,yz)\in R$.
\end{definition}
A translation-invariant equivalence relation is known as a \IndexDefinition{monoid congruence}\emph{monoid congruence}.
\begin{example}
Consider the monoid $(\NN,+,0)$. The standard \index{linear order}linear order $<$ on \index{natural numbers!linear order}$\NN$ is translation-invariant (but not an equivalence relation). For instance $5<7$ also implies that $5+2<7+2$. (Hence we're using ``translation'' in the geometric sense.)
\end{example}
\begin{theorem}
Let $\AR$ be a monoid presentation. Then the term equivalence relation~$\sim_R$ is a monoid congruence.
\end{theorem}
\begin{proof}
We've seen that $\sim$ is an equivalence relation, so it remains to establish translation invariance. Let $P$ be the set of rewrite paths of $\AR$. Suppose that $x$, $y$, $z\in A^*$, and also that $x\sim y$, so we have $p\in P$ such that $x=\Src(p)$ and $y=\Dst(p)$. We must show $zx\sim zy$ and $xz\sim yz$. For the first claim, we notice that $z\WL p\in P$, so $zx\sim zy$:
\begin{gather*}
\Src(z\WL p)=z\cdot \Src(p)=zx\\
\Dst(z\WL p)=z\cdot \Dst(p)=zy
\end{gather*}
To show that $xz \sim yz$, consider $p\WR z$ instead.
\end{proof}

Thus, \emph{the set of equivalence classes} of $\sim_R$ has the structure of a monoid:

\begin{theorem}
Let $\AR$ be a monoid presentation. Term concatenation is \index{well-defined}well-defined on the \index{equivalence class!monoid operation}equivalence classes of $\sim_R$; that is, if we pick two representative terms and concatenate them together, the equivalence class of the result does not depend on the choice of representatives.
\end{theorem}
\begin{proof}
Suppose we're given $x$, $x^\prime$, $y$, $y^\prime\in A^*$ such that $x\sim x^\prime$ and $y\sim y^\prime$. We must show that $xy\sim x^\prime y^\prime$. By assumption, there exist a pair of rewrite paths $p_x$, $p_y$ such that:
\begin{gather*}
\Src(p_x)=x\\
\Dst(p_x)=x^\prime\\[\medskipamount]
\Src(p_y)=y\\
\Dst(p_y)=y^\prime
\end{gather*}
Now, we form the rewrite path $p := (p_x\WR y) \circ (x^\prime\WL p_y)$ by whiskering and composing $p_x$~and~$p_y$. The composition is valid:
\begin{gather*}
\Dst(p_x\WR y)=\Dst(p_x)\cdot y=x^\prime\cdot y\\
\Src(x^\prime\WL p_y)=x^\prime\cdot\Src(p_y)=x^\prime\cdot y
\end{gather*}
We also have:
\begin{gather*}
\Src(p) = \Src(p_x\WR y) = \Src(p_x)\cdot y = x\cdot y\\
\Dst(p) = \Dst(x^\prime\WL p_y) = x^\prime \cdot \Dst(p_y) = x^\prime\cdot y^\prime
\end{gather*}
The existence of this rewrite path establishes that $xy\sim x^\prime y^\prime$. A pictorial proof is shown in \FigRef{monoid operation well defined}.
\end{proof}
\begin{figure}\captionabove{Term concatenation is well-defined on the equivalence classes of $\sim$}\label{monoid operation well defined}
\begin{ceqn}
\[
\begin{array}{c}
\begin{tikzcd}
x\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
x^\prime
\end{tikzcd}\\
\\
\begin{tikzcd}
y\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
y^\prime
\end{tikzcd}
\end{array} \qquad \xrightarrow{whiskering} \qquad
\begin{array}{c}
\begin{tikzcd}
x\cdot y\arrow[d, Rightarrow]\\
\cdots \arrow[d, Rightarrow]\\
x^\prime\cdot y
\end{tikzcd}\\
\\
\begin{tikzcd}
x^\prime\cdot y\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
x^\prime\cdot y^\prime
\end{tikzcd}
\end{array} \qquad \xrightarrow{composition} \qquad
\begin{tikzcd}
x\cdot y\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
x^\prime\cdot y\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
x^\prime\cdot y^\prime
\end{tikzcd}
\]
\end{ceqn}
\end{figure}


\begin{example}\label{bicyclic 5}
We can apply this theorem to the bicyclic monoid $\Bicyclic$ from \ExRef{bicyclic}. Consider the four terms $x := baba$, $x^\prime := ba$, $y := a$, $y^\prime := aba$. We have $x\sim x^\prime$ and $y\sim y^\prime$ via these two rewrite paths:
\begin{gather*}
p_x := b\BRule a\\
p_y := \BRuleInv a
\end{gather*}
Now, $(p_x\WR a)\circ (ba\WL p_y)$ is a rewrite path from $babaa$ to $baaba$, showing that $xy\sim x^\prime y^\prime$:
\[b\BRule aa\circ ba\BRuleInv a\]

Our original example showed that the bicyclic monoid can encode the problem of matching parentheses, by checking if a term is equivalent to~$\varepsilon$. This was somewhat trivial, because we only looked within a single equivalence class, without making use of the monoid operation. A more impressive party trick is to use the bicyclic monoid to model the \emph{stack effects} of programs in a \IndexDefinition{stack language}\emph{stack language}. This encoding relies on the monoid operation.

A stack language program is a sequence of tokens, where each token is either a \emph{literal} or a \emph{word}. To run the program, we evaluate each token sequentially from left to right:
\begin{enumerate}
\item A literal is immediately pushed on the \index{stack}stack.
\item A word, which is analogous to a function in other languages, pops some number of input values from the stack, processes those values, and then pushes zero or more results back on the stack.
\end{enumerate}

To each token, we assign a \emph{stack effect}, which is a pair of natural numbers $(m,n)$ that encodes the number of inputs~$m$ and the number of outputs~$n$ that result from evaluating this token on the stack. It turns out if we encode each stack effect as an element $b^m a^n$ in the bicyclic monoid, then the monoid operation gives us a composition operation on stack effects.

Consider a stack language whose literals are integers and whose words are as below.
\begin{center}
\begin{tabular}{lll}
\toprule
\textbf{Token}&\textbf{Description}&\textbf{Stack effect}\\
\midrule
A literal&Any integer value.&$a$\\
\texttt{neg}&Pop the top of the stack and push its negation.&$ba$\\
\texttt{+}&Pop two values from the stack and push their sum.&$b^2a$\\
\texttt{*}&Pop two values from the stack and push their product.&$b^2a$\\
\texttt{dup}&Duplicate the top of the stack.&$ba^2$\\
\texttt{.}&Pop the top of the stack and print it out.&$b$\\
\bottomrule
\end{tabular}
\end{center}

We can use this language to evaluate simple postfix arithmetic expressions:
\begin{quote}
\begin{verbatim}
5 dup * 3 neg + .
\end{verbatim}
\end{quote}
The above program has no overall effect on the stack; it will execute correctly when run on an empty stack, but if there were values on the stack, they remain unchanged. On the other hand, we can also have a program like ``\verb|dup * .|'' which expects at least one input value to be present, otherwise the evaluation of ``\texttt{dup}'' will attempt to pop the top of an empty stack. Finally, we can write a program like ``\verb|5 7 + 1|'' which expects no inputs, but leaves values behind.

We can determine the stack effect of each of these programs without evaluating them, as follows. We map each token to its stack effect using the above table, and then compose them together. The following closed form expression for the monoid operation is helpful here:
\[
b^i a^j \cdot b^k a^l = \begin{cases}
b^{i-j+k} a^l &\text{if $j<k$}\\
b^i a^{j-k+l} &\text{if $j\geq k$}
\end{cases}
\]

For example, we can calculate that the stack effect of ``\verb|5 dup * 3 neg + .|'' as follows, confirming that this program has no effect on the stack:
\[a\cdot ba^2 \cdot b^2a \cdot a \cdot ba \cdot b^2a \cdot b = \varepsilon\]

Because the monoid operation is associative, it follows that the stack effect of the \emph{concatenation} of two programs is equal to the composition of their stack effects. For this reason, stack-based languages are also known as \IndexDefinition{concatenative language}\emph{concatenative languages}. Examples of stack-based languages include \index{Forth}Forth~\cite{forth}, \index{Joy}Joy~\cite{joy}, and \index{Factor}Factor~\cite{factor}. The Factor compiler implements a static stack effect checker based on the idea we just described.
\end{example}

\section{A Swift Connection}\label{monoidsasprotocols}

In this section, we will show that finitely-presented monoids map to Swift protocols in a very natural way. We make use of the \index{derived requirement!encoding monoid}derived requirements formalism we defined in \SecRef{derived req}, and some further results from \SecRef{generic signature validity}. By now, it should be apparent that there are some similarities between the two theories, even if the concepts do not completely overlap:
\begin{center}
\begin{tabular}{ll}
\toprule
\textbf{Swift generics}&\textbf{F-P monoids}\\
\midrule
Type parameters&Terms\\
Explicit requirements&Rewrite rules\\
Derived requirements&Rewrite paths\\
Reduced type equality&Term equivalence\\
\index{type parameter graph}Type parameter graph&\index{Cayley graph}Cayley graph\\
\bottomrule
\end{tabular}
\end{center}

\newcommand{\GM}{G_\texttt{M}}

\paragraph{Free monoids.} The below protocol---or rather the \index{protocol generic signature!free monoid}protocol generic signature~$\GM$---models the \index{free monoid}free monoid over the two generators $\{a,b\}^*$:
\begin{Verbatim}
protocol M {
  associatedtype A: M
  associatedtype B: M
}
\end{Verbatim}
We define a function $\varphi$ that sends each element of $\{a,b\}^*$ to a type parameter of $\GM$ by recursively constructing a \index{dependent member type!free monoid}dependent member type for each ``$a$'' and ``$b$'', until we finally get to the empty term, which becomes the \Index{protocol Self type@protocol \tSelf\ type!free monoid}protocol \tSelf\ type:
\begin{gather*}
\varphi(\varepsilon):=\tSelf\\
\varphi(ua):=\texttt{$\varphi(u)$.A}\\
\varphi(ub):=\texttt{$\varphi(u)$.B}
\end{gather*}
For example, the four distinct terms of length 2 map to the following type parameters:
\begin{gather*}
\varphi(aa)=\texttt{Self.A.A}\\
\varphi(ab)=\texttt{Self.A.B}\\
\varphi(ba)=\texttt{Self.B.A}\\
\varphi(bb)=\texttt{Self.B.B}
\end{gather*}
The requirement signature of \texttt{M} also declares two \index{associated conformance requirement!free monoid}associated conformance requirements:
\begin{gather*}
\AssocConfReq{Self.A}{M}{M}\\
\AssocConfReq{Self.B}{M}{M}
\end{gather*}
As a consequence of these requirements, every type parameter output by $\varphi$ is a valid type parameter of $\GM$, and every such type parameter also conforms to~\texttt{M}:
\begin{proposition}\label{monoid type lemma} If $t\in \{a,b\}^*$ and $\tT:=\varphi(t)$, then $\GM\vdash\tT$ and $\GM\vdash\ConfReq{T}{M}$.
\end{proposition}
\begin{proof}
We proceed by \index{induction}induction on the \index{term length}length of $t$.

\BaseCase If $|t|=0$, then $t=\varepsilon$, and $\varphi(\varepsilon)=\tSelf$. We can derive \tSelf\ via a \IndexStep{Generic}\textsc{Generic} elementary statement:
\begin{gather*}
\GenericStep{Self}{1}
\end{gather*}
Likewise we get the conformance via a \IndexStep{Conf}\textsc{Conf} elementary statement:
\begin{gather*}
\ConfStep{Self}{M}{2}
\end{gather*}

\InductiveStep We know that $t=ua$ or $t=ub$ for some $u\in A^*$, so let $\texttt{U}:=\varphi(u)$. By induction, $\GM\vdash\texttt{U}$ and $\GM\vdash\ConfReq{U}{M}$. (We actually only need the latter.) First, we derive \texttt{U.A} or \texttt{U.B} by applying the \IndexStep{AssocName}\textsc{AssocName} inference rule to $\ConfReq{U}{M}$:
\begin{gather*}
\AnyStep{\ConfReq{U}{M}}{1}\\
\AssocNameStep{1}{U.A}{2}\\
\AssocNameStep{1}{U.B}{3}
\end{gather*}
To derive $\ConfReq{U.A}{M}$ or $\ConfReq{U.B}{M}$ from $\ConfReq{U}{M}$, we apply the \IndexStep{AssocConf}\textsc{AssocConf} inference rule for the appropriate associated requirement:
\begin{gather*}
\AnyStep{\ConfReq{U}{M}}{1}\\
\AssocConfStep{1}{U.A}{M}{2}\\
\AssocConfStep{1}{U.B}{M}{3}
\end{gather*}
This completes the induction.
\end{proof}

We wish to equip the type parameters of $\GM$ with the structure of a monoid. The identity element we take to be \tSelf, since this is $\varphi(\varepsilon)$.

The \index{binary operation}binary operation ``\;$\cdot$\;'' is defined by \index{formal substitution}\emph{formal substitution}, replacing \tSelf\ in the type parameter on the right-hand side with the entire type parameter on the left. Consider the two type parameters \texttt{Self.A.A} and \texttt{Self.B.B}:
\[(\texttt{Self.A.A})\cdot(\texttt{Self.B.B})=\texttt{Self.A.A.B.B}\]
To ensure this gives us a valid type parameter, we turn to \LemmaRef{subst lemma}. Suppose we are given two valid type parameters $\GM\vdash\texttt{U}$ and $\GM\vdash\texttt{V}$. By \PropRef{monoid type lemma}, we know that $\GM\vdash\ConfReq{U}{M}$. The conditions of \LemmaRef{subst lemma} are satisfied, thus $\GM\vdash\texttt{U}\cdot\texttt{V}$.

We now have a set of elements, a binary operation, and an identity element. The final question is, in what sense is this monoid ``equivalent'' to the free monoid $\{a,b\}^*$? The answer lies in the fact that $\varphi$ is an \emph{isomorphism} of monoids.

\begin{definition}
Let $M$ and $N$ be monoids.
\begin{enumerate}
\item A \index{homomorphism}\IndexDefinition{monoid homomorphism}\emph{monoid homomorphism} is a function $f\colon M\rightarrow N$ that sends the identity to the identity, and is compatible with the binary operation. That is, $f(\varepsilon_M)=\varepsilon_N$, and for all $x$, $y\in M$, $f(x\cdot_M y)=f(x)\cdot_N f(y)$.
\item A \IndexDefinition{monoid isomorphism}\emph{monoid isomorphism} is a monoid homomorphism $f$ which has an inverse $f^{-1}$, so $f^{-1}(f(x))=x$ for all $x\in M$ and $f(f^{-1}(f(x))=x$ for all $x\in N$. In this case we also say that $M$ and $N$ are \emph{isomorphic}.
\end{enumerate}
\end{definition}

We already saw that $\varphi$ maps the identity element $\varepsilon$ of $\{a,b\}^*$ to the identity element \tSelf\ of $\GM$. It also respects the monoid operation. For example,
\begin{gather*}
\varphi(aa)\cdot\varphi(bb)\\
\qquad {}=(\texttt{Self.A.A})\cdot(\texttt{Self.B.B})\\
\qquad {}=\texttt{Self.A.A.B.B}\\
\qquad {}=\varphi(aabb)
\end{gather*}
Finally, there is an inverse $\PhiInv$ mapping the type parameters of $\GM$ to elements of $\{a,b\}^*$:
\begin{gather*}
\PhiInv(\tSelf):=\varepsilon\\
\PhiInv(\texttt{U.A}):=\PhiInv(\texttt{U})\cdot a\\
\PhiInv(\texttt{U.B}):=\PhiInv(\texttt{U})\cdot b
\end{gather*}

We've shown that the type parameters of $\GM$ have a monoid structure isomorphic to~$\{a,b\}^*$. Furthermore, nothing in our construction relied on there being exactly two generators; we can add or remove associated type declarations in \texttt{M} as needed. For example, if we apply this construction to the free monoid with one generator, we get protocol \tN\ from \ExRef{protocol n graph}. This monoid is isomorphic to $(\NN,+,0)$, which justifies the name \tN.

\paragraph{Finitely-presented monoids.}
We extend our encoding from a free monoid~$A^*$ to a \index{finitely-presented monoid}finitely-presented monoid~$\AR$. We start with a protocol~\texttt{M} that encodes $A^*$, then for each $(u,v)\in R$, we apply $\varphi$ to $u$ and $v$ to obtain an \index{associated same-type requirement!finitely-presented monoid}associated same-type requirement $\AssocSameReq{$\varphi(u)$}{$\varphi(v)$}{M}$. Finally, we attach a \Index{where clause@\texttt{where} clause!protocol declaration}trailing \texttt{where} clause to \texttt{M} where we state these requirements. Let's try the \index{bicyclic monoid}bicyclic monoid $\Bicyclic$ from \ExRef{bicyclic}:
\begin{Verbatim}
protocol M {
  associatedtype A: M
  associatedtype B: M where Self.A.B == Self
}
\end{Verbatim}

\newcommand{\SameReqPhi}[2]{\SameReq{$\varphi(#1)$}{$\varphi(#2)$}}
\newcommand{\ConfReqPhi}[1]{\ConfReq{$\varphi(#1)$}{M}}

The same-type requirements of \texttt{M} give the protocol generic signature $\GM$ a non-trivial \index{equivalence class!type parameters}equivalence class structure under the \index{reduced type equality!finitely-presented monoid}reduced type equality relation.

\begin{example}
Recalling \ExRef{bicyclic 2}, we saw that $baaba\sim babaa$ in the bicyclic monoid, with $ba\BRule a \circ b\BRuleInv aa$ being one possible rewrite path from $baaba$ to $babaa$. Let \texttt{M} be the protocol that encodes this monoid. We can construct a derivation of $\GM\vdash\SameReq{Self.B.A.A.B.A}{Self.B.A.B.A.A}$ by translating each rewrite step:
\begin{align*}
&ba\BRule a&&\GM\vdash\SameReq{Self.B.A.A.B.A}{Self.B.A.A}\\
&b\BRuleInv aa&&\GM\vdash\SameReq{Self.B.A.A}{Self.B.A.B.A.A}
\end{align*}

The first rewrite step, $ba\BRule a$, proves an equivalence between $baaba$ and $baa$. We will build a derivation piecemeal. We start with the left whisker $ba$:
\begin{gather*}
\ConfStep{Self}{M}{1}\\
\AssocConfStep{1}{Self.B}{M}{2}\\
\AssocConfStep{2}{Self.B.A}{M}{3}
\end{gather*}
We then apply the \textsc{AssocSame} inference rule to get $ba\BRule$:
\begin{gather*}
\AssocSameStep{3}{Self.B.A.A.B}{Self.B.A}{4}
\end{gather*}
Finally, we apply the \IndexStep{SameName}\textsc{SameName} inference rule to get $ba\BRule a$:
\begin{gather*}
\SameNameStep{3}{4}{Self.B.A.A.B.A}{Self.B.A.A}{5}
\end{gather*}

The second rewrite step, $b\BRuleInv aa$, proves an equivalence between $baa$ and $babaa$. We can reuse some of the above derivation steps to get $b\BRuleInv$:
\begin{gather*}
\AssocSameStep{2}{Self.B.A.B}{Self.B}{6}\\
\SymStep{6}{Self.B}{Self.B.A.B}{7}
\end{gather*}

We then build $b\BRuleInv a$:
\begin{gather*}
\AssocConfStep{3}{Self.B.A.B}{M}{8}\\
\SameNameStep{8}{7}{Self.B.A}{Self.B.A.B.A}{9}
\end{gather*}

One more $a$, and we get $b\BRuleInv aa$:
\begin{gather*}
\AssocConfStep{8}{Self.B.A.B.A}{M}{10}\\
\SameNameStep{10}{9}{Self.B.A.A}{Self.B.A.B.A.A}{11}
\end{gather*}

Finally, we compose the two rewriting steps with \IndexStep{Trans}\textsc{Trans}, and we're done:
\begin{gather*}
\TransStep{9}{11}{Self.B.A.A.B.A}{Self.B.A.B.A.A}{12}
\end{gather*}
\end{example}

We now prove that for any monoid presentation $\AR$, the corresponding generic signature $\GM$ has the same equivalence class structure as~$\sim_R$:
\begin{enumerate}
\item If $x\sim y$, then $\varphi(x)$~and~$\varphi(y)$ are equivalent type parameters in $\GM$.
\item If $\varphi(x)$ and $\varphi(y)$ are equivalent type parameters in $\GM$, then $x\sim y$.
\end{enumerate}

\begin{theorem}\label{path to derivation}
Let $\AR$ be a monoid presentation, and let \texttt{M} be the protocol that encodes $\AR$. Then $x\sim_R y$ implies $\GM\vdash\SameReqPhi{x}{y}$ for all $x$, $y\in A^*$.
\end{theorem}
\begin{proof}
We are given a \index{rewrite path!to derivation}rewrite path $p$ with $\Src(p)=x$ and $\Dst(p)=y$, and we must construct a derivation of $\GM\vdash\SameReqPhi{x}{y}$. We proceed by \index{induction}induction on the \index{rewrite path length}length of $p$.

\BaseCase We have an \index{empty rewrite path}empty rewrite path, so $p=1_t$ for some $t\in A^*$. We construct a derivation of $\GM\vdash\varphi(t)$ using \PropRef{monoid type lemma}, and then apply the \IndexStep{Reflex}\textsc{Reflex} inference rule to derive $\SameReq{$\varphi(t)$}{$\varphi(t)$}$:
\begin{gather*}
\AnyStep{\varphi(t)}{1}\\
\ReflexStep{1}{$\varphi(t)$}{2}
\end{gather*}

\InductiveStep Suppose that $p$ has length at least~1, so that $p=p^\prime \circ s$ for some shorter rewrite path $p^\prime$ and rewrite step~$s$. Let's say that $s = t(u\Rightarrow v)w$ where $t$, $w\in A^*$, and $(u,v)$ or $(v,u)\in R$. Let's look at the source and destination of $p^\prime$ and $s$:
\begin{align*}
&\Src(p^\prime)=x
&&\Src(s) = tuw\\
&\Dst(p^\prime)=tuw
&&\Dst(s) = tvw=y
\end{align*}
By induction, we know that $\GM\vdash\SameReqPhi{x}{tuw}$. It remains to construct a derivation of $\GM\vdash\SameReqPhi{tuw}{tvw}$. We do this by first constructing $\GM\vdash\SameReqPhi{tu}{tv}$.

By \PropRef{monoid type lemma}, we know $\GM\vdash\ConfReqPhi{t}$. Now, consider the direction of~$s$:
\begin{enumerate}
\item If $s$ is \index{positive rewrite step}positive, then \texttt{M} states an associated same-type requirement $\SameReqPhi{u}{v}_\texttt{M}$. We apply \IndexStep{AssocSame}\textsc{AssocSame} to $\ConfReqPhi{t}$, so we get $\GM\vdash\SameReqPhi{tu}{tv}$.
\item If $s$ is \index{negative rewrite step}negative, \textsc{AssocSame} gives us $\GM\vdash\SameReqPhi{tv}{tu}$ instead, so we use the \IndexStep{Sym}\textsc{Sym} inference rule to get $\GM\vdash\SameReqPhi{tu}{tv}$.
\end{enumerate}
Next, \PropRef{monoid type lemma} gives us $\GM\vdash\varphi(w)$. Since we have $\GM\vdash\SameReqPhi{tu}{tv}$ and $\GM\vdash\varphi(w)$, \LemmaRef{subst lemma} implies that $\GM\vdash\SameReqPhi{tuw}{tvw}$.

Finally, we use \IndexStep{Trans}\textsc{Trans} to derive $\GM\vdash\SameReqPhi{x}{y}$ (recall that $y=tvw$):
\begin{gather*}
\AnyStep{\SameReqPhi{x}{tuw}}{1}\\
\AnyStep{\SameReqPhi{tuw}{y}}{2}\\
\TransStep{1}{2}{$\varphi(x)$}{$\varphi(y)$}{3}
\end{gather*}
Thus, every rewrite path $p$ maps to a derived same-type requirement in $\GM$.
\end{proof}

\begin{theorem}\label{derivation to path}
Let $\AR$ be a monoid presentation, and let \texttt{M} be the protocol that encodes $\AR$. Then $\GM\vdash\SameReqPhi{x}{y}$ implies that $x\sim_R y$ for all $x$, $y\in A^*$.
\end{theorem}
\begin{proof}
We are given a derivation of $\SameReqPhi{x}{y}$, and we must show that $x\sim y$ by constructing a rewrite path $p$ with source $x$ and destination $y$. We proceed by structural \index{induction}induction on derived requirements.

The following derivation steps produce same-type requirements (however, \textsc{Same} does not come up in our case, because $\GM$ does not have explicit same-type requirements; all same-type requirements are derived from the requirement signature of~\texttt{M}):
\begin{center}
\begin{tabular}{lll}
\toprule
\textsc{Same}&\textsc{AssocSame}&\textsc{SameName}\\
\textsc{Reflex}&\textsc{Sym}&\textsc{Trans}\\
\bottomrule
\end{tabular}
\end{center}
In each case, we must construct a new rewrite path~$p$ whose source and destination match the same-type requirement in the conclusion of the step, from the rewrite paths corresponding to each assumption used by the step, if any.

\BaseCase The base case is when our derivation step does not have any same-type requirement assumptions, but the conclusion is some same-type requirement. There are two such kinds of steps, and in both we construct the rewrite path~$p$ directly.

An application of the \IndexStep{AssocSame}\textsc{AssocSame} inference rule, when written in the form below, implies that we have a rewrite rule $(u,v)\in R$. We set $p:=t(u\Rightarrow v)$:
\[\SameReqPhi{tu}{tv}\tag{\textsc{AssocSame} $\ConfReqPhi{t}$}\]
The other base case is a \IndexStep{Reflex}\textsc{Reflex} step, where we set $p:=1_t$:
\[\SameReqPhi{t}{t}\tag{\textsc{Reflex} $\varphi(t)$}\]

\InductiveStep We handle \IndexStep{Sym}\textsc{Sym} by \index{rewrite path inverse}inverting our rewrite path. Induction gives us $p^\prime$ with $\Src(p^\prime)=t$ and $\Dst(p^\prime)=u$. We set $p:=(p^\prime)^{-1}$, because $\Src(p)=u$ and $\Dst(p)=t$:
\[\SameReqPhi{u}{t}\tag{\textsc{Sym} $\SameReqPhi{t}{u}$}\]
We handle \IndexStep{Trans}\textsc{Trans} by \index{rewrite path composition}composition. Induction gives us $p_1$, $p_2$ with $\Src(p_1)=t$, $\Dst(p_1)=\Src(p_2)=u$, and $\Dst(p_2)=v$. We set $p:=p_1\circ p_2$, because $\Src(p)=t$ and $\Dst(p)=v$:
\[\SameReqPhi{t}{v}\tag{\textsc{Trans} $\SameReqPhi{t}{u}$ $\SameReqPhi{u}{v}$}\]
Finally, we handle \IndexStep{SameName}\textsc{SameName} by \index{rewrite path whiskering}whiskering. Here, $g\in A$ is a generator, corresponding to an associated type declaration in~\texttt{M}. Induction gives us $p^\prime$ with $\Src(p^\prime)=t$ and $\Dst(p^\prime)=u$. We set $p:=p^\prime\WR g$, because $\Src(p)=tg$ and $\Dst(p)=ug$:
\[\SameReqPhi{tg}{ug}\tag{\textsc{SameName} $\ConfReqPhi{u}$ $\SameReqPhi{t}{u}$}\]
Thus, every derived same-type requirement of $\GM$ maps to a rewrite path $p$.
\end{proof}

In general, the Cayley graph is not preserved by a monoid isomorphism, because it depends on the choice of generating set. However, in our Swift encoding, the Cayley graph manifests directly. We now know that \tN\ and \texttt{Z4} encode the monoids $\{a\}^*$ and $\Pres{a}{a^4\sim\varepsilon}$ respectively. We also previously remarked that:
\begin{itemize}
\item The Cayley graph of $\{a\}^*$ (\ExRef{monoid n graph}) looks like the type parameter graph of~$\GN$ (\ExRef{protocol n graph}).
\item The Cayley graph of $\Pres{a}{a^4\sim\varepsilon}$ (\ExRef{monoid z4 example}) looks like the type parameter graph of~$G_\texttt{Z4}$ (\ExRef{protocol z4 graph}).
\end{itemize}
This always works, and furthermore the conformance path graph of \SecRef{finding conformance paths} coincides with the other two graphs in this case as well:
\begin{theorem}
Let $\AR$ be a monoid presentation, and let \texttt{M} be the protocol that encodes $\AR$. The following three graphs are isomorphic:
\begin{enumerate}
\item The \index{Cayley graph}Cayley graph of $\AR$.
\item The \index{type parameter graph!monoid}type parameter graph of $\GM$.
\item The \index{conformance path graph!monoid}conformance path graph of $\GM$.
\end{enumerate}
\end{theorem}
\begin{proof}
First, consider the vertex set of each graph. We've already established that terms and type parameters have the same equivalence class structure. From this it follows the only difference between the vertex sets is essentially our choice of labels:
\begin{enumerate}
\item A vertex in (1) is an equivalence class of terms: $\EquivClass{t}$ for some $t\in A^*$.
\item A vertex in (2) is an equivalence class of type parameters: $\EquivClass{\varphi(t)}$ for some $t\in A^*$.
\item A vertex in (3) is an equivalence class of \index{abstract conformance!monoid}abstract conformances. Since every type parameter of~$\GM$ conforms to exactly one protocol---\texttt{M}---every abstract conformance has the form $\ConfReqPhi{t}$ for some $t\in A^*$.
\end{enumerate}
The same can be said about the edge sets as well:
\begin{enumerate}
\item An edge in (1) joins each $\EquivClass{t}$ with $\EquivClass{tg}$ for all $a\in A$.
\item An edge in (2) joins each $\EquivClass{\varphi(t)}$ with $\EquivClass{\varphi(t)\texttt{.A}}$ for all associated types~\texttt{A} of~\texttt{M}.
\item An edge in (3) joins each $\ConfReq{$\EquivClass{\varphi(t)}$}{M}$ with $\ConfReq{$\EquivClass{\varphi(t)\texttt{.A}}$}{M}$ for all associated conformance requirements $\AssocConfReq{Self.A}{M}{M}$ of~\texttt{M}.
\end{enumerate}

One final remark. Recall that in the Cayley graph of a monoid presentation~$\AR$, every vertex has the same number of \index{successor!vertex}successors, and we just showed that the same is true of the type parameter graph of $\GM$. On the other hand, in an arbitrary generic signature, the number of successors of a vertex will vary, because equivalence classes will conform to various sets of protocols with distinct associated types. The type parameter graph of a ``typical'' generic signature is almost never the Cayley graph of a monoid.
\end{proof}

\section{The Word Problem}\label{word problem}

The previous section's \ThmRef{path to derivation} and \ThmRef{derivation to path} tell us that we can write a program that type checks if and only if two terms in a finitely-presented monoid are equivalent. The compiler must be able to verify the equivalence, or reject the program if it does not hold.

For example, we can start with our protocol encoding the \index{bicyclic monoid}bicyclic monoid, and also declare a method in a protocol extension:
\begin{Verbatim}
protocol M {
  associatedtype A: M
  associatedtype B: M where Self.A.B == Self
}

extension M {
  static func testBicyclicMonoid() {
    sameType(Self.B.A.A.B.A.self, Self.B.A.B.A.A.self)  // ok
    sameType(Self.A.A.A.self, Self.B.B.B.self)          // error!
  }
}

func sameType<T>(_: T.Type, _: T.Type) {}
\end{Verbatim}
A call to \texttt{sameType()} requires that both arguments are \index{metatype type}metatypes with the same instance type. For this program to type check, the compiler must verify two statements, corresponding to each call of \texttt{sameType()}:
\begin{gather*}
\GM\vdash\SameReq{Self.B.A.A.B.A}{Self.B.A.B.A.A}\\
\GM\vdash\SameReq{Self.B.A.A}{Self.B.B.B}
\end{gather*}
The first statement is true, because $baaba\sim babaa$, as we've seen. However, the second statement is actually false, and in fact $baa\not\sim bbb$. The compiler accepts the first call, and diagnoses an error on the second call to \texttt{sameType()}, as we would expect.

We can write a similar program to encode every other finitely-presented monoid we've seen so far, and the Swift compiler will happily accept them, and correctly decide if arbitrary terms are equivalent to each other. We now ask, does this always work, or are there some protocol declarations constructed this way that we cannot accept?

This is the well-known \IndexDefinition{word problem}\emph{word problem}:
\begin{itemize}
\item
\textbf{Input:} A monoid presentation $\AR$, and two terms $x$, $y\in A^*$.

\textbf{Result:} True or false: $x\sim_R y$?
\end{itemize}

The word problem was perhaps first posed in the early 20th century by \index{Axel Thue}Axel~Thue, who studied various formalisms defined by the replacement of substrings within larger strings via a fixed set of rules. Thue described what later became known as \index{Thue system}\index{Thue system!z@\igobble|seealso{finitely-presented monoid}}\emph{Thue~systems} and \index{semi-Thue system}\index{semi-Thue system!z@\igobble|seealso{reduction relation}}\emph{semi-Thue~systems} in a 1914 paper \cite{thue_translation}. A Thue system is a finitely-presented monoid, while a semi-Thue system is what we get if we replace the symmetric term equivalence relation with a \emph{reduction relation}, which we will study in the next section.

It seems that Thue's goal was in many ways similar to ours: to analyze a formal system by encoding its statements as terms in a finitely-presented monoid, such that the proof of the statement becomes a sequence of rewrite steps. Questions about the formal system then become instances of the word problem. The final task was to figure out this missing piece, and---voil\'a---we've solved all of mathematics.

Thue was able to solve the word problem in certain restricted instances. For example, if the rewrite rules preserve \index{length-preserving rewrite rule}\index{term length}term length, then every equivalence class must be finite, and $x\sim y$ can be decided by exhaustive enumeration of $\EquivClass{y}$. What eluded Thue though, was a general approach that worked in all cases.

The next twist in this saga came after the development of computability theory. We already sketched out \index{Turing machine}Turing machines and the \index{halting problem}halting problem in \SecRef{tag systems}, where we saw that the question of \emph{termination checking} is undecidable in our type substitution algebra. Now, we will see another such undecidable problem. In a 1947~paper~\cite{post_1947}, \index{Emil Post}Emil~Post established that no computable algorithm can solve the \index{undecidable problem!word problem}word problem. Post did this by defining an encoding of a Turing machine as a finitely-presented monoid. At a very high level, this encoding works as follows:
\begin{enumerate}
\item At any point in time, the complete state of the Turing machine, consisting of the current state symbol and the contents of the tape, can be described by a term in the finitely-presented monoid.
\item The single-step transitions, where the machine replaces the symbol at the current head position and moves left or right, define the monoid's rewrite rules.
\item A sequential execution defines a rewrite path. If the machine halts, we know there exists a rewrite path from the initial state to the halting state. Thus, the Turing machine halts if and only if the initial and halting states are equivalent terms.
\end{enumerate}
If we could solve the word problem in an \emph{arbitrary} finitely-presented monoid, we could also solve the halting problem for an arbitrary Turing machine---a contradiction, so no such algorithm exists. Even more is true. Not only is there no ``universal'' algorithm that takes the monoid presentation as part of its input, but we can construct a \emph{specific} finitely-presented monoid with an undecidable word problem.

One approach is to start with Turing's construction of a \index{universal Turing machine}\emph{universal machine}. This is a Turing machine that simulates the execution of another Turing machine one step at a time, receiving the \index{standard description}standard description of the other Turing machine as input on its tape. These days, we might call this an ``emulator'' or ``interpreter.''

By applying Post's construction to a universal Turing machine, we can get a specific monoid presentation with an undecidable word problem, because we can now encode an arbitrary Turing machine as a term in this specific monoid, and then determine if the Turing machine halts by solving the word problem. The ``universal Turing machine'' monoid has a large number of generators and rewrite rules, though.

A much simpler monoid with an undecidable word problem appeared in a 1958~paper by \index{Grigori Tseitin}G.~S.~Tseitin \cite{undecidablesemigroup}. For an English translation with commentary, see~\cite{nybergbrodda2024g}.

\newcommand{\Ts}{\mathfrak{C}_1}

\begin{theorem}\label{undecidablemonoid}
This finitely-presented monoid has an undecidable word problem:
\begin{align*}
\Ts := \Pres{a,b,c,d,e}{&ac\sim ca,\,ad\sim da,\,bc\sim cb,\,bd\sim db,\\
&eca\sim ce,\,edb\sim de,\\
&cca\sim ccae}
\end{align*}
\end{theorem}
\begin{corollary}
No computable algorithm can decide if two arbitrary type parameters are equivalent in the protocol generic signature $G_\texttt{C1}$:
\begin{Verbatim}
protocol C1 {
  associatedtype A: C1
  associatedtype B: C1
  associatedtype C: C1
  associatedtype D: C1
  associatedtype E: C1
    where A.C == C.A, A.D == D.A, B.C == C.B, B.D == D.B,
          E.C.A == C.E, E.D.B == D.E,
          C.C.A == C.C.A.E
}
\end{Verbatim}
The Swift compiler \index{limitation!undecidable generic signature}\emph{rejects} this protocol and \index{diagnostic!undecidable generic signature}diagnoses an error:
\begin{quote}
\begin{verbatim}
error: cannot build rewrite system for protocol;
rule length limit exceeded
\end{verbatim}
\end{quote}
The ``rule length'' in the error refers not to the rules we wrote down, but the rules generated while constructing a \emph{convergent} presentation for~$\Ts$, something we get to in \SecRef{rewritesystemintro}. If our monoid has an undecidable word problem, this cannot be done, and the procedure for generating such a presentation uses various heuristics to know when to give up. On the author's machine, the Swift compiler rejects protocol~\texttt{C1} after about 30 milliseconds of processing.
\end{corollary}

To understand~$\Ts$, we need to introduce a new concept. A \index{special monoid presentation}\emph{special monoid presentation} is one where the right-hand side of each \index{rewrite rule!special monoid presentation}rewrite rule is $\varepsilon$, and a \emph{special monoid} is one that has such a presentation:
\[\Pres{a_1, \ldots, a_m}{u_1 \sim \varepsilon,\, \ldots,\, u_n \sim \varepsilon}\]
The \index{bicyclic monoid!special monoid presentation}bicyclic monoid from \ExRef{bicyclic} is special. Every \index{group!special monoid presentation}group is special, because of the inverse operation; we can always replace a rule $u\sim v$ with $uv^{-1}\sim\varepsilon$.

Tseitin's monoid $\Ts$ is a \textsl{universal machine for solving word problems in special monoids}. If we take a special monoid presentation $\AR$ and a pair of terms $x$, $y\in A^*$, we can encode $\AR$, $x$, and $y$ as three terms $W\!$, $X$, and $Y$ over the generators of~$\Ts$. Then, Tseitin's key result is the following:
\[x\sim y\quad\text{in $\AR$}\qquad \text{if and only if} \qquad W\! X\sim WY \quad \text{in $\Ts$}\]

We will sketch this out by encoding a word problem in the \index{dihedral group}\emph{dihedral group of order~12}, which describes the symmetries of a regular hexagon, as a word problem in $\Ts$. Think of $s$ as rotation by~$60^\circ$, and $t$ as reflection by an axis of symmetry:
\[D_{12} := \Pres{s,t}{s^6\sim\varepsilon,\, t^2\sim\varepsilon,\, (st)^2\sim \varepsilon}\]

\smallskip

\emph{(First step.)} We encode the presentation of $D_{12}$ first. We translate each rewrite rule of~$D_{12}$ into an element of~$\Ts$, by replacing $s$~with~$cd$ and $t$~with~$cd^2$ (if $D_{12}$ had a third generator, it would map to~$cd^3$, and so on):
\begin{align*}
s^6 &\mapsto (cd)^6\\
t^2 &\mapsto (cd^2)^2\\
(st)^2 &\mapsto (cdcd^2)^2
\end{align*}

\smallskip

\emph{(Second step.)} We append $c$ at the end of each term above, concatenate them together, and add one more~$c$ at the end. We've encoded our \emph{presentation} of $D_{12}$ as an \emph{element} of~$\Ts$. This is our term~$W$:
\[
W := \underbrace{(cd)^6}_{s^6}{} \cdot c\cdot \underbrace{(cd^2)^2}_{t^2}{} \cdot c\cdot \underbrace{(cdcd^2)^2}_{(st)^2} {}\cdot c\cdot c
\]

\smallskip

\emph{(Third step.)} Now we pick a pair of terms from $D_{12}$. Let's take $x:=ts$ and $y:=s^5t$. A rotation followed by a reflection is the same as a reflection followed by a rotation in the opposite direction, so $ts\sim s^5t$. We again translate these terms into~$\Ts$, except using $a$~and~$b$ instead of $c$~and~$d$. This gives us our $X$ and $Y$:
\begin{gather*}
X := ab^2ab\\
Y := (ab)^5ab^2
\end{gather*}
Tseitin's rewrite rules define a (rather long) rewrite path from $W\! X$ to $WY$.

Our example does not exhibit undecidability, because the word problem in~$D_{12}$ is very easy to solve. There are only 12 elements, so we can just write down the \index{Cayley table}Cayley table. However, as with monoids, the word problem for groups is undecidable in the general case \cite{undecidablegroup} (for a more accessible introduction, see \cite{undecidablegroup2}, or Chapter~12 of \cite{rotman}). Because~$\Ts$ can encode the word problem in \emph{all} groups, we certainly cannot hope to write down an algorithm to solve the word problem in~$\Ts$.

\paragraph{The big picture.} We've shown that every finitely-presented monoid can be encoded as a specific kind of Swift protocol, almost as if Swift's protocols are a generalization of the finitely-presented monoids! This doesn't immediately help us implement Swift generics, though. However, \ChapRef{symbols terms rules} shows that by taking a suitable alphabet and set of rewrite rules, we can go in the other direction and encode any Swift generic signature as a finitely-presented monoid. While the undecidability of the word problem prevents us from accepting \emph{all} generic signatures, the next section will describe a large class of finitely-presented monoids where the word problem is easy to solve. All reasonable generic signatures fit in with our model.

\paragraph{Closing remarks.} In \SecRef{tag systems}, we showed that the \index{type substitution}type substitution algebra can encode arbitrary computation with recursive conformance requirements~\cite{se0157}. Now we see that using recursive conformance requirements and protocol \texttt{where} clauses~\cite{se0142}, we can encode an undecidable problem in the derived requirements formalism as well.

Tseitin's monoid is not a special monoid, so it cannot encode its own word problem. D.~J.~Collins~\cite{universalsemigroup} later discovered a ``more'' universal word problem interpreter having only a few more rules; this can encode word problems in $\Ts$ or even itself:
\begin{align*}
\mathfrak{U} := \Pres{ a,b,c,d,e_1,e_2,f\Tight ,g}{&ac\sim ca,\,ad\sim da,\,a\Tight f\sim f\Tight a,\,ag\sim ga,\\
&bc\sim cb,\, bd\sim db,\, b\Tight f\sim f\Tight b,\, bg\sim gb,\\
&ce_1\sim ec_1a,\, de_1\sim e_1db,\\
&e_2c\sim cae_2,\, e_2d\sim dbe_2,\\
&e_1g\sim ge_2,\\
&f\sim f\Tight e_1,\, e_2f\sim f}
\end{align*}

\section{The Normal Form Algorithm}\label{rewritesystemintro}

There is a ``naive'' solution to the word problem: we start from our first term~$s$, and attempt different combinations of rewrite steps, until we end up at~$t$. In fact, we can arrange the search so that if such a path does exist, the search always terminates with the appropriate rewrite path.

However, if there is no such path, we will never know when to stop. Recall that a rewrite rule $(u, v)\in R$ is a \emph{symmetric} equivalence. A rewrite path from $s$~to~$t$, if it exists, may contain multiple occurrences of the same rewrite rule, applied in either or both directions over the course of the path. The rewrite graph is infinite, and the intermediate terms visited along our path might be arbitrarily long.

\begin{center}
\begin{tikzcd}
s\arrow[rd, Rightarrow]&&\cdots\arrow[rd, Rightarrow]&&t\\
&\cdots\arrow[ru, Rightarrow]&&\cdots\arrow[ru, Rightarrow]
\end{tikzcd}
\end{center}

We now change our point of view, and think of a rewrite rule $(u,v)$ as a directed \emph{reduction} from $u$~to~$v$. That is, we're allowed to replace $u$ with $v$ anywhere it appears in a term, but not vice versa. Recall that a rewrite step $x(u\Rightarrow v)y$ is positive if $(u,v)\in R$, and negative if $(v,u)\in R$.

\begin{definition}
A \IndexDefinition{positive rewrite path}\emph{positive rewrite path} is one where each step taken is positive.
\end{definition}
An \index{empty rewrite path}empty rewrite path is positive, because it contains no steps. Positive rewrite paths are also closed under \index{rewrite path composition}composition and \index{rewrite path whiskering}whiskering, meaning that a) if $p_1$ and $p_2$ are positive, then so is $p_1\circ p_2$; b) if $p$ is positive and $z\in A^*$, so are $z\WL p$ and $p\WR z$. (On the other hand, if $p$ is positive, then $p^{-1}$ is positive if and only if $p$ is empty.)

\begin{definition}
Let $\AR$ be a monoid presentation.
\begin{enumerate}
\item The \IndexDefinition{reduction relation}\emph{reduction relation} on $A^*$ generated by $R$, denoted \index{$\rightarrow$}\index{$\rightarrow$!z@\igobble|seealso{reduction relation}}$s\rightarrow_R t$ or $s\rightarrow t$ when $R$ is clear from context, relates each pair of terms $s$ and $t$ such that there exists a positive rewrite path with source~$s$ and destination~$t$.

\item A term $s$ is \index{irreducible term|see{reduced term}}\emph{irreducible} or \IndexDefinition{reduced term}\emph{reduced} if $s\rightarrow t$ implies that $s=t$.

\item If $s\rightarrow t$ and $t$ is irreducible, we say $t$ is a \IndexDefinition{normal form}\emph{normal form} of $s$, denoted by $\tilde{s} = t$.
\end{enumerate}
\end{definition}

We can describe an \IndexDefinition{normal form algorithm}algorithm that attempts to find normal forms.

\begin{algorithm}[Normal form algorithm]\label{term reduction algo}
As input, takes a term $t\in A^*$ and a list of rewrite rules $R:=\{(u_1,v_1),\ldots\,(u_n,v_n)\}$. As output, returns some normal form of~$t$, which we denote~$\tilde{t}$; and a \index{positive rewrite path}positive rewrite path $p$, such that $\Src(p)=t$ and $\Dst(p)=\tilde{t}$.
\begin{enumerate}
\item Initialize the return value $p$ with the empty rewrite path $1_t$.
\item If $t=xuy$ for some $x$, $y\in A^*$ and $(u,v)\in R$: set $t\leftarrow xvy$, set $p\leftarrow p \circ x(u\Rightarrow v)y$, and go back to Step~1.
\item Otherwise, return $t$ and $p$.
\end{enumerate}
\end{algorithm}

Suppose we are given a pair of terms $s$, $t\in A^*$ and we wish to decide if $s\sim t$. If our algorithm tells us that both have the same normal form, it means that $s\rightarrow \tilde{s}$ via~$p_s$, $t\rightarrow \tilde{t}$ via~$p_t$, and also $\tilde{s}=\tilde{t}$. We can then conclude that $s\sim t$, because $p_s \circ p_t^{-1}$ is a rewrite path (no longer positive, unless $p_t$ is empty) with source~$s$ and destination~$t$:

\begin{center}
\begin{tikzcd}
s\arrow[rd, Rightarrow]&&&&t\arrow[ld, Leftarrow]\\
&\cdots\arrow[rd, Rightarrow]&&\cdots\arrow[ld, Leftarrow]\\
&&(\tilde{s} = \tilde{t})
\end{tikzcd}
\end{center}

Solving the word problem by comparing normal forms is very appealing, but it cannot work in the general case, because we already know the word problem is undecidable. Broadly speaking, two things can go wrong:
\begin{enumerate}
\item \textbf{Existence:} Our description of the normal form algorithm apparently returns a value in Step~3, so what does it mean for the normal form of some term~$t$ to not exist? The trouble is that the algorithm might not \index{termination}\emph{terminate}. In this case, we say that $t$ does not have a normal form.
\item \textbf{Uniqueness:} We demonstrated that if $\tilde{s}=\tilde{t}$ for some $s$, $t\in A^*$, then $s\sim t$. The converse does not always hold, however: a single equivalence class of $\sim$ may contain more than one irreducible term of $\rightarrow$. In other words, there may exist $s$~and~$t$ such that $s\sim t$, and yet $\tilde{s}\neq \tilde{t}$.
\end{enumerate}
To understand when and how we can overcome these difficulties, we turn to the theory of \index{string rewriting}\emph{string rewriting}. The text by Book and Otto \cite{book2012string} is a classic. Alternatively, instead of introducing string rewriting by way of finitely-presented monoids as we do here, one could instead start with an \emph{abstract} reduction relation operating on ``tree-like'' expressions, of which strings are a special case. This is the theory of \index{term rewriting}\emph{term rewriting}, and we invite the reader to also read \cite{andallthat} and \cite{formalmans6} for a thorough study of this approach.

\paragraph{Existence.} Consider the monoid with two elements $\{0,1\}$ and the binary operation $x\cdot y:=\max(x,y)$. (The identity is 0.) Here is one way to present this monoid:
\[ \Pres{a}{a\sim aa} \]
Every non-empty term is \emph{equivalent} to~$a$, but \emph{no} non-empty term has a normal form:
\[ a \rightarrow aa \rightarrow aaa \rightarrow aaaa \rightarrow \ldots \]
The normal form algorithm is quite useless here. However, if we replace our rewrite rule $(a, aa)$ with $(aa, a)$, the normal form algorithm now \emph{always} terminates, with no change to the term equivalence relation. For example:
\[ aaaa \rightarrow aaa \rightarrow aa \rightarrow a \]
Now, every non-empty term has the normal form~$a$. In fact, if $\AR$ has the property that for each $(u,v)\in R$, either $|u|<|v|$ or $|v|<|u|$, we can always \IndexDefinition{oriented rewrite rule}\emph{orient} our rewrite rules so that $|v|<|u|$. We say such rewrite rules are \IndexDefinition{length-reducing rewrite rule}\emph{length-reducing}. The ensures that the intermediate term $t$ becomes shorter on each iteration of Step~2 in \AlgRef{term reduction algo}, after we replace $u$ with $v$. This can only go on for finitely many steps, so we must always get a normal form.

Limiting ourselves to length-reducing rewrite rules is too restrictive; for example, recall that the free commutative monoid has the \IndexDefinition{length-preserving rewrite rule}\emph{length-preserving} rewrite rule $(ba, ab)$. Thus, we must generalize the idea of comparing \index{term length}term length to establish termination.

\begin{definition}\label{reduction order def}
Let $A$ be a finite alphabet of symbols. A \IndexDefinition{reduction order}\emph{reduction order} $<$ is a \index{partial order}partial order on $A^*$ that is also \index{well-founded order}well-founded (\DefRef{well founded def}) and \index{translation-invariant relation}translation-invariant (\DefRef{translation invariant def}).
\end{definition}

\begin{definition}
Let $\AR$ be a monoid presentation. A rewrite rule $(u,v)\in R$ is \IndexDefinition{oriented rewrite rule}\emph{oriented} with respect to some reduction order $<$, if the right-hand side is smaller with respect to this order, so~$u>v$. (Of course, if $<$ is not a \index{linear order}linear order, it may happen that we have a \IndexDefinition{non-orientable rewrite rule}\emph{non-orientable} rewrite rule $(u,v)\in R$ where $u\not< v$ and $v\not< u$.)
\end{definition}

If we can orient the rewrite rules of our monoid presentation using a reduction order, then the normal form algorithm will always find \emph{some} normal form in a finite number of steps. This solves the first of our two problems:

\begin{theorem}
Let $\AR$ be a monoid presentation. If the rewrite rules of~$R$ are oriented with respect to a reduction order~$<$, then every equivalence class of $\sim$ contains \emph{at least} one reduced term. We also say that the reduction relation $\rightarrow$ is \IndexDefinition{terminating reduction relation}\index{Noetherian reduction relation|see{terminating reduction relation}}\emph{terminating} or \emph{Noetherian} in this case.
\end{theorem}
\begin{proof}
Assume we have a non-terminating execution of \AlgRef{term reduction algo}, starting with some term~$t$. This means we have an infinite sequence of terms, each one reducing to the next by a positive rewrite step:
\[ t\rightarrow t_1\rightarrow t_2 \rightarrow \cdots \rightarrow t_n \rightarrow \cdots \]

Now, let $s=x(u\Rightarrow v)y$ be a \index{positive rewrite step}positive rewrite step. Since $(u,v)\in R$ is oriented, we have $u>v$. Also, $<$ is translation-invariant, so $xuy>xvy$. Thus, $\Src(s)>\Dst(s)$. This means our infinite reduction sequence gives us an \index{infinite descending chain}infinite descending chain of terms under our reduction order:
\[ t > t_1 > t_2 > \cdots > t_n > \cdots \]
However, this contradicts the assumption that $<$ is well-founded. Thus, no such infinite sequence of terms exists, and the normal form algorithm must terminate.
\end{proof}

If we equip our alphabet of symbols~$A$ with a well-founded partial order, we can always get a reduction order on $A^*$, called the \IndexDefinition{shortlex order}\emph{shortlex order}:
\begin{enumerate}
\item Terms of shorter length precede terms of longer length.
\item Otherwise, if two terms have the same length, we compare the symbols pairwise using the order on~$A$.
\end{enumerate}
For example, if $A:=\{a,b,c\}$, we might say that $a<b<c$ to define a linear order on $A$; then we can order various terms of $A^*$ using the shortlex order: \[a<b<ab<ac<aab\]

\begin{algorithm}[Shortlex order]\label{shortlex}
Takes two terms $x$, $y\in A^*$ as input, and returns one of ``$<$'', ``$>$'', ``$=$'' or \index{$\bot$}\index{$\bot$!z@\igobble|seealso{partial order}}``$\bot$'' as output.
\begin{enumerate}
\item (Shorter) If $|x|<|y|$, return ``$<$''.
\item (Longer) If $|x|>|y|$, return ``$>$''.
\item (Initialize) If $|x|=|y|$, we compare elements. Let $i:=0$.
\item (Equal) If $i=|x|$, we didn't find any differences, so $x=y$. Return ``$=$''.
\item (Subscript) Let $x_i$ and $y_i\in A$ be the $i$th symbol of $x$ and $y$, respectively.
\item (Compare) Compare $x_i$ with $y_i$ using the order on $A$. Return the result if it is ``$<$'', ``$>$'', or ``$\bot$''. Otherwise, we must have $x_i=y_i$, so keep going.
\item (Next) Increment $i$ and go back to Step~4.
\end{enumerate}
\end{algorithm}
The shortlex order will return ``$\bot$'' if and only if two equal-length terms $x$~and~$y$ have an \index{incomparable elements}incomparable pair of symbols at the same position, under the partial order on~$A$. In particular, if the order on~$A$ is linear, the induced shortlex order is always a linear order on~$A^*$. We also omit two simple proofs. It can also be shown that the shortlex order is well-founded using the same argument as in \PropRef{well founded type order}. Finally, one can show that the shortlex order is translation-invariant by \index{induction}induction on term length.

\begin{example}
The shortlex order is \emph{not} the same as the \index{lexicographic order}lexicographic ``dictionary'' order on strings. For example, $b<ab$ under the shortlex order, but $ab<b$ under the lexicographic order. The lexicographic order is \emph{not} well-founded, thus it is unsuitable for use as a reduction order:
\[b>ab>aab>aaab>aaaab>\cdots\]
\end{example}

At this point, we pause to observe that in all of our examples so far except the undecidable $\mathfrak{C}_1$, the normal form algorithm solves the word problem once we establish termination:
\begin{enumerate}
\item The \index{modular arithmetic}monoid $\Pres{a}{a^4\sim\varepsilon}$ from \ExRef{monoid z4 example} has a single length-reducing rewrite rule. The normal form algorithm will output one of $\{\varepsilon,a,a^2,a^3\}$.
\item The \index{bicyclic monoid}bicyclic monoid $\Bicyclic$ from \ExRef{bicyclic} has a single length-reducing rewrite rule. The normal form algorithm will find the unique term of the form $b^m a^n$ in each equivalence class.
\item The \index{free commutative monoid}free commutative monoid $\Pres{a,b}{ba\sim ab}$ from \ExRef{commutative free monoid example} can be equipped with the shortlex order where $a<b$ to establish termination. The normal form algorithm will find in each equivalence class a unique term of the form $a^m b^n$.
\end{enumerate}
The three examples above all enjoy unique normal forms, something that is not true of an arbitrary monoid presentation.

\paragraph{Uniqueness.}
We now describe monoid presentations with unique normal forms:

\begin{theorem}[Church-Rosser Theorem]\label{church rosser theorem}
Let $\AR$ be a monoid presentation. The following two conditions are equivalent:
\begin{enumerate}
\item \IndexDefinition{confluence}(Confluence) For all $t\in A^*$, if $t\rightarrow u$ and $t\rightarrow v$ for some $u$, $v\in A^*$, there exists a term $z\in A^*$ such that $u\rightarrow z$ and $v\rightarrow z$.
\item \IndexDefinition{Church-Rosser property}(Church-Rosser property) For all $x$, $y\in A^*$ such that $x\sim y$, there exists $z\in A^*$ such that $x\rightarrow z$ and $y\rightarrow z$.
\end{enumerate}
\end{theorem}

The Church-Rosser Theorem was discovered by \index{Alonzo Church}Alonzo Church and \index{John Rosser}John Rosser~\cite{conversion}. The original result was about the \index{lambda calculus}lambda calculus, which is a rewriting system whose terms are combinations of variables, function applications, and anonymous functions. Lambda calculus is Turing-complete, so while reduction may not \index{terminating reduction relation}terminate with a given lambda term, it is always confluent, so the Church-Rosser property holds. \index{Haskell Curry}Haskell~Curry (whom the \index{Haskell}Haskell language was named after) subsequently generalized the theorem away from the lambda calculus in \cite{combinatory}. We use the form of the statement where the terms are elements in a free monoid; a proof can be found in \cite{book2012string}.

\begin{example}\label{non confluent example}
Consider the following monoid presentation:
\[\Pres{a,b,c}{ab\sim a,\,bc\sim b}\]
The reduction relation is Noetherian, because the rules are \index{length-reducing rewrite rule}length-reducing. However, we will see the reduction relation is \emph{not} confluent, so the Church-Rosser property fails: we can find two irreducible terms in the same equivalence class.

First, observe that both $ac$ and $a$ are irreducible: neither term has any subterms equal to the left hand side of either rule. Now, let's play with the term $abc$. We can begin by applying either of the two rewrite rules:
\begin{enumerate}
\item Applying the first rule to $abc$ gives us $ac$, which is irreducible.
\item Applying the second rule to $abc$ gives us $ab$, which reduces to $a$ via the first rule, which is irreducible.
\end{enumerate}
It appears that \AlgRef{term reduction algo} is not deterministic; in Step~2, we have a choice of rewrite rules to apply, and so we output one of two rewrite paths that both end in distinct irreducible terms:
\begin{gather*}
p_1 := (ab\Rightarrow a)c\\
p_2 := a(bc\Rightarrow b)\circ(ab\Rightarrow a)
\end{gather*}
We've found a \IndexDefinition{confluence violation}\emph{confluence violation}. Note that $\Src(p_1)=\Src(p_2)=abc$, so $p_1^{-1}\circ p_2$ is a rewrite path from $ac$ to $a$. Thus $ac\sim a$, and yet, they're distinct and irreducible. If we draw the rewrite path $p_1^{-1}\circ p_2$ as a diagram, we see that no positive rewrite path can take us from $ac$ to $a$; we must go ``over the hump'':
\begin{center}
\begin{tikzcd}
&abc \arrow[dl, "(a\Rightarrow ab)c"', Leftarrow, bend right] \arrow[dr, "a(bc\Rightarrow b)", Rightarrow, bend left] &\\
ac &&ab \arrow[dl, "(ab\Rightarrow a)", Rightarrow, bend left] \\
&a&
\end{tikzcd}
\end{center}

Now, let's \emph{add} the rewrite rule $(ac,a)$ to our monoid presentation:
\[\Pres{a,b,c}{ab\sim a,\,bc\sim b,\,ac\sim a}\]
We \emph{already had} a rewrite path from $ac$ to $a$, so adding $(ac, a)$ does not change the equivalence relation $\sim$ (we'll see this is a so-called \index{Tietze transformation}Tietze transformation in \SecRef{tietze transformations}). It \emph{does} change $\rightarrow$; in particular, the reduction relation becomes confluent.

In our new presentation, the non-determinism disappears; the normal form algorithm always reduces $abc$ to~$a$, regardless of which rewrite rule we choose to apply first. In fact, every equivalence class now has a unique normal form $c^i b^{\hspace{0.08em}j}\! a^k$ for $i$, $j$, $k\in\NN$.
\end{example}

We give a name to this kind of monoid presentation where the normal form algorithm always terminates and outputs a unique reduced term from each equivalence class.

\begin{definition}
A \IndexDefinition{convergent rewriting system}\emph{convergent rewriting system} is a monoid presentation $\AR$ whose reduction relation $\rightarrow_R$ is Noetherian and confluent.
\end{definition}

A monoid presented by a convergent rewriting system has a decidable word problem:

\begin{corollary}\label{convergent decidable}
Let $\AR$ be a convergent rewriting system. Then for all $x$, $y\in A^*$, we have $x\sim y$ if and only if $\tilde{x}=\tilde{y}$.
\end{corollary}

We add two final entries to the Rosetta stone we started in \SecRef{monoidsasprotocols}:
\begin{center}
\begin{tabular}{ll}
\toprule
\textbf{Swift generics}&\textbf{F-P monoids}\\
\midrule
Type parameter order&Shortlex order\\
Reduced type parameter&Reduced term\\
\bottomrule
\end{tabular}
\end{center}
In fact, the type parameter order of \AlgRef{type parameter order} is basically a specific instance of the shortlex order, except we described it with a recursive algorithm, whereas \AlgRef{shortlex} is iterative.

\paragraph{Completion.} The process of repairing confluence violations by adding rules is called \IndexDefinition{completion}\emph{completion}. If completion succeeds, we get a convergent rewriting system. \ChapRef{completion} will explain the Knuth-Bendix algorithm that is used for this purpose. For example, the Swift compiler accepts a protocol \texttt{M} encoding the monoid presentation from \ExRef{non confluent example}; completion allows us to establish that $\GM\vdash\SameReq{Self.A.C}{Self.A}$.

Completion is only a \emph{semi-decision procedure} that may fail to terminate; in practice, we impose an iteration limit. This cannot be improved upon, because ultimately, it is \index{undecidable problem}undecidable if a monoid can be presented by a convergent rewriting system \cite{ODUNLAING1983339}. A monoid with an undecidable word problem cannot have such a presentation, so completion must fail in that case. Is the converse true? That is, if a finitely-presented monoid is known to have a decidable word problem, will completion always terminate in a finite number of steps and output a convergent rewriting system? The answer is also no:
\begin{enumerate}
\item With a fixed presentation $\AR$, completion may succeed or fail depending on the choice of reduction order $<$.
\item Two presentations $\AR$ and $\Pres{A^\prime}{R^\prime}$ over different \index{generating set}generating sets (alphabets) may present the same monoid up to \index{monoid isomorphism}isomorphism, but completion may succeed with one and fail with the other.
\item Some finitely-presented monoids are known to have a decidable word problem, but no convergent presentation over \emph{any} generating set.
\end{enumerate}
We will see in \SecRef{recursive conformances redux} that the avoidance of the first two issues motivates several design decisions in the lowering of generic signatures to monoid presentations. The second issue was originally explored by \cite{KAPUR1985337}, where we learn that the monoid presentation $\Pres{a,b}{aba\sim bab}$ cannot be extended to a convergent presentation on the same alphabet, but by adding a new symbol $c$ together with the rewrite rule $ab\sim c$, completion will quickly arrive at a convergent presentation.

The third case is more difficult. The below example is from a paper by \index{Craig Squier}Craig C.~Squier \cite{SQUIER1994271}; also, see \cite{Lafont1991ChurchRooserPA} and \cite{LAFONT1995229}.
\begin{theorem}\label{squier s1} The following finitely-presented monoid has a word problem decidable via a ``bespoke'' algorithm, but no convergent presentation over any generating set:
\[S_1:=\Pres{a,b,t,x,y}{ab\sim \varepsilon,\,xa\sim atx,\,xt\sim tx,\,xb\sim bx,\,xy\sim \varepsilon}\]
\end{theorem}
To prove this theorem, Squier introduced a new combinatorial property of the \index{rewrite graph}rewrite graph (the term \emph{derivation graph} is used in the paper), called \emph{finite derivation type}. It is shown that the rewrite graph of a convergent rewriting system has finite derivation type, and that finite derivation type does not depend on the choice of presentation, so it is an invariant of the monoid itself, and not just a particular presentation.

For example, $\Pres{a,b}{aba\sim bab}$ from above has finite derivation type, because a \emph{different} presentation of the same monoid happens to be convergent.

On the other hand, if it can be shown than some monoid does not have finite derivation type, we can conclude that no convergent presentation exists for this monoid, even if its word problem is decidable by other means. Squier was able to show that while $S_1$ does not have finite derivation type, it does have a decidable word problem, and an explicit decision procedure is given.

So, each of the below classes is a \index{proper subset}proper subset of the next, and we shall be content to play in the shallow end of the pool from here on out:
\begin{center}
\begin{tabular}{c}
convergent rewriting system \\
$\subsetneq$ \\
decidable word problem \\
$\subsetneq$ \\
all finitely-presented monoids
\end{tabular}
\end{center}
More conditions satisfied by monoids presented by convergent rewriting systems were explored in \cite{fptype}, \cite{fdtfp3}, and \cite{mild}. For a survey of results about convergent rewriting systems, see \cite{Otto1997}. Many open questions remain. Notice how many of the example monoids we saw only had a single rewrite rule. Even with such ``one-relation'' monoids, the situation is far from trivial. For example, it is not known if every one-relation monoid can be presented by a convergent rewriting system, or more generally, if the word problem for such monoids is decidable or not. However, one-relation monoids are known to have finite derivation type \cite{KOBAYASHI2000547}. For a survey of results about one-relation monoids and a good overview of the word problem in general, see \cite{onerelation}. Apart from string rewriting, another approach to solving the word problem is to use finite state automata to describe the structure of a monoid or group~\cite{epstein1992word}. Other interesting undecidable problems in abstract algebra appear in~\cite{tarski1953undecidable}.

\paragraph{Type substitution.} In fact, we could have formalized the \index{type substitution}type substitution algebra as a term rewriting system. This would be a rewriting system over tree-structured terms, because the types, substitution maps and conformances in this algebra all recursively nest within one another.

The ``evaluation rules'' for the $\otimes$ operator would instead define a reduction relation, and the \index{associative operation}associativity of $\otimes$ would be equivalent to proving that the reduction relation is \index{confluence}confluent. (The intrepid reader can quickly review \AppendixRef{notation summary} to see why this might be the case.) This reduction relation is not \index{terminating reduction relation}terminating, though, because as \SecRef{tag systems} demonstrates, it can encode arbitrary computation.

When viewed as a term rewriting system, type substitution is a lot like the \index{lambda calculus}lambda calculus---confluent but non-terminating. While this is an interesting observation, we will not pursue this direction further.

\begin{ceqn}
\[\ast \ast \ast\]
\end{ceqn}

This chapter does not have a \textbf{Source Code Reference} section, because we only talked about math.
\end{document}