\documentclass[a4paper,headsepline,bibliography=totoc,toc=flat,fleqn,twoside=semi]{scrbook} \usepackage{imakeidx} \usepackage[pdfborder={0 0 0},linkcolor=blue,citecolor=blue,linktocpage=true,colorlinks=true]{hyperref} \usepackage{upgreek} \usepackage{bm} \usepackage{amsthm} \usepackage{amsmath} \usepackage{amssymb} \usepackage{multirow} \usepackage{textgreek} \usepackage{tikz-cd} \usepackage{mathtools} \usepackage{fancyvrb} \usepackage{array} \usepackage{xcolor} \usepackage{mdframed} %\overfullrule=10pt \usetikzlibrary{positioning,shapes,shadows,arrows,trees,calc} \tikzstyle{class}=[rounded corners,draw=black,thick,anchor=west] \theoremstyle{definition} \newtheorem{definition}{Definition}[chapter] \theoremstyle{definition} \newtheorem{example}{Example}[chapter] \theoremstyle{definition} \newtheorem{algorithm}{Algorithm}[chapter] \newtheorem{theorem}{Theorem}[chapter] \newtheorem{lemma}{Lemma}[chapter] \RecustomVerbatimEnvironment{Verbatim}{Verbatim}{frame=single,numbers=left} \DeclareNewTOC[ type = listing, counterwithin=chapter, float ]{listing} \newcommand{\mathboxed}[1]{\boxed{\mbox{\vphantom{pI\texttt{pI}}#1}}} \newcommand{\ttbox}[1]{\boxed{\mbox{\vphantom{pI\texttt{pI}}\texttt{#1}}}} \newcommand{\ttgp}[2]{\texttt{$\uptau$\_#1\_#2}} \newcommand{\SubMap}[1]{\begin{tabular}{|lll|} \hline \multicolumn{3}{|l|}{\textbf{Types}}\\ \hline #1\\ \hline \end{tabular}} \newcommand{\SubMapC}[2]{\begin{tabular}{|lll|} \hline \multicolumn{3}{|l|}{\textbf{Types}}\\ \hline #1\\ \hline \hline \multicolumn{3}{|l|}{\textbf{Conformances}}\\ \hline #2\\ \hline \end{tabular}} \newcommand{\SubType}[2]{\texttt{#1}&$:=$&\texttt{#2}} \newcommand{\SubConf}[1]{\multicolumn{3}{|l|}{\texttt{#1}}} \newcommand{\archetype}[1]{$[\![\texttt{#1}]\!]$} \newcommand{\namesym}[1]{\mathsf{#1}} \newcommand{\genericparam}[1]{\bm{\mathsf{#1}}} \newcommand{\proto}[1]{\bm{\mathsf{#1}}} \newcommand{\protosym}[1]{[\proto{#1}]} \newcommand{\gensig}[2]{\langle #1\;\textit{where}\;#2\rangle} \newcommand{\genericsym}[2]{\bm{\uptau}_{#1,#2}} \newcommand{\assocsym}[2]{[\proto{#1}\colon\namesym{#2}]} \newcommand{\layoutsym}[1]{[\mathsf{layout\;#1}]} \newcommand{\supersym}[1]{[\mathsf{superclass}\;#1]} \newcommand{\concretesym}[1]{[\mathsf{concrete}\;#1]} \DeclareMathOperator{\gpdepth}{depth} \DeclareMathOperator{\gpindex}{index} \DeclareMathOperator{\domain}{domain} \newcommand{\SourceFile}[1]{\href{https://github.com/apple/swift/tree/main/#1}{\texttt{#1}}} % Note: \apiref{foo}{bar} must be followed by a newline, because of a quirk with \noindent \def\apiref#1#2 {\bigskip\hrule\smallskip\noindent\texttt{#1}\hfill\textsl{#2}\smallskip\hrule\smallskip\noindent} % Comment this out and uncomment the next line to see the work-in-progress sections \newcommand{\ifWIP}{\iffalse} %\newcommand{\ifWIP}{\iftrue} \makeindex[intoc] \title{\begin{center} $\left< \begin{array}{cc} :&= \\ \uptau&\rightarrow \end{array} \right>$ \end{center} \bigskip Compiling Swift Generics} \author{Slava Pestov} \pagestyle{headings} \begin{document} \maketitle \chapter*{Preface} This is a book about the implementation of generic programming in Swift. While it is primarily meant to be a reference for Swift compiler contributors, it should also be of interest to other language designers, type system researchers, and even just curious Swift programmers. Some familiarity with general compiler design and the Swift language is assumed. A basic understanding of abstract algebra is also helpful. This work began as a paper about the Requirement Machine, a new implementation of the core algorithms in Swift generics which shipped with Swift~5.6. After making some progress on writing the paper, I realized that a reference guide for the entire generics implementation would be more broadly useful to the community. I worked backwards, adding more preliminary material and revising subsequent sections until reaching a fixed point, hopefully converging on something approximating a coherent and self-contained treatment of this cross-section of the compiler. Part~\ref{part fundamentals} of this book outlines the basic building blocks. Each chapter in the first part depends on the previous chapters; a determined (or stubborn) reader should be able to work through them sequentially, however you might find it easier to skim some sections and refer back later. Part~\ref{part odds and ends} details how various language features are built up from the core concepts of generics. In the second part, each chapter is mostly independent of the others. Part~\ref{part rqm} dives into the Requirement Machine, which implements generic signature queries and requirement minimization. This is the most technical part of the book. Occasional historical asides explain when major features were introduced, citing the relevant Swift evolution proposals. The bibliography lists all cited proposals. There is also an automatically-generated index at the end; you might find it useful for looking up unfamiliar terminology. The Swift compiler is implemented in C++. To help separate essential from incidental complexity, concepts are described without immediately referencing the source code. Every chapter ends with a ``Source Code Reference'' section, structured somewhat like an API reference, which translates what was previously explained into code. You can skip this material if you're not interested in the practicalities of the compiler implementation itself. No knowledge of C++ is required outside of these sections. This book was typeset with \TeX. You can find the latest version in our git repository: \begin{quote} \url{https://github.com/apple/swift/tree/main/docs/Generics} \end{quote} \tableofcontents \part{Nuts and Bolts}\label{part fundamentals} \chapter{Introduction}\label{roadmap} Swift generics were designed with four primary goals in mind: \begin{enumerate} \item Generic definitions should be independently type checked, without knowledge of all possible concrete type substitutions that they are invoked with. \item Shared libraries that export generic definitions should be able to evolve resiliently without requiring recompilation of clients. \item Layouts of generic types should be determined by their concrete substitutions, with fields of generic parameter type stored inline. \item Abstraction over concrete types with generic parameters should only impose a cost across module boundaries, or in other situations where type information is not available at compile time. \end{enumerate} The Swift compiler achieves these goals as follows: \begin{enumerate} \item The interface between a generic definition and its uses is mediated by \textbf{generic requirements}. The generic requirements describe the behavior of the generic parameter types inside the function body, and the generic arguments at the call site are checked against the declaration's generic requirements at compile time. \item Generic functions receive \textbf{runtime type metadata} for each generic argument from the caller. Type metadata defines operations to abstractly manipulate values of their type without knowledge of their concrete layout. \item Runtime type metadata is constructed for each type in the language. The \textbf{runtime type layout} of a generic type is computed recursively from the type metadata of the generic arguments. Generic types always store their contents without boxing or indirection. \item The optimizer can generate a \textbf{specialization} of a generic function in the case where the definition is visible at the call site. This eliminates the overhead of runtime type metadata and abstract value manipulation. \end{enumerate} An important part of compiler implementation is the design of domain objects to model concepts in the language being compiled. One way to think of a compiler is that it is \emph{a library for implementing the target language}. A well-designed set of domain objects facilitates the introduction of new language features that compose existing functionality in new ways. The generics implementation deals with four fundamental domain objects: \emph{generic signatures}, \emph{substitution maps}, \emph{requirement signatures}, and \emph{conformances}. As you will see, they are defined as much by their inherent structure, as their relationship with each other. Subsequent chapters will dive into all the details, but first, we're going to look at a series of worked examples to help you understand the big picture. \section{Generic Functions} Consider these two rather contrived function declarations: \begin{Verbatim} func identity(_ x: Int) -> Int { return x } func identity(_ x: String) -> String { return x } \end{Verbatim} Apart from the parameter and return type, both have the same exact definition, and indeed you can write the same function for any concrete type. Your aesthetic sense might lead you to replace both with a single generic function: \begin{Verbatim} func identity(_ x: T) -> T { return x } \end{Verbatim} While this function declaration is trivial, it illustrates some important concepts and allows us to introduce terminology. You'll see a full description of the compilation pipeline in the next chapter, but for now, let's consider a simplified view where we begin with parsing, then type checking, and finally code generation. \begin{figure}\captionabove{The abstract syntax tree for \texttt{identity(\_:)}}\label{identity ast} \begin{center} \begin{tikzpicture}[% grow via three points={one child at (0.5,-0.7) and two children at (0.5,-0.7) and (0.5,-1.4)}, edge from parent path={[->] (\tikzparentnode.south) |- (\tikzchildnode.west)}] \node [class] {\vphantom{p}function declaration: \texttt{identity}} child { node [class] {\vphantom{p}generic parameter list: \texttt{}} child { node [class] {\vphantom{p}generic parameter declaration: \texttt{T}}}} child [missing] {} child { node [class] {\vphantom{p}parameter declaration: \texttt{\_ x:\ T}} child { node [class] {\vphantom{p}type representation: \texttt{T}}}} child [missing] {} child { node [class] {\vphantom{p}type representation: \texttt{T}}} child { node [class] {\vphantom{p}body} child { node [class] {\vphantom{p}statement: \texttt{return x}} child { node [class] {\vphantom{p}expression: \texttt{x}}}} child [missing] {}} child [missing] {} child [missing] {}; \end{tikzpicture} \end{center} \end{figure} \index{function declaration} \index{generic parameter list} \index{generic parameter declaration} \index{type representation} \index{identifier} \index{name lookup} \paragraph{Parsing} Figure~\ref{identity ast} shows the abstract syntax tree produced by the parser before type checking. The key elements: \begin{enumerate} \item The \emph{generic parameter list} \texttt{} introduces a single \emph{generic parameter declaration} named \texttt{T}. As its name suggests, this declares the generic parameter type \texttt{T}, scoped to the entire source range of this function. \item The \emph{type representation} \texttt{T} appears twice, first in the the parameter declaration \verb|_ x: T| and then as return type of \verb|identity(_:)|. A type representation is the purely syntactic form of a type. The parser does not perform name lookup, so the type representation stores the identifier \texttt{T} and does not refer to the generic parameter declaration of \texttt{T} in any way. \item The function body contains an expression referencing \texttt{x}. Again, the parser does not perform name lookup, so this is just the identifier \texttt{x} and is not associated with the parameter declaration \verb|_ x: T|. \end{enumerate} \index{generic parameter type} \index{generic signature} \index{type resolution} \index{type} \index{interface type} \index{generic function type} \paragraph{Type checking} Some additional structure is formed during type checking: \begin{enumerate} \item The generic parameter declaration \texttt{T} declares the generic parameter type \texttt{T}. Types are distinct from type declarations in Swift; some types denote a \emph{reference} a type declaration, and some are \emph{structural} (such as function types or tuple types). \item The type checker constructs a \emph{generic signature} for our function declaration. The generic signature has the printed representation \texttt{} and contains the single generic parameter type \texttt{T}. This is the simplest possible generic signature, apart from the empty generic signature of a non-generic declaration. \item The type checker performs \emph{type resolution} to transform the type representation \texttt{T} appearing in our parameter declaration and return type into a semantic \emph{type}. Type resolution queries name lookup for the identifier \texttt{T} at the source location of each type representation, which finds the generic parameter declaration \texttt{T} in both cases. This type declaration declares the generic parameter type \texttt{T}, which becomes the resolved type. \item There is now enough information to form the function's \emph{interface type}, which is the type of a reference to this function from expression context. The interface type of a generic function declaration is a \emph{generic function type}, composed from the function's generic signature, parameter types, and return type: \begin{quote} \begin{verbatim} (T) -> T \end{verbatim} \end{quote} \end{enumerate} The final step is the type checking of the function's body. The expression type checker queries name lookup for the identifier \texttt{x}, which finds the parameter declaration \verb|_ x: T|. \index{archetype type} \index{primary archetype type} While the type of our function parameter is the generic parameter type \texttt{T}, inside the body of a generic function it becomes a different kind of type, called a \emph{primary archetype}. The distinction isn't terribly important right now, and it will be covered in Chapter~\ref{genericenv}. It suffices to say that we'll use the notation \archetype{T} for the primary archetype corresponding to the generic parameter type \texttt{T}. With that out of the way, the expression type checker assigns the type \archetype{T} to the expression \texttt{x} appearing in the return statement. As expected, this matches the declared return type of the function. \paragraph{Code generation} We've now successfully type checked our function declaration. How might we generate code for it? Recall the two concrete implementations that we folded into our single generic function: \begin{Verbatim} func identity(_ x: Int) -> Int { return x } func identity(_ x: String) -> String { return x } \end{Verbatim} The calling conventions of these functions differ significantly: \begin{enumerate} \item The first function receives and returns the \texttt{Int} value in a machine register. The \texttt{Int} type is \emph{trivial},\footnote{Or POD, for you C++ folks.} meaning it can be copied and moved at will. \item The second function is trickier. A \texttt{String} is stored as a 16-byte value in memory, and contains a pointer to a reference-counted buffer. When manipulating values of a non-trivial type like \texttt{String}, memory ownership comes into play. The standard ownership semantics for a Swift function call are defined such that the caller retains ownership over the parameter values passed into the callee, while the callee transfers ownership of the return value to the caller. This means that the \verb|identity(_:)| function cannot just return the value \texttt{x}; instead, it must first create a logical copy of \texttt{x} that it owns, and then return this owned copy. This is achieved by incrementing the string value's buffer reference count via a call to a runtime function. \end{enumerate} More generally, every Swift type has a size and alignment, and defines three fundamental operations that can be performed on all values of that type: moving the value, copying the value, and destroying the value. A move is semantically equivalent to, but more efficient than, copying a value followed by destroying the old copy.\footnote{Of course if move-only types are ever introduced into the language, this will no longer be so; a new kind of value will exist which cannot be copied.} With a trivial type, moving or copying a value simply copies the value's bytes from one memory location to another, and destroying a value does nothing. With a reference type, these operations update the reference count. Copying a reference increments the reference count on its heap-allocated backing storage, and destroying a reference decrements the reference count, deallocating the backing storage when the reference count reaches zero. Even more complex behaviors are also possible; a struct might contain a mix of trivial types and references, for example. Weak references and existential types also have non-trivial value operations. As the joke goes, every problem in computer science can be solved with an extra level of indirection. The calling convention for a generic function takes \emph{runtime type metadata} for every generic parameter in the function's generic signature. Every type in the language has a reified representation as runtime type metadata, storing the type's size and alignment together with function pointers implementing the move, copy and destroy operations. The generated code for a generic function abstractly manipulates values of generic parameter type using the runtime type metadata provided by the caller. An important property of runtime type metadata is \emph{identity}; two pointers to runtime type metadata are equal if and only if they represent the same type in the language. \newenvironment{MoreDetails}{\medskip\begin{mdframed}[rightline=true,frametitlerule=true,frametitlerulecolor=gray,frametitlebackgroundcolor=light-gray,frametitlerulewidth=2pt,backgroundcolor=light-gray,linecolor=gray,frametitle={More details}] \begin{itemize}}{\end{itemize} \end{mdframed}} \definecolor{light-gray}{gray}{0.90} \begin{MoreDetails} \item Types: Chapter~\ref{types} \item Function declarations: Section~\ref{func decls} \item Generic parameter lists: Chapter~\ref{generic declarations} \item Type resolution: Chapter~\ref{typeresolution} \end{MoreDetails} \paragraph{Substitution maps} Let us now turn our attention to the callers of generic functions. A \emph{call expression} references a \emph{callee} together with a list of arguments. The callee is some other expression with a function type. Some possible callees include references to named function declarations, type expressions (which invokes a constructor), function parameters and local variables of function type, and results of other calls which return functions. In our example, we might call the \verb|identity(_:)| function as follows: \begin{Verbatim} identity(3) identity("Hello, Swift") \end{Verbatim} The callee here is a direct reference to the declaration of \verb|identity(_:)|. In Swift, calls to generic functions never specify their generic arguments explicitly; instead, the type checker infers them from the types of call argument expressions. A reference to a named generic function stores a \emph{substitution map} mapping each generic parameter type of the callee's generic signature to the inferred generic argument, also called the \emph{replacement type}. The generic signature of \verb|identity(_:)| has a single generic parameter type. The two references to \verb|identity(_:)| have different substitution maps; the first substitution map has the replacement type \texttt{Int}, and the second \texttt{String}. We will use the following notation for these substitution maps: \[ \SubMap{\SubType{T}{Int}} \qquad \SubMap{\SubType{T}{String}} \] We can apply a substitution map to the interface type of our function declaration to get the \emph{substituted type} of the callee: \[\ttbox{ (T) -> T} \times \SubMap{\SubType{T}{Int}} = \ttbox{(Int) -> Int}\] Substitution maps also play a role in code generation. When generating a call to a generic function, the compiler emits code to realize the runtime type metadata for each replacement type in the substitution map. The types \texttt{Int} and \texttt{String} are \emph{nominal types} defined in the standard library. These types are non-generic and have a fixed layout, so their runtime type metadata can be recovered by taking the address of a constant symbol exported by the standard library. \index{structural type} \index{metadata access function} Structural types are slightly more complicated. Suppose we were instead compiling a call to \verb|identity(_:)| where the replacement type for \texttt{T} was some function type, say \verb|(Int, String) -> Float|. Function types can have arbitrary parameter and return types. Therefore, structural type metadata is \emph{instantiated} by calling one of several \emph{metadata access functions}, declared in the runtime. These runtime entry points take metadata for the parameter types and return type, construct metadata representing the function type, and cache the result for future accesses. \begin{MoreDetails} \item Substitution maps: Chapter~\ref{substmaps} \end{MoreDetails} \index{inlinable function} \paragraph{Specialization} The passing of runtime type metadata and the resulting indirect manipulation of values incurs a performance penalty. As an alternative, if the definition of a generic function is visible at the call site, the optimizer can generate a \emph{specialization} of the generic function by cloning the definition and applying the substitution map to all types appearing in the function's body. Definitions of generic functions are always visible to the specializer within their defining module. Shared library developers can also opt-in to exporting the body of a function across module boundaries with the \texttt{@inlinable} attribute. \begin{MoreDetails} \item \texttt{@inlinable} attribute: Section~\ref{module system} \end{MoreDetails} \section{Generic Types} \index{struct declaration} \index{stored property declaration} For our next example, consider this simple generic struct storing two values of the same type: \begin{Verbatim} struct Pair { let first: T let second: T init(first: T, second: T) { self.first = first self.second = second } } \end{Verbatim} This struct declaration contains three members: two stored property declarations, and a constructor declaration. Recall that declarations have an \emph{interface type}, which is the type of a reference to the declaration from expression context. The interface type of \texttt{first} and \texttt{second} is the generic parameter type \texttt{T}. \index{metatype type} When a type declaration is referenced from expression context the result is a value representing the type, and the type of this value is a metatype type, so the interface type of \texttt{Pair} is the metatype type \texttt{Pair.Type}. \index{declared interface type} \index{generic nominal type} Type declarations also have a more primitive notion of a \emph{declared interface type}, which is the type assigned to a reference to the declaration from type context. The declared interface type of \texttt{Pair} is the \emph{generic nominal type} \texttt{Pair}. The interface type of a type declaration is the metatype of its declared interface type. \index{context substitution map} Instances of \texttt{Pair} store their fields inline without boxing, and the layout of \texttt{Pair} depends on the generic parameter \texttt{T}. If you declare a local variable whose type is the generic nominal type \texttt{Pair}, the compiler can directly compute the type's layout to determine the size of the stack allocation: \begin{Verbatim} let twoIntegers: Pair = ... \end{Verbatim} To compute the layout, the compiler first factors the type \texttt{Pair} into the application of a substitution map to the declared interface type: \[\ttbox{Pair} = \ttbox{Pair} \times \SubMap{\SubType{T}{Int}}\] The compiler then computes the substituted type of each stored property by applying this substitution map to each stored property's interface type: \[\ttbox{T} \times \SubMap{\SubType{T}{Int}} = \ttbox{Int}\] Therefore both fields of \texttt{Pair} have a substituted type of \texttt{Int}. The \texttt{Int} type has a size of 8 bytes and an alignment of 8 bytes, from which we derive that \texttt{Pair} has a size of 16 bytes and alignment of 8 bytes. \index{metadata access function} However, the layout is not always known at compile time, in which case we need the runtime type metadata for \texttt{Pair}. When compiling the declaration of \texttt{Pair}, the compiler emits a \emph{metadata access function} which takes the type metadata for \texttt{T} as an argument. The metadata access function calculates the layout of \texttt{Pair} for this \texttt{T} with the same algorithm as the compiler, but at runtime, and caches the result. Note that the runtime type metadata for \texttt{Pair} has two parts: \begin{enumerate} \item A common prefix present in all runtime type metadata, which includes the total size and alignment of a value. \item A private area specific to the declaration of \texttt{Pair}, such as the \emph{field offset vector} storing the starting offset of each field within a value. \end{enumerate} The first part comes into play if we call our \verb|identity(_:)| function with a value of type \texttt{Pair}. The generated code for the call invokes a metadata access function for \texttt{Pair} with the metadata for \texttt{Int} as an argument, and passes the resulting metadata for \texttt{Pair} to \verb|identity(_:)|. The implementation of \verb|identity(_:)| doesn't know that it is dealing with a \texttt{Pair}, but it uses the provided metadata to abstractly manipulate the value. The second part is used by the constructor implementation. The constructor does not have a generic parameter list of its own, but it is nested inside of a generic type, so it inherits the generic signature of the type, which is \texttt{}. The interface type of this constructor is the generic function type: \begin{quote} \begin{verbatim} (T, T) -> Pair \end{verbatim} \end{quote} Recall our declaration of the \texttt{twoIntegers} variable. Let's complete the declaration by writing down an initial value expression which calls the constructor: \begin{Verbatim} let twoIntegers: Pair = Pair(first: 1, second: 2) \end{Verbatim} At the call site, we have full knowledge of the layout of \texttt{twoIntegers}. However, the implementation of \texttt{Pair.init} only knows that it is working with a \texttt{Pair}, and not a \texttt{Pair}. The generated code for the constructor calls the metadata access function for \texttt{Pair} with the provided metadata for \texttt{T}. Since it knows it is working with a \texttt{Pair}, it can look inside the private area to get the field offset of \texttt{first} and \texttt{second}, and assign the two parameters into the \texttt{first} and \texttt{second} stored properties of \texttt{self}. \begin{MoreDetails} \item Type declarations: Section~\ref{type declarations} \item Context substitution map: Section~\ref{contextsubstmap} \end{MoreDetails} \section{Protocols} Our \verb|identity(_:)| function and the \texttt{Pair} type did not state any generic requirements, so they couldn't do much with their generic values except pass them around, which the compiler expresses in terms of the fundamental value operations---move, copy and destroy. \index{generic requirement} \index{conformance requirement} \index{opaque parameter} \index{where clause} We can do more interesting things with our generic parameter types by writing down generic requirements. The most important kind is the \emph{protocol conformance requirement}, which states that the replacement type for a generic requirement must conform to the given protocol. \begin{Verbatim} protocol Shape { func draw() } func drawShapes(_ shapes: [S]) { for shape in shapes { shape.draw() } } \end{Verbatim} The \verb|drawShapes(_:)| function takes an array of values whose type conforms to \texttt{Shape}. You can also write the declaration of \verb|drawShapes(_:)| using a trailing \texttt{where} clause, or avoid the explicit generic parameter list altogether and declare an \emph{opaque parameter type} instead: \begin{Verbatim} func drawShapes(_ shapes: [S]) where S: Shape func drawShapes(_ shapes: [some Shape]) \end{Verbatim} \index{generic signature} The generic signatures we've seen previously were rather trivial, only storing a single generic parameter type. More generally, a generic signature actually consists of a list of generic parameter types together with a list of requirements. Irrespective of the surface syntax, the generic signature of \verb|drawShape(_:)| will have a single requirement. We will use the following notation for generic signatures with requirements: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} The interface type of \verb|drawShapes(_:)| is a generic function type incorporating this generic signature: \begin{quote} \begin{verbatim} (S) -> () \end{verbatim} \end{quote} \index{qualified lookup} Inside the body of \verb|drawShapes(_:)|, the \texttt{shape} local variable bound by the \texttt{for}~loop is a value of type \archetype{S} (remember, generic parameter types become archetype types inside the function body; but as before, the distinction doesn't matter right now). Since \texttt{S} is subject to the conformance requirement \verb|S: Shape|, we can call the \texttt{draw()} method of the \texttt{Shape} protocol on \texttt{shape}. More precisely, a \emph{qualified lookup} of the identifier \texttt{draw} with a base type of \archetype{S} will find the \texttt{draw()} method of \texttt{Shape} as a consequence of the conformance requirement. \index{witness table} How does the compiler generate code for the call \verb|shape.draw()|? Once again, we need to introduce some indirection. For each conformance requirement in the generic signature of a generic function, the generic function receives a \emph{witness table} from the caller. The layout of a witness table is determined by the protocol's requirements; a method becomes an entry storing a function pointer. To call our protocol method, the compiler loads the function pointer from the witness table, and invokes it with the argument value of \texttt{shape}. Note that \verb|drawShapes(_:)| operates on a homogeneous array of shapes. While the array contains an arbitrary number of elements, \verb|drawShapes(_:)| only receives a single runtime type metadata for \texttt{S}, and one witness table for the conformance requirement \verb|S: Shape|, which together describe all elements of the array. \begin{MoreDetails} \item Protocols: Section~\ref{protocols} \item Constraint types: Section~\ref{constraints} \item Trailing \texttt{where} clauses: Section~\ref{trailing where clauses} \item Opaque parameters: Section~\ref{opaque parameters} \item Name lookup: Section~\ref{name lookup} \end{MoreDetails} \index{conformance} \index{normal conformance} \paragraph{Conformances} We can write a struct declaration conforming to \texttt{Shape}: \begin{Verbatim} struct Circle: Shape { let radius: Double func draw() {...} } \end{Verbatim} The declaration of \texttt{Circle} states a \emph{conformance} to the \texttt{Shape} protocol in its inheritance clause. The type checker constructs an object called a \emph{normal conformance}, which records the mapping from the protocol's requirements to the members of the conforming type which \emph{witness} those requirements. When the compiler generates the code for the declaration of \texttt{Circle}, it emits a witness table for each normal conformance defined on the type declaration. In our case, there is just a single requirement \texttt{Shape.draw()}, witnessed by the method \texttt{Circle.draw()}. The witness table for this conformance references the witness (indirectly, because the witness is always wrapped in a \emph{thunk}, which is a small function which shuffles some registers around and then calls the actual witness. This must be the case because protocol requirements use a slightly different calling convention than ordinary generic functions). Now, let's look at a call to \verb|drawShape(_:)| with an array of circles: \begin{Verbatim} drawShapes([Circle(radius: 1), Circle(radius: 2)]) \end{Verbatim} Recall that a reference to a generic function declaration comes with a substitution map. Substitution maps store a replacement type for each generic parameter of a generic signature, so our substitution map maps \texttt{S} to the replacement type \texttt{Circle}. When the generic signature has conformance requirements, the substitution map also stores a conformance for each conformance requirement. This is the ``proof'' that the concrete replacement type actually conforms to the protocol. \index{global conformance lookup} The type checker finds conformances by \emph{global conformance lookup}. The call to \verb|drawShape(_:)| will only type check if the replacement type conforms to \texttt{Shape}; the type checker rejects a call that provides an array of integers for example, because there is no conformance of \texttt{Int} to \texttt{Shape}.\footnote{Of course, you could define this conformance with an extension.} We will use the following notation for substitution maps storing a conformance: \[\SubMapC{\SubType{S}{Circle}}{\SubConf{Circle:\ Shape}}\] When emitting code to call to a generic function, the compiler looks at the substitution map and emits a reference to runtime type metadata for each replacement type, and a reference to the witness table for each conformance. In our case, \verb|drawShapes(_:)| takes a single runtime type metadata and a single witness table for the conformance. (The contents of the witness table were emitted when compiling the declaration of \texttt{Circle}; compiling the substitution map references this existing witness table.) \begin{MoreDetails} \item Conformances: Chapter~\ref{conformances} \item Conformance lookup: Section~\ref{conformance lookup} \end{MoreDetails} \index{identifier type representation} \index{associated type} \paragraph{Associated types} Perhaps the simplest example of a protocol with an associated type is the \texttt{Iterator} protocol in the standard library. This protocol abstracts over an iterator which produces elements of a type that depends on the conformance: \begin{Verbatim} protocol IteratorProtocol { associatedtype Element mutating func next() -> Element? } \end{Verbatim} Consider a generic function which returns the first element produced by an iterator: \begin{Verbatim} func firstElement(_ iter: inout I) -> I.Element { return iter.next()! } \end{Verbatim} The return type of our function is the \emph{identifier type representation} \texttt{I.Element} with two components, ``\texttt{I}'' and ``\texttt{Element}''. Type resolution resolves this type representation to a type by performing a qualified lookup of \texttt{Element} on the base type \texttt{I}. The generic parameter type \texttt{I} is subject to a conformance requirement, and qualified lookup finds the associated type declaration \texttt{Element}. \index{dependent member type} The resolved type is a \emph{dependent member type} composed from the generic parameter type \texttt{I} and associated type declaration \texttt{Element}. We will denote this dependent member type as \verb|I.[IteratorProtocol]Element| to make explicit the fact that a name lookup has resolved the identifier \texttt{Element} to an associated type. The interface type of \verb|firstElement(_:)| is therefore this generic function type: \begin{quote} \begin{verbatim} (inout I) -> I.[IteratorProtocol]Element \end{verbatim} \end{quote} \begin{MoreDetails} \item Identifier type representations: Section \ref{identtyperepr} \end{MoreDetails} \index{type parameter} \paragraph{Type parameters} A \emph{type parameter} in some fixed generic signature is either a generic parameter type, or a dependent member type whose base type conforms to the protocol of this associated type. The generic signature of \verb|firstElement(_:)| has two valid type parameters: \begin{quote} \begin{verbatim} I I.[IteratorProtocol]Element \end{verbatim} \end{quote} As with generic parameter types, dependent member types become primary archetypes in the body of a generic function; we can reveal a little more about the structure of primary archetypes now, and say that a primary archetype packages a type parameter together with a generic signature. Inside the body of \verb|firstElement(_:)|, the result of the call expression \verb|iter.next()!| is the optional type \texttt{\archetype{I.Element}?}, which is force-unwrapped to yield the archetype type \archetype{I.Element}. To manipulate a value of the element type abstractly, the compiler must be able to recover its runtime type metadata. While metadata for generic parameters is passed in directly, for dependent member types the metadata is recovered from one or more witness tables provided by the caller. A witness table for a conformance to \texttt{IteratorProtocol} stores two entries, one for each of the protocol's requirements: \begin{itemize} \item A metadata access function to witness the \texttt{Element} associated type. \item A function pointer to witness the \texttt{next()} protocol requirement. \end{itemize} \index{type witness} \paragraph{Type witnesses} When a concrete type conforms to a protocol, the normal conformance stores a \emph{type witness} for each of the protocol's associated types; this information is populated by the type checker during conformance checking. \begin{listing}\captionabove{Iterator producing the natural numbers}\label{natural numbers listing} \begin{Verbatim} struct NaturalNumbers: IteratorProtocol { typealias Element = Int var x = 0 mutating func next() -> Int? { defer { x += 1 } return x } } \end{Verbatim} \end{listing} Listing~\ref{natural numbers listing} shows a type that conforms to \texttt{IteratorProtocol} by producing an infinite stream of incrementing integers. Here, the associated type \texttt{Element} is witnessed by a type alias declaration with an underlying type of \texttt{Int}. This matches the return type of \texttt{NaturalNumbers.next()}. Indeed, we can omit the type alias entirely in this case, and instead rely on \emph{associated type inference} to derive it from the interface type of the witness. Suppose we call \verb|firstElement(_:)| with a value of type \texttt{NaturalNumbers}: \begin{Verbatim} var iter = NaturalNumbers() print(firstElement(&iter)) \end{Verbatim} The substitution map for the call stores the replacement type \texttt{NaturalNumbers} and the conformance of \texttt{NaturalNumbers} to \texttt{IteratorProtocol}: \begin{quote} \SubMapC{\SubType{I}{NaturalNumbers}}{\SubConf{NaturalNumbers:\ IteratorProtocol}} \end{quote} To compute the substituted type of the call, we apply our substitution map to the interface type of \verb|firstElement(_:)|. Substitution transforms the parameter type \texttt{I} to the replacement type \texttt{NaturalNumbers}. To compute the substituted return type for \verb|I.[IteratorProtocol]Element|, we can look up the type witness in the conformance stored in the substitution map. This is entirely analogous to how the generated code for our function is able to recover the runtime type metadata for this dependent member type from a witness table at run time. The normal conformance of \verb|NaturalNumbers: IteratorProtocol| can be found in the substitution map, and it stores the type witness for \verb|Element|, which is \verb|Int|. The substituted return type is \verb|Int|, and the substituted function type for the call is therefore: \begin{quote} \begin{verbatim} (inout NaturalNumbers) -> Int \end{verbatim} \end{quote} \begin{MoreDetails} \item Type witnesses: Section~\ref{type witnesses} \item Dependent member type substitution: Section~\ref{abstract conformances} \end{MoreDetails} \index{associated conformance} \index{requirement signature} \paragraph{Associated conformances} Protocols can also impose requirements on their associated types. The \texttt{Sequence} protocol in the standard library is one such example: \begin{Verbatim} protocol Sequence { associatedtype Element associatedtype Iterator: IteratorProtocol where Element == Iterator.Element func makeIterator() -> Iterator } \end{Verbatim} There are two requirements here: \begin{enumerate} \item The conformance requirement \verb|Iterator: IteratorProtocol|, which is written as a constraint type in the inheritance clause of the \texttt{Iterator} associated type. \item The same-type requirement \verb|Element == Iterator.Element|, written in a trailing \texttt{where} clause. \end{enumerate} Requirements on the generic parameters of a generic function or generic type are collected in the declaration's generic signature. A protocol analogously has a \emph{requirement signature} which collects the requirements imposed on its associated types. A protocol always declares a single generic parameter named \texttt{Self}, and our notation for a requirement signature looks like a generic signature over the protocol \texttt{Self} type: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} The conformance requirement \verb|Self.[Sequence]Iterator: IteratorProtocol| is an \emph{associated conformance requirement}, and associated conformance requirements appear in protocol witness tables. Therefore a witness table for a conformance to \texttt{Sequence} has \emph{four} entries: \begin{enumerate} \item A metadata access function to witness the \texttt{Element} associated type. \item A metadata access function to witness the \texttt{Iterator} associated type. \item A witness table access function to witness the associated conformance requirement \verb|Iterator: IteratorProtocol|. \item A function pointer to the witness the \texttt{makeIterator()} protocol requirement. \end{enumerate} \index{abstract conformance} \paragraph{Abstract conformances} Let's define a \verb|firstElementSeq(_:)| function which operates on a sequence.\footnote{We could give both functions the same name and take advantage of function overloading, but for clarity we're not going to do that.} We can call the \verb|makeIterator()| protocol requirement to create an iterator for our sequence, and then hand off this iterator to the \verb|firstElement(_:)| function we defined previously: \begin{Verbatim} func firstElementSeq(_ sequence: S) -> S.Element { var iter = sequence.makeIterator() return firstElement(&iter) } \end{Verbatim} The substitution map for the call to \verb|firstElement(_:)| is interesting. The argument \texttt{iter} has the type \archetype{S.Element}, which becomes the replacement type for the generic parameter \texttt{I} of \verb|firstElement(_:)|. Recall that this substitution map also needs to store a conformance. Since the conforming type is an archetype and not a concrete type, global conformance lookup returns an \emph{abstract conformance}. So our substitution map looks like this: \[\SubMapC{\SubType{I}{\archetype{S.Iterator}}}{\SubConf{\archetype{S.Iterator}:\ IteratorProtocol}}\] When generating code for the call, we need to emit runtime type metadata for \texttt{I} as well as a witness table for \verb|I: IteratorProtocol|. Both of these are recovered from the witness table for the conformance \verb|S: Sequence| that was passed by the caller of \verb|firstElementSeq(_:)|: \begin{enumerate} \item The replacement type for \texttt{I} is \archetype{S.Iterator}. Runtime type metadata for this type is recovered by calling the metadata access function for the \texttt{Iterator} associated type stored in the \verb|S: Sequence| witness table. \item The conformance for \verb|I: Iterator| is an abstract conformance. We know the type \archetype{S.Iterator} conforms to \verb|IteratorProtocol| because the \texttt{Sequence} protocol says that it does. Therefore, the witness table for this conformance is recovered by calling the witness table access function for the \verb|Iterator: IteratorProtocol| associated conformance in our \verb|S: Sequence| witness table. \end{enumerate} Recall that the shape of the substitution map is determined by the generic signature of the callee. In our earlier examples, the replacement types and conformances were fully concrete, which allowed us to emit runtime type metadata and witness tables for a call by referencing global symbols. More generally, the replacement types and conformances are defined in terms of the type parameters of the caller's generic signature. This makes sense, because we start with the runtime type metadata and witness tables received by the caller, from which we recover the runtime metadata and witness tables required by the callee. Here, the caller is \verb|firstElementSeq(_:)| and the callee is \verb|firstElement(_:)|. \section{Language Comparison} \index{C++} \index{Java} \index{Rust} Swift generics occupy a unique point in the design space, which avoids some of the tradeoffs inherent in the design of other popular languages: \begin{itemize} \item C++ templates do not allow for separate compilation and type checking. When a template declaration is compiled, only minimal semantic checks are performed and no code is actually generated. The body of a template declaration must be visible at each expansion point, and full semantic checks are performed after template expansion. There is no formal notion of requirements on template parameters; at a given expansion point, template expansion either succeeds or fails depending on how the substituted template parameters are used in the body of the template. \item Rust generics are separately type checked with the use of generic requirements. Unlike C++, specialization is not part of the semantic model of the language, but it is mandated by the implementation because Rust does not define a calling convention for unspecialized generic code. After type checking, the compiler completely specializes all usages of generic definitions for every set of provided generic arguments. \item Java generics are separately type checked and compiled. Only reference types can be used as generic arguments; primitive value types must be boxed on the heap. The implementation strategy uses a uniform runtime layout for all generic types, and generic argument types are not reified at runtime. This avoids the complexity of generic type layout at the virtual machine level, but it comes at the cost of runtime type checks and heap allocation. \end{itemize} We can summarize this with a table. \begin{quote} \begin{tabular}{|l|>{\centering}p{1.3cm}|>{\centering}p{1.3cm}|>{\centering}p{1.3cm}|>{\centering\arraybackslash}p{1.3cm}|} \hline &C++&Rust&Java&Swift\\ \hline Separate compilation&$\times$&$\times$&\checkmark&\checkmark\\ Specialization&\checkmark&$\checkmark$&$\times$&\checkmark\\ Generic requirements&$\times$&$\checkmark$&$\checkmark$&\checkmark\\ Unboxed values&\checkmark&\checkmark&$\times$&\checkmark\\ \hline \end{tabular} \end{quote} \chapter{Compilation Model}\label{compilation model} \index{Xcode} \index{Swift package manager} \index{Swift driver} \index{shared library} \index{framework} Most developers interact with the Swift compiler through Xcode and the Swift package manager, but for simplicity let's just consider direct invocation of \texttt{swiftc} from the command line. You can invoke \texttt{swiftc}, passing a list of all source files in your module as command line arguments: \begin{Verbatim} $ swiftc m.swift v.swift c.swift \end{Verbatim} The \texttt{swiftc} command runs the \emph{Swift driver}. By default, the driver emits an executable. When building frameworks (or libraries, if you're not versed in Apple jargon), the driver is invoked with the \texttt{-emit-library} and \texttt{-emit-module} flags, which generate a shared library and binary module file instead. Binary modules are consumed by the compiler when importing the framework, and are discussed in Section~\ref{module system}. \index{main function} \index{main source file} \index{top-level code declaration} Executables must define a \emph{main function}, which is the entry point invoked when the executable is run. There are three mechanisms for doing so: \begin{enumerate} \item If the module consists of a single source file, or if there are multiple source files and one of them is named \texttt{main.swift}, then this file becomes the \emph{main source file} of the module. The main source file can contain statements at the top level, outside of a function body; consecutive top-level statements are collected into \emph{top-level code declarations}. The main function executes the statements of each top-level code declaration in order. Source files other than the main source file cannot contain top-level code declarations. \item If a struct, enum or class declaration is annotated with the \texttt{@main} attribute, the declaration must contain a static method named \texttt{main()}; this method becomes the main entry point. This attribute was introduced in Swift 5.3~\cite{se0281}. \item The \texttt{@NSApplicationMain} and \texttt{@UIApplicationMain} attributes are an older way to specify the main entry point on Apple platforms. When applied to a class adopting the \texttt{NSApplicationMain} or \texttt{UIApplicationMain} protocol, a main entry point is generated which calls the \texttt{NSApplicationMain()} or \texttt{UIApplicationMain()} system framework function. \end{enumerate} \index{frontend job} \index{Swift frontend} \index{batch mode} \index{whole module optimization} \index{single file mode} The Swift driver schedules \emph{frontend jobs} to perform the actual compilation work. Each frontend job runs the \emph{Swift frontend} process, which is what compiler developers think of as ``the compiler.'' Multiple frontend jobs can run in parallel, leveraging multi-core concurrency. By default, the number of concurrent frontend jobs is determined by the number of CPU cores; this can be overridden with the \texttt{-j} driver flag. If there are more frontend jobs than can be run simultaneously, the driver queues them and kicks them off as other frontend jobs complete. Source files are divided among frontend jobs according to the \emph{compilation mode}: \begin{enumerate} \item In \emph{batch mode}, source files are partitioned into fixed-size batches, up to the maximum batch size. Each frontend job compiles the source files of a single batch. This is the default. \item In \emph{single file mode}, there is frontend job per source file, which is effectively the same as batch mode with a maximum batch size of one. Single file mode is only used for debugging and performance testing the compiler itself. The \texttt{-disable-batch-mode} command line flag instructs the driver to run in single file mode. \item In \emph{whole module optimization mode}, there is no parallelism; a single frontend job is scheduled to build all source files. This trades build time for quality of generated code, because the compiler is able to perform more aggressive optimization across source file boundaries. The \texttt{-wmo} driver flag enables whole module optimization. \end{enumerate} The Swift frontend itself is single-threaded, therefore a source file is the minimum unit of parallelism. \index{incremental build} In batch mode and single file mode, the driver can also perform an \emph{incremental build} by re-using the result of previous compilations, providing an additional compile-time speedup. Incremental builds are described in Section~\ref{request evaluator}. \index{primary file} \index{secondary file} The driver invokes the frontend with a list of \emph{primary files} and \emph{secondary files}. The primary files are those that this specific frontend job is tasked with building, and the secondary files are the remaining source files in the module. Each source file is a primary file of exactly one frontend job, and each frontend job's primary files and secondary files together form the full list of source files in the module. The \verb|-###| driver flag performs a ``dry run'' which prints all commands to run without actually doing anything. \begin{Verbatim} $ swiftc m.swift v.swift c.swift -### swift-frontend -frontend -c -primary-file m.swift v.swift c.swift ... swift-frontend -frontend -c m.swift -primary-file v.swift c.swift ... swift-frontend -frontend -c m.swift v.swift -primary-file c.swift ... ld m.o v.o c.o -o main \end{Verbatim} In the above, we're performing a batch mode build, but the module only has three source files, so for maximum parallelism each batch consists of a single source file. Therefore, each frontend job has a single primary file, with the other two source files becoming the secondary files for the job. The final command is the linker invocation, which combines the output of each frontend job into our binary executable. \begin{figure}\captionabove{The compilation pipeline}\label{compilerpipeline} \begin{center} \begin{tikzpicture}[node distance=1.5cm] \tikzstyle{stage} = [rectangle, draw=black, text centered] \tikzstyle{arrow} = [->,>=stealth] \node (Parse) [stage] {Parse}; \node (Sema) [stage, below of=Parse] {Sema}; \node (SILGen) [stage, below of=Sema] {SILGen}; \node (SILOptimizer) [stage, below of=SILGen] {SILOptimizer}; \node (IRGen) [stage, below of=SILOptimizer] {IRGen}; \node (LLVM) [stage, below of=IRGen] {LLVM}; \draw [arrow] (Parse) -- (Sema); \draw [arrow] (Sema) -- (SILGen); \draw [arrow] (SILGen) -- (SILOptimizer); \draw [arrow] (SILOptimizer) -- (IRGen); \draw [arrow] (IRGen) -- (LLVM); \end{tikzpicture} \end{center} \end{figure} \medskip \index{parser} \index{Sema} \index{SILGen} \index{SIL optimizer} \index{IRGen} \index{LLVM} \index{SIL mandatory pass} \index{SIL performance pass} \index{raw SIL} \index{canonical SIL} The frontend implements a classic multi-stage compiler pipeline, shown in Figure~\ref{compilerpipeline}: \begin{itemize} \item \textbf{Parse:} First, all source files are parsed into an abstract syntax tree. \item \textbf{Sema:} Semantic analysis type-checks and validates the abstract syntax tree. \item \textbf{SILGen:} The type-checked syntax tree is lowered to \emph{raw} SIL. \item \textbf{SILOptimizer:} The raw SIL is transformed into \emph{canonical} SIL by a series of \emph{mandatory passes}, which analyze the control flow graph and emit diagnostics; for example, \emph{definite initialization} ensures that all storage locations are initialized. When the \texttt{-O} command line flag is specified, the canonical SIL is further optimized by a series of \emph{performance passes} with the goal of improving run-time performance and reducing code size. \item \textbf{IRGen:} The optimized SIL is then transformed into LLVM IR. \item \textbf{LLVM:} Finally, the LLVM IR is handed off to LLVM, which performs various lower level optimizations before generating machine code. \end{itemize} Each pipeline phase can emit warnings and errors. The parser attempts to recover from errors; the presence of parse errors does not prevent Sema from running. On the other hand, if Sema emits errors, compilation stops; SILGen does not attempt to lower an invalid abstract syntax tree to SIL. \index{TBD} \index{textual interface} The pipeline will be slightly different depending on what the driver and frontend were asked to produce. When the frontend is instructed to emit a binary module file only, and not an object file, compilation stops after the SIL optimizer. When generating a textual interface file or TBD file, compilation stops after Sema. (Textual interfaces are discussed in Section~\ref{module system}. A TBD file is a list of symbols in a shared library, which can be consumed by the linker and is faster to generate than the shared library itself; we're not going to talk about them here.) \index{synthesized declaration} \index{s-expression} \index{Lisp} \index{assembly language} Various command-line flags print the output of each phase to the terminal (or some other file in conjunction with the \texttt{-o} flag), useful for debugging the compiler: \begin{itemize} \item \texttt{-dump-parse} prints the parsed syntax tree as an s-expression.\footnote{The term comes from Lisp. An s-expression represents a tree structure as nested parenthesized lists; e.g.\ \texttt{(a (b c) d)} is a node with three children \texttt{a}, \texttt{(b c)} and \texttt{d}, and \texttt{(b c)} has two children \texttt{b} and \texttt{c}.} \item \texttt{-dump-ast} prints the type-checked syntax tree as an s-expression. \item \texttt{-print-ast} prints the type-checked syntax tree in a form that approximates what was written in source code. This is useful for getting a sense of what declarations the compiler synthesized, for example for derived conformances to protocols like \texttt{Equatable}. \item \texttt{-emit-silgen} prints the raw SIL output by SILGen. \item \texttt{-emit-sil} prints the canonical SIL output by the SIL optimizer. To see the output of the performance pipeline, also pass \texttt{-O}. \item \texttt{-emit-ir} prints the LLVM IR output by IRGen. \item \texttt{-S} prints the assembly output by LLVM. \end{itemize} \index{frontend flag} Some command-line flags, such as those listed above, are understood by both the driver and the frontend. Certain other flags used for compiler development and debugging and only known to the frontend. If the driver is invoked with the \texttt{-frontend} flag as the first command line flag, then instead of scheduling frontend jobs, the driver spawns a single frontend job, passing it the rest of the command line without further processing: \begin{Verbatim} $ swiftc -frontend -typecheck -primary-file a.swift b.swift \end{Verbatim} Another mechanism for passing flags to the frontend is the \texttt{-Xfrontend} flag. When this flag appears in a command-line invocation of the driver, the command line argument that comes immediately after is passed to the frontend: \begin{Verbatim} $ swiftc a.swift b.swift -Xfrontend -dump-requirement-machine \end{Verbatim} The SIL intermediate form is described in \cite{sil}. \section{Name Lookup}\label{name lookup} \index{qualified lookup} \index{unqualified lookup} Name lookup is the process of resolving identifiers to declarations. The Swift compiler does not have a distinct ``name binding'' phase; instead, name lookup is queried from various points in the compilation process. Broadly speaking, there are two kinds of name lookup: \emph{unqualified lookup} and \emph{qualified lookup}. An unqualified lookup resolves a single identifier \texttt{foo}, while qualified lookup resolves an identifier \texttt{bar} relative to a base, such as \texttt{foo.bar}. There are also three important variations which are described immediately after the two fundamental kinds. \paragraph{Unqualified lookup} An unqualified lookup is always performed relative to the source location where the identifier actually appears. The source location may be inside of a primary file or secondary file. The first time an unqualified lookup is performed inside a source file, a \emph{scope tree} is constructed by walking the source file's abstract syntax tree. The root scope is the source file itself. Each scope has an associated source range, and zero or more child scopes; each child scope's source range must be a subrange of the source range of its parent, and the source ranges of sibling scopes are disjoint. Each scope introduces zero or more \emph{variable bindings}. \index{top-level lookup} \index{scope tree} \index{source range} \index{source location} Unqualified lookup first finds the innermost scope containing the source location, and proceeds to walk the scope tree up to the root, searching each parent node for bindings named by the given identifier. If the lookup reaches the root node, a \emph{top-level lookup} is performed next. This will look for top-level declarations named by the given identifier, first in all source files of the current module, followed by all imported modules. \index{direct lookup} \paragraph{Qualified lookup} A qualified lookup looks inside a list of type declarations for members with a given name. Starting from an initial list of type declarations, qualified lookup also visits the superclass of a class declaration, and conformed protocols. The more primitive operation performed at each step is called a \emph{direct lookup}, which searches inside a single type declaration and its extensions only, by consulting the type declaration's \emph{lookup table}. \index{module lookup} \paragraph{Module lookup} A qualified lookup where the base is a module declaration searches for a top-level declaration in the given module and any other modules that it re-exports via \texttt{@\_exported import}. \index{dynamic lookup} \index{AnyObject lookup} \index{Objective-C} \paragraph{Dynamic lookup} A qualified lookup where the base is the \texttt{AnyObject} type implements the legacy Objective-C behavior of a message send to \texttt{id}, which can invoke any method defined in any Objective-C class or protocol. In Swift, a dynamic lookup searches a global lookup table constructed from all \texttt{@objc} members of all classes and protocols. Any class can contain \texttt{@objc} members; the attribute can either be explicitly stated, or inferred if the method overrides an \texttt{@objc} method from the superclass. Protocol members are \texttt{@objc} only if the protocol itself is \texttt{@objc}. \index{partial order} \paragraph{Operator lookup} Operator symbols are declared at the top level of a module. Operator symbols have a fixity (prefix, infix, or postfix), and infix operators also have a \emph{precedence group}. Precedence groups are partially ordered with respect to other precedence groups. Standard operators like \texttt{+} and \texttt{*} and their precedence groups are thus defined in the standard library, rather than being built-in to the language itself. \index{sequence expression} \index{operator symbol} \index{precedence group} \index{operator lookup} An arithmetic expression like \texttt{2 + 3 * 6} is parsed as a \emph{sequence expression}, which is a flat list of nodes and operator symbols. The parser does not know the precedence, fixity or associativity of the \texttt{+} and \texttt{*} operators. Indeed, it does not know that they exist at all. The \emph{pre-check} phase of the expression type checker looks up operator symbols and transforms sequence expressions into the more familiar nested tree form. Operator symbols do not themselves have an implementation; they are just names. An operator symbol can be used as the name of a function implementing the operator on a specific type (for prefix and postfix operators) or a specific pair of types (for infix operators). Operator functions can be declared either at the top level, or as a member of a type. As far as a name lookup is concerned, the interesting thing about operator functions is that they are visible globally, even when declared inside of a type. Operator functions are found by consulting the operator lookup table, which contains top-level operator functions as well as member operator functions of all declared types. When the compiler type checks the expression \texttt{2 + 3 * 6}, it must pick two specific operator functions for \texttt{+} and \texttt{*} among all the possibilities in order to make this expression type check. In this case, the overloads for \texttt{Int} are chosen, because \texttt{Int} is the default literal type for the literals \texttt{2}, \texttt{3} and \texttt{6}. \begin{listing}\captionabove{Operator lookup in action}\label{customops} \begin{Verbatim} prefix operator <&> infix operator ++: MyPrecedence infix operator **: MyPrecedence precedencegroup MyPrecedence { associativity: right higherThan: AdditionPrecedence } // Member operator examples struct Chicken { static prefix func <&>(x: Chicken) {} static func ++(lhs: Chicken, rhs: Chicken) -> Int {} } struct Sausage { static func ++(lhs: Sausage, rhs: Sausage) -> Bool {} } // Top-level operator example func **(lhs: Sausage, rhs: Sausage) -> Sausage {} // Global operator lookup finds Sausage.++ // `fn' has type (Sausage, Sausage) -> Bool let fn = { ($0 ++ $1) as Bool } \end{Verbatim} \end{listing} Listing~\ref{customops} shows the definition of some custom operators and precedence groups. Note that the overload of \texttt{++} inside struct \texttt{Chicken} returns \texttt{Int}, and the overload of \texttt{++} inside struct \texttt{Sausage} returns \texttt{Bool}. The closure value stored in \texttt{fn} applies \texttt{++} to two anonymous closure parameters, \verb|$0| and \verb|$1|. While they do not have declared types, by simply coercing the \emph{return type} to \texttt{Bool}, we are able to unambiguously pick the overload of \texttt{++} declared in \texttt{Sausage}. (Whether this is good style is an exercise for the reader.) Initially, infix operators defined their precedence as an integer value; Swift 3 introduced named precedence groups \cite{se0077}. The global lookup for operator functions dates back to when all operator functions were declared at the top level. Swift~3 also introduced the ability to declare operator functions as members of types, but the global lookup behavior was retained \cite{se0091}. \section{Delayed Parsing} \index{primary file} \index{secondary file} The above ``compilation pipeline'' model is a simplification of the actual state of affairs. Recall that in the case where the driver schedules multiple frontend jobs, the list of source files is partitioned into disjoint subsets, where each subset becomes the primary files of some frontend job. Ultimately, each frontend job only needs to generate machine code from the declarations in its primary files, so all stages from SILGen onward operate on the frontend job's primary files only. However, the situation with parsing and type checking is more subtle. At a minimum, each frontend job must parse and type check its primary files. Furthermore, the partition of source files into frontend jobs is artificial and not visible to the user, and certainly a declaration in a primary file can reference declarations in secondary files. Therefore, in the general case, the abstract syntax tree for all secondary files must be available to a frontend job as well. On the other hand, it would be inefficient if every frontend job was required to fully parse all secondary files, because the time spent in the parser would be proportional to the number of frontend jobs multiplied by the number of source files, negating the benefits of parallelism. The \emph{delayed parsing} optimization solves this dilemma. When parsing a secondary file for the first time, syntax tree nodes for the bodies of top-level types, extensions and functions are not actually built. Instead, the parser operates in a high-speed mode where comments are skipped and pairs of braces are matched, but very little other work is performed. This constructs a ``skeleton'' representation of each secondary file. If the body of a type or extension in a secondary file is needed later---for example, because the type checking of a declaration in a primary file needs to perform a name lookup into this type---the source range of the declaration is parsed again, this time building the full syntax tree. \index{operator lookup} Operator lookup is incompatible with delayed parsing, because operator functions defined inside types are globally visible, as explained in the previous section. To deal with this, the parser looks for the keyword ``\texttt{func}'' followed by an operator symbol when skipping a type or extension body in a secondary file. The presence of this token sequence effectively disables delayed parsing for this declaration, because the first time an operator lookup is performed in the expression pre-checking pass, the bodies of all types containing operator functions are parsed again. Most types and extensions do not define operator functions, so this occurs rarely in practice. \index{AnyObject lookup} \index{dynamic lookup} \index{Objective-C} The situation with \texttt{AnyObject} lookup is similar, since a method call on a value of type \texttt{AnyObject} must consult a global lookup table constructed from \texttt{@objc} members of classes, and the (implicitly \texttt{@objc}) members of \texttt{@objc} protocols. Unlike operator functions, classes and \texttt{@objc} protocols are quite common in Swift programs, so it would be unfortunate to penalize compile-time performance when \texttt{AnyObject} is a rarely-used feature. Instead, the solution is to eagerly parse classes and \texttt{@objc} protocols the first time a frontend job encounters a dynamic \texttt{AnyObject} method call. There's actually one more complication here. Classes can be nested inside of other types, whose bodies are skipped if they appear in a secondary file. This is resolved with the same trick as operator lookup. When skipping the body of a type, the parser looks for occurrences of the ``\texttt{class}'' keyword. If the body contains this keyword, this type is parsed and its members visited recursively when building the \texttt{AnyObject} global lookup table. Most Swift programs, even those making heavy use of Objective-C interoperability, do not contain a dynamic \texttt{AnyObject} method call in every source file, so delayed parsing remains effective. \begin{example}\label{anyobjectdelayedparseex} Listing~\ref{anyobjectdelayedparse} shows an example of this behavior. This program consists of three files. Suppose that the driver kicks off three frontend jobs, with a single primary file for each frontend job: \begin{itemize} \item The frontend job with the primary file \texttt{a.swift} will parse \texttt{b.swift} and \texttt{c.swift} as secondary files. The body of \texttt{g()} in \texttt{b.swift} is skipped, and the body of \texttt{Outer} in \texttt{c.swift} is skipped. The parser makes a note that \texttt{Outer} contains the \texttt{class} keyword. The function \texttt{f()} in \texttt{a.swift} contains a dynamic \texttt{AnyObject} method call, so this frontend job will construct the global lookup table, triggering parsing of \texttt{Outer} and \texttt{Inner} in \texttt{c.swift}. \item The frontend job with the primary file \texttt{b.swift} will parse \texttt{a.swift} and \texttt{c.swift} as secondary files. This primary file does not reference anything from \texttt{c.swift} at all, so \texttt{Outer} remains unparsed in this frontend job. Type checking the call to \texttt{f()} from \texttt{g()} also does not require parsing the \emph{body} of \texttt{f()}. \item The frontend job with the primary file \texttt{c.swift} will parse \texttt{a.swift} and \texttt{b.swift} as secondary files, skipping parsing the bodies of \texttt{f()} and \texttt{g()}. \end{itemize} \end{example} \begin{listing}\captionabove{Delayed parsing with \texttt{AnyObject} lookup}\label{anyobjectdelayedparse} \begin{Verbatim} // a.swift func f(x: AnyObject) { x.foo()! } \end{Verbatim} \begin{Verbatim} // b.swift func g() { f() } \end{Verbatim} \begin{Verbatim} // c.swift struct Outer { class Inner { @objc func foo() {} } } \end{Verbatim} \end{listing} \begin{example} It is possible to construct a program where type checking of each primary file triggers complete parsing of all type and extension bodies in every secondary file, either because of pathological dependencies between source files, or extreme reliance on operator lookup and \texttt{AnyObject} lookup. Listing~\ref{defeatdelayparse} shows an example of the first kind. Again, if you assume the driver kicks off three frontend jobs with a single primary file for each frontend job, then each frontend job will eventually parse all type bodies in the other two secondary files. \end{example} \begin{listing}\captionabove{Defeating delayed parsing}\label{defeatdelayparse} \begin{Verbatim} // x.swift struct A { typealias T = B.T typealias U = C.T } \end{Verbatim} \begin{Verbatim} // y.swift struct B { typealias T = C.T typealias U = A.T } \end{Verbatim} \begin{Verbatim} // z.swift struct C { typealias T = Int typealias U = B.T } \end{Verbatim} \end{listing} \section{Request Evaluator}\label{request evaluator} \index{request} \index{request evaluator} \index{evaluation function} The \emph{request evaluator} is central to the architecture of the Swift compiler. Essentially, the request evaluator is a framework for performing queries against the abstract syntax tree. A \emph{request} packages a list of input parameters together with an \emph{evaluation function}. With the exception of emitting diagnostics, the evaluation function should be referentially transparent. Only the request evaluator should directly invoke the evaluation function; the request evaluator caches the result of the evaluation function for subsequent requests. As well as caching results, the request evaluator implements automatic cycle detection, and dependency tracking for incremental builds. The request evaluator is used to implement a form of lazy type checking. We saw from the previous section that in any given frontend job, declarations in primary files can reference declarations in secondary files without restriction. Swift programmers also know that declarations in a source file can also appear in any order; there is no need to forward declare names, and certain kinds of circular references are also permitted. \index{type-check source file request} \index{AST lowering request} \index{interface type request} \index{generic signature request} \index{qualified lookup request} \index{unqualified lookup request} For this reason the classic compiler design of a single type-checking pass that walks declarations in source order is not well-suited for Swift. Indeed, while the Swift type checker does walk over the declarations in each primary file over source order, instead of directly performing type checking work, it kicks off a series of requests which perform queries against declarations that may appear further down in the primary file, or in other secondary files. The compiler defines over two hundred kinds of requests. Important request kinds include: \begin{itemize} \item The \textbf{type-check source file request} is the key entry point into the type checker, explained below. \item The \textbf{AST lowering request} is the entry point into SILGen, generating SIL from the abstract syntax tree for a source file. \item The \textbf{unqualified lookup request} and \textbf{qualified lookup request} perform the two kinds of name lookup described in the previous section. \item The \textbf{interface type request} is explained in Chapter~\ref{decls}. \item The \textbf{generic signature request} is explained in Chapter~\ref{building generic signatures}. \end{itemize} \begin{listing}\captionabove{Forward reference example}\label{forwardref} \begin{Verbatim} let food = cook() func cook() -> Food {} struct Food {} \end{Verbatim} \end{listing} The \textbf{type-check source file request}'s evaluation function visits each declaration in a primary source file. It is responsible for kicking off enough requests to ensure that SILGen can proceed if all requests succeeded without emitting diagnostics. Consider what happens when type checking the program in Listing~\ref{forwardref}: \begin{enumerate} \item The \textbf{type-check source file request} begins by visiting the declaration of \texttt{food} and performing various semantic checks. \item One of these checks evaluates the \textbf{interface type request} with the declaration of \texttt{food}. This is a variable declaration, so the evaluation function type checks the initial value expression and returns the type of the result. \begin{enumerate} \item In order to type check the expression \texttt{cook()}, the \textbf{interface type request} is evaluated again, this time with the declaration of \texttt{cook} as its input parameter. \item The interface type of \texttt{cook()} has not been computed yet, so the request evaluator calls the evaluation function for this request. \end{enumerate} \item After computing the interface type of \texttt{food} and performing other semantic checks, the \textbf{type-check source file request} moves on to the declaration of \texttt{cook}: \begin{enumerate} \item The \textbf{interface type request} is evaluated once again, with the input parameter being the declaration of \texttt{cook}. \item The result was already cached, so the request evaluator immediately returns the cached result without computing it again. \end{enumerate} \end{enumerate} The \textbf{type-check source file request} is special, because it does not return a value; it is evaluated for the side effect of emitting diagnostics, whereas most other requests return a value. The implementation of the \textbf{type-check source file request} guarantees that if no diagnostics were emitted, then SILGen can generate valid SIL for all declarations in a primary file. However, SILGen can still evaluate other requests which result in diagnostics being emitted in secondary files. \begin{listing}\captionabove{Diagnostic emitted during SILGen}\label{silgendiag} \begin{Verbatim} // a.swift struct Box { let contents: DoesNotExist } \end{Verbatim} \begin{Verbatim} // b.swift func open(_: Box) {} \end{Verbatim} \end{listing} \begin{example} Listing~\ref{silgendiag} shows a program with two files. The first file declares a struct with a stored property naming a non-existent type. The second file declares a function whose input parameter type is the struct type declared by this struct declaration. A frontend job with the primary file \texttt{b.swift} and the secondary file \texttt{a.swift} does not emit any diagnostics in the type checking pass, because the stored property \texttt{contents} of \texttt{Box} is not actually referenced. However when SILGen runs, it needs to determine whether the parameter of type \texttt{Box} to the \texttt{open()} function needs to be passed directly in registers, or via an address by computing the \emph{type lowering} for the \texttt{Box} type. Type lowering recursively visits the stored properties of \texttt{Box} and computes their type lowering; this evaluates the \textbf{interface type request} for the \texttt{contents} property of \texttt{Box}, which emits a diagnostic because the identifier ``\texttt{DoesNotExist}'' does not resolve to a valid type. \end{example} The request evaluator framework was first introduced in Swift~4.2 \cite{reqeval}. In subsequent releases, various ad-hoc mechanisms were gradually converted into request evaluator requests, with resulting gains to compiler performance, stability, and implementation maintainability. \index{active request} \index{request cycle} \index{circular inheritance} \index{frontend flag} \paragraph{Cycles} In a language that supports forward references, it is possible to write a program that is syntactically well-formed, and where all identifiers resolve to valid declarations, but is nonetheless invalid because of circularity. The classic example of this is a pair of classes where each class inherits from the other: \begin{Verbatim} class A: B {} class B: A {} \end{Verbatim} Implementing bespoke logic to detect circularity is error-prone and tedious, and a missing circularity check can result in a crash or infinite loop when the compiler encounters an invalid input program. Instead, the request evaluator solves this problem in a more elegant way by maintaining a stack of \emph{active requests}. When a request is evaluated, the request evaluator first checks if the active request stack contains a request with the same kind and equal input parameters. In this case, calling the evaluation function would result in infinite recursion, so instead the request evaluator diagnoses an error and returns a request-specific sentinel value. The circularity diagnostic can be customized for each request kind; the default just reports a ``circular reference.'' If the compiler is invoked with the \texttt{-debug-cycles} frontend flag, the active request stack is also printed: \begin{Verbatim} $ swiftc cycle.swift -Xfrontend -debug-cycles ===CYCLE DETECTED=== `--TypeCheckSourceFileRequest(source_file "cycle.swift") `--SuperclassDeclRequest(cycle.(file).A@cycle.swift:1:7) `--SuperclassDeclRequest(cycle.(file).B@cycle.swift:2:7) `--SuperclassDeclRequest(cycle.(file).A@cycle.swift:1:7) cycle.swift:1:7: error: `A' inherits from itself class A: B {} ^ cycle.swift:2:7: note: class `B' declared here class B: A {} ^ \end{Verbatim} \paragraph{Debugging} In addition to \texttt{-debug-cycles}, a couple of command-line flags help with debugging compile-time performance issues. The \texttt{-stats-output-dir} flag is followed by the name of a directory, which must already exist. Each frontend job writes a new JSON file to this directory, with various counters and timers. For each kind of request, there is a counter for the number of unique requests of this kind that were evaluated, not counting requests whose results were cached. The timer records the time spent in the request's evaluation function. The output can be sliced and diced in various ways; you can actually make pretty effective use of \texttt{awk}, despite the JSON format: \begin{Verbatim} $ mkdir /tmp/stats $ swiftc ... -stats-output-dir /tmp/stats $ awk '/InterfaceTypeRequest.wall/ { x += $2 } END { print x }' \ /tmp/stats/*.json \end{Verbatim} The second command-line flag is \texttt{-trace-stats-events}. It must be passed in conjunction with \texttt{-stats-output-dir}, and enables output of a trace file to the statistics directory. The trace file records a time-stamped event for the start and end of each request evaluation function, in CSV format. \section{Incremental Builds}\label{incremental builds} \index{incremental build} The request evaluator also records dependencies for incremental compilation. The goal of incremental compilation is to prove which files do not need to be rebuilt, in the least conservative way possible. The quality of an incremental compilation implementation can be judged as follows:\footnote{Credit for this idea goes to David Ungar.} \begin{enumerate} \item Perform a clean build of all source files in the program, and collect the object files. \item Make a change to one or more source files in the input program. \item Do an incremental build, which rebuilds some subset of source files in the input program. If a source file was rebuilt but the resulting object file is identical to the one saved in Step~1, the incremental build performed \emph{wasted work}. \item Finally, do another clean build, which yet again rebuilds all source files in the input program. If a source file was rebuilt and the resulting object file is different to the one saved in Step~1, the incremental build was \emph{incorrect}. \end{enumerate} This highlights the difficulty of the incremental compilation problem. Rebuilding too many files is an annoyance; rebuilding \emph{too few} files is an error. A correct but ineffective implementation would rebuild all source files every time. The opposite approach of only rebuilding the subset of source files that have changed since the last compiler invocation is also too aggressive. To see why it is incorrect, consider the program shown in Listing~\ref{incrlisting1}. Let's say the programmer builds the program, adds the overload \verb|f: (Int) -> ()|, then builds it again. The new overload is more specific, so the call \texttt{f(123)} in \texttt{b.swift} now refers to the new overload; therefore, \texttt{b.swift} must also be rebuilt. \begin{listing}\captionabove{Rebuilding a file after adding a new overload}\label{incrlisting1} \begin{Verbatim} // a.swift func f(_: T) {} // new overload added in second version of file func f(_: Int) {} \end{Verbatim} \begin{Verbatim} // b.swift func g() { f(123) } \end{Verbatim} \end{listing} \index{dependency file} The approach used by the Swift compiler is to construct a \emph{dependency graph}. The frontend outputs a \emph{dependency file} for each source file, recording all names the source file \emph{provides}, and all names the type checker \emph{requires} while compiling the source file. When performing an incremental build, the driver begins by rebuilding all source files which have changed since the last compilation, because at a minimum, these files need to be rebuilt. Then, the driver reads the dependency files, collecting all names provided by the changed source files, and rebuilds all source files which require those names. Dependency files use a binary serialization format and have the ``\texttt{.swiftdeps}'' file name extension. The list of provided names in the dependency file is generated by walking the abstract syntax tree, collecting all visible declarations in each source file. The list of required names is generated by the request evaluator, using the stack of active requests. Every cached request has a list of required names, and a request can optionally be either a dependency sink, or dependency source. \index{dependency sink} A \emph{dependency sink} is a name lookup request which records a required name. When a dependency sink request is evaluated, the request evaluator walks the stack of active requests, adding the identifier to each active request's list of required names. When a request with a cached value is evaluated again, the request's existing list of required names is ``replayed,'' adding them to each active request that depends on the cached value. \index{dependency source} \index{type-check source file request} \index{AST lowering request} A \emph{dependency source} is a request which appears at the top of the request stack, such as the \textbf{type-check source file request} or the \textbf{AST lowering request}. After a dependency source request has been evaluated, its list of required names is added to the corresponding source file's list of required names. \begin{listing}\captionabove{Recording incremental dependencies}\label{dependencyexample} \begin{Verbatim} // a.swift func breakfast() { soup(nil) } \end{Verbatim} \begin{Verbatim} // b.swift func lunch() { soup(nil) } \end{Verbatim} \begin{Verbatim} // c.swift func soup(_: Pumpkin?) {} struct Pumpkin {} \end{Verbatim} \end{listing} \begin{example} The above describes a subtle trick when evaluating a request whose result has already been cached. Listing~\ref{dependencyexample} shows a program with three source files. Suppose now that the driver decides to compile \emph{both} \texttt{a.swift} and \texttt{b.swift} in the same frontend job. This frontend job proceeds as follows: \begin{enumerate} \item First, the \textbf{type-check source file request} runs with the source file \texttt{a.swift}. \begin{enumerate} \item While type checking the body of \texttt{breakfast()}, the type checker evaluates the \textbf{unqualified lookup request} with the identifier ``\texttt{soup}.'' \item This records the identifier ``\texttt{soup}'' in the requires list of each active request. There is one active request, the \textbf{type-check source file request} for \texttt{a.swift}. \item The lookup finds the declaration of \texttt{soup()} in \texttt{c.swift}. \item The type checker evaluates the \textbf{interface type request} with the declaration of \texttt{soup()}. \begin{enumerate} \item The \textbf{interface type request} evaluates the \textbf{unqualified lookup request} with the identifier ``\texttt{Pumpkin}.'' \item This records the identifier ``\texttt{Pumpkin}'' in the requires list of each active request, of which there are now two: the \textbf{interface type request} for \texttt{soup()}, and the \textbf{type-check source file request} for \texttt{a.swift}. \end{enumerate} \item The \textbf{type-check source file request} for \texttt{a.swift} has now finished. The requires list for this request contains two identifiers, ``\texttt{soup}'' and ``\texttt{Pumpkin}''; both are added to the requires list of the source file \texttt{a.swift}. \end{enumerate} \item Next, the \textbf{type-check source file request} runs with the source file \texttt{b.swift}. \begin{enumerate} \item While type checking the body of \texttt{lunch()}, the type checker evaluates the \textbf{unqualified lookup request} with the identifier ``\texttt{soup}.'' \item This records the identifier ``\texttt{soup}'' in the requires list of each active request. There is one active request, the \textbf{type-check source file request} for \texttt{b.swift}. \item The lookup finds the declaration of \texttt{soup()} in \texttt{c.swift}. \item The type checker evaluates the \textbf{interface type request} with the declaration of \texttt{soup()}. \item This request has already been evaluated, and the cached result is returned. The requires list for this request is the single identifier ``\texttt{Pumpkin}.'' This requires list is replayed, as if the request was being evaluated for the first time. This adds the identifier ``\texttt{Pumpkin}'' to the requires list of each active request, of which there is just one: the \textbf{type-check source file request} for \texttt{b.swift}. \item The \textbf{type-check source file request} for \texttt{b.swift} has now finished. The requires list for this request contains two identifiers, ``\texttt{soup}'' and ``\texttt{Pumpkin}''; both are added to the requires list of the source file \texttt{b.swift}. \end{enumerate} \end{enumerate} Once this frontend job completes, dependency files for \texttt{a.swift} and \texttt{b.swift} are written out. Both source files require the names ``\texttt{soup}'' and ``\texttt{Pumpkin}.'' The dependency of \texttt{b.swift} on ``\texttt{Pumpkin}'' is correctly recorded because evaluating a request with a cached value replays the request's requires list in Step~(2.f) above. \end{example} There's a bit more to the story than this, but we're already far afield from the goal of describing Swift generics; you can find more details in \cite{reqeval} and \cite{incremental}. \section{Module System}\label{module system} \index{source file} \index{file unit} \index{Clang file unit} \index{serialized AST file unit} The list of source files in a compiler invocation together form the \emph{main module}. The main module is special, because its abstract syntax tree is constructed directly by parsing source code. There are three other kinds of modules: serialized modules, imported modules, and the built-in module. \index{main module} \index{module declaration} \index{file unit} A module is represented by a \emph{module declaration} containing one or more \emph{file units}. In the main module, the file units are \emph{source files}, where each stores the parsed syntax tree for that source file. A serialized module contains one or more \emph{serialized AST file units} and imported modules consist of one or more \emph{Clang file units}. \index{import declaration} The \texttt{import} keyword parses as an \emph{import declaration}. After parsing, one of the first stages of type checking loads all modules imported by the main module. The standard library is defined in the \texttt{Swift} module, which is imported automatically unless the frontend was invoked with the \texttt{-parse-stdlib} flag, which is used when building the standard library itself. As for the special \texttt{Builtin} module, it contains types and intrinsics implemented by the compiler itself, to be used when implementing the standard library. The \texttt{-parse-stdlib} flag also causes the built-in module to be implicitly imported (Section~\ref{builtin type}). \index{serialized module} \index{binary module} \paragraph{Serialized Modules} A serialized module is output when the Swift compiler is invoked with the \texttt{-emit-module} flag. Serialized module files use the ``\texttt{.swiftmodule}'' file name extension. Serialized modules are stored in a binary format, closely tied to the specific version of the Swift compiler (when building a shared library for distribution, it is better to publish a textual interface instead, as described at the end of this section). Name lookup into a serialized module lazily constructs declarations by deserializing records from this binary format as needed. Deserialized declarations generally look like parsed declarations that have already been type checked, but they sometimes contain less information. For example, in Chapter~\ref{generic declarations}, you will see various syntactic representations of generic parameter lists, \texttt{where} clauses, and so on. Since this information is only used when type checking the declaration, it is not serialized. Instead, deserialized declarations only need to store a generic signature, described in Chapter~\ref{genericsig}. \index{expression} \index{statement} \index{inlinable function} \index{inlinable attribute} \index{serialized SIL} Another key difference between parsed declarations and deserialized declarations is that parsed function declarations have a body, consisting of statements and expressions. This body is never serialized, so deserialized function declarations never have a body. The one case where the body of a function is made available across module boundaries is when the function is annotated with the \texttt{@inlinable} attribute; this is implemented by serializing the SIL representation of the function instead. \index{imported module} \index{ClangImporter} \paragraph{Imported Modules} An imported module is implemented in C, Objective-C or C++. The Swift compiler embeds a copy of Clang and uses it to parse module maps, header files, and binary precompiled headers. Name lookup into an imported module lazily constructs Swift declarations from their corresponding Clang declarations. The Swift compiler component responsible for this is known as the ``ClangImporter.'' Imported function declarations generally do not have bodies if the entry point was previously emitted by Clang and is available externally. Occasionally the ClangImporter synthesizes accessor methods and other such trivia, which do have bodies represented as Swift statements and expressions. C functions not available externally, such as \texttt{static inline} functions declared in header files, are emitted by having Swift IRGen call into Clang. \index{bridging header} Invoking the compiler with the \texttt{-import-objc-header} flag followed by a header file name specifies a \emph{bridging header}. This is a shortcut for making C declarations in the bridging header visible to all other source files in the main module, without having to define a separate Clang module first. This is implemented by adding a Clang file unit corresponding to the bridging header to the main module. For this reason, you should not assume that all file units in the main module are necessarily source files. \index{resilience} \index{library evolution} \index{textual interface} \paragraph{Textual Interfaces} The Swift binary module format depends on compiler internals and no attempt is made to preserve compatibility across compiler releases. When building a shared library for distribution, you can instead generate a \emph{textual interface}: \begin{Verbatim} $ swiftc Horse.swift -enable-library-evolution -emit-module-interface \end{Verbatim} The \texttt{-enable-library-evolution} flag enables \emph{resilience}, which instructs client code to use more abstract access patterns which are guaranteed to only depend on the published public declarations of a module. For example, this allows adding new fields to a public struct, since client code is required to pass the struct indirectly. Library evolution is a prerequisite for emitting a textual interface; unlike the serialized module format, textual interfaces only describe the public declarations of a module. \index{inlinable function} \index{inlinable attribute} \index{synthesized declaration} \index{ASTPrinter} Textual interface files use the ``\texttt{.swiftinterface}'' file name extension. They are generated by the AST printer, which prints declarations in a format that looks very much like Swift source code, with a few exceptions: \begin{enumerate} \item Non-\texttt{@inlinable} function bodies are skipped. Bodies of \texttt{@inlinable} functions are printed verbatim, including comments, except that \verb|#if| conditions are evaluated. \item Various synthesized declarations, such as type alias declarations from associated type inference, witnesses for derived conformances such as \texttt{Equatable}, and so on, are written out explicitly. \item Opaque return types also require special handling (Section~\ref{reference opaque archetype}). \end{enumerate} Note that (1) above means the textual interface format is target-specific; a separate textual interface needs to be generated for each target platform, alongside the shared library itself. When a module defined by a textual interface is imported for the first time, a frontend job parses and type checks the textual interface, and generates a serialized module file which is then consumed by the original frontend job. Serialized module files generated in this manner are cached, and can be reused between invocations of the same compiler version. The \texttt{@inlinable} attribute was introduced in Swift 4.2~\cite{se0193}. The Swift ABI was formally stabilized in Swift 5.0, when the standard library became part of the operating system on Apple platforms. Library evolution support and textual interfaces became user-visible features in Swift 5.1~\cite{se0260}. \section{Source Code Reference}\label{compilation model source reference} The Swift driver is now implemented in Swift, and lives in a separate repository from the rest of the compiler: \begin{quote} \url{https://github.com/apple/swift-driver} \end{quote} The Swift frontend, standard library and runtime are found in the main repository: \begin{quote} \url{https://github.com/apple/swift} \end{quote} The major components of the Swift frontend live in their own subdirectories of the main repository. The entities modeling the abstract syntax tree are defined in \SourceFile{lib/AST/} and \SourceFile{include/swift/AST/}; among these, types and declarations are important for the purposes of this book, and will be covered in Chapter~\ref{types} and Chapter~\ref{decls}. The core of the SIL intermediate language is implemented in \SourceFile{lib/SIL/} and \SourceFile{include/swift/SIL/}. Each stage of the compilation pipeline has its own subdirectory: \begin{itemize} \item \SourceFile{lib/Parse/} \item \SourceFile{lib/Sema/} \item \SourceFile{lib/SILGen/} \item \SourceFile{lib/SILOptimizer/} \item \SourceFile{lib/IRGen/} \end{itemize} \subsection*{The AST Context} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/ASTContext.h} \item \SourceFile{lib/AST/ASTContext.cpp} \end{itemize} \apiref{ASTContext}{class} The global singleton for a single frontend instance. An AST context provides a memory allocation arena, unique allocation for various immutable data types used throughout the compiler, and storage for various other global singletons. \subsection*{Request Evaluator} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/Evaluator.h} \item \SourceFile{lib/AST/Evaluator.cpp} \end{itemize} \apiref{SimpleRequest}{template class} Each request kind is a subclass of \texttt{SimpleRequest}. The evaluation function is implemented by overriding the \texttt{evaluate()} method of \texttt{SimpleRequest}. \index{dependency source} \index{dependency sink} \apiref{RequestFlags}{enum class} One of the template parameters to \texttt{SimpleRequest} is a set of flags: \begin{itemize} \item \texttt{RequestFlags::Uncached}: indicates that the result of the evaluation function should not be cached. \item \texttt{RequestFlags::Cached}: indicates that the result of the evaluation function should be cached by the request evaluator, which uses a per-request kind \texttt{DenseMap} for this purpose. \item \texttt{RequestFlags::SeparatelyCached}: the result of the evaluation function should be cached by the request implementation itself, as described below. \item \texttt{RequestFlags::DependencySource}, \texttt{DependencySink}: if one of these is set, the request kind becomes a dependency source or sink, as described in Section~\ref{incremental builds}. \end{itemize} Separate caching can be more performant if it allows the cached value to be stored directly inside of an AST node, instead of requiring the request evaluator to consult a side table. For example, many requests taking a declaration as input store the result directly inside of the \texttt{Decl} instance or some subclass thereof. Due to expressivity limitations in C++, a bit of boilerplate is involved in the definition of a new request kind. For example, consider the \texttt{InterfaceTypeRequest}, which takes a \texttt{ValueDecl} as input and returns a \texttt{Type} as output: \begin{itemize} \item \begingroup \raggedright The request type ID is declared in \SourceFile{include/swift/AST/TypeCheckerTypeIDZone.def}. \item The \texttt{InterfaceTypeRequest} class is declared in \SourceFile{include/swift/AST/TypeCheckRequests.h}. \item The \texttt{InterfaceTypeRequest::evaluate()} method is defined in \SourceFile{lib/Sema/TypeCheckDecl.cpp}. \item \endgroup The request is separately cached. The \texttt{InterfaceTypeRequest} class overrides the \texttt{isCached()}, \texttt{getCachedResult()} and \texttt{cacheResult()} methods to store the declaration's interface type inside the \texttt{ValueDecl} instance itself. These methods are implemented in \SourceFile{lib/AST/TypeCheckRequestFunctions.cpp}. \end{itemize} \index{request evaluator} \apiref{Evaluator}{class} Request evaluation is performed by calling the \texttt{evaluateOrDefault()} top-level function, passing it an instance of the request evaluator, the request to evaluate, and a sentinel value to return in case of circularity. The \texttt{Evaluator} class is a singleton, stored in the \texttt{evaluator} instance variable of the global \texttt{ASTContext} singleton. The request evaluator will either return a cached value, or invoke the evaluation function and cache the result. For example, the \texttt{getInterfaceType()} method of \texttt{ValueDecl} is implemented as follows: \begin{Verbatim} Type ValueDecl::getInterfaceType() const { auto &ctx = getASTContext(); return evaluateOrDefault( ctx.evaluator, InterfaceTypeRequest{const_cast(this)}, ErrorType::get(ctx))); } \end{Verbatim} \subsection*{Name Lookup} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/NameLookup.h} \item \SourceFile{include/swift/AST/NameLookupRequests.h} \item \SourceFile{lib/AST/NameLookup.cpp} \item \SourceFile{lib/AST/UnqualifiedLookup.cpp} \end{itemize} The ``AST scope'' subsystem implements unqualified lookup for local bindings. Outside of the name lookup implementation itself, the rest of the compiler does not generally interact with it directly: \begin{itemize} \item \SourceFile{include/swift/AST/ASTScope.h} \item \SourceFile{lib/AST/ASTScope.cpp} \item \SourceFile{lib/AST/ASTScopeCreation.cpp} \item \SourceFile{lib/AST/ASTScopeLookup.cpp} \item \SourceFile{lib/AST/ASTScopePrinting.cpp} \item \SourceFile{lib/AST/ASTScopeSourceRange.cpp} \end{itemize} \apiref{UnqualifiedLookupRequest}{class} Unqualified lookups are performed by evaluating an instance of this request kind. The request takes an \texttt{UnqualifiedLookupDescriptor} as input. \index{top-level lookup} \index{unqualified lookup} \apiref{UnqualifiedLookupDescriptor}{class} Encapsulates the input parameters for an unqualified lookup: \begin{itemize} \item The name to look up. \item The declaration context where the lookup starts. \item The source location where the name was written in source. If not specified, this becomes a top-level lookup. \item Various flags, described below. \end{itemize} \apiref{UnqualifiedLookupFlags}{enum class} Flags passed as part of an \texttt{UnqualifiedLookupDescriptor}. \begin{itemize} \item \texttt{UnqualifiedLookupFlags::TypeLookup}: if set, lookup ignores declarations other than type declarations. This is used in type resolution. \item \texttt{UnqualifiedLookupFlags::AllowProtocolMembers}: if set, lookup finds members of protocols and protocol extensions. Generally should always be set, except to avoid request cycles in cases where it is known the result of the lookup cannot appear in a protocol or protocol extensions. \item \texttt{UnqualifiedLookupFlags::IgnoreAccessControl} if set, lookup ignores access control. Generally should never be set, except when recovering from errors in diagnostics. \item \texttt{UnqualifiedLookupFlags::IncludeOuterResults} if set, lookup stops after finding results in an innermost scope, or to always proceed to a top-level lookup. \end{itemize} \index{declaration context} \index{qualified lookup} \apiref{DeclContext}{class} Declaration contexts will be introduced in Chapter~\ref{decls}, and the \texttt{DeclContext} class in Section~\ref{declarationssourceref}. \begin{itemize} \item \texttt{lookupQualified()} has various overloads, which perform a qualified name lookup into one of various combinations of types or declarations. The ``\texttt{this}'' parameter---the \texttt{DeclContext~*} which the method is called on determines the visibility of declarations found via lookup through imports and access control; it is not the base type of the lookup. \end{itemize} \apiref{NLOptions}{enum} Similar to \texttt{UnqualifiedLookupFlags}, but for \texttt{DeclContext::lookupQualified()}. \begin{itemize} \item \verb|NL_OnlyTypes|: if set, lookup ignores declarations other than type declarations. This is used in type resolution. \item \verb|NL_ProtocolMembers|: if set, lookup finds members of protocols and protocol extensions. Generally should always be set, except to avoid request cycles in cases where it is known the result of the lookup cannot appear in a protocol or protocol extension. \item \verb|NL_IgnoreAccessControl|: if set, lookup ignores access control. Generally should never be set, except when recovering from errors in diagnostics. \end{itemize} \apiref{NominalTypeDecl}{class} Nominal type declarations will be introduced in Chapter~\ref{decls}, and the \texttt{NominalTypeDecl} class in Section~\ref{declarationssourceref}. \begin{itemize} \item \texttt{lookupDirect()} performs a direct lookup, which only searches the nominal type declaration itself, and its extensions. \end{itemize} \subsection*{Primary File Type Checking} \index{primary file} \index{type-check source file request} Key source files: \begin{itemize} \item \SourceFile{lib/Sema/TypeCheckDeclPrimary.cpp} \end{itemize} The \texttt{TypeCheckSourceFileRequest} calls the \texttt{typeCheckDecl()} global function, which uses the visitor pattern to switch on the declaration kind. For each declaration kind, it performs various semantic checks and kicks off requests which may emit diagnostics. \subsection*{Module System} \index{module declaration} \apiref{ModuleDecl}{class} A module. \begin{itemize} \item \texttt{getName()} returns the module's name. \item \texttt{getFiles()} returns an array of \texttt{FileUnit}. \item \texttt{isMainModule()} answers if this is the main module. \end{itemize} \apiref{FileUnit}{class} Abstract base class representing a file unit. \index{primary file} \index{secondary file} \index{top-level declaration} \index{scope tree} \index{main source file} \apiref{SourceFile}{class} Represents a parsed source file from disk. Inherits from \texttt{FileUnit}. \begin{itemize} \item \texttt{getTopLevelItems()} returns an array of all top-level items in this source file. \item \texttt{isPrimary()} returns \texttt{true} if this is a primary file, \texttt{false} if this is a secondary file. \item \texttt{isScriptMode()} answers if this is the main file of a module. \item \texttt{getScope()} returns the root of the scope tree for unqualified lookup. \end{itemize} \index{imported module} \index{serialized module} \index{textual interface} Imported and serialized modules get a subdirectory each: \begin{itemize} \item \SourceFile{lib/ClangImporter/} \item \SourceFile{lib/Serialization/} \end{itemize} The AST printer for generating textual interfaces is implemented in a pair of files: \begin{itemize} \item \SourceFile{include/swift/AST/ASTPrinter.h} \item \SourceFile{lib/AST/ASTPrinter.cpp} \end{itemize} The interface between name lookup and the module system is mediated by a pair of abstract base classes defined in the below header file: \begin{itemize} \item \SourceFile{include/swift/AST/LazyResolver.h} \end{itemize} \apiref{LazyMemberLoader}{class} Abstract base class implemented by different kinds of modules to look up top-level declarations and members of types and extensions. For the main module, this consults lookup tables, for serialized modules this deserializes records and builds declarations from them, for imported modules this constructs Swift declarations from Clang declarations. \apiref{LazyConformanceLoader}{class} Abstract base class implemented by different kinds of modules to fill out conformances (Chapter~\ref{conformances}). \chapter{Types}\label{types} \index{type representation} \index{type} \index{type resolution} Swift makes a distinction between a \emph{type representation}, read by the parser, and a \emph{type}, which is a semantic object understood by the type checker. Type representations are resolved to types by performing \emph{type resolution}. Not all types are constructed by resolving type representations written in source; building types and taking them apart is an extremely common activity throughout the compiler. \paragraph{Notation} A few words about the notation used throughout this book. For convenience, we identify the printed form of a type, such as \texttt{Array}, with its type representation and the semantic type, depending on context. But what does it actually mean to say that the string ``\texttt{Array}'' parses into the type representation \texttt{Array} which resolves into the type \texttt{Array}? First, we have the string ``\texttt{Array}'' written somewhere in a source file. The lexer splits the string up into a sequence of tokens: ``\texttt{Array}'', ``\texttt{<}'', ``\texttt{Int}'', and ``\texttt{>}''. The parser reads each token, building up a type representation. A type representation has a tree structure, so when we talk about the type representation \texttt{Array}, we really mean this: \begin{quote} ``An identifier type representation with a single component, storing the identifier \texttt{Array} together with a single generic argument. The generic argument is another identifier type representation, again with a single component, storing the identifier \texttt{Int}.'' \end{quote} Types also have a tree structure, so when we talk about the type \texttt{Array}, what we really mean is: \begin{quote} ``A generic nominal type for the struct declaration named \texttt{Array}, with a single generic argument. The single generic argument is a nominal type for the struct declaration named \texttt{Int}.'' \end{quote} The difference between the type representation \texttt{Array} and the type \texttt{Array} is that the type representation only stores identifiers, with no connection to the declaration of \texttt{Array} and \texttt{Int}. The semantic type points at the declarations themselves. There is also a notion of validity here. The string ``\texttt{Foo>(} and \texttt{Pasta} can both be seen as type representations, but the former does not resolve to a valid type because \texttt{Array} only has a single generic argument. The latter only resolves to a valid type if a type declaration named \texttt{Pasta} exists, and if \texttt{TomatoSauce} also resolves to a valid type. Type representations are rarely encountered outside of the type resolution process and the parser itself, so we will leave them aside until Chapter~\ref{typeresolution}. \index{structural components} \index{type substitution} \paragraph{Tree structure} Types are categorized into kinds, such as nominal types, function types, and so on. Each kind of type is composed of smaller structural components, including other types, pointers to declarations, and various attributes. Factory methods for each kind construct types from their structural components, and analogously, each kind has getter methods to take the type apart. Once created, types are immutable. This gives types a recursive tree structure. To say that a type \emph{contains} another type means that the latter appears as a child node of the former. This concept is most useful when talking about types containing generic parameters, because those types can be substituted to form concrete types. For example, if \texttt{T} is a generic parameter type, the type \texttt{Array} can be substituted by replacing \texttt{T} with \texttt{Int}. This gives you the type \texttt{Array}, which no longer contains any generic parameters. Various utility operations exist to walk the recursive structure of a type, check if it contains a type with certain properties, or transform its contained types, forming a new type with the same tree structure. \index{canonical type} \index{sugared type} \paragraph{Canonical types} The Swift grammar defines some shorthand spellings for common types, such as \texttt{[T]} for \texttt{Array} and \texttt{T?} for \texttt{Optional}. Type alias declarations are another kind of shorthand; declaring a type alias type introduces a new name for some existing type. The various alternate spellings for existing types are called \emph{sugared types}; Section~\ref{sugared types} gives a full account of the possible kinds. A type is \emph{canonical} if it does not contain any sugared types. Computing the canonical type of an arbitrary type returns the original type if it was already canonical, otherwise it transforms the type by replacing all sugared types that it contains with their desugared form. \index{SILGen} \index{IRGen} The compiler tries to preserve type sugar when resolving type representations into types and when transforming types, ensuring that types mentioned in diagnostics look like the types written by the user. After type checking, compiler passes such as SILGen and IRGen mostly deal with canonical types. For the most part, type sugar has no semantic effect. For example, it would not make sense to define two overloads of the same function that only differ by sugared types. One notable exception is the rule for default initialization of variables: if the variable's type is declared as the sugared optional type \texttt{T?} for some \texttt{T}, the variable's initial value expression is assumed to be \texttt{nil} if none was provided. Spelling the type as \texttt{Optional} avoids the default initialization behavior. Listing~\ref{optional initialization} shows an example of this rule. \begin{listing}\captionabove{The sugared optional type has a semantic effect}\label{optional initialization} \begin{Verbatim} var x: Int? print(x) // prints `nil' var y: Optional print(y) // error: use of uninitialized variable `y' \end{Verbatim} \end{listing} \index{reduced type} \index{pointer equality} \index{canonical equality} \index{reduced equality} \paragraph{Type equality} In addition to being immutable, types are uniquely allocated; if the type \texttt{Array} is constructed twice in the same compilation instance, both values will point at the same type object in memory. Three levels of equality are defined on types, from strongest to weakest: \begin{enumerate} \item \textbf{Pointer equality} determines if two types are exactly equal as trees. The type \texttt{Array} is not pointer-equal to the sugared type \texttt{[Int]}. \item \textbf{Canonical equality} determines if two types have the same canonical type. The types \texttt{Array} and \texttt{[Int]} are canonical-equal, because the canonical type of \texttt{[Int]} is \texttt{Array}. \item \textbf{Reduced equality} determines if two types have the same reduced type with respect to a generic signature. Two different type parameters can reduce to the same type parameter when same-type requirements are in play. Reduced types are formally introduced in Section~\ref{genericsigqueries}. \end{enumerate} If both types are already canonical, the first two relations coincide; if both types are reduced, all three coincide. The remainder of this chapter describes, for each kind of type, the role it plays in the language and how it breaks down into structural components. \index{nominal type} \index{generic nominal type} \index{parent type} \paragraph{Nominal types} A \emph{nominal type} is the type declared by a non-generic struct, enum or class declaration, such as \texttt{Int}. A \emph{generic nominal type} is a type declared by a generic struct, enum or class declaration, \emph{specialized} with a list of generic arguments, such as \texttt{Array}. Both kinds of nominal types point at their declaration. They also store a parent type if the nominal type declaration is nested inside of another nominal type declaration. The parent type can be a sugared type, but its canonical type must always be the correct nominal type for the original type declaration's parent type declaration. Nested type declarations are described in Section~\ref{type declarations}. \section{Structural Types} \index{structural type} Structural types are those built-in to the language, rather than being defined in the standard library or user code. Structural types are not to be confused with the types produced by the \emph{structural resolution stage}, which is discussed in Chapter~\ref{typeresolution}. \index{tuple type} \paragraph{Tuple types} A tuple type is an ordered list of element types with optional labels. A value of a tuple type is a list of values with the corresponding element types. The list of element types can be empty, which gives the unique empty tuple type \texttt{()}. The standard library declares a type alias \texttt{Void} whose underlying type is \texttt{()}. \index{SILGen} From the user's point of view, tuple types are either empty or have at least two elements. An unlabeled one-element tuple type cannot be formed at all; \texttt{(T)} resolves to the same type as \texttt{T} in the language. Labeled one-element tuple types have a production in the grammar, but are explicitly rejected by type resolution. They can still appear in the implementation when SILGen needs to materialize the associated value of an enum case as a single value (for instance, \texttt{case person(name:\ String)}), but such types cannot arise as the types of expressions nor can they be written in source. \index{function type} \paragraph{Function types} The type of a function declaration or closure expression is a function type. In the expression grammar, \emph{call expressions} are formed from an expression with a function type, such as a declaration reference, closure expression or result of another call, together with an argument list. \index{non-escaping function type} A function type stores a parameter list, a return type, and attributes. The attributes includes the function's effect, \texttt{@escaping} attribute, and an optional calling convention: \begin{itemize} \item The two effect kinds are \texttt{throws} and \texttt{async}; the latter was introduced as part of the concurrency model in Swift~5.5 \cite{se0296}. \item Non-escaping functions are second-class and can only be passed to other functions, captured by non-escaping closures, or immediately called; they cannot be stored inside other values. \item A \texttt{@convention(thin)} function is passed as a single function pointer, without a closure context; it cannot capture values from outer scopes. \item A \texttt{@convention(c)} function is similarly restricted and also must have parameter and return types representable in C. \item A \texttt{@convention(block)} function is an Objective-C block, which allow captures but must have parameter and return types representable in Objective-C. \end{itemize} \index{SILGen} Each entry in the parameter list consists of a parameter type and again, some non-type bits: \begin{itemize} \item the \textbf{value ownership kind}, which is one of default, \texttt{inout}, \texttt{\_\_owned} or \texttt{\_\_shared}, \item the \textbf{variadic} flag, in which case the parameter type must be an array type. \item the \texttt{@autoclosure} attribute, in which case the parameter type must be another function type of the form \texttt{() -> T} for some type \texttt{T}. \end{itemize} When type checking a call to function value with a variadic parameter, the type checker collects multiple expressions from the call argument list into an implicit array expression. Otherwise, variadic parameters behave exactly like arrays once you get to SILGen and below. \index{autoclosure function type} \index{tuple splat} The \texttt{@autoclosure} attribute instructs the type checker to treat the corresponding argument in the caller as if it was a value of type \texttt{T}, rather than a function type \texttt{()~->~T}. The argument is then wrapped inside an implicit closure expression. In the body of the callee, an \texttt{@autoclosure} parameter behaves exactly like an ordinary function value, and can be called to evaluate the expression provided by the caller. \medskip Note that the following are two different function types: \begin{quote} \begin{verbatim} (Int, Int) -> Bool ((Int, Int)) -> Bool \end{verbatim} \end{quote} The first has a parameter list with two entries, both storing the parameter type \texttt{Int}. The second has a parameter list with a single entry, storing a tuple type of two elements, \texttt{(Int, Int)}. The type checker does define an implicit conversion between them though, in the special case of passing a call argument. This allows the code in Listing~\ref{tuple splat example} to type check. \begin{listing}\captionabove{``Tuple splat'' function conversion example}\label{tuple splat example} \begin{Verbatim} func apply(fn: (T) -> U, arg: T) -> U { return fn(arg) } print(apply(fn: (+), arg: (1, 2))) // prints 3 \end{Verbatim} \end{listing} \index{argument label} \index{closure expression} Listing \ref{function type labels} demonstrates another subtle point. Argument labels are part of a function declaration's \emph{name}, not a function declaration's \emph{type}. A closure is always called without argument labels. This includes the case of a closure formed from an unapplied reference to a function declaration---even if the function declaration has argument labels. \begin{listing}\captionabove{Argument labels are not part of a function declaration's type}\label{function type labels} \begin{Verbatim} func subtract(minuend x: Int, subtrahend y: Int) -> Int { return x - y } print(subtract(minuend: 3, subtrahend: 1)) // prints 2 let fn1 = subtract // declaration name can omit argument labels print(fn1(3, 1)) // prints 2 let fn2 = subtract(minuend:subtrahend:) // full declaration name print(fn2(3, 1)) // prints 2 \end{Verbatim} \end{listing} The history of Swift function types is an interesting case study in language evolution. Originally, a function type always had a \emph{single} input type, which could be a tuple type to model a function of multiple arguments. Tuple types used to be able to represent \texttt{inout} and variadic elements, and furthermore, the argument labels of a function declaration were part of the function declaration's type. The existence of such ``non-materializable'' tuple types introduced complications throughout the type system, and argument labels had inconsistent behavior in different contexts. The syntax for referencing a declaration name with argument labels was adopted in Swift~2.2~\cite{se0021}. Subsequently, argument labels were dropped from function types in Swift~3~\cite{se0111}. The distinction between a function taking multiple arguments and a function taking a single tuple argument was first hinted at in Swift~3 with \cite{se0029} and \cite{se0066}, and became explicit in Swift~4 \cite{se0110}. At the same time, Swift~4 also introduced the ``tuple splat'' function conversion which simulated the Swift~3 model in a limited way for the cases where the old behavior was convenient. For example, the element type of \texttt{Dictionary} is a key/value tuple, but you often want to call the \texttt{Collection.map()} method with a closure taking two arguments, and not a closure with a single tuple argument. Even after the above proposals were implemented, the compiler continued to model a function type as having a single input type for quite some time, despite this being completely hidden from the user. After Swift~5, the function type representation fully converged with the semantic model of the language. \index{generic function type} \paragraph{Generic function types} A generic function type is a function type adorned with a generic signature. Generic function types only appear as the interface type of a function declaration in a generic context. Swift's type system does not support rank-2 polymorphism, so an expression in the Swift language can never have a generic function type. When referenced from within an expression, the interface type of a generic function declaration always has substitutions applied, making the type of the expression into a non-generic function type. Generic function types have a special behavior when their canonical type is computed. Since generic function types carry a generic signature, the parameter types and return type of a \emph{canonical} generic function type are actually \emph{reduced} types with respect to this generic signature (Section~\ref{reducedtypes}). \index{metatype type} \index{concrete metatype type} \index{instance type} \paragraph{Metatype types} Types are values in Swift, and a metatype is the type of a type used as a value. The metatype of a type \texttt{T} is written as \texttt{T.Type}. The type \texttt{T} is the \emph{instance type} of the metatype. For example, \texttt{(()~->~()).Type} is the metatype type with the instance type \verb|() -> ()|. This metatype has one value, the function type \verb|() -> ()|. Metatypes are sometimes referred to as \emph{concrete metatypes}, to distinguish them from \emph{existential metatypes}. Most concrete metatypes are singleton types, where the only value is the instance type itself. One exception is class metatypes for non-final classes; the values of a class metatype include the class type itself, but also all subclasses of the class. \section{Abstract Types} \index{generic parameter type} \paragraph{Generic parameter types} A generic parameter type is the declared interface type of a generic parameter declaration. The sugared form references the declaration, and the canonical form only stores a depth and an index. This is all described in Chapter~\ref{generic declarations}. Care must be taken not to print canonical generic parameter types in diagnostics, to avoid surfacing the ``\ttgp{1}{2}'' notation to the user. Section~\ref{genericsigsourceref} shows a trick to transform a canonical generic parameter type back into its sugared form using a generic signature. \index{bound dependent member type} \index{unbound dependent member type} \index{dependent member type} \paragraph{Dependent member types} A dependent member type stores a base type together with an identifier or an associated type declaration. The former is called an \emph{unbound} dependent member type, and the latter is \emph{bound}. Unbound and bound dependent member types do not present as different concepts in the language. Instead, Chapter~\ref{typeresolution} describes how dependent member types written in source can be first resolved from a type representation into their unbound form, and then resolved again into a bound form once a generic signature is available. We will write \texttt{T.A} for the unbound dependent member type with base type \texttt{T} and identifier \texttt{A}, or \texttt{T.[P]A} for the bound dependent member type with base type \texttt{T} and associated type declaration \texttt{A} from protocol \texttt{P}. The latter is not valid Swift syntax, but the notation is useful to distinguish the two. A dependent member type is \emph{proper} if the base type is a generic parameter or another proper dependent member type. Improper dependent member types appear internally in the expression type checker and associated type inference, but are not ever constructed by type resolution. For the most part you can ignore them. \index{type parameter} \index{interface type} A \emph{type parameter} is a generic parameter type or a proper dependent member type. A type that might contain type parameters but is not necessarily a type parameter itself is called an \emph{interface type}. Type parameters are discussed in Section~\ref{typeparams} and Section~\ref{reducedtypes}. \index{archetype type} \index{reduced type} \paragraph{Archetype types} Type parameters only have meaning when considered together with a generic signature; archetypes are an alternate ``self-describing'' representation that stores their local requirements. An archetype is always part of a \emph{generic environment}, a concept introduced in Chapter~\ref{genericenv}. Archetypes store a reduced type parameter together with their generic environment, and the generic environment stores its generic signature; this signature describes the requirements imposed on the archetype's type parameter. In the source language, archetypes and type parameters do not present as distinct concepts, but the compiler uses them internally in different contexts. In diagnostics, an archetype is printed as the type parameter it represents. To distinguish an archetype from its type parameter, we're going to use the notation \archetype{T}. Archetypes are also used to represent opaque return types (Section~\ref{opaquearchetype}) and opened existential types (Section~\ref{open existential archetypes}). \index{contextual type} A type that might contain archetypes but is not necessarily an archetype itself is called a \emph{contextual type}. \index{protocol type} \paragraph{Protocol types} A protocol type is the declared interface type of a protocol declaration. Protocol types are nominal types, but they never have generic arguments or parent types, and so there is exactly one protocol type for a given protocol declaration. A protocol type is also a special kind of type called a \emph{constraint type}, described in Section~\ref{constraints}. A protocol type represents a conformance requirement to its protocol. A protocol type is never the type of a value in Swift; the concept of a type-erased container is represented with an existential type. \index{constraint type} \index{protocol composition type} \index{AnyObject} \paragraph{Protocol composition types} A protocol composition type is a constraint type with a list of members. On the right hand side of a conformance requirement, protocol compositions \emph{expand} into a list of generic requirements, one for each member of the composition, as described in Section~\ref{requirement desugaring}. The members can include protocol types, a class type (at most one), and the \texttt{AnyObject} layout constraint: \begin{quote} \begin{verbatim} P & Q P & AnyObject SomeClass & P \end{verbatim} \end{quote} \index{parameterized protocol type} \index{constrained protocol type} \paragraph{Parameterized protocol types} A parameterized protocol type\footnote{The evolution proposal calls them ``constrained protocol types''; here we're going to use the terminology that appeared in the compiler implementation itself. Perhaps the latter should be renamed to match the evolution proposal at some point.} stores a protocol type together with a list of generic arguments. As a constraint type, it expands to a conformance requirement together with one or more same-type requirements. The same-type requirements constrain the protocol's \emph{primary associated types}, which are declared with a syntax similar to a generic parameter list. The written representation looks like a generic nominal type, except the named declaration is a protocol, for example, \texttt{Sequence}. Full details appear in Section~\ref{protocols}. Parameterized protocol types were introduced in Swift 5.7 \cite{se0346}. \index{existential type} \paragraph{Existential types} An existential type wraps a constraint type and represents a value with some unknown dynamic type satisfying this constraint. The written syntax is \texttt{any~P}, where \texttt{P} is a constraint type. The \texttt{any} keyword was introduced in Swift 5.6 \cite{se0355}. Prior to Swift 5.6, existential types and constraint types were the same concept, both in the language and in the compiler implementation. The existential type wrapper was introduced at the same time as the \texttt{any} keyword. Existential types are described in Chapter~\ref{existentialtypes}. \index{existential metatype type} \paragraph{Existential metatype types} An existential metatype represents a metatype value whose instance type satisfies some constraint type. For example, if \texttt{P} is a protocol type, the values of the existential metatype \texttt{any P.Type} are the concrete metatypes of types conforming to \texttt{P}. An existential metatype is distinct from the \emph{concrete} metatype of an existential type, which is the type with one value, the existential type \texttt{any~P}. Before the introduction of the \texttt{any} keyword, existential metatypes were written as \texttt{P.Type}, and the concrete metatype of an existential as \texttt{P.Protocol}. This created an edge case, because for all non-protocol types \texttt{T}, \texttt{T.Type} is always the concrete metatype. The new spelling for an existential metatype is \texttt{any P.Type} or \texttt{any (P.Type)}, while the concrete metatype for the existential type itself is now written as \texttt{(any~P).Type}---note that the parentheses follow the tree structure of the types. \begin{listing}\captionabove{Dynamic Self type example}\label{dynamic self example} \begin{Verbatim} class Base { required init() {} func dynamicSelf() -> Self { // the type of `self' in a method returning `Self' is // the dynamic Self type. return self } func clone() -> Self { return Self() } func invalid1() -> Self { return Base() } func invalid2(_: Self) {} } class Derived: Base {} let y = Derived().dynamicSelf() // y has type `Derived' let z = Derived().clone() // z has type `Derived' \end{Verbatim} \end{listing} \index{dynamic Self type} \paragraph{Dynamic Self types} The dynamic Self type appears when a class method declares a return type of \texttt{Self}. In this case, the object is known to have the same dynamic type as the base of the method call, which might be a subclass of the method's class. The dynamic Self type wraps a class type, which is the static upper bound for the type. \index{SILGen} This concept comes from Objective-C, where it is called \texttt{instancetype}. The dynamic Self type in many ways behaves like a generic parameter, but it is not represented as one; the type checker and SILGen implement support for it directly. \begin{example} Listing~\ref{dynamic self example} demonstrates some of the behaviors of the dynamic Self type. Two invalid cases are shown; \texttt{invalid1()} is rejected because the type checker cannot prove that the return type is always an instance of the dynamic type of \texttt{self}, and \texttt{invalid2()} is rejected because \texttt{Self} appears in contravariant position. Note that \texttt{Self} has a different interpretation inside a non-class type declaration. In a protocol declaration, \texttt{Self} is the implicit generic parameter (Section~\ref{protocols}). In a struct or enum declaration, \texttt{Self} is the declared interface type (Section~\ref{identtyperepr}). \end{example} \section{Sugared Types}\label{sugared types} \index{sugared type} Sugared generic parameter types were already described in the previous section. Of the remaining kinds of sugared types, type alias types are defined by the user, and the other three are built-in to the language. \index{type alias type} \paragraph{Type alias types} A type alias type represents a reference to a type alias declaration. It contains an optional parent type, a substitution map, and the substituted underlying type. The canonical type of a type alias type is the substituted underlying type. The type alias type's substitution map is formed in type resolution, from any generic arguments applied to the type alias type declaration itself, together with the generic arguments of the base type (Section~\ref{identtyperepr}). Type resolution applies this substitution map to the underlying type of the type alias declaration to compute the substituted underlying type. The type alias type also preserves this substitution map for printing, and for requirement inference (Section~\ref{requirementinference}). \index{optional sugared type} \paragraph{Optional types} The optional type is written as \texttt{T?} for some object type \texttt{T}; its canonical type is \texttt{Optional}. \index{array sugared type} \paragraph{Array types} The array type is written as \texttt{[E]} for some element type \texttt{E}; its canonical type is \texttt{Array}. \index{dictionary sugared type} \paragraph{Dictionary types} The dictionary type is written as \texttt{[K: V]} for some key type \texttt{K} and value type \texttt{V}; its canonical type is \texttt{Dictionary}. \section{Built-in Types}\label{builtin type} \index{built-in type} What users think of as fundamental types, such as \texttt{Int} and \texttt{Bool}, are defined as structs in the standard library. These structs wrap the various \emph{built-in types} which are understood directly by the compiler. \index{compiler intrinsic} Built-in types are not nominal types, so they cannot contain members, cannot have new members added via extensions, and cannot conform to protocols. Values of built-in types are manipulated using special \emph{compiler intrinsics}. The standard library wraps built-in types in nominal types, and defines methods and operators on those nominal types which call the intrinsic functions, thereby presenting the actual interface expected by users. For example, the \texttt{Int} struct defines a single stored property named \texttt{\_value} with type \texttt{Builtin.Int}. The \texttt{+} operator on \texttt{Int} is implemented by extracting the \texttt{\_value} stored property from a pair of \texttt{Int} values, calling the \texttt{Builtin.sadd\_with\_overflow\_Int64} compiler intrinsic to add them together, and finally, wrapping the resulting \texttt{Builtin.Int} in a new instance of \texttt{Int}. \index{built-in module} Built-in types and their intrinsics are defined in the special \texttt{Builtin} module, which is a special module constructed by the compiler itself and not built from source code. The \texttt{Builtin} module is only visible when the compiler is invoked with the \texttt{-parse-stdlib} frontend flag; the standard library is built with this flag, but user code never interacts with the \texttt{Builtin} module directly. \section{Miscellaneous Types} A handful of special types do not describe the types of values, but are used by the type checker as part of the type checking process. \index{reference storage type} \index{weak reference type} \index{unowned reference type} \paragraph{Reference storage types} A reference storage type is the type of a variable declaration adorned with the \texttt{weak}, \texttt{unowned} or \texttt{unowned(unsafe)} attribute. The wrapped type must be a class type, a class-constrained archetype, or class-constrained existential type. Reference storage types arise as the interface types of variable declarations, and as the types of SIL instructions. The types of expressions never contain reference storage types. \index{placeholder type} \paragraph{Placeholder types} A placeholder type represents a generic argument to be inferred by the type checker. The written representation is the underscore ``\texttt{\_}''. They can only appear in a handful of restricted contexts and do not remain after type checking. The expression type checker replaces placeholder types with type variables, solves the constraint system, and finally replaces the type variables with their fixed concrete types. For example, here the interface type of the \texttt{myPets} local variable is inferred as \texttt{Array}: \begin{Verbatim} let myPets: Array<_> = ["Zelda", "Giblet"] \end{Verbatim} Placeholder types were introduced in Swift 5.6~\cite{se0315}. \index{unbound generic type} \paragraph{Unbound generic types} Unbound generic types predate placeholder types, and can be seen as a special case. An unbound generic type is written as a named reference to a generic type declaration, without generic arguments applied. An unbound generic type behaves like a generic nominal type where all generic arguments are placeholder types. In the example above, the generic nominal type \texttt{Array<\_>} contains a placeholder type. The unbound generic type \texttt{Array} could have been used instead: \begin{Verbatim} let myPets: Array = ["Zelda", "Giblet"] \end{Verbatim} \index{underlying type} One other place where unbound generic types can appear is in the underlying type of a non-generic type alias, which is shorthand for declaring a generic type alias that forwards its generic arguments. For example, the following two declarations are equivalent: \begin{Verbatim} typealias MyDictionary = Dictionary typealias MyDictionary = Dictionary \end{Verbatim} Unbound generic types are also occasionally useful in diagnostics when you want to print the name of a type declaration only (like \texttt{Outer.Inner}) without the generic parameters of its declared interface type (\texttt{Outer.Inner}, for example). \index{type variable type} \index{constraint solver arena} \paragraph{Type variable types} A type variable represents the future inferred type of an expression in the expression type checker's constraint system. The expression type checker builds the constraint system by walking an expression recursively, assigning new type variables to the types of sub-expressions and recording constraints between these type variables. Solving the constraint system can have three possible outcomes: \begin{itemize} \item \textbf{One solution}---every type variable has exactly one fixed type assignment; the expression is well-typed. \item \textbf{No solutions}---some constraints could not be solved, indicating erroneous input. \item \textbf{Multiple solutions}---the constraint system is underspecified and some type variables can have multiple valid fixed type assignments. \end{itemize} In the case of multiple solutions, the type checker uses heuristics to pick the ``best'' solution for the entire expression; if none of the solutions are clearly better than the others, an ambiguity error is diagnosed. Otherwise, we proceed as if the solver only found the best solution. The final step applies the solution to the expression by replacing type variables appearing in the types of sub-expressions with their fixed types. The utmost care must be taken when working with type variables because unlike all other types, they are not allocated with indefinite lifetime. Type variables live in the constraint solver arena, which grows and shrinks as the solver explores branches of the solution space. Types that \emph{contain} type variables, and other structures that recursively contain such types, also need to be allocated in the constraint solver arena. Type variables ``escaping'' from the constraint solver can crash the compiler in odd ways. Assertions should be used to rule out type variables from appearing in the wrong places. The printed representation of a type variable is \texttt{\$Tn}, where \texttt{n} is an incrementing integer local to the constraint system. One way you can see type variables in action is by passing the \texttt{-Xfrontend~-debug-constraints} compiler flag. \index{l-value type} \index{object type} \paragraph{L-value types} An l-value type represents the type of an expression appearing on the left hand side of an assignment operator (hence the ``l'' in l-value), or as an argument to an \texttt{inout} parameter in a function call. L-value types wrap an \emph{object type} which is the type of the stored value; they print out as \texttt{@lvalue~T} where \texttt{T} is the object type, but this is not valid syntax in the language. \index{SILGen} L-value types appear in type-checked assignment expressions and call arguments for \texttt{inout} parameters. If you're familiar with C++, you can think of an l-value type as somewhat analogous to a C++ mutable reference type ``\texttt{T \&}''---unlike C++ though, they are not directly visible in the source language. \index{error type} \paragraph{Error types} Error types are returned when type substitution encounters an invalid or missing conformance (Chapter~\ref{substmaps}). In this case, the error type wraps the original type, and prints as the original type to make types coming from malformed conformances more readable in diagnostics. The expression type checker also assigns error types to invalid declaration references. This uses the singleton form of the error type, which prints as \texttt{<>}. To avoid user confusion, diagnostics containing the singleton error type should not be emitted. Generally, any expression whose type contains an error type does not need to be diagnosed, because a diagnostic should have been emitted elsewhere. \section{Source Code Reference}\label{typesourceref} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/Type.h} \item \SourceFile{include/swift/AST/Types.h} \item \SourceFile{lib/AST/Type.cpp} \end{itemize} Other source files: \begin{itemize} \item \SourceFile{include/swift/AST/TypeNodes.def} \item \SourceFile{include/swift/AST/TypeVisitor.h} \item \SourceFile{include/swift/AST/CanTypeVisitor.h} \end{itemize} \apiref{Type}{class} Represents an immutable, uniqued type. Meant to be passed as a value, it stores a single instance variable, a \texttt{TypeBase *} pointer. The \texttt{getPointer()} method returns this pointer. The pointer is not \texttt{const}, however neither \texttt{TypeBase} nor any of its subclasses define any mutating methods. The pointer may be \texttt{nullptr}; the default constructor \texttt{Type()} constructs an instance of a null type. Most methods will crash if called on a null type; only the implicit \texttt{bool} conversion and \texttt{getPointer()} are safe. The \texttt{getPointer()} method is only used occasionally, because types are usually passed as \texttt{Type} and not \texttt{TypeBase *}, and \texttt{Type} overloads \texttt{operator->} to forward method calls to the \texttt{TypeBase *} pointer. While most operations on types are actually methods on \texttt{TypeBase}, a few methods are also defined on \texttt{Type} itself (these are called with ``\texttt{.}'' instead of ``\texttt{->}''). \index{s-expression} \begin{description} \item[Various traversals] \texttt{walk()} is a general pre-order traversal where the callback returns a tri-state value---continue, stop, or skip a sub-tree. Built on top of this are two simpler variants; \texttt{findIf()} takes a boolean predicate, and \texttt{visit()} takes a void-returning callback which offers no way to terminate the traversal. \item[Transformations] \texttt{transformWithPosition()}, \texttt{transformRec()}, \texttt{transform()}. As with the traversals, the first of the three is the most general, and the other two are built on top. In all three cases, the callback is invoked on all types contained within a type, recursively. It can either elect to replace a type with a new type, or leave a type unchanged and instead try to transform any of its child types. \item[Substitution] \texttt{subst()} implements type substitution, which is a particularly common kind of transform which replaces generic parameters or archetypes with concrete types (Section~\ref{substmapsourcecoderef}). \item[Printing] \texttt{print()} outputs the string form of a type, with many customization options; \texttt{dump()} prints the tree structure of a type in an s-expression form. The latter is extremely useful for invoking from inside a debugger, or ad-hoc print debug statements. \end{description} The \texttt{Type} class explicitly deletes the overloads of \texttt{operator==} and \texttt{operator!=} to make the choice between pointer and canonical equality explicit. To check pointer equality of possibly-sugared types, first unwrap both sides with a \texttt{getPointer()} call: \begin{Verbatim} if (lhsType.getPointer() == rhsType.getPointer()) ...; \end{Verbatim} The more common canonical type equality check is implemented by the \texttt{isEqual()} method on \texttt{TypeBase}: \begin{Verbatim} if (lhsType->isEqual(rhsType)) ...; \end{Verbatim} \apiref{TypeBase}{class} The root of the type kind hierarchy. Its instances are always uniqued and allocated by the AST context, either the permanent arena or constraint solver arena. Instances are usually wrapped in \texttt{Type}. The various subclasses correspond to the different kinds of types: \begin{itemize} \item \texttt{NominalType} and its four subclasses: \begin{itemize} \item \texttt{StructType}, \item \texttt{EnumType}, \item \texttt{ClassType}, \item \texttt{ProtocolType}. \end{itemize} \item \texttt{BoundGenericNominalType} and its three subclasses: \begin{itemize} \item \texttt{BoundGenericStructType}, \item \texttt{BoundGenericEnumType}, \item \texttt{BoundGenericClassType}. \end{itemize} \item The structural types \texttt{TupleType}, \texttt{MetatypeType}. \item \texttt{AnyFunctionType} and its two subclasses: \begin{itemize} \item \texttt{FunctionType}, \item \texttt{GenericFunctionType}. \end{itemize} \item \texttt{GenericTypeParamType}, \texttt{DependentMemberType}, the two type parameter types. \item \texttt{ArchetypeType}, and its three subclasses: \begin{itemize} \item \texttt{PrimaryArchetypeType}, \item \texttt{OpenedArchetypeType}, \item \texttt{OpaqueArchetypeType}. \end{itemize} \item The abstract types: \begin{itemize} \item \texttt{ProtocolCompositionType}, \item \texttt{ParameterizedProtocolType}, \item \texttt{ExistentialType}, \item \texttt{ExistentialMetatypeType}, \item \texttt{DynamicSelfType}. \end{itemize} \item \texttt{SugarType} and its four subclasses: \begin{itemize} \item \texttt{TypeAliasType}, \item \texttt{OptionalType}, \item \texttt{ArrayType}, \item \texttt{DictionaryType}. \end{itemize} \item \texttt{BuiltinType} and its subclasses (there are a bunch of esoteric ones; only a few are shown below): \begin{itemize} \item \texttt{BuiltinRawPointerType}, \item \texttt{BuiltinVectorType}, \item \texttt{BuiltinIntegerType}, \item \texttt{BuiltinIntegerLiteralType}, \item \texttt{BuiltinNativeObjectType}, \item \texttt{BuiltinBridgeObjectType}. \end{itemize} \item \texttt{ReferenceStorageType} and its two subclasses: \begin{itemize} \item \texttt{WeakStorageType}, \item \texttt{UnownedStorageType}. \end{itemize} \item Miscellaneous types: \begin{itemize} \item \texttt{UnboundGenericType}, \item \texttt{PlaceholderType}, \item \texttt{TypeVariableType}, \item \texttt{LValueType}, \item \texttt{ErrorType}. \end{itemize} \end{itemize} Each concrete subclass defines some set of static factory methods, usually named \texttt{get()} or similar, which take the structural components and construct a new, uniqued type of this kind. There are also getter methods, prefixed with \texttt{get}, which project the structural components of each kind of type. It would be needlessly duplicative to list all of the getter methods for each subclass of \texttt{TypeBase}; you can pursue them yourself by looking at \SourceFile{include/swift/AST/Types.h}. \paragraph{Dynamic casts} Subclasses of \texttt{TypeBase *} are identifiable at runtime via the \verb|is<>|, \verb|castTo<>| and \verb|getAs<>| template methods. To check if a type has a specific kind, use \verb|is<>|: \begin{Verbatim} Type type = ...; if (type->is()) ...; \end{Verbatim} To conditionally cast a type to a specific kind, use \verb|getAs<>|, which returns \verb|nullptr| if the cast fails: \begin{Verbatim} if (FunctionType *funcTy = type->getAs()) ...; \end{Verbatim} Finally, \verb|castTo<>| is an unconditional cast which asserts that the type has the required kind: \begin{Verbatim} FunctionType *funcTy = type->castTo(); \end{Verbatim} These template methods desugar the type if it is a sugared type, and the casted type can never itself be a sugared type. This is usually what you want; for example, if \texttt{type} is the \texttt{Swift.Void} type alias type, then \texttt{type->is()} returns true, because it is for all intents and purposes a tuple (an empty tuple), except when printed in diagnostics. There are also top-level template functions \verb|isa<>|, \verb|dyn_cast<>| and \verb|cast<>| that operate on \texttt{TypeBase *}. Using these with \texttt{Type} is an error; you must explicitly unwrap the pointer with \texttt{getPointer()}. These casts do not desugar, and permit casting to sugared types. This is occasionally useful if you need to handle sugared types differently from canonical types for some reason: \begin{Verbatim} Type type = ...; if (isa(type.getPointer())) ...; \end{Verbatim} \paragraph{Canonical types} The \texttt{getCanonicalType()} method outputs a \texttt{CanType} wrapping the canonical form of this \texttt{TypeBase *}. The \texttt{isCanonical()} method checks if a type is canonical. \index{type kind} \index{exhaustive switch} \index{visitor pattern} \paragraph{Visitors} If you need to exhaustively handle each kind of type, the simplest way is to switch over the kind, which is an instance of the \texttt{TypeKind} enum, like this: \begin{Verbatim} Type ty = ...; switch (ty->getKind()) { case TypeKind::Struct: { auto *structTy = ty->castTo(); ... } case TypeKind::Enum: ... case TypeKind::Class: ... } \end{Verbatim} However, in most cases it is more convenient to use the \emph{visitor pattern} instead. You can subclass \texttt{TypeVisitor} and override various \texttt{visit\emph{Kind}Type()} methods, then hand the type to the visitor's \texttt{visit()} method, which performs the switch and dynamic cast dance above: \begin{Verbatim} class MyVisitor: public TypeVisitor { public: void visitStructType(StructType *ty) { ... } }; MyVisitor visitor; Type ty = ...; visitor.visit(ty); \end{Verbatim} The \texttt{TypeVisitor} also defines various methods corresponding to abstract base classes in the \texttt{TypeBase} hierarchy, so, for example, you can override \texttt{visitNominalType()} to handle all nominal types at once. The \texttt{TypeVisitor} preserves information if it receives a sugared type; for example, visiting \texttt{Int?}\ will call \texttt{visitOptionalType()}, while visiting \texttt{Optional} will call \texttt{visitBoundGenericEnumType()}. In the common situation where the semantics of your operation do not depend on type sugar, you can use the \texttt{CanTypeVisitor} template class instead. Here, the \texttt{visit()} method takes a \texttt{CanType}, so \texttt{Int?}\ will need to be canonicalized to \texttt{Optional} before being passed in. \paragraph{Nominal types} A handful of methods on \texttt{TypeBase} exist which perform a desugaring cast to a nominal type (so they will also accept a type alias type or other sugared type), and return the nominal type declaration, or \texttt{nullptr} if the type isn't of a nominal kind: \begin{itemize} \item \texttt{getAnyNominal()} returns the nominal type declaration of \texttt{UnboundGenericType}, \texttt{NominalType} or \texttt{BoundGenericNominalType}. \item \texttt{getNominalOrBoundGenericNominal()} returns the nominal type declaration of a \texttt{NominalType} or \texttt{BoundGenericNominalType}. \item \texttt{getStructOrBoundGenericStruct()} returns the type declaration of a \texttt{StructType} or \texttt{BoundGenericStructType}. \item \texttt{getEnumOrBoundGenericEnum()} returns the type declaration of an \texttt{EnumType} or \texttt{BoundGenericEnumType}. \item \texttt{getClassOrBoundGenericClass()} returns the class declaration of a \texttt{ClassType} or \texttt{BoundGenericClassType}. \item \texttt{getNominalParent()} returns the parent type stored by an \texttt{UnboundGenericType}, \texttt{NominalType} or \texttt{BoundGenericNominalType}. \end{itemize} \paragraph{Recursive properties} Various predicates are computed when a type is constructed and are therefore cheap to check: \begin{itemize} \item \texttt{hasTypeVariable()} determines whether the type was allocated in the permanent arena or the constraint solver arena. \item \texttt{hasArchetype()}, \texttt{hasOpaqueArchetype()}, \texttt{hasOpenedExistential()}. \item \texttt{hasTypeParameter()}. \item \texttt{hasUnboundGenericType()}, \texttt{hasDynamicSelf()}, \texttt{hasPlaceholder()}. \item \texttt{isLValue()}---despite the ``\texttt{is}'' in the name, this is a recursive property and not the same as \verb|ty->is()|. \end{itemize} \paragraph{Utility operations} These encapsulate frequently-useful patterns. \begin{itemize} \item \texttt{getOptionalObjectType()} returns the type \texttt{T} if the type is some \texttt{Optional}, otherwise it returns the null type. \item \texttt{getMetatypeInstanceType()} returns the type \texttt{T} if the type is some \texttt{T.Type}, otherwise it returns \texttt{T}. \item \texttt{mayHaveMembers()} answers if this is a nominal type, archetype, existential type or dynamic Self type. \end{itemize} \paragraph{Recovering the AST context} All non-canonical types point at their canonical type, and canonical types point at the AST context. \begin{itemize} \item \texttt{getASTContext()} returns the singleton AST context from a type. \end{itemize} \apiref{CanType}{class} The \texttt{CanType} class wraps a \texttt{TypeBase *} pointer which is known to be canonical. The pointer can be recovered with the \texttt{getPointer()} method. It forwards various methods to either \texttt{Type} or \texttt{TypeBase~*}. There is an implicit conversion from \texttt{CanType} to \texttt{Type}. In the other direction, the explicit one-argument constructor \texttt{CanType(Type)} asserts that the type is canonical; however, most of the time the \texttt{getCanonicalType()} method on \texttt{TypeBase} is used instead. The \texttt{operator==} and \texttt{operator!=} operators are used to test \texttt{CanType} for pointer equality. The \texttt{isEqual()} method described earlier implements canonical equality on sugared types by first canonicalizing both sides, and then checking the resulting canonical types for pointer equality. Therefore, the following are equivalent: \begin{Verbatim} if (lhsType->isEqual(rhsType)) ...; if (lhsType->getCanonicalType() == rhsType->getCanonicalType()) ...; \end{Verbatim} The \texttt{CanType} class can be used with the \verb|isa<>|, \verb|cast<>| and \verb|dyn_cast<>| templates. Instead of returning the actual \texttt{TypeBase} subclass, the latter two return a \emph{canonical type wrapper} for that subclass. Every subclass of \texttt{TypeBase} has a corresponding canonical type wrapper; if the subclass is named \texttt{FooType}, the canonical wrapper is named \texttt{CanFooType}. Canonical type wrappers forward \texttt{operator->} to the specific \texttt{TypeBase} subclass, and define methods of their own (called with ``\texttt{.}'') which project the known-canonical components of the type. For example, \texttt{FunctionType} has a \texttt{getResult()} method returning \texttt{Type}, so the canonical type wrapper \texttt{CanFunctionType} has a \texttt{getResult()} method returning a \texttt{CanType}. The wrapper methods are not exhaustive, and their use is not required because you can instead make explicit calls to \texttt{CanType(Type)} or \texttt{getCanonicalType()} after projecting a type that is known to be canonical. \begin{Verbatim} CanType canTy = ...; CanFunctionType canFuncTy = cast(canTy); // method on CanFunctionType: returns CanType(canFuncTy->getResult()) CanType canResultTy = canFuncTy.getResult(); // operator-> forwards to method on FunctionType: returns Type CanType resultTy = CanType(canFuncTy->getResult()); \end{Verbatim} \apiref{AnyFunctionType}{class} This is the base class of \texttt{FunctionType} and \texttt{GenericFunctionType}. \begin{itemize} \item \texttt{getParams()} returns an array of \texttt{AnyFunctionType::Param}. \item \texttt{getResult()} returns the result type. \item \texttt{getExtInfo()} returns an instance of \texttt{AnyFunctionType::ExtInfo} storing the additional non-type attributes. \end{itemize} \apiref{AnyFunctionType::Param}{class} This represents a parameter in a function type's parameter list. \begin{itemize} \item \texttt{getPlainType()} returns the type of the parameter. If the parameter is variadic (\texttt{T...}), this is the element type \texttt{T}. \item \texttt{getParameterType()} same as above, but if the parameter is variadic, returns the type \texttt{Array}. \item \texttt{isVariadic()}, \texttt{isAutoClosure()} are the special behaviors. \item \texttt{getValueOwnership()} returns an instance of the \texttt{ValueOwnership} enum. \end{itemize} \apiref{ValueOwnership}{enum class} The possible ownership attributes on a function parameter. \begin{itemize} \item \texttt{ValueOwnership::Default} \item \texttt{ValueOwnership::InOut} \item \texttt{ValueOwnership::Shared} \item \texttt{ValueOwnership::Owned} \end{itemize} \apiref{AnyFunctionType::ExtInfo}{class} This represents the non-type attributes of a function type. \chapter{Declarations}\label{decls} \index{value declaration} \index{interface type} \index{extension declaration} \index{top-level code declaration} The different kinds of declarations are categorized into a taxonomy. A \emph{value declaration} has a name that can be directly referenced from an expression. Each value declaration also has an \emph{interface type}. Roughly speaking, this is the type of an expression referencing the declaration. Most declarations are value declarations, but there are some important exceptions. Extensions, described in Chapter~\ref{extensions}, add members to a type but do not themselves have names. A \emph{top-level code declaration} is another kind of declaration that is not a value declaration; it holds the statements and expressions at the top level of a source file. \index{declared interface type} \index{type declaration} \index{metatype type} A \emph{type declaration} is an important kind of value declaration. A type declaration declares a new type that you can write down in a type annotation; this is the \emph{declared interface type} of the type declaration. Since type declarations are value declarations, they also have an interface type, which is the type of an expression referencing the type declaration. When a type is used as a value, the type of the value is a metatype. A type declaration's interface type is therefore the metatype of its declared interface type. \index{struct declaration} \index{enum declaration} \index{class declaration} \index{nominal type declaration} \index{protocol declaration} Struct, enum and class declarations are called \emph{nominal type declarations}. Protocols are also nominal type declarations, but they are special enough it is best to think of them as a separate kind of entity. \index{self interface type} The \emph{self interface type} of a type or extension declaration is the type from which the \texttt{self} parameter type of a method is derived. In a struct, enum or class declaration, the self interface type and declared interface type coincide. In a protocol, the self interface type is the protocol \texttt{Self} type (Section~\ref{protocols}). In the following, the nominal type declaration \texttt{Fish} is referenced twice, first as a type annotation, and then in an expression: \begin{Verbatim} struct Fish {} let myFish: Fish = Fish() \end{Verbatim} This is a very simple piece of code, but there's more going on than seems at first glance. The first occurrence of \texttt{Fish} is the type annotation for the variable declaration \texttt{myFish}, so the interface type of \texttt{myFish} becomes the nominal type \texttt{Fish}. The second occurrence is inside the initial value expression of \texttt{myFish}. The callee of the call expression \texttt{Fish()} is the type expression \texttt{Fish}, whose type is the metatype \texttt{Fish.Type}. A call of a metatype is transformed into a call of the \texttt{init} member, which names a constructor declaration. Constructors are called on an instance of the metatype of a type, and return an instance of the type. So the initial value expression has the type \texttt{Fish}, which matches the interface type of \texttt{myFish}. The constructor has the interface type \verb|(Fish.Type) -> () -> Fish|. \index{declaration context} \index{module declaration} \index{function declaration} \index{variable declaration} \index{generic parameter declaration} \index{closure expression} \index{source file} A \emph{declaration context} is an entity that can contain declarations. Declaration contexts are distinct from declarations. Module declarations, nominal type declarations, extension declarations and function declarations are also declaration contexts. Not all declarations are declaration contexts; variable declarations and generic parameter declarations are not. Furthermore, some declaration contexts are not declarations. A closure expression is not a declaration, but it is a declaration context, because the body of a closure can contain variable, function and type declarations. A source file is another kind of declaration context that is not a declaration. A summary of the examples so far is shown in Table~\ref{taxonomy examples}. \begin{table}\captionabove{Classifying various entities in our taxonomy}\label{taxonomy examples} \begin{tabular}{|l|>{\centering}p{2.3cm}|>{\centering}p{2.3cm}|>{\centering}p{2.3cm}|>{\centering\arraybackslash}p{2.3cm}|} \hline Entity kind&Decl?&Value decl?&Type decl?&Decl context?\\ \hline \hline Module&\checkmark&$\checkmark$&$\checkmark$&\checkmark\\ Source file&$\times$&$\times$&$\times$&\checkmark\\ Nominal type&\checkmark&\checkmark&\checkmark&\checkmark\\ Extension&\checkmark&$\times$&$\times$&\checkmark\\ Generic parameter&\checkmark&\checkmark&\checkmark&$\times$\\ Function&\checkmark&\checkmark&$\times$&\checkmark\\ Variable&\checkmark&\checkmark&$\times$&$\times$\\ Top-level code&$\checkmark$&$\times$&$\times$&\checkmark\\ Closure expression&$\times$&$\times$&$\times$&\checkmark\\ Call expression&$\times$&$\times$&$\times$&$\times$\\ \hline \end{tabular} \end{table} Declarations and declaration contexts are nested within each other. The roots in this hierarchy are module declarations; all other declarations and declaration contexts point at a parent declaration context. Source files are always immediate children of module declarations. \index{local declaration context} \index{subscript declaration} \index{initializer declaration context} A \emph{local context} is any declaration context that is not a module, source file, type declaration or extension. Top-level code declarations, function declarations and closure expressions are three kinds of local contexts we've already seen. The three remaining kinds of local context are subscript declarations, enum element declarations and initializer contexts: \begin{itemize} \item Subscripts and enum elements are local contexts, because they contain their parameter declarations. \item Subscript declarations can also be generic, so they need to contain their generic parameters. \item Initializer contexts represent the initial value expression of a variable that is itself not a child of a local context. This ensures that any declarations appearing in the initial value expression of a variable are always children of a local context. \end{itemize} \index{top-level type declaration} \index{nested type declaration} \index{local type declaration} \index{top-level type declaration} \index{nested type declaration} \index{local type declaration} \index{top-level function declaration} \index{method declaration} \index{local function declaration} \index{global variable declaration} \index{stored property declaration} \index{local variable declaration} There is special terminology for type declarations in different kinds of declaration contexts: \begin{itemize} \item A \emph{top-level type} is an immediate child of a source file. \item A \emph{nested type} or \emph{member type} is an immediate child of a nominal type declaration or an extension. \item A \emph{local type} is an immediate child of a local context. \end{itemize} Similarly, for functions: \begin{itemize} \item A \emph{top-level function} or \emph{global function} is an immediate child of a source file. \item A \emph{method} is an immediate child of a nominal type declaration or an extension. \item A \emph{local function} is an immediate child of a local context. \end{itemize} And finally, for variables: \begin{itemize} \item A \emph{global variable} is an immediate child of a source file. \item A \emph{property} is an immediate child of a nominal type declaration or an extension. \item A \emph{local variable} is an immediate child of a local context. \end{itemize} \section{Type Declarations}\label{type declarations} \index{struct declaration} \index{enum declaration} \index{class declaration} \index{declared interface type} \paragraph{Struct, enum and class declarations} These are the concrete nominal types. The declared interface type of a non-generic nominal type declaration is a nominal type. If the nominal type declaration is generic, the declared interface type is a generic nominal type where the generic arguments are the declaration's generic parameters. Concrete nominal types can be nested inside of other declaration contexts, with a few limitations described in Section~\ref{nested nominal types}. The declared interface type reflects this nesting. For example, the declared interface type of \texttt{Outer.Inner} is the generic nominal type \texttt{Outer.Inner}: \begin{Verbatim} struct Outer { struct Inner {} } \end{Verbatim} Classes can inherit from other classes; Chapter~\ref{classinheritance} describes how inheritance interacts with generics. \index{protocol declaration} \paragraph{Protocol declarations} The declared interface type of a protocol declaration is the protocol type \texttt{P}. Protocols are the fourth kind of nominal type, but they behave differently in many ways, because they do not have concrete instances. Protocol declarations are described in Chapter~\ref{protocols}. \index{type alias declaration} \paragraph{Type alias declarations} Type aliases assign a new name to an underlying type. The declared interface type is a type alias type whose canonical type is the underlying type of the type alias. The special case of type aliases in protocols is discussed in Section~\ref{protocol type alias}. \index{generic parameter declaration} \paragraph{Generic parameter declarations} Generic parameter declarations appear inside generic parameter lists of generic declarations. The declared interface type of a generic parameter declaration is the sugared generic parameter type that prints as the name of the declaration. The canonical type of this type is the generic parameter type \ttgp{d}{i}, where \texttt{d} is the depth and \texttt{i} is the index. Generic parameter declarations are described in Chapter~\ref{generic declarations}. \index{associated type declaration} \paragraph{Associated type declarations} Associated type declarations appear inside protocols. The declared interface type of an associated type \texttt{A} is a bound dependent member type \texttt{Self.[P]A} referencing the declaration of \texttt{A}, with the \texttt{Self} generic parameter of the protocol as the base type. Associated type declarations are described in Section~\ref{protocols}. \section{Function Declarations}\label{func decls} \index{function declaration} \index{generic function type} \paragraph{Function declarations} Functions can either appear at the top level, inside of a local context such as another function, or as a member of a type, called a method. If a function is itself generic or nested inside of a generic context, the interface type is a generic function type, otherwise it is a function type. The interface type of a function is constructed from the interface types of the function's parameter declarations, and the function's return type. If the return type is omitted, it becomes the empty tuple type \texttt{()}. For methods, this function type is then wrapped in another level of function type representing the base of the call which becomes the \texttt{self} parameter of the method. \index{self interface type} \index{method self parameter} The \texttt{self} parameter's type and parameter flags are constructed from the self interface type of the method's type declaration, and various attributes of the method: \begin{itemize} \item If the method is \texttt{mutating}, the \texttt{self} parameter becomes \texttt{inout}. \item If the method returns the dynamic Self type, the \texttt{self} parameter type is wrapped in the dynamic Self type. \item Finally, if the method is \texttt{static}, the \texttt{self} parameter is wrapped in a metatype. \end{itemize} This can be summarized as follows; note that \texttt{(Self)} parameter list means the self interface type of the method's type declaration, together with any additional parameter flags computed via the above: \begin{quote} \begin{tabular}{|c|c|l|l|} \hline Generic?&Method?&Interface type\\ \hline $\times$&$\times$&\texttt{(Params...)\ -> Result}\\ \checkmark&$\times$&\texttt{ (Params...)\ -> Result}\\ $\times$&\checkmark&\texttt{(Self) -> (Params...)\ -> Result}\\ \checkmark&\checkmark&\texttt{ (Self) -> (Params...)\ -> Result}\\ \hline \end{tabular} \end{quote} The two levels of function type in the interface type of a method mirror the two-level structure of a method call expression \texttt{foo.bar(1, 2)}, shown in Figure~\ref{method call expr}: \begin{itemize} \item The self apply expression \texttt{foo.bar} applies the single argument \texttt{foo} to the method's \texttt{self} parameter. The type of the self apply expression is the method's inner function type. \item The outer call applies the argument list \texttt{(1, 2)} to the inner function type. The type of the outer call expression is the method's return type. \end{itemize} \begin{figure}\captionabove{Two levels of function application in a method call \texttt{foo.bar(1, 2)}}\label{method call expr} \begin{center} \begin{tikzpicture}[% grow via three points={one child at (0.5,-0.7) and two children at (0.5,-0.7) and (0.5,-1.4)}, edge from parent path={[->] (\tikzparentnode.south) |- (\tikzchildnode.west)}] \node [class] {\vphantom{p}call expression: \texttt{Result}} child { node [class] {\vphantom{p}self apply expression: \texttt{(Int, Int) -> ()}} child { node [class] {\vphantom{p}declaration reference expression: \texttt{Foo.bar}}} child { node [class] {\vphantom{p}declaration reference expression: \texttt{foo}}}} child [missing] {} child [missing] {} child { node [class] {\vphantom{p}argument list} child { node [class] {\vphantom{p}integer literal expression: \texttt{1}}} child { node [class] {\vphantom{p}integer literal expression: \texttt{2}}}} child [missing] {} child [missing] {} child [missing] {}; \end{tikzpicture} \end{center} \end{figure} \index{constructor declaration} \paragraph{Constructor declarations} Constructor declarations always appear as members of other types, and are named \texttt{init}. The interface type of a constructor takes a metatype and returns an instance of the constructed type, possibly wrapped in an \texttt{Optional}. \begin{quote} \begin{tabular}{|c|l|} \hline Generic?&Interface type\\ \hline $\times$&\texttt{(Self.Type) -> (Params...)\ -> Self}\\ \checkmark&\texttt{ (Self.Type) -> (Params...)\ -> Self}\\ \hline \end{tabular} \end{quote} \index{initializer interface type} Class constructors also have a \emph{initializer interface type}, used when a subclass initializer delegates to an initializer in the superclass. The initializer interface type is the same as the interface type, except it takes a self value instead of a self metatype. \begin{quote} \begin{tabular}{|c|l|} \hline Generic?&Initializer interface type\\ \hline $\times$&\texttt{(Self) -> (Params...)\ -> Self}\\ \checkmark&\texttt{ (Self) -> (Params...)\ -> Self}\\ \hline \end{tabular} \end{quote} \index{destructor declaration} \paragraph{Destructor declarations} Destructor declarations cannot have a generic parameter list, a \texttt{where} clause, or a parameter list. Formally they take no parameters and return \texttt{()}. \begin{quote} \begin{tabular}{|c|l|} \hline Generic?&Interface type\\ \hline $\times$&\texttt{(Self) -> ()\ -> ()}\\ \checkmark&\texttt{ (Self) -> ()\ -> ()}\\ \hline \end{tabular} \end{quote} \section{Storage Declarations} \index{storage declaration} \index{l-value type} Storage declarations represent the declaration of an l-value. Storage declarations can have zero or more associated accessor declarations. The accessor declarations are siblings of the storage declaration in the declaration context hierarchy. \index{variable declaration} \paragraph{Variable declarations} The interface type of a variable is the stored value type, possibly wrapped in a reference storage type if the variable is declared as \texttt{weak} or \texttt{unowned}. The \emph{value interface type} of a variable is the storage type without any wrapping. For historical reasons, the interface type of a property (a variable appearing inside of a type) does not include the \texttt{Self} clause, the way that method declarations do. \index{pattern binding declaration} \index{pattern binding entry} \index{pattern} \index{initial value expression} Variable declarations are always created alongside a \emph{pattern binding declaration} which represents the various ways in which variables can be bound to values in Swift. A pattern binding declaration consists of one or more \emph{pattern binding entries}. Each pattern binding entry has a \emph{pattern} and an optional \emph{initial value expression}. A pattern declares zero or more variables. \begin{example} A pattern binding declaration with a single entry, where the pattern declares a single variable: \begin{Verbatim} let x = 123 \end{Verbatim} Same as the above, except with a more complex pattern which declares a variable storing the first element of a tuple while discarding the second element: \begin{Verbatim} let (x, _) = (123, "hello") \end{Verbatim} A pattern binding declaration with a single entry, where the pattern declares two variables \texttt{x} and \texttt{y}: \begin{Verbatim} let (x, y) = (123, "hello") \end{Verbatim} A pattern binding declaration with two entries, where the first pattern declares \texttt{x} and the second declares \texttt{y}: \begin{Verbatim} let x = 123, y = "hello" \end{Verbatim} A pattern binding declaration with a single entry that does not declare any variables: \begin{Verbatim} let _ = ignored() \end{Verbatim} And finally, two pattern binding declarations, where each one pattern binding declaration has a single entry declaring a single variable: \begin{Verbatim} let x = 123 let y = "hello" \end{Verbatim} \end{example} \begin{example} If the pattern binding declaration appears outside of a local context, each entry must declare at least one variable, so both pattern binding declarations are rejected here: \begin{Verbatim} let _ = 123 struct S { let _ = "hello" } \end{Verbatim} \end{example} \index{typed pattern} \index{tuple pattern} \begin{example} A funny quirk of the pattern grammar is that typed patterns and tuple patterns do not compose in the way one might think. If ``\texttt{let x:~Int}'' is a typed pattern declaring a variable \texttt{x} type with annotation \texttt{Int}, and ``\texttt{let (x, y)}'' is a tuple pattern declaring two variables \texttt{x} and \texttt{y}, you might expect ``\texttt{let~(x:~Int,~y:~String)}'' to declare two variables \texttt{x} and \texttt{y} with type annotations \texttt{Int} and \texttt{String} respectively; what actually happens is you get a tuple pattern declaring two variables named \texttt{Int} and \texttt{String} that binds a two-element tuple with \emph{labels} \texttt{x} and \texttt{y}: \begin{Verbatim} let (x: Int, y: String) = (x: 123, y: "hello") print(Int) // huh? prints 123 print(String) // weird! prints "hello" \end{Verbatim} \end{example} \index{parameter declaration} \paragraph{Parameter declarations} Functions, enum elements and subscripts can have parameter lists; each parameter is represented by a parameter declaration. The interface type of a declaration with a parameter list is built by first computing the interface type of each parameter. Closure expressions also have parameter lists and thus parent parameter declarations. \index{value ownership kind} \index{autoclosure function type} Among other things, the parameter declaration stores the value ownership kind, the variadic flag, and the \texttt{@autoclosure} attribute. This is in fact the same exact information encoded in the parameter list of a function type. \index{argument label} \index{default argument expression} \index{closure expression} Parameter declarations of named declarations can also have argument labels and default argument expressions, which are not encoded in a function type. These phenomena are only visible when directly calling a named declaration and not a closure value. \index{subscript declaration} \paragraph{Subscript declarations} Subscripts always appear as members of types, with a special declaration name. The interface type of a subscript is a function type taking the index parameters and returning the storage type. The value interface type of a subscript is just the storage type. For historical reasons, the interface type of a subscript does not include the \texttt{Self} clause, the way that method declarations do. \begin{quote} \begin{tabular}{|c|l|l|l|} \hline Generic?&Interface type\\ \hline \hline $\times$&\texttt{(Indices...)\ -> Value}\\ $\checkmark$&\texttt{\ (Indices...)\ -> Value}\\ \hline \end{tabular} \end{quote} \index{accessor declaration} \paragraph{Accessor declarations} The interface type of an accessor depends the accessor kind. For example, getters return the value, and setters take the new value as a parameter. Property accessors do not take any other parameters; subscript accessors also take the subscript's index parameters. There is a lot more to say about accessors and storage declarations, but unfortunately, you'll have to wait for the next book. \section{Source Code Reference}\label{declarationssourceref} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/Decl.h} \item \SourceFile{include/swift/AST/DeclContext.h} \item \SourceFile{lib/AST/Decl.cpp} \item \SourceFile{lib/AST/DeclContext.cpp} \end{itemize} Other source files: \begin{itemize} \item \SourceFile{include/swift/AST/DeclNodes.def} \item \SourceFile{include/swift/AST/ASTVisitor.h} \item \SourceFile{include/swift/AST/ASTWalker.h} \end{itemize} \apiref{Decl}{class} Base class of declarations. Figure~\ref{declhierarchy} shows various subclasses, which correspond to the different kinds of declarations defined previously in this chapter. \begin{figure}\captionabove{The \texttt{Decl} class hierarchy}\label{declhierarchy} \begin{center} \begin{tikzpicture}[% grow via three points={one child at (0.5,-0.7) and two children at (0.5,-0.7) and (0.5,-1.4)}, edge from parent path={[->] (\tikzparentnode.south) |- (\tikzchildnode.west)}] \node [class] {\texttt{\vphantom{p}Decl}} child { node [class] {\texttt{\vphantom{p}ValueDecl}} child { node [class] {\texttt{\vphantom{p}TypeDecl}} child { node [class] {\texttt{\vphantom{p}NominalTypeDecl}} child { node [class] {\texttt{\vphantom{p}StructDecl}}} child { node [class] {\texttt{\vphantom{p}EnumDecl}}} child { node [class] {\texttt{\vphantom{p}ClassDecl}}} child { node [class] {\texttt{\vphantom{p}ProtocolDecl}}} } child [missing] {} child [missing] {} child [missing] {} child [missing] {} child { node [class] {\texttt{\vphantom{p}TypeAliasDecl}}} child { node [class] {\texttt{\vphantom{p}AbstractTypeParamDecl}} child { node [class] {\texttt{\vphantom{p}GenericTypeParamDecl}}} child { node [class] {\texttt{\vphantom{p}AssociatedTypeDecl}}} } } child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child { node [class] {\texttt{\vphantom{p}AbstractFunctionDecl}} child { node [class] {\texttt{\vphantom{p}FuncDecl}} child { node [class] {\texttt{\vphantom{p}AccessorDecl}}} } child [missing] {} child { node [class] {\texttt{\vphantom{p}ConstructorDecl}}} child { node [class] {\texttt{\vphantom{p}DestructorDecl}}} } child [missing] {} child [missing] {} child [missing] {} child [missing] {} child { node [class] {\texttt{\vphantom{p}AbstractStorageDecl}} child { node [class] {\texttt{\vphantom{p}VarDecl}} child { node [class] {\texttt{\vphantom{p}ParamDecl}}} } child [missing] {} child { node [class] {\texttt{\vphantom{p}SubscriptDecl}}} } child [missing] {} child [missing] {} child [missing] {} } child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child [missing] {} child { node [class] {\texttt{\vphantom{p}ExtensionDecl}}}; \end{tikzpicture} \end{center} \end{figure} \index{synthesized declaration} Instances are always allocated in the permanent arena of the \texttt{ASTContext}, either when the declaration is parsed or synthesized. The top-level \verb|isa<>|, \verb|cast<>| and \verb|dyn_cast<>| template functions support dynamic casting from \texttt{Decl *} to any of its subclasses. \begin{itemize} \item \texttt{getDeclContext()} returns the parent \texttt{DeclContext} of this declaration. \item \texttt{getInnermostDeclContext()} if this declaration is also a declaration context, returns the declaration as a \texttt{DeclContext}, otherwise returns the parent \texttt{DeclContext}. \item \texttt{getASTContext()} returns the singleton AST context from a declaration. \end{itemize} \index{visitor pattern} \index{declaration kind} \index{exhaustive switch} \index{statement} \index{expression} \index{type representation} \paragraph{Visitors} If you need to exhaustively handle each kind of declaration, the simplest way is to switch over the kind, which is an instance of the \texttt{DeclKind} enum, like this: \begin{Verbatim} Decl *decl = ...; switch (decl->getKind()) { case DeclKind::Struct: { auto *structDecl = decl->castTo(); ... } case DeclKind::Enum: ... case DeclKind::Class: ... } \end{Verbatim} However, just as with types, is can be more convenient to use the visitor pattern. You can subclass \texttt{ASTVisitor} and override various \texttt{visit\emph{Kind}Decl()} methods, then hand the declaration to the visitor's \texttt{visit()} method, which performs the switch and dynamic cast dance above: \begin{Verbatim} class MyVisitor: public ASTVisitor { public: void visitStructDecl(StructType *decl) { ... } }; MyVisitor visitor; Decl *decl = ...; visitor.visit(decl); \end{Verbatim} The \texttt{ASTVisitor} also defines various methods corresponding to abstract base classes in the \texttt{Decl} hierarchy, so for example you can override \texttt{visitNominalTypeDecl()} to handle all nominal type declarations at once. The \texttt{ASTVisitor} is more general than just visiting declarations; it also supports visiting statements, expressions, and type representations. A more elaborate form is implemented by the \texttt{ASTWalker}. While the visitor visits a single declaration, the walker traverses nested declarations, statements and expressions for you in a pre-order walk. \index{value declaration} \apiref{ValueDecl}{class} Base class of named declarations. \index{interface type} \begin{itemize} \item \texttt{getDeclName()} returns the declaration's name. \item \texttt{getInterfaceType()} returns the declaration's interface type. \end{itemize} \index{type declaration} \index{declared interface type} \apiref{TypeDecl}{class} Base class of type declarations. \begin{itemize} \item \texttt{getDeclaredInterfaceType()} returns the type of an instance of this declaration. \end{itemize} \index{nominal type declaration} \index{self interface type} \apiref{NominalTypeDecl}{class} Base class of nominal type declarations. Also a \texttt{DeclContext}. \begin{itemize} \item \texttt{getSelfInterfaceType()} returns the type of the \texttt{self} value inside the body of this declaration. Different from the declared interface type for protocols, where the declared interface type is a nominal but the declared self type is the generic parameter \texttt{Self}. \item \texttt{getDeclaredType()} returns the type of an instance of this declaration, without generic arguments. If the declaration is generic, this is an unbound generic type. If this declaration is not generic, this is a nominal type. This is occasionally used in diagnostics instead of the declared interface type, when the generic parameter types are irrelevant. \end{itemize} \index{type alias declaration} \apiref{TypeAliasDecl}{class} A type alias declaration. Also a \texttt{DeclContext}. \begin{itemize} \item \texttt{getDeclaredInterfaceType()} returns the underlying type of the type alias declaration, wrapped in type alias type sugar. \item \texttt{getUnderlyingType()} returns the underlying type of the type alias declaration, without wrapping it in type alias type sugar. \end{itemize} \index{function declaration} \apiref{AbstractFunctionDecl}{class} Base class of function-like declarations. Also a \texttt{DeclContext}. \begin{itemize} \item \texttt{getImplicitSelfDecl()} returns the implicit \texttt{self} parameter, if there is one. \item \texttt{getParameters()} returns the function's parameter list. \item \texttt{getMethodInterfaceType()} returns the type of a method without the \texttt{Self} clause. \item \texttt{getResultInterfaceType()} returns the return type of this function or method. \end{itemize} \apiref{ParameterList}{class} The parameter list of \texttt{AbstractFunctionDecl}, \texttt{EnumElementDecl} or \texttt{SubscriptDecl}. \begin{itemize} \item \texttt{size()} returns the number of parameters. \item \texttt{get()} returns the \texttt{ParamDecl} at the given index. \end{itemize} \index{constructor declaration} \apiref{ConstructorDecl}{class} Constructor declarations. \begin{itemize} \item \texttt{getInitializerInterfaceType()} returns the initializer interface type, used when type checking \texttt{super.init()} delegation. \end{itemize} \index{storage declaration} \apiref{AbstractStorageDecl}{class} Base class for storage declarations. \begin{itemize} \item \texttt{getValueInterfaceType()} returns the type of the stored value, without \texttt{weak} or \texttt{unowned} storage qualifiers. \end{itemize} \index{declaration context} \index{file unit} \apiref{DeclContext}{class} Base class for declaration contexts. The top-level \verb|isa<>|, \verb|cast<>| and \verb|dyn_cast<>| template functions also support dynamic casting from a \texttt{DeclContext *} to any of its subclasses. There are a handful of subclasses which are not also subclasses of \texttt{Decl *}: \begin{itemize} \item \texttt{ClosureExpr}. \item \texttt{FileUnit} and its various subclasses, such as \texttt{SourceFile}. \item A few other less interesting ones you can find in the source. \end{itemize} Utilities for understanding the nesting of declaration contexts: \begin{itemize} \item \texttt{getAsDecl()} if declaration context is also a declaration, returns the declaration, otherwise returns \texttt{nullptr}. \item \texttt{getParent()} returns the parent declaration context. \item \texttt{isModuleScopeContext()} returns true if this is a \texttt{ModuleDecl} or \texttt{FileUnit}. \item \texttt{isTypeContext()} returns true if this is a nominal type declaration or an extension. \item \texttt{isLocalContext()} returns true if this is not a module scope context or type context. \item \texttt{getParentModule()} returns the module declaration at the root of the hierarchy. \item \texttt{getModuleScopeContext()} returns the innermost parent which is a \texttt{ModuleDecl} or \texttt{FileUnit}. \item \texttt{getParentSourceFile()} returns the innermost parent which is a source file, or \texttt{nullptr} if this declaration context was not parsed from source. \item \texttt{getInnermostDeclarationDeclContext()} returns the innermost parent which is also a declaration, or \texttt{nullptr}. \item \texttt{getInnermostDeclarationTypeContext()} returns the innermost parent which is also a nominal type or extension, or \texttt{nullptr}. \end{itemize} Operations on type contexts: \begin{itemize} \item \texttt{getSelfNominalDecl()} returns the nominal type declaration if this is a type context, or \texttt{nullptr}. \item \texttt{getSelfStructDecl()} as above but result is a \texttt{StructDecl *} or \texttt{nullptr}. \item \texttt{getSelfEnumDecl()} as above but result is a \texttt{EnumDecl *} or \texttt{nullptr}. \item \texttt{getSelfClassDecl()} as above but result is a \texttt{ClassDecl *} or \texttt{nullptr}. \item \texttt{getSelfProtocolDecl()} as above but result is a \texttt{ProtocolDecl *} or \texttt{nullptr}. \item \texttt{getDeclaredInterfaceType()} delegates to the method on \texttt{NominalTypeDecl} or \texttt{ExtensionDecl} as appropriate. \item \texttt{getSelfInterfaceType()} is similar. \end{itemize} Generics-related methods on \texttt{DeclContext} are described in Section~\ref{genericdeclsourceref}. \chapter{Generic Declarations}\label{generic declarations} \index{generic declaration} \index{generic parameter list} A \emph{generic declaration} is a declaration with a generic parameter list. The following kinds of declarations can be generic: \begin{itemize} \item classes, structs and enums, \item type aliases, \item functions, \item constructors, \item subscripts. \end{itemize} Generic type aliases were introduced in Swift 3 \cite{se0048}. Generic subscripts were introduced in Swift 4 \cite{se0148}. \index{parsed generic parameter list} \index{protocol Self type} \index{opaque parameter} The \emph{parsed} generic parameter list of a declaration is the subset of generic parameter declarations written in source only, with the \texttt{<...>} syntax following the declaration name. The declaration's generic parameter list includes the parsed generic parameter list together with any implicit generic parameters: \begin{enumerate} \item Functions and subscripts may have a parsed generic parameter list, or they can declare opaque parameters with the \texttt{some} keyword, or both (Section~\ref{opaque parameters}). \item Protocols always have a single implicit \texttt{Self} generic parameter, and no parsed generic parameter list (Section~\ref{protocols}). \item Extensions always have an implicit set of generic parameters inherited from the extended type, and no parsed generic parameter list (Chapter~\ref{extensions}). \end{enumerate} Parsed generic parameters, the protocol \texttt{Self} type, and the implicit generic parameters of an extension all have names that remain in scope for the entire source range of the generic declaration. Generic parameters introduced by opaque parameter declarations are unnamed; only the value declared by the opaque parameter has a name. \index{declaration context} \index{generic context} All generic declarations are declaration contexts, because they contain their generic parameter declarations. A \emph{generic context} is a declaration context where at least one parent context is a generic declaration. Note the subtle distinction in the meaning of ``generic'' when talking about declarations and declaration contexts; a declaration is generic only if it has generic parameters of its own, whereas a declaration context being a generic context is a transitive properly inherited from the parent context. \index{depth} \index{index} Inside a generic context, unqualified name lookup will find all outer generic parameters. Each generic parameter is therefore uniquely identified within a generic context by its \emph{depth} and \emph{index}: \begin{itemize} \item The depth identifies a specific generic declaration, starting from zero for the top-level generic declaration and incrementing for each nested generic declaration. \item The index identifies a generic parameter within a single generic parameter list. \end{itemize} \index{sugared type} The declared interface type of a generic parameter declaration is a sugared type that prints as the generic parameter name. The canonical type of this type only stores the depth and index. The notation for a canonical generic parameter type is \ttgp{d}{i}, where \texttt{d} is the depth and \texttt{i} is the index. \begin{example} Listing~\ref{linkedlistexample} declares a \texttt{LinkedList} type with a single generic parameter named \texttt{Element}, and a \texttt{mapReduce()} method with two generic parameters named \texttt{T} and \texttt{A}. All three generic parameters are visible from inside the method: \begin{quote} \begin{tabular}{|l|l|l|l|} \hline Name&Depth&Index&Canonical type\\ \hline \texttt{Element}&0&0&\ttgp{0}{0}\\ \texttt{T}&1&0&\ttgp{1}{0}\\ \texttt{A}&1&1&\ttgp{1}{1}\\ \hline \end{tabular} \end{quote} \end{example} \begin{listing}\captionabove{Two nested generic declarations}\label{linkedlistexample} \begin{Verbatim} enum LinkedList { case none indirect case entry(Element, LinkedList) func mapReduce(_ f: (Element) -> T, _ m: (A, T) -> A, _ a: A) -> A { switch self { case .none: return a case .entry(let x, let xs): return m(xs.mapReduce(f, m, a), f(x)) } } \end{Verbatim} \end{listing} \section{Constraint Types}\label{constraints} \index{constraint type} \index{requirement} \index{inheritance clause} \index{generic parameter declaration} A generic requirement adds new capabilities to a generic parameter type, by restricting the possible substituted concrete types to those that provide this capability. The next section will introduce the trailing \texttt{where} clause syntax for stating generic requirements in a fully general way. Before doing that, we'll take a look at the simpler mechanism of stating a \emph{constraint type} in the inheritance clause of a generic parameter declaration: \begin{Verbatim} func allEqual(_ elements: [T]) {...} \end{Verbatim} \index{protocol type} \index{protocol composition type} \index{parameterized protocol type} \index{class type} \index{AnyObject} \index{layout constraint} \index{Any} A constraint type is one of the following: \begin{enumerate} \item A protocol type, like \texttt{Hashable}. \item A parameterized protocol type, like \texttt{Sequence} (Section~\ref{protocols}). \item A protocol composition, like \texttt{ShapeProtocol \& MyClass}. Protocol compositions were originally just compositions of protocol types, but they can include class types as of Swift 4 \cite{se0156}. \item A class type, like \texttt{NSObject}. \item The \texttt{AnyObject} \emph{layout constraint}, which restricts the possible concrete types to those represented as a single reference-counted pointer. \item The empty protocol composition, written \texttt{Any}. Writing \texttt{Any} in a generic parameter's inheritance clause is pointless, but it is allowed for completeness. \end{enumerate} Constraint types can appear in various positions: \begin{enumerate} \item In the inheritance clause of a generic parameter declaration, which is the focus of this section. \item On the right hand side of a conformance, superclass or layout requirement in a \texttt{where} clause, which you will see shortly. \item In the inheritance clauses of protocols and associated types (Section~\ref{protocols}). \item Following the \texttt{some} keyword in an opaque parameter (Section~\ref{opaque parameters}) or return type (Chapter~\ref{opaqueresult}). \item Following the \texttt{any} keyword in an existential type (Chapter~\ref{existentialtypes}). A single class type cannot be the constraint type of an existential; \texttt{any~NSObject} is just written as \texttt{NSObject}. Existential types where the constraint type is \texttt{AnyObject} and \texttt{Any} can also be written without the \texttt{any} keyword. \end{enumerate} \begin{example} Listing~\ref{dependentconstrainttype} exhibits a generic parameter whose constraint type references another generic parameter visible from the current scope. The generic parameter \texttt{C} is visible in the entire declaration of \texttt{open(box:)}, including the generic parameter list. \end{example} \begin{listing}\captionabove{The constraint type of \texttt{B} in \texttt{open(box:)} refers to \texttt{C}}\label{dependentconstrainttype} \begin{Verbatim} class Box { var contents: Contents } func open, C>(box: B) -> C { return box.contents } struct Vegetables {} class FarmBox: Box {} let vegetables: Vegetables = open(box: FarmBox()) \end{Verbatim} \end{listing} \section{Requirements}\label{trailing where clauses} \index{where clause} \index{requirement representation} \index{constraint requirement representation} \index{same-type requirement representation} \index{requirement} A constraint type in the inheritance clause of a generic parameter declaration is syntax sugar for a \texttt{where} clause with a single entry whose subject type is the generic parameter type: \begin{Verbatim} struct Set {...} struct Set where Element: Hashable {...} \end{Verbatim} The requirements in a \texttt{where} clause state the subject type explicitly, allowing stating requirements on dependent member types, for example: \begin{Verbatim} func isSorted(_: S) where S.Element: Comparable {...} \end{Verbatim} Another generalization over generic parameter inheritance clauses is that \texttt{where} clauses can define same-type requirements: \begin{Verbatim} func merge(_: S, _: T) -> [S.Element] where S: Comparable, S.Element == T.Element {...} \end{Verbatim} Formally, a \texttt{where} clause is a list of one or more \emph{requirement representations}. There are three kinds of requirement representations, with the first two kinds storing a pair of type representations, and the third storing a type representation and layout constraint: \begin{enumerate} \item \textbf{Constraint requirement representations}, written as \texttt{T:\ C}, where \texttt{T} and \texttt{C} are type representations, called the subject type and constraint type, respectively. \item \textbf{Same-type requirement representations}, written as \texttt{T == U}, where \texttt{T} and \texttt{U} are type representations. \item \textbf{Layout requirement representations}, written as \texttt{T:\ L} where \texttt{L} is a layout constraint. The only type of layout constraint which can be written in the source language is \texttt{AnyObject}, but this is actually parsed as a constraint requirement representation. Bona-fide layout requirement representations only appear within the \texttt{@\_specialize} attribute. \end{enumerate} Just as type resolution resolves type representations to types, \emph{requirement resolution} resolves requirement representations to \emph{requirements}. A requirement is the equivalent of a requirement representation at the semantic layer; requirements store types instead of type representations. Figure~\ref{typerequirementrepresentation} shows the correspondence. \begin{figure}\captionabove{Types and requirements, at the syntactic and semantic layers}\label{typerequirementrepresentation} \begin{center} \begin{tikzcd}[column sep=3cm,row sep=1cm] \mathboxed{requirement representation} \arrow[d, "\text{contains}"{left}] \arrow[r, "\text{resolves to}"] &\mathboxed{requirement} \arrow[d, "\text{contains}"] \\ \mathboxed{type representation} \arrow[r, "\text{resolves to}"]&\mathboxed{type} \end{tikzcd} \end{center} \end{figure} \index{conformance requirement} \index{superclass requirement} \index{layout requirement} \index{same-type requirement} Requirement resolution resolves each type representation to a type, and computes the requirement kind. The requirement kind encodes more detail than the requirement representation kind: \begin{itemize} \item \textbf{Conformance requirements} state that a type must conform to a protocol, protocol composition or parameterized protocol type. \item \textbf{Superclass requirements} state that a type must either equal to be a subclass of the superclass type. \item \textbf{Layout requirements} state that a type must satisfy a layout constraint. \item \textbf{Same-type requirements} state that two interface types are reduced-equal (this concept was first introduced in Chapter~\ref{types} and will be detailed in Section~\ref{reducedtypes}). \end{itemize} Constraint requirement representations resolve to conformance, superclass and layout requirements; the exact kind of requirement is only known after type resolution resolves the constraint type by performing name lookups. Same-type requirement representations always resolve to same-type requirements. The simpler syntax introduced in the previous section, where a constraint type can be written in the inheritance clause of a generic parameter declaration, also resolves to a requirement. The requirement's subject type is the generic parameter type. The requirement kind is always a conformance, superclass or layout requirement, never a same-type requirement. \paragraph{History} The \texttt{where} clause syntax used to be part of the generic parameter list itself, but was moved to the modern ``trailing'' form in Swift 3 \cite{se0081}. Implementation limitations prevented \texttt{where} clause requirements from constraining outer generic parameters until Swift 3. Once these implementation difficulties were solved, it no longer made sense to restrict a \texttt{where} clause to appear only on a declaration that has its own generic parameter list; this restriction was lifted in Swift 5.3 \cite{se0261}, allowing any declaration in a generic context to declare a \texttt{where} clause. For example, the following became valid: \begin{Verbatim} enum LinkedList { ... func sum() -> Element where Element: AdditiveArithmetic {...} } \end{Verbatim} There is no semantic distinction between attaching a \texttt{where} clause to a member of a type, or moving the member to a constrained extension, so the above is equivalent to the following: \begin{Verbatim} extension LinkedList where Element: AdditiveArithmetic { func sum() -> Element {...} } \end{Verbatim} Unfortunately, due to historical quirks in the name mangling scheme, the above is not an ABI-compatible transformation. \index{value requirements} \paragraph{Protocol requirements} There is still one situation where constraining outer generic parameters is prohibited, for usability reasons. The \emph{value requirements} of a protocol (properties, subscripts and methods) cannot constrain \texttt{Self} or its associated types in their \texttt{where} clause. The reason is that this value requirements must be fulfilled by all concrete conforming types. If a value requirement's \texttt{where} clause imposed additional constraints on \texttt{Self}, it would be impossible for a concrete type which did not otherwise satisfy those constraints to declare a witness for this value requirement. Rather than allow defining a protocol which cannot be conformed to, the type checker diagnoses an error. \begin{example} The following protocol attempts to define an \texttt{Element} associated type with no requirements, and a \texttt{minElement()} method which requires that \texttt{Element} conform to the \texttt{Comparable} protocol: \begin{Verbatim} protocol SetProtocol { associatedtype Element func minElement() -> Element where Element: Comparable } \end{Verbatim} This is not allowed, because there is no way to implement the \texttt{minElement()} requirement in a concrete conforming type whose \texttt{Element} type is not \texttt{Comparable}. One way to fix the error is to move the \texttt{where} clause from the protocol method to the associated type, which would instead impose the requirement on all conforming types.\end{example} \section{Opaque Parameters}\label{opaque parameters} \index{opaque parameter} \index{depth} \index{index} In the type of a function or subscript parameter, the \texttt{some} keyword declares an \emph{opaque parameter type}. The \texttt{some} keyword is followed by a constraint type. This introduces an unnamed generic parameter, and the constraint type imposes a conformance, superclass or layout requirement on this generic parameter. \index{parsed generic parameter list} If a declaration has both a parsed generic parameter list and opaque parameters, the opaque parameters have the same depth as the parsed generic parameters, and appear after the parsed generic parameters in index order. Opaque parameter types are unnamed, and therefore are not visible to type resolution. In particular, there is no way to refer to an opaque parameter type within the function's \texttt{where} clause, or from a type annotation on a declaration nested in the function's body. From expression context however, the type of an opaque parameter can be obtained via the built-in \texttt{type(of:)} pseudo-function,\footnote{It looks like a function call, but the type checking behavior of \texttt{type(of:)} cannot be described by a Swift function type; it is not a real function.} which produces a metatype value. This allows for invoking static methods and such. \begin{example} These two definitions are equivalent: \begin{Verbatim} func merge(_: some Sequence, _: some Sequence) -> [E] {} func merge, T: Sequence>(_: S, _: T) -> [E] {} \end{Verbatim} The constraint types here are parameterized protocol types, which are described in the next section. \end{example} Opaque parameter declarations were introduced in Swift 5.7 \cite{se0341}. Note that \texttt{some} appearing in the return type of a function declares an \emph{opaque return type}, which is a related but quite different feature (Chapter~\ref{opaqueresult}). \section{Protocol Declarations}\label{protocols} \index{protocol declaration} \index{protocol Self type} Protocols have an implicit generic parameter list with a single generic parameter named \texttt{Self}. Conceptually the \texttt{Self} type stands in for the concrete conforming type. Protocols cannot be nested inside any declaration context other than a source file; structs, classes and enums cannot be nested inside of protocols. This restriction is discussed in Section~\ref{nested nominal types}. Protocols can specify generic requirements on the \texttt{Self} type and its associated types, using similar syntax to other generic declarations. The type checker ensures that these requirements are satisfied by any concrete type conforming to the protocol. \begin{listing}\captionabove{A protocol declaration with a primary associated type which is then used as a parameterized protocol type}\label{primaryassoctypelisting} \begin{Verbatim} protocol IteratorProtocol { associatedtype Element mutating func next() -> Element? } // The first declaration is equivalent to the second: func sumOfSquares>(_: I) -> Int {...} func sumOfSquares(_: I) -> Int where I.Element == Int {...} \end{Verbatim} \end{listing} \index{primary associated type} \index{parameterized protocol type} \index{same-type requirement} \paragraph{Primary associated types} A protocol can declare a list of \emph{primary associated types} with a syntax resembling a generic parameter list. While generic parameter lists introduce new generic parameter declarations, the entries in the primary associated type list reference existing associated types declared in the protocol's body. A protocol with primary associated types can be used as a parameterized protocol type. As a constraint type, a parameterized protocol type is equivalent to a conformance requirement between the subject type and the protocol, together with one or more same-type requirements. The same-type requirements relate the primary associated types of the subject type with the arguments of the parameterized protocol type. \begin{example} Listing~\ref{primaryassoctypelisting} shows the standard library's iterator protocol, which defines a single primary associated type, together with a use of the protocol as a parameterized protocol type. \end{example} Parameterized protocol types and primary associated types were added to the language in Swift~5.7~\cite{se0346}. This \emph{desugaring} will receive a more formal treatment in Section~\ref{requirement desugaring}. \index{associated type declaration} \index{where clause} \index{inheritance clause} \paragraph{Associated type requirements} Associated types can state one or more constraint types in their inheritance clause, in addition to an optional \texttt{where} clause. Constraint types in the inheritance clause resolve to requirements whose subject type is the associated type declaration's declared interface type---which you might recall is the dependent member type \texttt{Self.[P]A}. where \texttt{A} is an associated type declaration in some protocol \texttt{P}. The standard library \texttt{Sequence} protocol demonstrates all of these features: \begin{Verbatim} protocol Sequence { associatedtype Iterator: IteratorProtocol associatedtype Element where Iterator.Element == Element func makeIterator() -> Iterator } \end{Verbatim} The conformance requirement on \texttt{Iterator} could have been written with a \texttt{where} clause as well: \begin{Verbatim} associatedtype Iterator where Iterator: IteratorProtocol \end{Verbatim} Finally, a \texttt{where} clause can be attached to the protocol itself; there is no semantic difference between that and attaching it to an associated type: \begin{Verbatim} protocol Sequence where Iterator: IteratorProtocol, Iterator.Element == Element {...} \end{Verbatim} Unlike generic parameters, associated type inheritance clauses allow multiple entries, separated by commas. This is effectively equivalent to a single inheritance clause entry containing a protocol composition: \begin{Verbatim} associatedtype Data: Codable & Hashable associatedtype Data: Codable, Hashable \end{Verbatim} \paragraph{Unqualified lookup inside protocols} Within the entire source range of the protocol declaration, unqualified references to associated types, like \texttt{Element} and \texttt{Iterator} above, resolve to their declared interface type. This is a shorthand for accessing the associated type as a member type of the protocol \texttt{Self} type. The \texttt{Sequence} protocol above could instead have been declared as follows: \begin{Verbatim} protocol Sequence where Self.Iterator: IteratorProtocol, Self.Iterator.Element == Self.Element {...} \end{Verbatim} \index{inheritance clause} \index{protocol inheritance} \paragraph{Protocol inheritance clauses} Constraint types appearing in the protocol's inheritance clause become generic requirements on \texttt{Self} in the same manner that constraint types in generic parameter inheritance clauses become requirements on the generic parameter type. Requirements on \texttt{Self} are imposed by the conformance checker on concrete types conforming to the protocol. If the constraint type is another protocol, we call the protocol stating the requirement the \emph{derived protocol} and the protocol named by the constraint type the \emph{base protocol}. The derived protocol is said to \emph{inherit} from (or sometimes, \emph{refine}) the base protocol. Protocol inheritance can be observed in two ways; first, every concrete type conforming to the derived protocol must also conform to the base protocol. Second, qualified name lookup will search through inherited protocols when the lookup begins from the derived protocol or one of its concrete conforming types. For example, the standard library's \texttt{Collection} protocol inherits from \texttt{Sequence}, therefore any concrete type conforming \texttt{Collection} must also conform to \texttt{Sequence}. If some type parameter \texttt{T} is known to conform to \texttt{Collection}, members of both the \texttt{Collection} and \texttt{Sequence} protocols will be visible to qualified name lookup on a value of type \texttt{T}. \begin{Verbatim} protocol Collection: Sequence {...} \end{Verbatim} Protocols can restrict their conforming types to those with a reference-counted pointer representation by stating an \texttt{AnyObject} layout constraint: \begin{Verbatim} protocol BoxProtocol: AnyObject {...} \end{Verbatim} Protocols can also impose a superclass requirement on their conforming types: \begin{Verbatim} class Plant {} class Animal {} protocol Duck: Animal {} class MockDuck: Plant, Duck {} // error: MockDuck is not a subclass of Animal \end{Verbatim} Just like with protocol inheritance, qualified name lookup understands a superclass in a protocol's inheritance clause, making the members of the superclass visible to all lookups that look into the protocol. \index{class-constrained protocol} A protocol is \emph{class-constrained} if the \texttt{Self:~AnyObject} requirement can be proven from its inheritance clause; either directly stated, implied by a superclass requirement, or inherited from another protocol. \paragraph{History} In older releases of Swift, protocols could only constrain associated types by writing a constraint type in the associated type's inheritance clause, which limited the kinds of requirements that could be imposed on the concrete conforming type. The general trailing \texttt{where} clause syntax on associated types and protocols were introduced in Swift~4~\cite{se0142}. \index{recursive conformance} Another important generalization was allowing an associated type to conform to the same protocol that it appears in, either directly or indirectly. For example, the SwiftUI \texttt{View} protocol has a \texttt{Body} associated type that itself conforms to \texttt{View}: \begin{Verbatim} protocol View { associatedtype Body: View var body: Body { get } } \end{Verbatim} The ability to declare a so-called \emph{recursive conformance} was introduced in Swift 4.1 \cite{se0157}. This feature has some profound implications. In particular, it means that a generic signature with a conformance to a protocol such as the above has an infinite number of type parameters; for example, consider \texttt{}: \begin{Verbatim} T T.Body T.Body.Body T.Body.Body.Body ... \end{Verbatim} \section{Source Code Reference}\label{genericdeclsourceref} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/Decl.h} \item \SourceFile{include/swift/AST/DeclContext.h} \item \SourceFile{include/swift/AST/GenericParamList.h} \item \SourceFile{lib/AST/Decl.cpp} \item \SourceFile{lib/AST/DeclContext.cpp} \item \SourceFile{lib/AST/GenericParamList.cpp} \end{itemize} Other source files: \begin{itemize} \item \SourceFile{include/swift/AST/Types.h} \item \SourceFile{lib/AST/NameLookup.cpp} \end{itemize} \index{declaration context} \index{generic context} \index{generic declaration} \index{parsed generic parameter list} \apiref{DeclContext}{class} See also Section~\ref{name lookup}, Section~\ref{declarationssourceref} and Section~\ref{genericsigsourceref}. \begin{itemize} \item \texttt{isGenericContext()} answers if this declaration context or one of its parent contexts has a generic parameter list. \item \texttt{isInnermostContextGeneric()} answers if this declaration context is a generic context with its own generic parameter list, that is, if its declaration is a generic declaration. \end{itemize} \apiref{GenericContext}{class} Base class for declarations which can be generic. See also Section~\ref{genericsigsourceref}. \begin{itemize} \item \texttt{getParsedGenericParams()} returns the declaration's parsed generic parameter list, or \texttt{nullptr}. \item \texttt{getGenericParams()} returns the declaration's full generic parameter list, which includes any implicit generic parameters. Evaluates a \texttt{GenericParamListRequest}. \item \texttt{isGeneric()} answers if this declaration has a generic parameter list. \item \texttt{getGenericContextDepth()} returns the depth of the innermost generic parameter list, or \texttt{(unsigned)-1} if neither this declaration nor any outer declaration is generic. \item \texttt{getTrailingWhereClause()} returns the trailing \texttt{where} clause, or \texttt{nullptr}. \end{itemize} Trailing \texttt{where} clauses are not preserved in serialized generic contexts. Except when actually building the generic signature, most code uses \texttt{getGenericSignature()} from Section~\ref{genericsigsourceref} instead. \index{generic parameter list} \apiref{GenericParamList}{class} A generic parameter list. \begin{itemize} \item \texttt{getParams()} returns an array of generic parameter declarations. \item \texttt{getOuterParameters()} returns the outer generic parameter list, linking multiple generic parameter lists for the same generic context. Only used for extensions of nested generic types. \end{itemize} \index{protocol Self type} \apiref{GenericParamListRequest}{class} This request creates the full generic parameter list for a declaration. Kicked off from \texttt{GenericContext::getGenericParams()}. \begin{itemize} \item For protocols, this creates the implicit \texttt{Self} parameter. \item For functions and subscripts, calls \texttt{createOpaqueParameterGenericParams()} to walk the formal parameter list and look for \texttt{OpaqueTypeRepr}s. \item For extensions, calls \texttt{createExtensionGenericParams()} which clones the generic parameter lists of the extended nominal itself and all of its outer generic contexts, and links them together via \texttt{GenericParamList::getOuterParameters()}. \end{itemize} \index{generic parameter declaration} \apiref{GenericTypeParamDecl}{class} A generic parameter declaration. \begin{itemize} \item \texttt{getDepth()} returns the depth of the generic parameter declaration. \item \texttt{getIndex()} returns the index of the generic parameter declaration. \item \texttt{getName()} returns the name of the generic parameter declaration. \item \texttt{getDeclaredInterfaceType()} returns the non-canonical generic parameter type for this declaration. \item \texttt{isOpaque()} answers if this generic parameter is associated with an opaque parameter. \item \texttt{getOpaqueTypeRepr()} returns the associated \texttt{OpaqueReturnTypeRepr} if this is an opaque parameter, otherwise \texttt{nullptr}. \item \texttt{getInherited()} returns the generic parameter declaration's inheritance clause. \end{itemize} Inheritance clauses are not preserved in serialized generic parameter declarations. Requirements stated on generic parameter declarations are part of the corresponding generic context's generic signature, so except when actually building the generic signature, most code uses \texttt{getGenericSignature()} on Section~\ref{genericsigsourceref} instead. \index{generic parameter type} \index{depth} \index{index} \apiref{GenericTypeParamType}{class} A generic parameter type. \begin{itemize} \item \texttt{getDepth()} returns the depth of the generic parameter declaration. \item \texttt{getIndex()} returns the index of the generic parameter declaration. \item \texttt{getName()} returns the name of the generic parameter declaration, only if this is a non-canonical type. \end{itemize} \index{where clause} \apiref{TrailingWhereClause}{class} The syntactic representation of a trailing \texttt{where} clause. \begin{itemize} \item \texttt{getRequirements()} returns an array of \texttt{RequirementRepr}. \end{itemize} \index{requirement representation} \apiref{RequirementRepr}{class} The syntactic representation of a requirement in a trailing \texttt{where} clause. \begin{itemize} \item \texttt{getKind()} returns a \texttt{RequirementReprKind}. \item \texttt{getFirstTypeRepr()} returns the first \texttt{TypeRepr} of a same-type requirement. \item \texttt{getSecondTypeRepr()} returns the second \texttt{TypeRepr} of a same-type requirement. \item \texttt{getSubjectTypeRepr()} returns the first \texttt{TypeRepr} of a constraint or layout requirement. \item \texttt{getConstraintTypeRepr()} returns the second \texttt{TypeRepr} of a constraint requirement. \item \texttt{getLayoutConstraint()} returns the layout constraint of a layout requirement. \end{itemize} \apiref{RequirementReprKind}{enum class} \begin{itemize} \item \texttt{RequirementRepr::TypeConstraint} \item \texttt{RequirementRepr::SameType} \item \texttt{RequirementRepr::LayoutConstraint} \end{itemize} \apiref{WhereClauseOwner}{class} Represents a reference to some set of requirement representations which can be resolved to requirements, for example a trailing \texttt{where} clause. This is used by various requests, such as the \texttt{RequirementRequest} below, and the \texttt{InferredGenericSignatureRequest} in Section~\ref{buildinggensigsourceref}. \begin{itemize} \item \texttt{getRequirements()} returns an array of \texttt{RequirementRepr}. \item \texttt{visitRequirements()} resolves each requirement representation and invokes a callback with the \texttt{RequirementRepr} and resolved \texttt{Requirement}. \end{itemize} \apiref{RequirementRequest}{class} Request which can be evaluated to resolve a single requirement representation in a \texttt{WhereClauseOwner}. Used by \texttt{WhereClauseOwner::visitRequirements()}. \index{protocol declaration} \apiref{ProtocolDecl}{class} A protocol declaration. \begin{itemize} \item \texttt{getTrailingWhereClause()} returns the protocol \texttt{where} clause, or \texttt{nullptr}. \item \texttt{getAssociatedTypes()} returns an array of all associated type declarations in the protocol. \item \texttt{getPrimaryAssociatedTypes()} returns an array of all primary associated type declarations in the protocol. \item \texttt{getInherited()} returns the parsed inheritance clause. \end{itemize} Trailing \texttt{where} clauses and inheritance clauses are not preserved in serialized protocol declarations. Except when actually building the requirement signature, most code uses \texttt{getRequirementSignature()} from Section~\ref{genericsigsourceref} instead. The last four utility methods operate on the requirement signature, so are safe to use on deserialized protocols: \begin{itemize} \item \texttt{getInheritedProtocols()} returns an array of all protocols directly inherited by this protocol, computed from the inheritance clause. \item \texttt{inheritsFrom()} determines if this protocol inherits from the given protocol, possibly transitively. \item \texttt{getSuperclass()} returns the protocol's superclass type. \item \texttt{getSuperclassDecl()} returns the protocol's superclass declaration. \end{itemize} \index{associated type declaration} \apiref{AssociatedTypeDecl}{class} An associated type declaration. \begin{itemize} \item \texttt{getTrailingWhereClause()} returns the associated type's trailing \texttt{where} clause, or \texttt{nullptr}. \item \texttt{getInherited()} returns the associated type's inheritance clause. \end{itemize} Trailing \texttt{where} clauses and inheritance clauses are not preserved in serialized associated type declarations. Requirements on associated types are part of a protocol's requirement signature, so except when actually building the requirement signature, most code uses \texttt{getRequirementSignature()} from Section~\ref{genericsigsourceref} instead. \chapter{Generic Signatures}\label{genericsig} \index{generic signature} \index{generic context} \index{requirement} \index{where clause} \index{inheritance clause} \index{opaque parameter} We've now seen all the syntactic building blocks that go into constructing the \emph{generic signature} of a generic context. Each level of generic context nesting can introduce new generic parameters or requirements, so the generic signature collects information from each outer generic declaration. This records in one place a complete description of a generic context: \begin{itemize} \item A list of all visible generic parameters, including outer parameters. This includes generic parameters explicitly defined in source, as well as those generic parameters implicitly introduced by opaque parameter declarations. \item A list of all generic requirements that apply to these generic parameters, which includes those from outer declarations. We've seen three syntactic forms that define requirements so far: generic parameter inheritance clauses, trailing \texttt{where} clauses, and opaque parameters. A fourth and final mechanism, requirement inference, is described later in Section~\ref{requirementinference}. \end{itemize} The \texttt{-debug-generic-signatures} frontend flag prints the generic signature of each declaration as it is being type checked. In debug output, the printed representation of a generic signature resembles the language syntax; we're going to use this written notation throughout when talking about generic signatures: \[\underbrace{\texttt{}}_{\text{requirements}}\] \begin{listing}\captionabove{Example program and \texttt{-debug-generic-signatures} output}\label{debuggenericsignatures} \begin{Verbatim} struct Outer { struct Inner { func transform() -> (T, U) where T.Element == U { ... } } } \end{Verbatim} \begin{Verbatim} debug.(file).Outer@debug.swift:1:8 Generic signature: debug.(file).Outer.Inner@debug.swift:2:10 Generic signature: debug.(file).Outer.Inner.transform()@debug.swift:3:10 Generic signature: \end{Verbatim} \end{listing} \begin{example} Listing \ref{debuggenericsignatures} shows three generic declarations and the compiler output from the \texttt{-debug-generic-signatures} flag. \end{example} The requirements in a generic signature are constructed from syntactic representations, but they do not always look like the requirements written by the user. The requirements in a generic signature satisfy certain invariants and are sorted by comparing their subject types. The type parameter order is introduced in Section~\ref{typeparams}. The multi-step process for transforming user-written requirements into the correct minimal form that appears in a generic signature is described in Chapter~\ref{building generic signatures}. For now though, we're just going to assume you're working with an existing generic signature that was given to you by the type checker or some other part of the compiler. \index{canonical generic signature} \index{generic signature equality} \paragraph{Canonical signatures} Generic signatures are immutable and uniqued, so two generic signatures with the same structure and the same sugared types are pointer-equal. A generic signature is \emph{canonical} if all listed generic parameter types are canonical, and any types appearing in requirements are canonical. A canonical signature is computed from an arbitrary generic signature by replacing any sugared types appearing in the signature with canonical types. Two generic signatures are canonical-equal if their canonical signatures are pointer-equal. \begin{example} These two declarations state their requirements in different ways, and you might even spot that the second one has a redundant requirement: \begin{Verbatim} func allEqual1, U: Sequence> -> Bool {} func allEqual2(_: A, _: B) -> Bool where A: Sequence, B: Sequence, B.Element == A.Element, A.Iterator: IteratorProtocol {} \end{Verbatim} The requirements of both \texttt{allEqual1()} and \texttt{allEqual2()} reduce to the same form in their generic signatures. The first declaration's generic signature: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} The second declaration's generic signature: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} The two generic signatures only differ by type sugar; namely, they use the corresponding sugared generic parameter types from their declaration. This means they are not pointer-equal, but they are canonical-equal. The canonical generic signature of both is obtained by replacing generic parameters with their canonical types: \begin{quote} \begin{verbatim} <τ_0_0, τ_0_1 where τ_0_0: Sequence, τ_0_1: Sequence, τ_0_0.[Sequence]Element == τ_0_1.[Sequence]Element> \end{verbatim} \end{quote} \end{example} \paragraph{Reduced signatures?} There is no notion of a ``reduced generic signature'' the way we have reduced types. The generic requirements in a generic signature are always written in a minimal, reduced form (Section~\ref{minimal requirements}); the only variation allowed is type sugar. \index{empty generic signature} \paragraph{Empty generic signature} If a nominal type declaration is not a generic context (that is, neither it nor any parent context has any generic parameters), then its generic signature will have no generic parameters or generic requirements. This is called the \emph{empty generic signature}. Lacking any generic parameters, the empty generic signature more generally has no type parameters, either. The valid interface types of the empty generic signature are the fully concrete types, that is, types that do not contain any type parameters. \section{Requirement Signatures}\label{requirement sig} \index{requirement signature} \index{protocol Self type} \index{inheritance clause} \index{associated type declaration} \index{where clause} The generic signature of a protocol \texttt{P} always has a single generic parameter \texttt{Self} together with a single conformance requirement \texttt{Self:\ P}: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} The structure ``inside'' the \texttt{Self} type is described by the \emph{requirement signature} of the protocol. The requirement signature is constructed by collecting requirements from the protocol's inheritance clause, associated type inheritance clauses, and \texttt{where} clauses on the protocol's associated types and the protocol itself. Just like with generic signatures, the requirements in a requirement signature are always converted into a \emph{minimal} and \emph{reduced} form. The \texttt{-Xfrontend -debug-generic-signatures} flag prints the requirement signature of each protocol that is type checked. The written representation of a requirement signature looks like a generic signature over the protocol's single \texttt{Self} generic parameter. For example, the requirement signature of the \texttt{Sequence} protocol is the following: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} \index{protocol type alias} Requirement signatures also store a compact description of all protocol type aliases defined within the protocol; these are used when resolving \texttt{where} clause requirements involving subject types that name protocol type aliases. Protocol type aliases are not shown by the \texttt{-Xfrontend -debug-generic-signatures} flag. \index{conformance checking} \paragraph{Conformance checking} When checking a conformance to a protocol, the type checker must ensure the concrete type satisfies all requirements in the requirement signature: \begin{enumerate} \item The concrete type must conform to any inherited protocols, which are encoded as conformance requirements on the \texttt{Self} type. \item The concrete type must be a class if the requirement signature imposes a superclass or \texttt{AnyObject} requirement on \texttt{Self}. \item Finally, the type witnesses must satisfy any requirements imposed on them by the protocol. \end{enumerate} All of the above are instances of the more general problem of checking whether concrete types satisfy generic requirements (Section~\ref{checking generic arguments}). The concrete type must also declare a \emph{type witness} for each of the protocol's associated types (Section~\ref{type witnesses}). The conformance requirements of a protocol's requirement signature are known as \emph{associated conformance requirements} and each corresponding conformance is an \emph{associated conformance} (Section~\ref{associated conformances}). \paragraph{A mildly interesting observation} The printed representation of a requirement signature is almost never going to form a valid \emph{generic} signature. The requirement signature of \texttt{Sequence} as shown above does not state a conformance requirement \texttt{Self:\ Sequence}, so it does not make sense to talk about requirements involving the \texttt{Iterator} and \texttt{Element} member types of \texttt{Self}. This is certainly a valid requirement signature though, because the \texttt{Sequence} protocol does not inherit from itself. If we try to build a generic signature from a requirement signature by adding a conformance requirement to \texttt{Self}, then \emph{all other} requirements in the requirement signature will become redundant; they are, after all, implied by the conformance of \texttt{Self} to the protocol. \index{inheritance clause} \paragraph{Protocol inheritance clauses} Recall from Section~\ref{protocols} that a constraint type written in a protocol's inheritance clause is equivalent to a \texttt{where} clause requirement with a subject type of \texttt{Self}, just like a constraint type in a generic parameter's inheritance clause is equivalent to a \texttt{where} clause requirement with the generic parameter type as the subject type. This correspondence comes with an important caveat, though. Qualified name lookup into a protocol type must also look into inherited protocols and the protocol's superclass type, if there is one. However, name lookup can only look directly at syntactic constructs, because in the compiler implementation, name lookup is ``upstream'' of generics. Building a protocol's requirement signature performs type resolution, which queries name lookup; those name lookups cannot in turn depend on the requirement signature having already been constructed. For this reason, any \texttt{where} clause requirements which introduce protocol inheritance relationships must be written with a subject type of exactly \texttt{Self} for qualified name lookup to ``understand'' them. Protocol inheritance implied by some combination of same-type requirements is not allowed. After a protocol's requirement signature has been built, conformance requirements on \texttt{Self} are compared against the protocol's inheritance clause; any unexpected conformance requirements are diagnosed with a warning. \begin{listing}\captionabove{Example showing non-obvious protocol inheritance relationship}\label{badinheritance} \begin{Verbatim} protocol Base { associatedtype Other: Base typealias Salary = Int } protocol Good: Base { typealias Income = Salary } // warning: protocol `Bad' should be declared to refine `Base' due to a // same-type constraint on `Self' protocol Bad { associatedtype Tricky: Base where Tricky.Other == Self typealias Income = Salary // error: cannot find type `Salary' in scope } \end{Verbatim} \end{listing} \begin{example} In Listing~\ref{badinheritance}, the \texttt{Self} type of the \texttt{Bad} protocol is equivalent to the type parameter \texttt{Self.Tricky.Other} via a same-type requirement. The \texttt{Tricky} associated type conforms to \texttt{Base}, and the \texttt{Other} associated type of \texttt{Base} also conforms to \texttt{Base}. For this reason, the \texttt{Self} type of \texttt{Bad} actually conforms to \texttt{Base}. However, this inheritance relationship is invisible to name lookup, so resolution of the underlying type of \texttt{Income} fails to find the declaration of \texttt{Salary}. After building the protocol's requirement signature, the type checker discovers the unexpected conformance requirement on \texttt{Self}, but at this stage, it is too late to attempt the failed name lookup again! For this reason, the compiler instead emits a warning suggesting the user change the declaration of the protocol to \texttt{protocol Bad:~Base}. \end{example} \section{Type Parameter Order}\label{typeparams} \index{partial order} \index{linear order} The type parameters of a generic signature are linearly ordered with respect to each other. Let's begin by defining partial orders and linear orders, which are a special kind of partial order. \begin{definition} A \emph{partial order} over some set of objects $S$ is a binary relation $<$ satisfying the following: \begin{itemize} \item For all $a\in S$, $a\not< a$. \item For all $a$, $b$, $c\in S$, if $a\texttt{D}$, then $\ttgp{d}{i} > \ttgp{D}{I}$. \item If $\texttt{d}=\texttt{D}$ and $\texttt{i}<\texttt{I}$, then $\ttgp{d}{i} < \ttgp{D}{I}$. \item If $\texttt{d}=\texttt{D}$ and $\texttt{i}>\texttt{I}$, then $\ttgp{d}{i} > \ttgp{D}{I}$. \item If $\texttt{d}=\texttt{D}$ and $\texttt{i}=\texttt{I}$, then $\ttgp{d}{i} = \ttgp{D}{I}$. \end{itemize} The linear order on generic parameters can be more concisely stated as simply the lexicographic order on (depth, index) pairs. \end{definition} \index{root associated type} \begin{definition}\label{root associated type} A \emph{root associated type} is an associated type defined in a protocol such that no inherited protocol has an associated type with the same name. \end{definition} \begin{example} In the following, \texttt{Q.A} is \emph{not} a root associated type, because \texttt{Q} inherits \texttt{P} and \texttt{P} also declares an associated type named \texttt{A}: \begin{Verbatim} protocol P { associatedtype A // root } protocol Q : P { associatedtype A // not a root associatedtype B // root } \end{Verbatim} \end{example} \index{protocol order} \begin{definition}\label{linear protocol order} The linear order on protocols is a lexicographic order on fully-qualified protocol names; that is, their module names are compared first, followed by the protocol declaration name if both are defined in the same module. \end{definition} \begin{example} Say the \texttt{Barn} module defines a \texttt{Horse} protocol, and the \texttt{Swift} module defines \texttt{Collection}. We have $\mathtt{Barn.Horse}<\mathtt{Swift.Collection}$, since $\mathtt{Barn}<\mathtt{Swift}$. If the \texttt{Barn} module also defines a \texttt{Saddle} protocol, then $\mathtt{Barn.Horse}<\mathtt{Barn.Saddle}$; both are from the same module, so we compare protocol names, $\mathtt{Horse}<\mathtt{Saddle}$. \end{example} \index{associated type order} \begin{definition}\label{associated type order} The linear order on associated type declarations is a lexicographic order on triples, composed from Definitions~\ref{root associated type} and \ref{linear protocol order}: \begin{enumerate} \item root associated types always precede non-root associated types. \item two associated types with the same ``root-ness'' (meaning both are roots or both are non-roots) but from different protocols are compared with the linear protocol order. \item two associated types with the same ``root-ness'' and same protocol are compared by name. \end{enumerate} If an erroneous protocol declares two associated types with the same name, the source location or any other arbitrary tie breaker can also be used, since invalid code is never ABI. \end{definition} Finally, we can define the linear order on type parameters. In the literature, this is known as a \emph{shortlex order}. \index{shortlex order} \index{generic parameter type} \index{dependent member type} \index{type parameter length} \begin{definition}[Linear order on type parameters]\label{type parameter order} When two type parameters differ in length, the one with shorter length precedes the other. For example, we have $\ttgp{2}{0}<\texttt{\ttgp{1}{0}.Element}$. \index{bound dependent member type} \index{unbound dependent member type} When two type parameters have the same length, elements are compared pairwise: \begin{enumerate} \item The first pair of elements are always generic parameter types, so they are compared by Definition~\ref{generic parameter order}. \item Subsequent pairs are identifiers or associated types. If one is an identifier and the other is an associated type, the associated type declaration precedes the identifier; that is, unbound type parameters ``come after'' bound type parameters. If both elements are identifiers, they are compared with the lexicographic order on strings. Associated types are compared by Definition~\ref{associated type order}. \end{enumerate} Comparison stops at the first index where the two corresponding elements of each type parameter are distinct. The outcome of the final comparison determines the relative order of the two type parameters. If all elements are pairwise equal, the type parameters have the same length and same elements, so must be canonical-equal. \end{definition} \begin{table}\captionabove{Type parameters defined by the generic signature in Example~\ref{typeparameterorderexample}.}\label{typeparameterordertable} \begin{tabular}{|l|} \hline Length 1\\ \hline \texttt{T}\\ \texttt{U}\\ \hline \hline Length 2\\ \hline \texttt{T.[Sequence]Element}\\ \texttt{T.[Sequence]Iterator}\\ \texttt{T.Element}\\ \texttt{T.Iterator}\\ \texttt{U.[Sequence]Element}\\ \texttt{U.[Sequence]Iterator}\\ \texttt{U.Element}\\ \texttt{U.Iterator}\\ \hline \hline Length 3\\ \hline \texttt{T.[Sequence]Iterator.[IteratorProtocol]Element}\\ \texttt{T.Iterator.Element}\\ \texttt{U.[Sequence]Iterator.[IteratorProtocol]Element}\\ \texttt{U.Iterator.Element}\\ \hline \end{tabular} \end{table} \begin{example}\label{typeparameterorderexample} Table~\ref{typeparameterordertable} shows all type parameters in the following generic signature, written in type parameter order: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} A few unbound type parameters are also thrown in the mix to show how they are ordered with respect to the bound type parameters. Notice how type parameters are ordered by length first; all type parameters of length 1 precede those of length 2, which precede those of length 3. \end{example} \section{Reduced Types}\label{reducedtypes} \index{reduced type} \index{equivalence class} \index{same-type requirement} Two type parameters are \emph{equivalent} with respect to a generic signature if one can be transformed into the other via a series of same-type requirements. The set of all type parameters equivalent to a given type parameter is called its \emph{equivalence class}. Every type parameter is part of exactly one equivalence class, so the set of all type parameters described by a generic signature can be partitioned into disjoint equivalence classes. \begin{definition} A type parameter is a \emph{reduced type parameter} with respect to a generic signature if it is not fixed to a concrete type, and precedes every other type parameter in its own equivalence class. Type parameters fixed to concrete types are never considered to be reduced. \end{definition} \begin{definition} An interface type is a \emph{reduced type} with respect to a generic signature if all type parameters appearing inside the interface type are reduced type parameters. It follows that an interface type containing a type parameter that is fixed to a concrete type is not a reduced type. \end{definition} \begin{table}\captionabove{Equivalence classes defined by the generic signature in Example~\ref{typeparameterorderexample}}\label{equivalenceclassestable} \begin{tabular}{|l|l|} \hline Reduced type parameter&Representatives\\ \hline \texttt{T}&\texttt{T}\\ \hline \texttt{U}&\texttt{U}\\ \hline \texttt{T.[Sequence]Element}&\texttt{T.[Sequence]Element}\\ &\texttt{T.Element}\\ &\texttt{U.[Sequence]Element}\\ &\texttt{U.Element}\\ &\texttt{T.[Sequence]Iterator.[IteratorProtocol]Element}\\ &\texttt{T.Iterator.Element}\\ &\texttt{U.[Sequence]Iterator.[IteratorProtocol]Element}\\ &\texttt{U.Iterator.Element}\\ \hline \texttt{T.[Sequence]Iterator}&\texttt{T.[Sequence]Iterator}\\ \texttt{T.Iterator}&\texttt{T.Iterator}\\ \hline \texttt{U.[Sequence]Iterator}&\texttt{U.[Sequence]Iterator}\\ \texttt{U.Iterator}&\texttt{U.Iterator}\\ \hline \end{tabular} \end{table} \begin{example} Table~\ref{equivalenceclassestable} groups the type parameters from Example~\ref{typeparameterorderexample} into equivalence classes. The type parameters in the first column are the reduced types of the type parameters in the second column. The generic parameters \texttt{T} and \texttt{U} are in their own equivalence class. The equivalence class of \texttt{T.[Sequence]Element} contains multiple type parameters, because of the same-type requirement between the element types of \texttt{T} and \texttt{U}. Then there are two equivalence classes for the iterator types, \texttt{T.[Sequence]Iterator} and \texttt{U.[Sequence]Iterator}. The type parameters of an equivalence class are ordered; the first type parameter is the reduced type parameter for all members of that equivalence class. The equivalence classes themselves are also ordered, by comparing their reduced type parameters. You can think of a same-type requirement as merging two equivalence classes together into a larger equivalence class. The equivalence class \texttt{T.[Sequence]Element} was formed by two same-type requirements: \begin{enumerate} \item \texttt{Self.Element == Self.IteratorProtocol.Element}, in the \texttt{Sequence} protocol. \item \texttt{T.Element == U.Element}, in our generic signature. \end{enumerate} If we omit the second requirement, \texttt{T.[Sequence]Element} and \texttt{U.[Sequence]Element} would belong to two different equivalence classes. Each equivalence class would still contain the element type of the corresponding iterator, because of the first same-type requirement. \end{example} \index{equivalence class graph} \index{directed graph} For a generic signature, we can construct a directed graph called the \emph{equivalence class graph}. A directed graph is defined by a set of vertices, and a set of edges, which are ordered pairs of vertices. The vertices here are reduced type parameters. There is an edge from a type parameter \texttt{T} to a type parameter \texttt{U} if for some associated type declaration \texttt{A} in a protocol \texttt{P}, \texttt{T} is conforms to \texttt{P}, and \texttt{T.[P]A} reduces to \texttt{U}. Edges are labeled with their associated type declarations. A type parameter can be thought of as a \emph{path} through this directed graph, starting from a generic parameter, then traversing successive edges for each associated type declaration until reaching the type parameter's equivalence class. Two reduced-equal type parameters represent two different paths that end at the same equivalence class. \begin{figure}\captionabove{The directed graph of equivalence classes from Example~\ref{archetypeexample}}\label{archetypegraph} \begin{center} \begin{tikzpicture}[node distance=3cm] \tikzstyle{archetype} = [rectangle, draw=black, text centered] \tikzstyle{arrow} = [->,>=stealth] \node (dummy1) [] {}; \node (S1) [archetype, left of=dummy1] {\texttt{T}}; \node (S2) [archetype, right of=dummy1] {\texttt{U}}; \node (S1Element) [archetype, below of=dummy1] {\texttt{T.Element}}; \node (dummy2) [below of=S1Element] {}; \node (S1Iterator) [archetype, left of=dummy2] {\texttt{T.Iterator}}; \node (S2Iterator) [archetype, right of=dummy2] {\texttt{U.Iterator}}; \draw [arrow] (S1) -- (S1Iterator) node[midway,left] {\footnotesize{\texttt{.Iterator}}}; \draw [arrow] (S2) -- (S2Iterator) node[midway,right] {\footnotesize{\texttt{.Iterator}}}; \draw [arrow] (S1Iterator) -- (S1Element) node[midway,right] {\footnotesize{\texttt{.Element}}}; \draw [arrow] (S2Iterator) -- (S1Element) node[midway,left] {\footnotesize{\texttt{.Element}}}; \draw [arrow] (S1) -- (S1Element) node[midway,right] {\footnotesize{\texttt{.Element}}}; \draw [arrow] (S2) -- (S1Element) node[midway,left] {\footnotesize{\texttt{.Element}}}; \end{tikzpicture} \end{center} \end{figure} \begin{example} Figure~\ref{archetypegraph} shows the equivalence class graph for the generic signature of Example~\ref{typeparameterorderexample}: \begin{itemize} \item If you begin at the generic parameter \texttt{T} and follow the \texttt{.Element} edge, you end up at the equivalence class whose reduced type is \texttt{T.Element}. \item Similarly, if you begin at the generic parameter \texttt{U}, then follow the \texttt{.Iterator} edge, and finally follow the \texttt{.Element} edge, you also end up at the equivalence class with the reduced type \texttt{T.Element}. \end{itemize} This shows that the type parameters \texttt{T.Element} and \texttt{U.Iterator.Element} belong to the same equivalence class, with the reduced type of \texttt{T.Element}. \end{example} The equivalence class graph is a useful intuition, but it does not yield a useful computational algorithm. The set of equivalence classes may be infinite, as you saw with the SwiftUI \texttt{View} protocol shown in Section~\ref{protocols}. It is also possible for a single equivalence class to consist of an infinite set of type parameters. An example appears in the standard library \texttt{Collection} protocol, which has a \texttt{SubSequence} associated type: \begin{Verbatim} protocol Collection: Sequence { ... associatedtype SubSequence: Collection where SubSequence == SubSequence.SubSequence ... } \end{Verbatim} In the generic signature \verb||, all of the following type parameters belong to the same equivalence class, via the same-type requirement: \begin{quote} \begin{verbatim} T.SubSequence T.SubSequence.SubSequence T.SubSequence.SubSequence.SubSequence ... \end{verbatim} \end{quote} \index{linear order} \index{infinite descending chain} \index{well-founded order} \paragraph{Mathematical aside} When we defined reduced type parameters, we assumed that each equivalence class has a unique smallest type parameter. This might seem obvious, but it is not always true for arbitrary infinite sets and linear orders. For example, the set of negative integers is an infinite set that can be linearly ordered with the standard ``less-than'' relation, but it does not have a minimum element, because we can exhibit an \emph{infinite descending chain} where each integer is smaller than the next: \[\cdots < -3 < -2 < -1\] With the type parameter order, this cannot happen; it is a \emph{well-founded order}. This allows us to reduce the problem of finding the minimum element of an equivalence class to the problem of finding the minimum element of a \emph{finite} set of type parameters, as follows: \begin{enumerate} \item The set of type parameters of any fixed length $N$ is finite, because there are a finite number of generic parameters and associated type declarations in a program, and each type parameter is obtained by combining them in a finite number of possible ways. \item Therefore, the set of type parameters of length $\leq N$ is also finite. \item This means that the set of type parameters that precede some type parameter $T$ of length $N$ under our linear order is always finite, because it is a subset of the set of type parameters of length $\leq N$. \item Now, we pick an arbitrary type parameter from our equivalence class, and find the finite subset of type parameters that are smaller than or equal to our chosen type parameter. A finite set always has a minimum element; this is the smallest type parameter of our equivalence class. \end{enumerate} There is an interesting corollary to the above argument: any infinite equivalence class of type parameters must contain type parameters of arbitrary length. \section{Generic Signature Queries}\label{genericsigqueries} \index{generic signature query} \index{requirement} A few times, we've mentioned ``proving'' properties that are implied by some combination of generic requirements. A fundamental set of \emph{generic signature queries} are used by the rest of the compiler to reason about the type parameters of a generic signature. This section just defines their behavior; a full accounting of how generic signature queries are \emph{implemented} will have to wait until Chapter~\ref{propertymap}. The various kinds of queries are grouped into three categories, shown in Table~\ref{genericsigquerytable}. \begin{table}\captionabove{Generic signature queries}\label{genericsigquerytable} \begin{center} \begin{tabular}{|l|l|} \hline Predicates&\texttt{isValidTypeParameter()}\\ &\texttt{requiresProtocol()}\\ &\texttt{requiresClass()}\\ &\texttt{isConcreteType()}\\ \hline Properties&\texttt{getRequiredProtocols()}\\ &\texttt{getSuperclassBound()}\\ &\texttt{getConcreteType()}\\ &\texttt{getLayoutConstraint()}\\ \hline Reduced types&\texttt{areReducedTypeParametersEqual()}\\ &\texttt{isReducedType()}\\ &\texttt{getReducedType()}\\ \hline \end{tabular} \end{center} \end{table} \index{isValidTypeParameter()} \index{requiresProtocol()} \index{requiresClass()} \index{isConcreteType()} \index{conformance requirement} \index{class-constrained protocol} \paragraph{Predicate queries} The simplest of all queries are the binary predicates, which respond with \texttt{true} or \texttt{false}. \begin{description} \item [\texttt{isValidTypeParameter()}] answers if a type parameter is valid for this generic signature. \item [\texttt{requiresProtocol()}] answers if a type parameter conforms to a protocol. \item [\texttt{requiresClass()}] answers if a type parameter is subject to an \texttt{AnyObject} layout constraint, meaning it is represented at runtime as a single retainable pointer. This can either be stated explicitly, or implied by a superclass requirement. \item [\texttt{isConcreteType()}] answers if a type parameter is fixed to a concrete type. \end{description} \begin{example} Consider this pair of generic signatures: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} \begin{itemize} \item \texttt{isValidTypeParameter(E)} is true in both signatures. \item \texttt{isValidTypeParameter(F)} is only true in the second signature, because the first signature only has one generic parameter. \item \texttt{isValidTypeParameter(E.Element)} is true in both signatures. \item \texttt{isValidTypeParameter(E.Element.Element)} is only true in the second signature, because \texttt{E.Element} does not conform to \texttt{Sequence} in the first signature. \end{itemize} \end{example} \begin{example} Consider this generic signature: \begin{quote} \begin{verbatim} , U: Executor, V: NSObject> \end{verbatim} \end{quote} The following queries all return true: \begin{itemize} \item \texttt{requiresProtocol(T, Collection)}, because the requirement is directly stated. \item \texttt{requiresProtocol(T, Sequence)}, because \texttt{Collection} inherits from \texttt{Sequence}. \item \texttt{requiresProtocol(T.Iterator, IteratorProtocol)}, because the \texttt{Iterator} associated type of \texttt{Sequence} conforms to \texttt{IteratorProtocol}. \item \texttt{requiresClass(U)}, because \texttt{Executor} is a class-constrained protocol. \item \texttt{requiresClass(V)}, because \texttt{NSObject} is a class. \item \texttt{isConcreteType(T.Element)}, because the requirement is directly stated. \item \texttt{isConcreteType(T.Iterator.Element)}, implied by the same-type requirement in the requirement signature of \texttt{Sequence}. \end{itemize} \end{example} \index{getRequiredProtocols()} \index{getSuperclassBound()} \index{getConcreteType()} \index{getLayoutConstraint()} \index{AnyObject} \index{superclass requirement} \index{layout requirement} \paragraph{Property queries} The next set of queries derive more complex properties that are not just true/false predicates. \begin{description} \item [\texttt{getRequiredProtocols()}] returns the list of all protocols that a type parameter must conform to. The list is minimal in the sense that no protocol inherits from any other protocol in the list, and sorted in canonical protocol order (Definition~\ref{linear protocol order}). \item [\texttt{getSuperclassBound()}] returns the superclass bound of a type parameter if there is one. \item [\texttt{getConcreteType()}] returns the concrete type to which a type parameter is fixed if there is one. \item [\texttt{getLayoutConstraint()}] returns the layout constraint describing a type parameter's runtime representation if there is one. The \texttt{AnyObject} layout constraint is the only one that can be explicitly written in source. A second kind of layout constraint, \texttt{\_NativeClass}, is implied by a superclass requirement whose superclass is a native Swift class, meaning a class not inheriting from \texttt{NSObject}. The \texttt{\_NativeClass} layout constraint implies the \texttt{AnyObject} layout constraint. The two differ in how reference counting operations on their instances are lowered in code generation; arbitrary class instances use the Objective-C runtime entry points for retain and release operations, whereas native class instances use a more efficient calling convention. \end{description} \begin{example} In the following generic signature, \texttt{getSuperclassBound(T)} is \texttt{G}: \begin{Verbatim} > class G {} \end{Verbatim} \end{example} \begin{example} In the following generic signature, \texttt{getConcreteType(T.Index)} is \texttt{Int}: \begin{quote} \begin{verbatim} > \end{verbatim} \end{quote} This is a non-trivial consequence of several requirements: \begin{itemize} \item The type parameter \texttt{T.[Collection]Index} is in the equivalence class of the type parameter \texttt{T.[Collection]Indices.[Sequence]Element}, via the same-type requirement in the \texttt{Collection} protocol. \item The base type of this type parameter is \texttt{T.[Collection]Indices}, which is fixed to the concrete type \texttt{Range} in our generic signature. \item Therefore, any member types of this type parameter are fixed to the corresponding type witnesses in the concrete type's conformance. \item The standard library defines a conditional conformance of \texttt{Range} to \texttt{Collection} when the \texttt{Element} generic parameter of \texttt{Range} conforms to the \texttt{Strideable} protocol: \begin{Verbatim} extension Range: Collection where Element: Strideable {...} \end{Verbatim} Since \texttt{Int} conforms to \texttt{Strideable}, the type \texttt{Range} satisfies the conditional requirements of this conditional conformance. The \texttt{Element} associated type is witnessed by the \texttt{Element} generic parameter in the conformance of \texttt{Range} to \texttt{Sequence}. \item The type parameter \texttt{T.[Collection]Indices.[Sequence]Element} is therefore fixed to the concrete type \texttt{Int}, which gives us the final result. \end{itemize} \end{example} \index{reduced type} \index{areReducedTypeParametersEqual()} \index{isReducedType()} \index{getReducedType()} \paragraph{Reduced type queries} The final three generic signature queries concern reduced types: \begin{description} \item [\texttt{areReducedTypeParametersEqual()}] answers if two type parameters have the reduced type. Does not produce a useful result if one or the other is concrete. \item [\texttt{isReducedType()}] answers if an arbitrary type is already reduced. \item [\texttt{getReducedType()}] computes the reduced type of an arbitrary type. \end{description} \begin{example} In the generic signature \texttt{}, the reduced type of \texttt{Array} is \texttt{Array}. \end{example} \section{Source Code Reference}\label{genericsigsourceref} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/GenericSignature.h} \item \SourceFile{include/swift/AST/Requirement.h} \item \SourceFile{include/swift/AST/RequirementSignature.h} \item \SourceFile{lib/AST/GenericSignature.cpp} \end{itemize} Other source files: \begin{itemize} \item \SourceFile{include/swift/AST/Decl.h} \item \SourceFile{include/swift/AST/DeclContext.h} \item \SourceFile{lib/AST/Decl.cpp} \item \SourceFile{lib/AST/DeclContext.cpp} \end{itemize} \index{declaration context} \apiref{DeclContext}{class} See also Section~\ref{declarationssourceref} and Section~\ref{genericdeclsourceref}. \begin{itemize} \item \texttt{getGenericSignatureOfContext()} returns the generic signature of the innermost generic context, or the empty generic signature if there isn't one. \end{itemize} \index{generic context} \apiref{GenericContext}{class} See also Section~\ref{genericdeclsourceref}. \begin{itemize} \item \texttt{getGenericSignature()} returns the declaration's generic signature, computing it first if necessary. If the declaration does not have a generic parameter list or trailing \texttt{where} clause, returns the generic signature of the parent context. \end{itemize} \index{generic signature} \index{sugared type} \apiref{GenericSignature}{class} Represents an immutable, uniqued generic signature. Meant to be passed as a value, it stores a single instance variable, a \texttt{GenericSignatureImpl *} pointer. The \texttt{getPointer()} method returns this pointer. The pointer is not \texttt{const}, however \texttt{GenericSignatureImpl} does not define any mutating methods. The pointer may be \texttt{nullptr}, representing an empty generic signature; the default constructor \texttt{GenericSignature()} constructs this value. There is an implicit \texttt{bool} conversion which tests for the empty generic signature. The \texttt{getPointer()} method is only used occasionally, because the \texttt{GenericSignature} class overloads \texttt{operator->} to forward method calls to the \texttt{GenericSignatureImpl *} pointer. Some operations on generic signatures are methods on \texttt{GenericSignature} (called with ``\texttt{.}'') and some on \texttt{GenericSignatureImpl} (called with ``\texttt{->}''). Methods of \texttt{GenericSignature} are safe to call with an empty generic signature, which is presented as having no generic parameters or requirements. Methods forwarded to \texttt{GenericSignatureImpl} can only be invoked if the signature is non-empty. \index{generic signature equality} The \texttt{GenericSignature} class explicitly deletes \texttt{operator==} and \texttt{operator!=} to make the choice between pointer and canonical equality explicit. To check pointer equality of generic signatures, first unwrap both sides with a \texttt{getPointer()} call: \begin{Verbatim} if (lhsSig.getPointer() == rhsSig.getPointer()) ...; \end{Verbatim} The more common canonical signature equality check is implemented by the \texttt{isEqual()} method on \texttt{GenericSignatureImpl}: \begin{Verbatim} if (lhsSig->isEqual(rhsSig)) ...; \end{Verbatim} \index{reduced type} Various accessor methods: \begin{itemize} \item \texttt{getGenericParams()} returns an array of \texttt{GenericTypeParamType}. If the generic signature is empty, this is the empty array, otherwise it contains at least one generic parameter. \item \texttt{getInnermostGenericParams()} returns an array of \texttt{GenericTypeParamType} with the innermost generic parameters only, that is, those with the highest depth. If the generic signature is empty, this is the empty array, otherwise it contains at least one generic parameter. \item \texttt{getRequirements()} returns an array of \texttt{Requirement}. If the generic signature is empty, this is the empty array. \item \texttt{getCanonicalSignature()} returns the canonical signature. If the generic signature is empty, returns the canonical empty generic signature. \item \texttt{getPointer()} returns the underlying \texttt{GenericSignatureImpl *}. \end{itemize} Computing reduced types: \begin{itemize} \item \texttt{getReducedType()} returns the reduced type of an interface type for this generic signature. If the generic signature is empty, the type must be fully concrete, and is returned unchanged. \end{itemize} Other: \begin{itemize} \item \texttt{print()} prints the generic signature, with various options to control the output. \item \texttt{dump()} prints the generic signature, meant for use from the debugger or ad-hoc print debug statements. \end{itemize} \index{generic signature query} \apiref{GenericSignatureImpl}{class} The backing storage of a generic signature. Instances of this class are allocated in the AST context, and are always passed by pointer. \begin{itemize} \item \texttt{isEqual()} checks if two generic signatures are canonically equal. \item \texttt{getSugaredType()} given a type containing canonical type parameters that is understood to be written with respect to this generic signature, replaces the generic parameter types with their ``sugared'' forms, so that the name is preserved when the type is printed out to a string. \item \texttt{forEachParam()} invokes a callback on each generic parameter of the signature; the callback also receives a boolean indicating if the generic parameter type is reduced or not---a generic parameter on the left hand side of a same-type requirement is not reduced. \item \texttt{areAllParamsConcrete()} answers if all generic parameters are fixed to concrete types via same-type requirements, which makes the generic signature somewhat like an empty generic signature. Fully-concrete generic signatures are lowered away at the SIL level. \end{itemize} The generic signature queries from Section~\ref{genericsigqueries} are methods on \texttt{GenericSignatureImpl}: \begin{itemize} \item Predicate queries: \begin{itemize} \item \texttt{isValidTypeParameter()} \item \texttt{requiresProtocol()} \item \texttt{requiresClass()} \item \texttt{isConcreteType()} \end{itemize} \item Property queries: \begin{itemize} \item \texttt{getRequiredProtocols()} \item \texttt{getSuperclassBound()} \item \texttt{getConcreteType()} \item \texttt{getLayoutConstraint()} \end{itemize} \item Reduced type queries: \begin{itemize} \item \texttt{areReducedTypeParametersEqual()} \item \texttt{isReducedType()} \item \texttt{getReducedType()} \end{itemize} \end{itemize} \index{canonical generic signature} \apiref{CanGenericSignature}{class} The \texttt{CanGenericSignature} class wraps a \texttt{GenericSignatureImpl *} pointer which is known to be canonical. The pointer can be recovered with the \texttt{getPointer()} method. There is an implicit conversion from \texttt{CanGenenericSiganture} to \texttt{GenericSignature}. The \texttt{operator->} forwards method calls to the underlying \texttt{GenericSignatureImpl}. The \texttt{operator==} and \texttt{operator!=} operators are used to test \texttt{CanGenericSignature} for pointer equality. The \texttt{isEqual()} method of \texttt{GenericSignatureImpl} implements canonical equality on arbitrary generic signatures by first canonicalizing both sides, then checking the resulting canonical signatures for pointer equality. Therefore, the following are equivalent: \begin{Verbatim} if (lhsSig->isEqual(rhsSig)) ...; if (lhsSig.getCanonicalSignature() == rhsSig.getCanonicalSignature()) ...; \end{Verbatim} The \texttt{CanGenericSignature} class inherits from \texttt{GenericSignature}, and so inherits all of the same methods. Additionally, it overrides \texttt{getGenericParams()} to return an array of \texttt{CanGenericTypeParamType}. \index{requirement} \apiref{Requirement}{class} A generic requirement. \begin{itemize} \item \texttt{getKind()} returns the \texttt{RequirementKind}. \item \texttt{getSubjectType()} returns the subject type. \item \texttt{getConstraintType()} returns the constraint type if the requirement kind is not \texttt{RequirementKind::Layout}, otherwise asserts. \item \texttt{getProtocolDecl()} returns the protocol declaration of the constraint type if this is a conformance requirement with a protocol type as the constraint type. \item \texttt{getLayoutConstraint()} returns the layout constraint if the requirement kind is \texttt{RequirementKind::Layout}, otherwise asserts. \end{itemize} \apiref{RequirementKind}{enum class} An enum encoding the four kinds of requirements. \begin{itemize} \item \texttt{RequirementKind::Conformance} \item \texttt{RequirementKind::Superclass} \item \texttt{RequirementKind::Layout} \item \texttt{RequirementKind::SameType} \end{itemize} \index{protocol declaration} \apiref{ProtocolDecl}{class} See also Section~\ref{genericdeclsourceref}. \begin{itemize} \item \texttt{getRequirementSignature()} returns the protocol's requirement signature, first computing it, if necessary. \end{itemize} \index{requirement signature} \apiref{RequirementSignature}{class} A protocol requirement signature. \begin{itemize} \item \texttt{getRequirements()} returns an array of \texttt{Requirement}. \item \texttt{getTypeAliases()} returns an array of \texttt{ProtocolTypeAlias}. \end{itemize} \index{protocol type alias} \apiref{ProtocolTypeAlias}{class} A protocol type alias descriptor. \begin{itemize} \item \texttt{getName()} returns the name of the alias. \item \texttt{getUnderlyingType()} returns the underlying type of the type alias. This is a type written in terms of the type parameters of the requirement signature. \end{itemize} \index{type parameter} \index{interface type} \apiref{TypeBase}{class} See also Section~\ref{typesourceref}. \begin{itemize} \item \texttt{isTypeParameter()} answers if this type is a type parameter; that is, a generic parameter type, or a \texttt{DependentMemberType} whose base is another type parameter. \item \texttt{hasTypeParameter()} answers if this type is itself a type parameter, or if it contains a type parameter in structural position. For example, \texttt{Array<\ttgp{0}{0}>} will answer \texttt{false} to \texttt{isTypeParameter()}, but \texttt{true} to \texttt{hasTypeParameter()}. \end{itemize} \index{dependent member type} \apiref{DependentMemberType}{class} A type representing a reference to an associated type. \begin{itemize} \item \texttt{getBase()} returns the base type; for example, given \texttt{\ttgp{0}{0}.Foo.Bar}, will answer \texttt{\ttgp{0}{0}.Foo}. \item \texttt{getName()} returns the identifier naming the associated type. \item \texttt{getAssocType()} if this is a resolved \texttt{DependentMemberType}, returns the associated type declaration, otherwise if it is unresolved, returns \texttt{nullptr}. \end{itemize} \index{type declaration} \index{protocol order} \apiref{TypeDecl}{class} See also Section~\ref{declarationssourceref}. \begin{itemize} \item \texttt{compare()} compares two protocols by the protocol order (Definition~\ref{linear protocol order}), returning one of the following: \begin{itemize} \item $-1$ if this protocol precedes the given protocol, \item 0 if both protocol declarations are equal, \item 1 if this protocol follows the given protocol. \end{itemize} \end{itemize} \index{type parameter order} \index{generic parameter order} \apiref{swift::compareDependentTypes()}{function} Implements the type parameter order (Definition~\ref{type parameter order}), returning one of the following: \begin{itemize} \item $-1$ if the left hand side precedes the right hand side, \item 0 if the two type parameters are equal as canonical types, \item 1 if the left hand side follows the right hand side. \end{itemize} \chapter{Substitution Maps}\label{substmaps} \index{substitution map} \index{input generic signature} \index{replacement type} \index{conformance} A \emph{substitution map} describes a mapping from type parameters of a generic signature to replacement types which satisfy the requirements of this generic signature. Substitution maps arise when a reference to a generic declaration is \emph{specialized} by applying generic arguments. The generic signature of a substitution map is called the \emph{input generic signature}. A substitution map stores a reference to its input generic signature, and the list of generic parameters and conformance requirements in this signature determine the substitution map's shape: \begin{quote} \texttt{<\ttbox{A}, \ttbox{B} where \ttbox{B:\ Sequence}, B.[Sequence]Element == Int>} \end{quote} A substitution map consists of a replacement type for each generic parameter, and a conformance for each conformance requirement: \begin{quote} \begin{tabular}{ccc} \ttbox{A}&\ttbox{B}&\ttbox{B:\ Sequence}\\ $\Downarrow$&$\Downarrow$&$\Downarrow$\\ \ttbox{String}&\ttbox{Array}&\ttbox{Array:\ Sequence} \end{tabular} \end{quote} We can collect all of the above information in a table: \begin{quote} \begin{tabular}{|lcl|} \hline \textbf{Generic parameters}&&\textbf{Types}\\[\smallskipamount] \ttbox{A}&$\Rightarrow$&\ttbox{String}\\[\medskipamount] \ttbox{B}&$\Rightarrow$&\ttbox{Array}\\[\medskipamount] \textbf{Requirements}&&\textbf{Conformances}\\[\smallskipamount] \ttbox{B:\ Sequence}&$\Rightarrow$&\ttbox{Array:\ Sequence}\\[\medskipamount] \hline \end{tabular} \end{quote} Or more concisely, \begin{quote} \SubMapC{ \SubType{A}{String}\\ \SubType{B}{Array} }{ \SubConf{Array:\ Sequence} } \end{quote} \begin{listing}\captionabove{Substitution maps in type checking}\label{substmaptypecheck} \begin{Verbatim} func genericFunction(_: A, _: B) where B.Element == Int {} struct GenericType where B.Element == Int { func nonGenericMethod() {} } // substitution map for the call is {A := String, B := Array}. genericFunction("hello", [1, 2, 3]) // the type of `value' is GenericType>. let value = GenericType>() // the context substitution map for the type of `value' is // {A := String, B := Array}. value.nonGenericMethod() \end{Verbatim} \end{listing} \begin{example} Listing~\ref{substmaptypecheck} shows how our substitution map arises when type checking some code: \begin{quote} \SubMapC{ \SubType{A}{String}\\ \SubType{B}{Array} }{ \SubConf{Array:\ Sequence} } \end{quote} Here, all three of \texttt{genericFunction()}, \texttt{GenericType} and \texttt{nonGenericMethod()} have the same generic signature, \texttt{}. When type checking a generic function call, the expression type checker infers the generic arguments from the types of the argument expressions. When referencing a generic type, the generic arguments can be written explicitly. All three generic declarations are referenced with the same substitution map in this example. (When referencing a generic type declaration, this substitution map is called the \emph{context substitution map} of the specialized type, which is \texttt{GenericType>} here. Context substitution maps are coming right up in Section~\ref{contextsubstmap}.) \end{example} \index{interface type} \index{original type} \index{substituted type} \index{type substitution} \paragraph{Type substitution} Applying a substitution map to a generic parameter projects the corresponding replacement type from the substitution map. A type parameter is not necessarily a generic parameter type; it might be a dependent member type as well. Applying a substitution map to a dependent member type derives the replacement type from one of the substitution map's conformances. Now, we haven't talked about conformances yet. There is an inherent circularity between substitution maps and conformances---substitution maps store conformances, and conformances can store substitution maps, which means that whichever one you choose to explain first, you necessarily have to hand-wave the existence of the other. We will look at conformances in great detail in Chapter~\ref{conformances}. The derivation of replacement types for dependent member types is discussed in Section~\ref{abstract conformances}. Recall that an interface type is a type \emph{containing} type parameters valid for some generic signature. A substitution map can be more generally applied to an interface type, not just a type parameter. Called \emph{type substitution}, this operation recursively transforms any type parameters appearing in the interface type with their replacement types, preserving the ``concrete structure'' of the interface type. The interface type here is called the \emph{original type}, and the result type the \emph{substituted type}. It can be helpful to think of applying a substitution map to an interface type as a \emph{right action}: \[\mathboxed{original type}\times \mathboxed{substitution map} = \mathboxed{substituted type}\] Type substitution does not care about generic parameter sugar in the original type; replacement types for generic parameters are always looked up by depth and index in the substitution map. \begin{example} Applying the substitution map from our running example to sugared and canonical generic parameter types produces the same results: \[ \left\{ \begin{array}{l} \ttbox{A}\\[\medskipamount] \ttbox{\ttgp{0}{0}}\\[\medskipamount] \ttbox{B}\\[\medskipamount] \ttbox{\ttgp{0}{1}} \end{array}\right\} \times \SubMapC{ \SubType{A}{String}\\ \SubType{B}{Array} }{ \SubConf{Array:\ Sequence} } = \left\{ \begin{array}{l} \ttbox{String}\\[\medskipamount] \ttbox{String}\\[\medskipamount] \ttbox{Array}\\[\medskipamount] \ttbox{Array} \end{array}\right\} \] \end{example} \begin{listing}\captionabove{Applying a substitution map to four interface types}\label{typealiassubstlisting} \begin{Verbatim} struct GenericType where B.Element == Int { typealias T1 = A typealias T2 = B typealias T3 = (A.Type, Float) typealias T4 = (Optional) -> B } let t1: GenericType>.T1 = ... let t2: GenericType>.T2 = ... let t3: GenericType>.T3 = ... let t4: GenericType>.T4 = ... \end{Verbatim} \end{listing} \begin{example} Listing~\ref{typealiassubstlisting} shows a generic type with four member type alias declarations. There are four global variables, and the type of each global variable is written as a member type alias reference with the same type base type, \texttt{GenericType>}. Type resolution resolves a member type alias reference by applying a substitution map to the underlying type of the type alias declaration. Here, the underlying type of each type alias declaration is an interface type for the generic signature of \texttt{GenericType}, and the substitution map is the same substitution map as Example~\ref{substmaptypecheck}. Applying the substitution map to the underlying type of each type alias declaration yields the type of each global variable: \begin{quote} \begin{tabular}{|l|l|l|} \hline &\textbf{Original type}&\textbf{Substituted type}\\ \hline \texttt{T1}&\texttt{A}&\texttt{String}\\ \texttt{T2}&\texttt{B}&\texttt{Array}\\ \texttt{T3}&\texttt{(A.Type, Float)}&\texttt{(String.Type, Float)}\\ \texttt{T4}&\texttt{(Optional) -> B}&\texttt{(Optional) -> Array}\\ \hline \end{tabular} \end{quote} The first two original types are generic parameters, and substitution directly projects the corresponding replacement type from the substitution map; the second two original types are substituted by recursively replacing generic parameters they contain. References to generic type alias declarations are more complex because in addition to the generic parameters of the base type, the generic type alias will have generic parameters of its own. Section~\ref{identtyperepr} describes how the substitution map is computed in this case. \end{example} \index{substitution failure} \index{SILGen} Substitution can \emph{fail} if the interface type contains member types and some of the conformances in the substitution map are invalid. In this case, an error type is returned instead of signaling an assertion. Invalid conformances can appear in substitution maps when the user's own code is invalid; it is not an invariant violation as long as other errors are diagnosed elsewhere and the compiler does not proceed to SILGen with error types in the abstract syntax tree. \index{fully-concrete type} \index{output generic signature} \paragraph{Output generic signature} If the replacement types in the substitution map are fully concrete---that is, they do not contain type parameters---then all possible substituted types produced by this substitution map will always be fully concrete. If the replacement types are interface types for some \emph{output} generic signature, the substitution map's substituted types will be written in terms of the type parameters of the output generic signature. The output generic signature might be a different generic signature than the \emph{input} generic signature of the substitution map. This leads naturally to the concept of substitution map composition, described in Section~\ref{submapcomposition}. The output generic signature is not stored in the substitution map; it is implicit from context. Also, fully concrete types can be seen as valid interface types for \emph{any} generic signature, because they do not contain type parameters at all. Keeping that in mind, we have this rule: \begin{quote} \textbf{Substitution maps transform the interface types of an input generic signature into the interface types of an output generic signature.} \end{quote} We haven't introduced archetypes yet, but substitution maps whose replacement types are archetypes will be discussed in Section~\ref{archetypesubst}. \index{canonical substitution map} \index{substitution map equality} \paragraph{Canonical substitution maps} Substitution maps are immutable and uniqued, just like types and generic signatures. A substitution map is canonical if all replacement types are canonical types and all conformances are canonical conformances. A substitution map is canonicalized by constructing a new substitution map from the original substitution map's canonicalized replacement types and conformances. As with types, canonicalization gives substitution maps two levels of equality; two substitution maps are pointer-equal if their replacement types and conformances are pointer-equal. Two substitution maps are canonical-equal if their canonical substitution maps are pointer-equal; or equivalently, if their replacement types and conformances are canonical-equal. Applying a canonical substitution map to a canonical original type is not guaranteed to produce a canonical substituted type. However, there are two important invariants that do hold: \begin{enumerate} \item Given two canonical-equal original types, applying the same substitution map to both will produce two canonical-equal substituted types. \item Given an original type and two canonical-equal substitution maps, applying the two substitution maps to this type will also produce two canonical-equal substituted types. \end{enumerate} \section{Context Substitution Maps}\label{contextsubstmap} \index{context substitution map} \index{declared interface type} \index{specialized type} \index{parent type} A nominal type is \emph{specialized} if the type itself or one of its parent types is a generic nominal type. That is, \texttt{Array} and \texttt{Array.Iterator} are both specialized types, but \texttt{Int} and \texttt{String.UTF8View} are not. Equivalently, a nominal type is specialized if the nominal type declaration is a generic context---that is, the type declaration itself has a generic parameter list, or an outer declaration context has one. Every specialized type determines a unique substitution map for the generic signature of its declaration, called the \emph{context substitution map}. The context substitution map replaces the generic parameters of the type declaration with the corresponding generic arguments of the specialized type. The defining property is that applying a specialized type's context substitution map to the declared interface type of the type declaration gives us back the specialized type: \[ \mathboxed{declared interface type}\times \mathboxed{context substitution map} = \mathboxed{specialized type} \] To demonstrate the above identity, consider the generic signature of the \texttt{Dictionary} type declaration in the standard library: \begin{quote} \texttt{} \end{quote} One possible specialized type for \texttt{Dictionary} is the type \texttt{Dictionary}; this type, its context substitution map and the declared interface type of \texttt{Dictionary} are related as follows: \[ \ttbox{Dictionary<\ttgp{0}{0}, \ttgp{0}{1}>}\times \SubMapC{ \SubType{\ttgp{0}{0}}{Int}\\ \SubType{\ttgp{0}{1}}{String} }{ \SubConf{Int:\ Hashable} } = \ttbox{Dictionary} \] \index{identity substitution map} \paragraph{The identity substitution map} What about the context substitution map of a type declaration's declared interface type? By definition, this substitution map must leave the declared interface type unchanged. That is, it maps every generic parameter of the type declaration's generic signature to itself. If we look at the \texttt{Dictionary} type again, we get \[ \ttbox{Dictionary<\ttgp{0}{0}, \ttgp{0}{1}>}\times \SubMapC{ \SubType{\ttgp{0}{0}}{\ttgp{0}{0}}\\ \SubType{\ttgp{0}{1}}{\ttgp{0}{1}} }{ \SubConf{\ttgp{0}{0}:\ Hashable} } = \ttbox{Dictionary<\ttgp{0}{0}, \ttgp{0}{1}>} \] Every generic signature has such a substitution map, called the \emph{identity substitution map}. \[\mathboxed{interface type} \times \mathboxed{identity substitution map} = \mathboxed{interface type}\] Applying the identity substitution map to any interface type leaves it unchanged, with three caveats: \begin{enumerate} \item The interface type must only contain type parameters which are valid in the input generic signature of this identity substitution map. \item Substitution might change type sugar, because generic parameters appearing in the original interface type might be sugared differently than the input generic signature of this identity substitution map. Therefore, canonical equality of types is preserved, not necessarily pointer equality. \item We won't talk about archetypes until Chapter~\ref{genericenv}, but you may have met them already. Applying the identity substitution map to a contextual type containing archetypes replaces the archetypes with equivalent type parameters. There is a corresponding \emph{forwarding substitution map} which maps all generic parameters to archetypes; the forwarding substitution map acts as the identity in the world of contextual types. \end{enumerate} \index{empty generic signature} \index{empty substitution map} \index{fully-concrete type} \paragraph{The empty substitution map} The empty generic signature only has a single unique substitution map, the \emph{empty substitution map}, so the context substitution map of a non-specialized nominal type is the empty substitution map. Recall that the only valid interface types of the empty generic signature are the fully concrete types. The action of the empty substitution map leaves fully concrete types unchanged. \begin{gather*} \mathboxed{fully-concrete type} \times \mathboxed{empty substitution map} = \mathboxed{fully-concrete type}\\ \ttbox{Int}\times\mathboxed{empty substitution map}=\ttbox{Int} \end{gather*} In general, the empty substitution map is not the same as the identity substitution map. The empty substitution map is the identity substitution map of the empty generic signature only. Applying the empty substitution map to an interface type containing type parameters is a substitution failure and returns an error type. \[\ttbox{\ttgp{0}{0}.[Sequence]Element} \times \mathboxed{empty substitution map} = \ttbox{<>}\] \index{declaration context} \index{qualified lookup} \index{member reference expression} \paragraph{Other declaration contexts} A more general notion is the context substitution map of a type \emph{with respect to a declaration context}. This is where the ``context'' comes from in ``context substitution map.'' Recall that a qualified name lookup \texttt{foo.bar} looks for a member named \texttt{foo} on some base type, here the type of \texttt{foo}. The context substitution map for the member's declaration context describes the substitutions for computing the type of the member reference expression. When the declaration context is the type declaration itself, ``context substitution map with respect to its own declaration context'' coincides with the earlier notion of ``the'' context substitution map of a base type. \index{direct lookup} Recall from Section~\ref{name lookup} that qualified name lookup performs a series of \emph{direct lookups}, first into the type declaration itself, then its superclass if any, and finally any protocols it conforms to. A \emph{direct lookup} in turn searches the immediate members of the type declaration and any of its extensions. Thus we can talk about the set of declaration contexts \emph{reachable} from a qualified name lookup on a base type: \begin{enumerate} \item The type declaration itself and its extensions. \item The superclass declaration and its extensions, and everything reachable recursively via the superclass declaration. \item All protocol conformances of the type declaration, and their protocol extensions. \end{enumerate} The declaration context for computing a context substitution map must be reachable via qualified name lookup from the base type. \index{constrained extension} \index{conformance requirement} \begin{definition}\label{context substitution map for decl context} The context substitution map with respect to a declaration context is defined as follows for the three kinds of reachable declaration contexts: \begin{enumerate} \item When the declaration context is the generic type or an extension, the replacement types of the substitution map are the corresponding generic arguments of the base type. If the context is a constrained extension, the substitution map will store additional conformances for the conformance requirements of the extension. \item When the declaration context is a protocol or a protocol extension, the generic signature is the protocol generic signature, possibly with additional requirements if the context is a constrained protocol extension. The substitution map's single replacement type is the entire base type. \item When the declaration context is a superclass of the generic type (which must be a class type or an archetype with a superclass requirement), the context substitution map is constructed recursively from the type declaration's superclass type. This case will be described in Chapter~\ref{classinheritance}. \end{enumerate} The context substitution map's input generic signature is the generic signature of the declaration context; thus it can be applied to the interface type of a member of this context. \end{definition} \begin{listing}\captionabove{Context substitution map with respect to an extension context}\label{context substitution map of constrained extension listing} \begin{Verbatim} struct Outer { struct Inner {} } extension Outer.Inner where U: Sequence { typealias A = (U.Element) -> () } // What is the type of `x'? let x: Outer.Inner.A = ... \end{Verbatim} \end{listing} \begin{example} Case~1 determines the type of \texttt{x} in Listing~\ref{context substitution map of constrained extension listing}. The base type is the generic nominal type \texttt{Outer.Inner} and the type alias \texttt{A} is a member of the constrained extension of \texttt{Outer.Inner}. The generic nominal type \texttt{Outer.Inner} sets \texttt{T} to \texttt{Int} and \texttt{U} to \texttt{String}. The extension defines the additional conformance requirement \texttt{U:~Sequence}. Therefore, the context substitution map with respect to the extension's declaration context is: \begin{quote} \SubMapC{ \SubType{T}{Int}\\ \SubType{U}{String} }{ \SubConf{String:\ Sequence} } \end{quote} Applying the above substitution map to the declared interface type of the type alias \texttt{A} gives us the final result: \[ \ttbox{(U.[Sequence]Element) -> ()} \times \SubMapC{ \SubType{T}{Int}\\ \SubType{U}{String} }{ \SubConf{String:\ Sequence} } = \ttbox{(Character) -> ()}\] \end{example} \begin{example} In the previous example, we could instead compute the context substitution map for the type declaration context itself. We get almost the same substitution map, except without the conformance requirement: \begin{quote} \SubMap{ \SubType{T}{Int}\\ \SubType{U}{String} } \end{quote} Applying this substitution map to the declared interface type of the type alias \texttt{A} will produce an error type, because the dependent member type \texttt{U.[Sequence]Element} is not a valid type parameter for this substitution map's input generic signature: \[ \ttbox{(U.[Sequence]Element) -> ()} \times \SubMap{ \SubType{T}{Int}\\ \SubType{U}{String} } = \ttbox{<>}\] \end{example} \begin{example} What if we use the correct declaration context, but the base type does not satisfy the requirements of the constrained extension? For example, consider the type \texttt{Outer.Inner}. Computing the context substitution map of our base type for the constrained extension's declaration context will output a substitution map containing an invalid conformance, because \texttt{Int} does not conform to \texttt{Sequence}: \[ \SubMapC{ \SubType{T}{Int}\\ \SubType{U}{Int} }{ \multicolumn{3}{|l|}{invalid conformance} } \] \end{example} In fact, the type alias \texttt{A} cannot be referenced as a member of this base type at all, because name lookup checks whether the generic requirements of a type declaration are satisfied. Checking generic requirements will be first introduced as part of type resolution (Section~\ref{identtyperepr}), and will come up elsewhere as well. \index{protocol substitution map} \index{protocol Self type} \paragraph{Protocol substitution map} The context substitution map of a type with respect to a protocol declaration context is called the \emph{protocol substitution map}. Every protocol's generic signature has a single generic parameter with a single conformance requirement, so a substitution map for this generic signature consists of a conformance together with its conforming type. In this manner, there is a one-to-one correspondence between conformances to a specific protocol and the substitution maps of the protocol's generic signature; this mapping is defined by the protocol substitution map construction. \begin{quote} \SubMapC{ \SubType{Self}{T} }{ \SubConf{T:~P} } \end{quote} \begin{listing}\captionabove{The context substitution map with respect to a protocol context}\label{protocolsubstitutionmaplisting} \begin{Verbatim} struct S: P { typealias Element = Int } protocol P { associatedtype Element typealias B = Array } // What is the type of `x'? let x: S.B = ... \end{Verbatim} \end{listing} \begin{example} The type of \texttt{x} in Listing~\ref{protocolsubstitutionmaplisting} is determined by the context substitution map of \texttt{S} for the protocol declaration context \texttt{P}, which is the protocol substitution map for the conformance \texttt{S:~P}: \begin{quote} \SubMapC{ \SubType{Self}{S} }{ \SubConf{S:~P} } \end{quote} The declared interface type of \texttt{B} is \texttt{Array}. Applying our substitution map replaces the dependent member type \texttt{Self.Element} with the type witness \texttt{Int} from the conformance, giving us the final substituted type \texttt{Array}. \end{example} \section{Composing Substitution Maps}\label{submapcomposition} \index{substitution map composition} Just as a substitution map can be applied to an original type to produce a substituted type, a substitution map can also be applied to \emph{another substitution map} to produce a new substitution map. The substitution maps are assumed to be \emph{compatible}, meaning the output generic signature of the first must equal the input generic signature of the second. This is called the \emph{composition} of two substitution maps: \[\mathboxed{substitution map 1}\times \mathboxed{substitution map 2} = \mathboxed{substitution map 3}\] The action of the composed substitution map is equal to first applying the left hand side substitution map, followed by the right hand side:\footnote{This is why substitution maps act on the right and not the left; it makes our equations more natural.} \begin{multline*} \mathboxed{type}\times \left(\,\mathboxed{substitution map 1} \times \mathboxed{substitution map 2}\,\right)\\ = \left(\,\mathboxed{type}\times \mathboxed{substitution map 1}\,\right)\times \mathboxed{substitution map 2} \end{multline*} \index{input generic signature} \index{output generic signature} Therefore, the input generic signature of a composed substitution map is the input generic signature of the left hand side; its output generic signature is the output generic signature of the right hand side. \[ \underbrace{ \overbrace{\mathboxed{substitution map 1}}^{\text{signature 1 to signature 2}} \times \overbrace{\mathboxed{substitution map 2}}^{\text{signature 2 to signature 3}} }_{ \text{signature 1 to signature 3}} \] Composition is defined by applying the second substitution map to each replacement type and conformance of the first substitution map, and collecting the results in a new substitution map. We haven't explained what it means to apply a substitution map to a conformance yet; this will be revisited in Section~\ref{conformance subst}. \newcommand{\FirstMapInExample}{\SubMap{ \SubType{T}{Array}\\ \SubType{U}{A} }} \newcommand{\SecondMapInExample}{\SubMap{ \SubType{A}{Int} }} \newcommand{\ThirdMapInExample}{\SubMap{ \SubType{T}{Array}\\ \SubType{U}{Int} }} \begin{listing}\captionabove{Motivating substitution map composition}\label{composesubstmaplisting} \begin{Verbatim} struct Outer { var inner: Inner, A> } struct Inner { var value: (T) -> U } let outer: Outer = ... let x = outer.inner.value \end{Verbatim} \end{listing} \begin{example}\label{composesubstmapexample} Listing~\ref{composesubstmaplisting} shows an example where substitution map composition can help reason about the types of chained member reference expressions. The \texttt{inner} stored property of \texttt{Outer} has type \texttt{Inner, A>}. Here is the context substitution map of this type: \begin{quote} \FirstMapInExample \end{quote} The substitution map's input generic signature is the generic signature of the type declaration \texttt{Inner}, which is \texttt{}. This type is an interface type for the generic signature of \texttt{Outer}, so the output generic signature of the above substitution map is the generic signature of \texttt{Outer}, which is \texttt{}. Now, let's look at the \texttt{outer} global variable. It has the type \texttt{Outer}, with the following context substitution map: \begin{quote} \SecondMapInExample \end{quote} The input generic signature of the context substitution map is the generic signature of \texttt{Outer}. The output generic signature is the empty generic signature, because the replacement type is fully concrete. We can compose these two substitution maps, because the first substitution map's output generic signature is the same as the second substitution map's input generic signature. The composition is defined as applying the second substitution map to each replacement type of the first: \[\FirstMapInExample\times\SecondMapInExample = \ThirdMapInExample\] Now, the substituted type of \texttt{outer.inner.value} is derived from the interface type of \texttt{value}, which is the function type \verb|(T) -> U|. Substitution map composition gives us two equivalent ways to compute the substituted type: \begin{enumerate} \item By applying the first substitution map to the original type \verb|(T) -> U| to get an intermediate substituted type, and then applying the second substitution map to the intermediate substituted type to produce the final substituted type: \begin{gather*} \left(\,\ttbox{(T) -> U}\times\FirstMapInExample\,\right)\times \SecondMapInExample\\[\medskipamount] = \ttbox{Array -> A}\times \SecondMapInExample\\[\medskipamount] = \ttbox{Array -> Int} \end{gather*} \item By composing the two substitution maps to get a third substitution map, and then applying the third substitution map to the original type \texttt{(T) -> U}: \begin{gather*} \ttbox{(T) -> U}\times\left(\,\FirstMapInExample\times \SecondMapInExample\,\right)\\[\medskipamount] = \ttbox{(T) -> U}\times \ThirdMapInExample\\[\medskipamount] = \ttbox{Array -> Int} \end{gather*} \end{enumerate} The final substituted type, \texttt{Array -> Int}, is the same in both cases. \end{example} Composing a generic signature's identity substitution map with another substitution map for the same input generic signature leaves the substitution map unchanged: \[ \mathboxed{identity substitution map}\times \mathboxed{original substitution map} = \mathboxed{original substitution map} \] The identity substitution map is also an identity for composition on the right, with the same caveat as for types; it is only true if the other substitution map's replacement types are interface types. If they are contextual types, the archetypes will be replaced with equivalent type parameters. \[ \mathboxed{original substitution map}\times \mathboxed{identity substitution map} = \mathboxed{original substitution map} \] \begin{example} The above identities hold for the first substitution map from Example~\ref{composesubstmapexample}: \begin{gather*} \underbrace{\SubMapC{\SubType{T}{T}}{\SubConf{T:\ Sequence}}}_{\text{left identity}} \times \underbrace{\FirstMapInExample}_{\text{original substitution map}} = \underbrace{\FirstMapInExample}_{\text{original substitution map}}\\[\medskipamount] \underbrace{\FirstMapInExample}_{\text{original substitution map}} \times \underbrace{\SubMap{\SubType{A}{A}}}_{\text{right identity}} = \underbrace{\FirstMapInExample}_{\text{original substitution map}} \end{gather*} Note that the left and right identity substitution maps are different in this case, because our substitution map has different input and output generic signatures. \end{example} \index{associative operation} Substitution map composition is \emph{associative}. This means that both possible ways of composing three substitution maps will output the same result: \begin{multline*} \left(\,\mathboxed{substitution map 1}\times \mathboxed{substitution map 2}\,\right)\times \mathboxed{substitution map 3}\\ = \mathboxed{substitution map 1}\times \left(\,\mathboxed{substitution map 2} \times \mathboxed{substitution map 3}\,\right) \end{multline*} \paragraph{Mathematical aside} These sorts of rules are occasionally useful when writing code in the compiler, but understanding them helps learn to \emph{think} about substitution maps even more. If you have a background in higher math, you will be familiar with the idea of \emph{equational reasoning}; describing a set of objects by writing down the fundamental equations they satisfy. \index{vector space} \index{linear transformation} Linear algebra is the study of vector spaces and linear transformations. A linear transformation is a function from one vector space into another which preserves vector addition and scalar multiplication. While a vector space over a non-finite field is an infinite set, a linear transformation from a finite-dimensional vector space is completely determined by its values on a finite set of basis vectors. This is similar in a sense to substitution maps. While the input generic signature of a substitution map might have an infinite set of unique type parameters, the substitution map is not an arbitrary transformation of types; it preserves the ``concrete shape'' of the original type and transforms dependent member types in a certain way. From this, it follows that the structure of a substitution map is entirely determined by its behavior on a finite set of replacement types and conformances. \index{category} \index{morphism} \paragraph{An even more mathematical aside} In abstract algebra, a \emph{category} is a collection of \emph{objects} and \emph{morphisms} with certain properties. Each morphism is associated with a pair of objects, the \emph{source} and \emph{target}. The set of morphisms with source $A$ and target $B$ is denoted $\mathrm{Hom}(A,~B)$. The morphisms of a category must obey certain properties: \begin{enumerate} \item For every object $A$, there is an \emph{identity morphism} $1_A\in\mathrm{Hom}(A, A)$. \item If $f\in\mathrm{Hom}(A, B)$ and $g\in\mathrm{Hom}(B, C)$ are a pair of morphisms, there is a third morphism $g\circ f\in\mathrm{Hom}(A, C)$, called the \emph{composition} of $f$ and $g$. \item Composition respects the identity: if $f\in\mathrm{Hom}(A, B)$, then $f\circ 1_A=1_B\circ f=f$. \item Composition is associative: if $f\in\mathrm{Hom}(A, B)$, $g\in\mathrm{Hom}(B, C)$ and $h\in\mathrm{Hom}(C, D)$, then $h\circ(g\circ f)=(h\circ g)\circ f$. \end{enumerate} We can define \emph{the category of vector spaces} by taking the objects to be vector spaces and the morphisms to be linear transformations. We also can define \emph{the category of generic signatures} where the objects are generic signatures and morphisms are substitution maps, with two caveats. First, the morphism composition notation ($g\circ f$) is the opposite of our notation for substitution maps ($f \times g$). Second, in order for the identity substitution map to act as an identity morphism, we need to restrict our category to those substitution maps where the replacement types are interface types only. We can similarly define a category where the objects are generic signatures and the morphisms are substitution maps containing contextual types only, if we take the identity morphism to be the forwarding substitution map instead of the identity substitution map. \section{Building Substitution Maps}\label{buildingsubmaps} Now that you've seen how to get substitution maps from types, and how to compose existing substitution maps, it's time to talk about building substitution maps from scratch using the two variants of the \textbf{get substitution map} operation. \newcommand{\InvalidSubjectTypeSubMap}{\SubMapC{ \SubType{T}{Array} }{ \SubConf{Array:\ Sequence}\\ \SubConf{String:\ Comparable} }} \index{get substitution map} \index{conformance requirement} \index{protocol substitution map} The first variant constructs a substitution map directly from its three constituent parts: a generic signature, an array of replacement types, and an array of conformances. The arrays must have the correct length for the given generic signature---equal to the number of generic parameters for the replacement types array, and equal to the number of conformance requirements for the conformances array. The conformances array must satisfy an additional validity condition. Conformances happen to store their conforming type and protocol. Each conformance in a substitution map must match the conformance requirements of the generic signature as follows: \begin{enumerate} \item The conforming type of a conformance must be canonically equal to the result of applying the substitution map to the subject type of the corresponding conformance requirement. \item The protocol of a conformance must be the same as the protocol on the right hand side of the corresponding conformance requirement. \end{enumerate} This variant of \textbf{get substitution map} is used when constructing a substitution map from a deserialized representation, because a serialized substitution map is guaranteed to satisfy the above invariants. It is also used when building a protocol substitution map, because the shape is sufficiently simple---just a single replacement type and a single conformance. \index{replacement type callback} \index{type variable type} \index{type parameter} \index{archetype type} \index{query substitution map functor} \index{query type map functor} The second variant takes the input generic signature and a pair of callbacks: \begin{enumerate} \item The \textbf{replacement type callback} maps a generic parameter type to a replacement type. It is invoked with each generic parameter type to populate the replacement types array. \item The \textbf{conformance lookup callback} maps a protocol conformance requirement to a conformance. It is invoked with each conformance requirement to populate the conformances array. \end{enumerate} The conformance lookup callback takes three parameters: \begin{enumerate} \item The \emph{original type}; this is the subject type of the conformance requirement. \item The \emph{substituted type}; this is the result of applying the substitution map to the original type, which should be canonically equal to the conforming type of the conformance that will be returned. \item The protocol declaration named by the conformance requirement. \end{enumerate} The callbacks can be arbitrarily defined by the caller. Several pre-existing ``functors'' also implement common behaviors. For the replacement type callback, \begin{enumerate} \item The \textbf{query substitution map} functor looks up a generic parameter in an existing substitution map. \item The \textbf{query type map} functor looks up a generic parameter in an LLVM \texttt{DenseMap}. \end{enumerate} For the conformance lookup callback, \begin{enumerate} \item The \textbf{global conformance lookup} functor performs a global conformance lookup (Section~\ref{conformance lookup}). \item The \textbf{local conformance lookup} functor performs a local conformance lookup into another substitution map (Section~\ref{abstract conformances}). \item The \textbf{make abstract conformance} functor asserts that the substituted type is a type variable, type parameter or archetype, and returns an abstract conformance (also in Section~\ref{abstract conformances}). It is used when it is known that the substitution map can be constructed without performing any conformance lookups, as is the case with the identity substitution map. \end{enumerate} \index{conformance lookup callback} \index{abstract conformance} \index{global conformance lookup} \index{local conformance lookup} \index{global conformance lookup functor} \index{local conformance lookup functor} \index{make abstract conformance functor} \index{context substitution map} Specialized types only store their generic arguments, not conformances, so the context substitution map of a specialized type is constructed by first populating a \texttt{DenseMap} with the generic arguments of the specialized type and all of its parent types, and then invoking the \textbf{get substitution map} operation with the \textbf{query type map} and \textbf{global conformance lookup} functors. \index{identity substitution map} The identity substitution map of a generic signature is constructed from a replacement type callback which just returns the input generic parameter together with the \textbf{make abstract conformance} functor. \begin{example} A substitution map which does \emph{not} satisfy the invariants specified above, and thus cannot be constructed. First, the generic signature: \begin{quote} \texttt{}: \end{quote} And the substitution map: \begin{quote} \InvalidSubjectTypeSubMap \end{quote} The generic signature has two conformance requirements: \begin{gather*} \ttbox{T:\ Sequence}\\ \ttbox{T.[Sequence]Element:\ Comparable} \end{gather*} Applying the substitution map to the subject type of each requirement produces the expected conforming types: \begin{gather*} \ttbox{T} \times \mathboxed{substitution map} = \ttbox{Array}\\ \ttbox{T.[Sequence]Element} \times \mathboxed{substitution map} = \ttbox{Int} \end{gather*} The actual conforming type of the second conformance violates our invariant: \begin{quote} \begin{tabular}{|l|l|c|} \hline Actual&Expected&Correct?\\ \hline \texttt{Array}&\texttt{Array}&\checkmark\\ \texttt{String}&\texttt{Int}&$\times$\\ \hline \end{tabular} \end{quote} \end{example} \section{Nested Nominal Types}\label{nested nominal types} \index{nested type declaration} Nominal type declarations can appear inside other declaration contexts, subject to the following restrictions: \begin{enumerate} \item Structs, enums and classes cannot be nested in generic local contexts. \item Structs, enums and classes cannot be nested in protocols or protocol extensions. \item Protocols cannot be nested in any declaration context other than a source file. \end{enumerate} We're going to explore the implementation limitations behind these restrictions, and possible future directions for lifting them. (The rest of the book talks about what the compiler does, but this section is about what the compiler \emph{doesn't} do.) \index{local context} \index{local type declaration} \index{generic context} \index{context substitution map} \paragraph{Types in generic local contexts} This restriction is a consequence of a shortcoming in the representation of a nominal type. Recall from Chapter~\ref{types} that nominal types and generic nominal types store a parent type, and generic nominal types additionally store a list of generic arguments, corresponding to the generic parameter list of the nominal type declaration. This essentially means there is no place to store the generic arguments from outer generic local contexts. \begin{listing}\captionabove{A nominal type declaration in a generic local context}\label{nominal type in generic local context} \begin{Verbatim} func f(t: T) { struct Nested { let t: T func printT() { print(t) } } Nested(t: t).printT() } func g() { f(t: 123) f(t: "hello") } \end{Verbatim} \end{listing} Listing~\ref{nominal type in generic local context} shows a nominal type nested inside of a generic function. The generic signature of \texttt{Nested} contains the generic parameter \texttt{T} from the outer generic function \texttt{algorithm()}. However, under our rules, the declared interface type of \texttt{Nested} is a singleton nominal type, because \texttt{Nested} does not have its own generic parameter list, and its parent context is not a nominal type declaration. This means there is no way to recover a context substitution map for this type because the generic argument for \texttt{T} is not actually stored anywhere. In the source language, there is no way to specialize \texttt{Nested}; the reference to \texttt{T} inside \texttt{f()} is always understood to be the generic parameter \texttt{T} of the outer function. However, inside the compiler, different generic specializations can still arise. If the two calls to \texttt{f()} from inside \texttt{g()} are specialized and inlined by the SIL optimizer for example, the two temporary instances of \texttt{Nested} must have different in-memory layouts, because in one call \texttt{T} is \texttt{Int}, and in the other \texttt{T} is \texttt{String}. A better representation for the specializations of nominal types would replace the parent type and list of generic arguments with a single ``flat'' list that includes all outer generic arguments as well. This approach could represent generic arguments coming from outer local contexts without loss of information. \index{runtime type metadata} Luckily, this ``flat'' representation is already implemented in the Swift runtime. The runtime type metadata for a nominal type includes all the generic parameters from the nominal type declaration's generic signature, not just the generic parameters of the nominal type declaration itself. So while lifting this restriction would require some engineering effort on the compiler side, it would be a backward-deployable and ABI-compatible change. \index{protocol Self type} \paragraph{Types in protocol contexts} To allow struct, enum and class declarations to appear inside protocols and protocol extensions, a decision needs to be made as to whether the protocol \texttt{Self} type should be ``captured'' by the nested type. \begin{listing}\captionabove{A nominal type declaration nested in a protocol context}\label{nominal type in protocol context} \begin{Verbatim} protocol P {} extension P { struct Nested { let value: Self func method() { print(value) } } func f() { Nested(value: self).method() } } struct S: P {} \end{Verbatim} \end{listing} If the nested type captures \texttt{Self}, the code shown in Listing~\ref{nominal type in generic local context} would become valid. With this model, the \texttt{Nested} struct depends on \texttt{Self}, so it would not make sense to reference it as a member of the protocol itself, like \texttt{P.Nested}. Instead, \texttt{Nested} would behave as if it was a member of every conforming type, like \texttt{S.Nested} above (or even \texttt{T.Nested}, if \texttt{T} is a generic parameter conforming to \texttt{P}). At the implementation level, this would mean that the generic signature of a nominal type nested inside of a protocol context would include the protocol \texttt{Self} type, and the \emph{entire} parent type, for example \texttt{S} in \texttt{S.Nested}, would become the replacement type for \texttt{Self} in the context substitution map. The alternative approach would prohibit the nested type from referencing the protocol \texttt{Self} type. The nested type's generic signature would \emph{not} include the protocol \texttt{Self} type, and \texttt{P.Nested} would be a valid member type reference. The protocol would effectively act as a namespace for the nominal types it contains, with the nested type not depending on the conformance to the protocol in any way. \begin{listing}\captionabove{Protocol declaration nested inside other declaration contexts}\label{protocol nested inside type} \begin{Verbatim} struct Outer { protocol P { func f() } } func generic(_: T) { protocol P { // What does this mean? func f(_: T) } } \end{Verbatim} \end{listing} \index{protocol Self type} \index{Haskell} \index{multi-parameter type class} \paragraph{Protocols in other declaration contexts} The final generalization is the ability to nest protocols inside other declaration contexts, such as functions or nominal types. This can be broken down into two cases: \begin{enumerate} \item Protocols inside non-generic declaration contexts. \item Protocols inside generic declaration contexts. \end{enumerate} Listing~\ref{protocol nested inside type} shows both possibilities. The first case is a relatively straightforward; the non-generic declaration contexts acts as a namespace to which the protocol declaration is scoped. In contrast, the second case would introduce significant complexity to the language design, by allowing ``generic protocols'' with more generic parameters than just the protocol \texttt{Self} type. Such a protocol would be what Haskell calls a ``multi-parameter type class.'' Unlike the prior generalizations this one carries profound implications and tradeoffs and it is not clear that it belongs in the Swift language. \section{Source Code Reference}\label{substmapsourcecoderef} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/SubstitutionMap.h} \item \SourceFile{lib/AST/SubstitutionMap.cpp} \end{itemize} Other source files: \begin{itemize} \item \SourceFile{include/swift/AST/GenericSignature.h} \item \SourceFile{include/swift/AST/Type.h} \item \SourceFile{include/swift/AST/Types.h} \end{itemize} \index{type substitution} \apiref{Type}{class} See also Section~\ref{typesourceref}. \begin{itemize} \item \texttt{subst()} applies a substitution map to this type and returns the substituted type. \end{itemize} \index{declaration context} \index{context substitution map} \apiref{TypeBase}{class} See also Section~\ref{typesourceref} and Section~\ref{genericsigsourceref}. \begin{itemize} \item \texttt{getContextSubstitutionMap()} returns this type's context substitution map with respect to the given \texttt{DeclContext}. \end{itemize} \index{substitution map} \index{input generic signature} \index{empty substitution map} \index{substitution map composition} \apiref{SubstitutionMap}{class} Represents an immutable, uniqued substitution map. As with \texttt{Type} and \texttt{GenericSignature}, this class stores a single pointer, so substitution maps are cheap to pass around as values. The default constructor \texttt{SubstitutionMap()} constructs an empty substitution map. The implicit \texttt{bool} conversion tests for a non-empty substitution map. Accessor methods: \begin{itemize} \item \texttt{empty()} answers if this is the empty substitution map; this is the logical negation of the \texttt{bool} implicit conversion. \item \texttt{getGenericSignature()} returns the substitution map's input generic signature. \item \texttt{getReplacementTypes()} returns an array of \texttt{Type}. \item \texttt{hasAnySubstitutableParams()} answers if the input generic signature contains at least one generic parameter not fixed to a concrete type; that is, it must be non-empty and not fully concrete (see the \texttt{areAllParamsConcrete()} method of \texttt{GenericSignatureImpl} from Section~\ref{genericsigsourceref}). \end{itemize} Recursive properties computed from replacement types: \begin{itemize} \item \texttt{hasArchetypes()} answers if any of the replacement types contain a primary archetype or opened existential archetype. \item \texttt{hasOpenedExistential()} answers if any of the replacement types contain an opened existential archetype. \item \texttt{hasDynamicSelf()} answers if any of the replacement types contain the dynamic Self type. \end{itemize} Canonical substitution maps: \begin{itemize} \item \texttt{isCanonical()} answers if the replacement types and conformances stored in this substitution map are canonical. \item \texttt{getCanonical()} constructs a new substitution map by canonicalizing the replacement types and conformances of this substitution map. \end{itemize} Composing substitution maps (Section~\ref{submapcomposition}): \begin{itemize} \item \texttt{subst()} applies another substitution map to this substitution map, producing a new substitution map. \end{itemize} Two overloads of the \texttt{get()} static method are defined for constructing substitution maps (Section~\ref{buildingsubmaps}). \medskip \noindent \texttt{get(GenericSignature, ArrayRef, ArrayRef)}\newline builds a new substitution map from an input generic signature, an array of replacement types, and array of conformances. \medskip \noindent \texttt{get(GenericSignature, TypeSubstitutionFn, LookupConformanceFn)} builds a new substitution map by invoking a pair of callbacks to produce each replacement type and conformance. \index{replacement type callback} \apiref{TypeSubstitutionFn}{type alias} The type signature of a replacement type callback for \texttt{SubstitutionMap::get()}. \begin{verbatim} using TypeSubstitutionFn = llvm::function_ref; \end{verbatim} The parameter type is always a \texttt{GenericTypeParamType *} when the callback is used with \texttt{SubstitutionMap::get()}. \index{query substitution map functor} \apiref{QuerySubstitutionMap}{struct} A functor intended to be used with \texttt{SubstitutionMap::get()} as a replacement type callback. Overloads \texttt{operator()} with the signature of \texttt{TypeSubstitutionFn}. Constructed from a \texttt{SubstitutionMap}: \begin{Verbatim} QuerySubstitutionMap{subMap} \end{Verbatim} \index{query type map functor} \apiref{QueryTypeSubstitutionMap}{struct} A functor intended to be used with \texttt{SubstitutionMap::get()} as a replacement type callback. Overloads \texttt{operator()} with the signature of \texttt{TypeSubstitutionFn}. Constructed from an LLVM \texttt{DenseMap}: \begin{Verbatim} DenseMap typeMap; QueryTypeSubstitutionMap{typeMap} \end{Verbatim} \index{conformance lookup callback} \apiref{LookupConformanceFn}{type alias} The type signature of a conformance lookup callback for \texttt{SubstitutionMap::get()}. \begin{verbatim} using LookupConformanceFn = llvm::function_ref< ProtocolConformanceRef(CanType origType, Type substType, ProtocolDecl *conformedProtocol)>; \end{verbatim} \index{global conformance lookup functor} \apiref{LookUpConformanceInModule}{struct} A functor intended to be used with \texttt{SubstitutionMap::get()} as a conformance lookup callback. Overloads \texttt{operator()} with the signature of \texttt{LookupConformanceFn}. Constructed with a \texttt{ModuleDecl *}: \begin{Verbatim} LookUpConformanceInModule{moduleDecl} \end{Verbatim} \index{local conformance lookup functor} \apiref{LookUpConformanceInSubstitutionMap}{struct} A functor intended to be used with \texttt{SubstitutionMap::get()} as a conformance lookup callback. Overloads \texttt{operator()} with the signature of \texttt{LookupConformanceFn}. Constructed with a \texttt{SubstitutionMap}: \begin{Verbatim} LookUpConformanceInSubstitutionMap{subMap} \end{Verbatim} \index{make abstract conformance functor} \apiref{MakeAbstractConformance}{struct} A functor intended to be used with \texttt{SubstitutionMap::get()} as a conformance lookup callback. Overloads \texttt{operator()} with the signature of \texttt{LookupConformanceFn}. Constructed without arguments: \begin{Verbatim} MakeAbstractConformance() \end{Verbatim} \index{generic signature} \index{identity substitution map} \apiref{GenericSignature}{class} See also Section~\ref{genericsigsourceref}. \begin{itemize} \item \texttt{getIdentitySubstitutionMap()} returns the substitution map that replaces each generic parameter with itself. \end{itemize} \chapter{Conformances}\label{conformances} \index{invalid conformance} \index{abstract conformance} \index{concrete conformance} \index{normal conformance} \index{specialized conformance} \index{inherited conformance} \index{self conformance} \index{specialized type} A \emph{conformance} describes how a type satisfies the requirements of a protocol. In the previous chapter, you saw that conformances appear in substitution maps, populated by a global conformance lookup operation. Now, we will discuss their structure and the role that conformances play in type substitution, and look at global conformance lookup in more detail. There are three kinds of conformance: \begin{enumerate} \item An \textbf{invalid conformance} denotes that a type does not actually conform to the protocol. \item An \textbf{abstract conformance} denotes that a type conforms to the protocol, but it is not known where this conformance was declared. Described in Section~\ref{abstract conformances}. \item A \textbf{concrete conformance} represents a conformance with a known definition. \end{enumerate} Concrete conformances are further broken down into four sub-kinds, with the first two sub-kinds being the main focus of this chapter: \begin{enumerate} \item A \textbf{normal conformance} represents the actual declaration of a conformance on a type or extension. \item A \textbf{specialized conformance} is how an arbitrary specialized type conforms to a protocol. \item A \textbf{self conformance} is how a protocol conforms to itself, which is only possible in a few very special cases. Described in Section~\ref{selfconformingprotocols}. \item An \textbf{inherited conformance} is how a subclass conforms to a protocol when the conformance was declared on the superclass. Described in Section~\ref{inheritedconformance}. \end{enumerate} \index{extension declaration} \index{nominal type declaration} \index{inheritance clause} \index{conformance} \paragraph{Normal conformances} Structs, enums and classes can conform to protocols. A normal conformance represents the \emph{declaration} of such a conformance. Normal conformances are declared by referencing a protocol from the inheritance clause of a type or extension declaration: \begin{Verbatim} struct Horse: Animal {...} struct Cow {...} extension Cow: Animal {...} \end{Verbatim} \index{conformance lookup table} \index{local conformance} Each type or extension declaration has a list of \emph{local conformances}, which are the normal conformances declared on that type or extension. In the above, the struct declaration \texttt{Horse} has a single local conformance. The struct declaration \texttt{Cow} does not have any local conformances itself, but the \emph{extension} of \texttt{Cow} has one. \index{extension declaration} \index{extended type} Nominal type declarations have a \emph{conformance lookup table}, which stores the local conformances of the type and any of its extensions, together with conformances inherited from the superclass, if the type declaration is a class declaration. Extension declarations do not have a conformance lookup table of their own; their local conformances are part of the extended type's conformance lookup table. The conformance lookup table is used to implement global conformance lookup. The rest of the compiler does not interact directly with conformance lookup tables. \index{declared interface type} \index{declaration context} \index{generic signature} \index{value witness} \index{value requirement} Broken down into constituent parts, a normal conformance stores the following: \begin{itemize} \item \textbf{The type:} the declared interface type of the conforming context. \item \textbf{The protocol:} this is the protocol being conformed to. \item \textbf{The conforming context:} either a nominal type declaration (if the conformance is stated on the type) or an extension thereof (if the conformance is stated on an extension). \item \textbf{The generic signature:} the generic signature of the conforming context. If the conformance context is a nominal type declaration or an unconstrained extension, this is the generic signature of the nominal type. If the conformance context is a constrained extension, this generic signature will have additional requirements, and the conformance becomes a conditional conformance. Conditional conformances are described in Section~\ref{conditional conformance}. \item \textbf{Type witnesses:} a mapping from each associated type of the protocol to the concrete type witnessing the associated type requirement. This is an interface type written in terms of the generic signature of the conformance. Section~\ref{type witnesses} will talk about type witnesses. \item \textbf{Associated conformances:} a mapping from the conformance requirements of the requirement signature to a conformance of the substituted subject type to the requirement's protocol. Section~\ref{associated conformances} will talk about associated conformances. \item \textbf{Value witnesses:} for each value requirement of the protocol, the declaration witnessing the requirement. This declaration is either a member of the conforming nominal type, an extension of the conforming nominal type, or it is a default implementation from a protocol extension. The mapping is more elaborate than just storing the witness declaration; Chapter~\ref{valuerequirements} goes into the details. \end{itemize} \index{conformance substitution map} \paragraph{Specialized conformances} In Section~\ref{submapcomposition}, you saw that substitution maps apply to types, other substitution maps, and conformances. Applying a non-identity substitution map to a normal conformance produces a \emph{specialized conformance}, which wraps the underlying normal conformance together with the substitution map, which we call the \emph{conformance substitution map}: \[ \mathboxed{normal conformance} \times \mathboxed{substitution map} = \mathboxed{specialized conformance} \] The conformance substitution map is never the identity substitution map; applying the identity substitution map to a normal conformance simply returns the original normal conformance. This avoids the overhead of constructing a specialized conformance in this case, and also has a nice mathematical interpretation: it ensures that the identity substitution map actually acts as the identity on a normal conformance: \[ \mathboxed{normal conformance} \times \mathboxed{identity substitution map} = \mathboxed{normal conformance} \] \paragraph{Canonical conformances}\index{canonical conformance} Like types and substitution maps, specialized conformances are immutable and uniqued. A specialized conformance is \emph{canonical} if the substitution map is canonical. Canonicalizing a specialized conformance returns a new conformance with the same underlying conformance, and the canonicalized conformance substitution map. Normal conformances are always canonical. \section{Conformance Lookup}\label{conformance lookup} \index{global conformance lookup} Conformances are typically found via \emph{global conformance lookup}, which takes a type and a protocol and returns a conformance. Global conformance lookup answers the question ``does a type conform to a protocol''---when the result is an invalid conformance, the type does not conform, otherwise it conforms. \index{action} \index{protocol declaration} \index{commutative diagram} Global conformance lookup can be understood as the action of a protocol declaration on the left of a type: \[\mathboxed{protocol declaration} \times \mathboxed{type} = \mathboxed{conformance}\] All conformances store the conformed protocol and conforming type. The conforming type of a valid conformance found by global conformance lookup is canonical-equal to the type that was handed to the lookup operation (the two types might differ by type sugar, so are not required to be pointer-equal). We can exhibit this with a \emph{commutative diagram}: \begin{quote} \begin{tikzcd}[column sep=3cm,row sep=1cm] \mathboxed{type} \arrow[r, bend left, "\text{look up conformance}"] &\mathboxed{conformance}\arrow[l, bend left, "\text{get conforming type}"] \end{tikzcd} \end{quote} A commutative diagram is a diagram where every path with the same start and end leads to the same result. The above commutative diagram shows two pairs of paths: \begin{enumerate} \item Starting from a type, we look up its conformance to a fixed protocol, and get the conforming type of this conformance. This takes us back to the original type. \item Starting from a conformance, we get its conforming type, and perform a global conformance lookup with this type and our protocol. This gives us the original conformance. \end{enumerate} Global conformance lookup always returns a normal conformance when given the declared interface type of a type declaration that directly conforms to the protocol: \[\mathboxed{protocol declaration}\times\mathboxed{declared interface type}=\mathboxed{normal conformance}\] The conforming type of a normal conformance is the declared interface type; plugging this information into our commutative diagram gives us the following: \begin{quote} \begin{tikzcd}[column sep=3cm,row sep=1cm] \mathboxed{declared interface type} \arrow[r, bend left, "\text{look up conformance}"] &\mathboxed{normal conformance}\arrow[l, bend left, "\text{get conforming type}"] \end{tikzcd} \end{quote} You will see more equations and commutative diagrams in the next section, after a brief interlude where we discuss a conceptual difficulty. \index{coherence} \index{module declaration} \index{dynamic cast} \paragraph{Coherence} In reality, our diagram above hand-waves away a significant complication. Since a conformance can be declared on an extension, and the extended type might be defined in a different module, it is possible that two modules may define the same conformance in two different ways. Global conformance lookup is not guaranteed to be \emph{coherent}. For example, imagine if there were two different conformances of some concrete type \texttt{K} to \texttt{Hashable}. Then it would be possible for two different modules to construct values of type \texttt{Set} with incompatible hash functions; passing such a value from one module to the other would result in undefined behavior. For now, there's no real answer to this dilemma. The compiler rejects duplicate conformance definitions if an existing conformance is statically visible, so this scenario cannot occur with \texttt{Int} and \texttt{Hashable} for instance, because the conformance of \texttt{Int} to \texttt{Hashable} in the standard library is always visible, so any attempt to define a new conformance would be diagnosed as an error. However, if the concrete type \texttt{K} is defined in some common module, and two separately-compiled modules both define a conformance of \texttt{K} to \texttt{Hashable}, a module that imports all three will observe both conformances statically, with unpredictable results. A similar scenario can occur with library evolution. Suppose a library publishes the concrete type \texttt{K}, and a third party defines a conformance of \texttt{K} to \texttt{Hashable}. If the library vendor then adds their own conformance of \texttt{K} to \texttt{Hashable}, the previously-compiled client might encounter incorrect behavior at runtime. The global conformance lookup operation as implemented by the compiler actually takes a module declaration as an input, along with the type and protocol. The intent behind passing the module was that it should be taken into account somehow, perhaps restricting the search to those conformances that are transitively visible via import declarations, with an error diagnostic in the case of a true ambiguity. At the time of writing, this module declaration is ignored. The runtime equivalent of a global conformance lookup is a \emph{dynamic cast} from a concrete type to an existential type. Dynamic casts suffer from a similar ambiguity issue. To be coherent, this dynamic cast operation would need to inspect something akin to a module import graph reified at the call site to be able to disambiguate duplicate conformances. In the absence of proper compiler support for addressing this problem, there is a rule of thumb that, if followed by Swift users, mostly guarantees coherence. The rule is that when defining a conformance on an extension, either the extended type or the protocol should be part of the current module. That is, the following is fine, because our own type conforms to a standard library protocol: \begin{Verbatim} struct MyType {...} extension MyType: Hashable {...} \end{Verbatim} This is fine too, because a standard library type conforms to our own protocol: \begin{Verbatim} protocol MyProtocol {...} extension Int: MyProtocol {...} \end{Verbatim} However the next example is potentially problematic; we're defining the conformance of a standard library type to a standard library protocol, and nothing prevents some other module from declaring the same conformance: \begin{Verbatim} extension String.UTF8View: Hashable {...} \end{Verbatim} \index{retroactive conformance} A conformance where neither the conforming type nor the protocol is part of the current module is called a \emph{retroactive conformance}. Today, retroactive conformances are allowed without any restrictions. In a future compiler version, they might generate a warning. Unfortunately, avoiding retroactive conformances does not completely solve the issue either, because there is another possible hole with class inheritance and library evolution. Consider a framework which defines an open class and a protocol: \begin{Verbatim} public protocol MyProtocol {} open class BaseClass {} \end{Verbatim} A client might declare a subclass of \texttt{BaseClass} and conform it to \texttt{MyProtocol}, concluding this it is safe to do so because the conforming type, \texttt{DerivedClass}, is owned by the client, and thus this is not a retroactive conformance: \begin{Verbatim} import OtherLibrary class DerivedClass: BaseClass {} extension DerivedClass: MyProtocol {} \end{Verbatim} However, in the next version of the framework, the framework author might decide to conform \texttt{BaseClass} to \texttt{MyProtocol}. At this point, \texttt{DerivedClass} has two duplicate conformances to \texttt{MyProtocol}; the inherited conformance from \texttt{BaseClass}, and the local conformance of \texttt{DerivedClass}. \section{Conformance Substitution}\label{conformance subst} \index{normal conformance} \index{declared interface type} \index{specialized type} \index{specialized conformance} If global conformance lookup returns a normal conformance when given the declared interface type of a nominal type declaration, the natural question is what it should return given an arbitrary specialized type of a nominal type declaration. As you might guess, the answer is that it returns a specialized conformance. Recall the following three equations: \begin{enumerate} \item First, the factorization of a specialized type into the declared interface type of some nominal type declaration, together with a substitution map, from Section~\ref{contextsubstmap}: \[\mathboxed{declared interface type} \times \mathboxed{substitution map} = \mathboxed{specialized type}\] \item Next, the notation for global conformance lookup from the previous section: \[\mathboxed{protocol declaration} \times \mathboxed{type} = \mathboxed{conformance}\] \item And finally, the fact that global conformance lookup returns a normal conformance when given a declared interface type, also from the previous section: \[\mathboxed{protocol declaration} \times \mathboxed{declared interface type} = \mathboxed{normal conformance}\] \end{enumerate} We want to answer the question of what it means to perform a global conformance lookup with an arbitrary specialized type: \[ \mathboxed{protocol declaration} \times \mathboxed{specialized type} = \mathboxed{?} \] If we substitute equation (1) into the above, we get the following: \begin{gather*} \mathboxed{protocol declaration} \times \mathboxed{specialized type}\\ = \mathboxed{protocol declaration} \times \left(\,\mathboxed{declared interface type} \times \mathboxed{substitution map}\,\right) \end{gather*} Now, here's the trick. A binary operation $\times$ is \emph{associative} if the placement of parentheses doesn't matter; that is, if $(A\times B)\times C=A\times (B\times C)$. We want global conformance lookup and type substitution to be associative. This means we should be able to change the placement of the parentheses in the above equation while getting the same result: \begin{gather*} \mathboxed{protocol declaration} \times \mathboxed{specialized type}\\ = \mathboxed{protocol declaration} \times \left(\,\mathboxed{declared interface type} \times \mathboxed{substitution map}\,\right)\\ = \left(\,\mathboxed{protocol declaration} \times \mathboxed{declared interface type}\,\right) \times \mathboxed{substitution map} \end{gather*} Now, by equation (3), the term inside the parentheses gives us a normal conformance: \begin{multline*} \mathboxed{protocol declaration} \times \mathboxed{specialized type}\\ = \mathboxed{protocol declaration} \times \left(\,\mathboxed{declared interface type} \times \mathboxed{substitution map}\,\right)\\ = \left(\,\mathboxed{protocol declaration} \times \mathboxed{declared interface type}\,\right) \times \mathboxed{substitution map}\\ = \mathboxed{normal conformance} \times \mathboxed{substitution map} \end{multline*} So there you go---global conformance lookup with a specialized type returns a specialized conformance, whose conformance substitution map is the context substitution map of the specialized type. \index{conformance substitution map} Similarly, the conforming type of a specialized conformance is the specialized type we get by applying the conformance substitution map to the conforming type of the normal conformance; this has to be the case, because of our commutative diagram: \begin{quote} \begin{tikzcd}[column sep=3cm,row sep=1cm] \mathboxed{specialized type} \arrow[r, bend left, "\text{conformance lookup}"] &\mathboxed{specialized conformance}\arrow[l, bend left, "\text{conforming type}"] \end{tikzcd} \end{quote} You've now seen what it means to apply a substitution map to a normal conformance, and how this operation arises naturally in the implementation of global conformance lookup. As it turns out, you can apply substitution maps to specialized conformances as well. \begin{multline*} \mathboxed{specialized conformance}\times\mathboxed{substitution map 2}=\mathboxed{?}\\ \end{multline*} (There are going to be two substitution maps in play now, the conformance substitution map of the specialized conformance, and the substitution map being applied, so let's label them ``substitution map 1'' and ``substitution map 2.'') \index{output generic signature} \index{input generic signature} First, we need a notion of the output generic signature of a conformance; we want the input generic signature of the applied substitution map is equal to the output generic signature of the conformance. For a normal conformance, its output generic signature is the generic signature of the conformance context. For a specialized conformance, the output generic signature is the output generic signature of the conformance substitution map. As with substitution maps and interface types, the output generic signature of a specialized conformance isn't actually stored; it is implicit from usage. \index{conformance substitution map} \index{substitution map composition} With this out of the way, we can proceed to derive our equation. First, we factor the specialized conformance into a normal conformance together with a substitution map: \begin{gather*} \mathboxed{specialized conformance}\times\mathboxed{substitution map 2}\\ =\left(\,\mathboxed{normal conformance} \times \mathboxed{substitution map 1}\,\right) \times \mathboxed{substitution map 2} \end{gather*} Once again, we just move the parentheses around, because it intuitively appears to make sense: \begin{multline*} \mathboxed{specialized conformance}\times\mathboxed{substitution map 2}\\ =\left(\,\mathboxed{normal conformance} \times \mathboxed{substitution map 1}\,\right) \times \mathboxed{substitution map 2}\\ =\mathboxed{normal conformance} \times \left(\,\mathboxed{substitution map 1} \times \mathboxed{substitution map 2}\,\right) \end{multline*} Finally, we can simplify inside of the parentheses on the third line above, by composing the two substitution maps. This gives us our answer: applying a substitution map to a specialized conformance builds a new specialized conformance with the same underlying normal conformance, and a new conformance substitution map obtained by composing the old conformance substitution map with the given substitution map. Recall that substitution map composition is associative: \begin{multline*} \left(\,\mathboxed{substitution map 1}\times\mathboxed{substitution map 2}\,\right)\times \mathboxed{substitution map 3}\\ =\mathboxed{substitution map 1} \times \left(\,\mathboxed{substitution map 2} \times \mathboxed{substitution map 3}\,\right) \end{multline*} The above together with everything else from this section can be combined into one final identity. You just saw the same identity for normal conformances; it holds for specialized conformances too: \begin{multline*} \left(\,\mathboxed{specialized conformance} \times \mathboxed{substitution map 2}\,\right) \times \mathboxed{substitution map 3}\\ = \mathboxed{specialized conformance} \times \left(\,\mathboxed{substitution map 2} \times \mathboxed{substitution map 3}\,\right) \end{multline*} \section{Type Witnesses}\label{type witnesses} \index{type witness} \index{type alias declaration} \index{generic parameter declaration} \index{associated type inference} \index{conformance checking} \index{value requirement} A concrete type fulfills the associated type requirements of a protocol by declaring a \emph{type witness} for each associated type. Type witnesses are declared in one of four ways: \begin{enumerate} \item Via a \textbf{member type declaration} having the same name as the associated type. Usually this member type is a type alias, but it is legal to use a nested nominal type declaration as well. If the conforming type is a class, the member type may also be defined in a superclass. \item Via a \textbf{generic parameter} having the same name as the associated type. If the conforming type is not generic but is nested inside of a generic context, a generic parameter of the innermost generic context can be used.\footnote{The latter being allowed was probably an oversight, but it's the behavior implemented today.} \item Via \textbf{associated type inference}, where it is implicitly derived from the declaration of a witness to a value requirement. \item Via a \textbf{default type witness} on the associated type declaration, which is used if all else fails. \end{enumerate} The conformance checker is responsible for resolving type witnesses and ensuring they satisfy the requirements of the protocol's requirement signature, as described earlier in Section~\ref{requirement sig}. The problem of checking whether concrete types satisfy generic requirements is covered in Section~\ref{checking generic arguments}. \begin{listing}\captionabove{Different ways of declaring a type witness in a conformance}\label{type witness listing} \begin{Verbatim} protocol P { associatedtype T = Int func f(_: T) } extension P { func f(_: T) {} } struct WithMemberType: P { struct T {} } struct WithGenericParam: P {} struct WithInferredType: P { func f(_: String) {} } struct WithDefault: P {} \end{Verbatim} \end{listing} \index{synthesized declaration} \begin{example} Listing~\ref{type witness listing} illustrates all four possibilities. In all cases other than the first, the conformance checker synthesizes a type alias declaration with the same name as the associated type. This type alias declaration is visible as a member of the concrete conforming type. For this reason, it appears at first glance that the generic parameter \texttt{T} is a member type of \texttt{WithGenericParam}: \begin{Verbatim} func squared(_ x: WithGenericParam.T) -> Int { return x * x } \end{Verbatim} However, the member type \texttt{T} is not the generic parameter declaration itself, but the synthesized type alias declaration. If \texttt{WithGenericParam} did not declare a conformance to \texttt{P}, there would be no member type named \texttt{T}, because generic parameter declarations are not visible as member types. \end{example} \index{type witness} \index{associated type declaration} \index{protocol substitution map} \paragraph{Projection} Given a conformance and an associated type of the conformed protocol, we can ask the conformance for the corresponding type witness. The next section will explain how type substitution of dependent member types uses the type witnesses of a conformance, but first we need to develop the ``algebra'' of type witnesses. \index{action} \index{conformance substitution map} We can understand getting a type witness out of a conformance as the action of an associated type declaration (of a protocol) on the left of a conformance (to this protocol): \[ \mathboxed{associated type} \times \mathboxed{conformance} = \mathboxed{type witness} \] Normal conformances directly store type witnesses as interface types for the conforming context's generic signature; getting a type witness from a normal conformance projects this stored value: \[ \mathboxed{associated type} \times \mathboxed{normal conformance} = \mathboxed{type witness} \] Next, we need to understand what it means to get a type witness from a specialized conformance: \[ \mathboxed{associated type} \times \mathboxed{specialized conformance} = \mathboxed{?} \] As it turns out, this is implemented by applying the conformance substitution map to the corresponding type witness from the underlying normal conformance. To understand why, we start by writing the specialized conformance as a substitution map applied to a normal conformance: \begin{gather*} \mathboxed{associated type} \times \mathboxed{specialized conformance}\\ = \mathboxed{associated type} \times \left(\,\mathboxed{normal conformance} \times \mathboxed{substitution map}\,\right) \end{gather*} Then, we repeat our magic trick---we want this action to be associative, so we move the parentheses around: \begin{gather*} \mathboxed{associated type} \times \mathboxed{specialized conformance}\\ = \mathboxed{associated type} \times \left(\,\mathboxed{normal conformance} \times \mathboxed{substitution map}\,\right)\\ = \left(\,\mathboxed{associated type} \times \mathboxed{normal conformance}\,\right) \times \mathboxed{substitution map} \end{gather*} Therefore, the problem of projecting a type witness of a specialized conformance reduces to applying the conformance substitution map to a type witness of the underlying normal conformance: \begin{multline*} \mathboxed{associated type} \times \mathboxed{specialized conformance}\\ = \mathboxed{associated type} \times \left(\,\mathboxed{normal conformance} \times \mathboxed{substitution map}\,\right)\\ = \left(\,\mathboxed{associated type} \times \mathboxed{normal conformance}\,\right) \times \mathboxed{substitution map}\\ = \mathboxed{type witness} \times \mathboxed{substitution map} \end{multline*} \index{commutative diagram} The above equations show that getting a type witness of a specialized conformance fits nicely with our notational formalism. Another way to convince yourself that this makes sense is with a commutative diagram. Figure~\ref{type witness diagram} shows a commutative diagram relating global conformance lookup with getting a type witness from a specialized conformance: \begin{enumerate} \item Starting from the declared interface type of a nominal type declaration, we can look up the conformance to a protocol, and get the type witness for an associated type out of this conformance. \item If we apply a substitution map to the declared interface type, we get a specialized type. Global conformance lookup with the specialized type returns a specialized conformance. Getting a type witness from the specialized conformance applies the substitution map to the type witness of the normal conformance. \item Each horizontal arrow applies \emph{the same} substitution map, which is the context substitution map of the specialized type. \end{enumerate} \begin{figure}\captionabove{Type witnesses of normal and specialized conformances}\label{type witness diagram} \begin{center} \begin{tikzcd}[column sep=3cm,row sep=1cm] \mathboxed{declared interface type} \arrow[d, "\text{look up conformance}"{left}] \arrow[r, "\text{substitution}"] &\mathboxed{specialized type} \arrow[d, "\text{look up conformance}"] \\ \mathboxed{normal conformance} \arrow[r, "\text{substitution}"] \arrow[d, "\text{get type witness}"{left}]&\mathboxed{specialized conformance} \arrow[d, "\text{get type witness}"]\\ \mathboxed{type witness} \arrow[r, "\text{substitution}"]&\mathboxed{specialized type witness} \end{tikzcd} \end{center} \end{figure} \index{output generic signature} \index{conformance substitution map} We saw that the type witnesses of a normal conformance are written in terms of the conforming context's generic signature. For a specialized conformance, they are written in terms of the output generic signature of the conformance substitution map. \begin{example} To make this concrete, say we look up the conformance of \texttt{Array} to \texttt{Sequence}, and then get the type witness for the \texttt{Element} associated type. The declared interface type of \texttt{Array} is \texttt{Array}, where \texttt{Element} is the generic parameter of \texttt{Array}. The type witness of the \texttt{Element} associated type in the normal conformance of \texttt{Array} to \texttt{Sequence} is the \texttt{Element} generic parameter type. Our specialized type is \texttt{Array}. The context substitution map of \texttt{Array} replaces $\texttt{Element}$ with $\texttt{Int}$: \[\SubMap{\SubType{Element}{Int}}\] Figure~\ref{type witness diagram example} shows the commutative diagram for this case. Each horizontal arrow in the commutative diagram represents the application of this substitution map to a type or conformance. Since the diagram is commutative, we can start at the top left and always end up at the bottom right, independent of which of the three paths we take. \begin{figure}\captionabove{Type witnesses of the conformances of \texttt{Array} and \texttt{Array} to \texttt{Sequence}}\label{type witness diagram example} \begin{center} \begin{tikzcd}[column sep=3cm,row sep=1cm] \ttbox{Array} \arrow[d, "\text{look up conformance}"{left}] \arrow[r, "\text{substitution}"] &\ttbox{Array} \arrow[d, "\text{look up conformance}"] \\ \ttbox{Array:\ Sequence} \arrow[r, "\text{substitution}"] \arrow[d, "\text{get type witness}"{left}]&\ttbox{Array:\ Sequence} \arrow[d, "\text{get type witness}"]\\ \ttbox{Element} \arrow[r, "\text{substitution}"]&\ttbox{Int} \end{tikzcd} \end{center} \end{figure} \end{example} \section{Abstract Conformances}\label{abstract conformances} \index{abstract conformance} \index{generic signature} \index{archetype type} \index{requiresProtocol()} An \emph{abstract conformance} represents the conformance of a type parameter (or archetype) to a protocol, where the type parameter is understood to satisfy \texttt{requiresProtocol()} generic signature query of some generic signature: \[ \mathboxed{protocol declaration} \times \mathboxed{type parameter} = \mathboxed{abstract conformance} \] The conforming type of an abstract conformance is a type parameter. Similar to normal conformances and specialized conformances, we can show the relationship between an abstract conformance and its conforming type with a commutative diagram: \begin{quote} \begin{tikzcd}[column sep=3cm,row sep=1cm] \mathboxed{type parameter} \arrow[r, bend left, "\text{look up conformance}"] &\mathboxed{abstract conformance}\arrow[l, bend left, "\text{get conforming type}"] \end{tikzcd} \end{quote} \index{commutative diagram} \index{dependent member type} \index{bound dependent member type} \index{type substitution} Abstract conformances allow us to formalize the behavior of type substitution with a dependent member type: \[ \mathboxed{dependent member type} \times \mathboxed{substitution map} = \mathboxed{?} \] Before we can solve the above, consider a bound dependent member type \texttt{T.[P]A} in some generic signature, with base type \texttt{T} and associated type \texttt{A} of protocol \texttt{P}. If the compiler was able to form this dependent member type, either by type resolution or some other means, it necessarily follows that \texttt{T} conforms to \texttt{P} in the type parameter's generic signature. This allows us to factor the dependent member type into an associated type declaration together with an abstract conformance: \[ \ttbox{[P]A} \times \ttbox{T:\ P} = \ttbox{T.[P]A} \] In order words, dependent member types are the type witnesses of abstract conformances: \[ \mathboxed{associated type} \times \mathboxed{abstract conformance} = \mathboxed{dependent member type} \] This gives us the following equation: \begin{gather*} \mathboxed{dependent member type} \times \mathboxed{substitution map}\\ = \left(\,\mathboxed{associated type} \times \mathboxed{abstract conformance} \,\right) \times \mathboxed{substitution map} \end{gather*} Next we switch the parentheses around: \begin{multline*} \mathboxed{dependent member type} \times \mathboxed{substitution map}\\ = \left(\,\mathboxed{associated type} \times \mathboxed{abstract conformance} \,\right) \times \mathboxed{substitution map}\\ = \mathboxed{associated type} \times \left(\,\mathboxed{abstract conformance} \times \mathboxed{substitution map} \,\right) \end{multline*} You saw how normal and specialized conformances are substituted in Section~\ref{conformance subst}. Now, it appears we need the ability to apply a substitution map to an abstract conformance. This operation is called \emph{local conformance lookup}. Whereas global conformance lookup takes a specialized type and a protocol, local conformance lookup starts from a substitution map, a protocol, and a type parameter for the substitution map's input generic signature. \index{local conformance lookup} \index{substitution map} Indeed, local conformance lookup is the missing piece of the puzzle for understanding the implementation of type substitution with a dependent member type: \begin{enumerate} \item First, we factor the dependent member type into an associated type declaration together with an abstract conformance. The abstract conformance can be further broken down into a conforming type (the dependent member type's base type) and a protocol (the associated type declaration's parent protocol). \item Next, we perform a local conformance lookup into the substitution map, with the base type and protocol. \item Finally, we get the associated type declaration's corresponding type witness from the conformance returned by local conformance lookup. \end{enumerate} Local conformance lookup is \emph{compatible} with global conformance lookup, in the following sense: a local conformance lookup with some substitution map, base type and protocol returns the same conformance as first applying the substitution map to the base type, followed by a global conformance lookup with the substituted type and our protocol. This can be expressed in our formalism with the following equation: \begin{multline*} \mathboxed{abstract conformance} \times \mathboxed{substitution map}\\ = \left(\, \mathboxed{protocol declaration} \times \mathboxed{type parameter} \,\right) \times \mathboxed{substitution map}\\ = \mathboxed{protocol declaration} \times \left(\, \mathboxed{type parameter} \times \mathboxed{substitution map} \,\right)\\ = \mathboxed{protocol declaration} \times \mathboxed{substituted type} \end{multline*} Local conformance lookup is not actually implemented in terms of global conformance lookup, though. Instead, the result is derived directly from the conformances stored in the substitution map itself. The simplest case is when the abstract conformance directly names a conformance requirement in the substitution map's input generic signature; local conformance lookup returns the corresponding conformance stored in the substitution map. In the general case, local conformance lookup derives the conformance via a \emph{conformance path}. This will be revealed in Chapter~\ref{conformance paths}. \begin{listing}\captionabove{Applying a substitution map to a dependent member type}\label{dmt subst map listing} \begin{Verbatim} struct Concatenation where Elements.Element: Sequence { typealias InnerIterator = Elements.Element.Iterator } // What is the type of `iter'? let iter: Concatenation>.InnerIterator = ... \end{Verbatim} \end{listing} \eject \begin{example} Listing~\ref{dmt subst map listing} shows an example of dependent member type substitution.\footnote{Are you getting bored of endless variations on \texttt{Array} yet? Feel free to suggest more varied examples!} We're going to work through how the compiler derives the type of the \texttt{iter} variable. The type annotation references the \texttt{InnerIterator} member type alias with a base type of \texttt{Concatenation>}, so we need to apply the context substitution map of this base type to the underlying type of the type alias declaration. The generic signature of \texttt{Concatenation} is the following: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} The context substitution map of \texttt{Concatenation>} is a substitution map for the above input generic signature: \begin{quote} \SubMapC{ \SubType{Elements}{Array>} }{ \SubConf{Array>:\ Sequence}\\ \SubConf{Array:\ Sequence} } \end{quote} The underlying type of the \texttt{InnerIterator} type alias is the bound dependent member type \verb|Elements.[Sequence]Element.[Sequence]Iterator|. To apply our substitution map to this dependent member type, the compiler performs the three steps outlined earlier in this section: \begin{enumerate} \item The base type of the dependent member type is \verb|Elements.[Sequence]Element|, and the associated type \texttt{Iterator} is defined in the \texttt{Sequence} protocol. Therefore the abstract conformance is \begin{quote} \verb|Elements.[Sequence]Element: Sequence| \end{quote} \item Applying the substitution map to this abstract conformance performs a local conformance lookup into the substitution map. The conforming type and protocol of the abstract conformance is exactly equal to the second conformance requirement in the generic signature, so the local conformance lookup returns the conformance \verb|Array: Sequence|. \item The final step projects the type witness for \texttt{Iterator} from this conformance. This is a specialized conformance, with the conformance substitution map: \begin{quote} \SubMap{\SubType{Element}{Int}} \end{quote} Recall that projecting a type witness from a specialized conformance is defined by first projecting the type witness from the underlying normal conformance, in our case \verb|Array: Sequence|, and then applying the conformance substitution map, shown above. The type witness for \texttt{Iterator} in our normal conformance is an interface type written with respect to the generic signature of \texttt{Array}, which is \verb||: \begin{quote} \verb|IndexingIterator>| \end{quote} Applying the conformance substitution map from our specialized conformance to this interface type replaces the \texttt{Element} generic parameter with \texttt{Int}: \begin{quote} \verb|IndexingIterator>| \end{quote} \end{enumerate} So the type of \texttt{iter} is \verb|IndexingIterator>|. \end{example} \begin{example} If you're particularly attentive, you'll remember from Section~\ref{buildingsubmaps} that the construction of the context substitution map of a specialized type is a little tricky, because we have to recursively compute the substituted subject type of each conformance requirement in the generic signature and then perform a global conformance lookup. In the previous example, the generic signature of \texttt{Concatenation} has two conformance requirements, and their original and substituted subject types are as follows: \begin{gather*} \ttbox{Elements} \Rightarrow \ttbox{Array>}\\ \ttbox{Elements.[Sequence]Element} \Rightarrow \ttbox{Array} \end{gather*} The computation of each substituted subject type can be understood as applying the \emph{partially-constructed} context substitution map that has been built so far to each original subject type. \eject For the first subject type, the substitution trivially projects the replacement type of the \texttt{Elements} generic parameter: \[ \ttbox{Elements} \times \SubMapC{ \SubType{Elements}{Array>} }{ \multicolumn{3}{|l|}{---}\\ \multicolumn{3}{|l|}{---} } = \ttbox{Array>} \] The second time around, the original subject type is itself a dependent member type, so type substitution recursively performs the same dance with a local conformance lookup and type witness projection---if you like, you can work this one out with pen and paper to convince yourself that it is so: \begin{multline*} \ttbox{Elements.[Sequence]Element} \times \SubMapC{ \SubType{Elements}{Array>} }{ \SubConf{Array>:\ Sequence}\\ \multicolumn{3}{|l|}{---} }\\ = \ttbox{Array} \end{multline*} \end{example} \index{protocol substitution map} \paragraph{Protocol substitution maps} Recall the protocol substitution map construction from Section~\ref{contextsubstmap}, which wraps a conformance \texttt{T:\ P} in a substitution map for the protocol's generic signature \verb||. Suppose that our protocol \texttt{P} declares an associated type \texttt{A}, and the type witness for \texttt{A} in the conformance \verb|T: P| is some type \texttt{X}. We can now show that the following two are equivalent: \begin{enumerate} \item Projecting the type witness for \texttt{A} from the conformance \verb|T: P|: \[\ttbox{A} \times \ttbox{T:\ P} = \ttbox{X}\] \item Applying the protocol substitution map to the declared interface type of \texttt{A}, which is the dependent member type \texttt{Self.[P]A}: \[\ttbox{Self.[P]A} \times \SubMapC{ \SubType{Self}{T} }{ \SubConf{T:\ P} } = \ttbox{X}\] \end{enumerate} To see why, we need to recall two facts. First, the dependent member type \texttt{Self.[P]A} can be written as the type witness of \texttt{A} in the abstract conformance \verb|Self: P|. Second, applying the protocol substitution map to \verb|Self: P| performs a local conformance lookup which simply projects the original conformance from the substitution map. Therefore, we have: \begin{gather*} \ttbox{Self.[P]A} \times \SubMapC{ \SubType{Self}{T} }{ \SubConf{T:\ P} }\\[\medskipamount] = \left(\,\ttbox{A} \times \ttbox{Self:\ P}\,\right) \times \SubMapC{ \SubType{Self}{T} }{ \SubConf{T:\ P} }\\[\medskipamount] = \ttbox{A} \times \left(\,\ttbox{Self:\ P} \times \SubMapC{ \SubType{Self}{T} }{ \SubConf{T:\ P} }\,\right)\\[\medskipamount] = \ttbox{A} \times \ttbox{T:\ P}\\[\medskipamount] = \ttbox{X} \end{gather*} \section{Associated Conformances}\label{associated conformances} There is an interesting duality between substitution maps and (normal) conformances, illustrated in Table~\ref{substitution map conformance duality}. \index{associated conformance} \index{requirement signature} \index{conformance requirement} \index{substitution map} A substitution map records a replacement type for each generic parameter of a generic signature, and as you saw in the previous section, a normal conformance records a type witness for each associated type of a protocol. A substitution map also stores a conformance for each conformance requirement in its generic signature. A normal conformance stores an \emph{associated conformance} for each conformance requirement in the protocol's requirement signature. Recall from Section~\ref{requirement sig} that the printed representation of a requirement signature looks like a generic signature with a single \texttt{Self} generic parameter. For example, here is the abridged requirement signature of the standard library's \texttt{Collection} protocol: \begin{quote} \texttt{} \end{quote} The special case of an associated conformance requirement with a subject type of \texttt{Self} represents a protocol inheritance relationship, as you already saw in Section~\ref{requirement sig}. Other associated conformance requirements constrain the protocol's associated types. \begin{table}\captionabove{Duality between substitution maps and conformances}\label{substitution map conformance duality} \begin{center} \begin{tabular}{|l|l|} \hline \textbf{Substitution map}&\textbf{Normal conformance}\\ \hline Input generic signature&Requirement signature\\ Generic parameter&Associated type declaration\\ Replacement type&Type witness\\ Conformance requirement&Associated conformance requirement\\ Conformance in substitution map&Associated conformance\\ \hline \end{tabular} \end{center} \end{table} The conformance checker populates the associated conformance mapping in a normal conformance by computing the substituted subject type of each associated conformance requirement, and then performing a global conformance lookup with this subject type. This is analogous to the conformance lookup performed during the construction of a substitution map (Section~\ref{buildingsubmaps}). The substituted subject type is obtained by applying the protocol substitution map to the subject type of each associated conformance requirement. For example, in the conformance of \texttt{Array} to \texttt{Collection}, the substituted subject type of the requirement \verb|Self: Sequence| is just the conforming type: \[ \ttbox{Self} \times \SubMapC{\SubType{Self}{Array}}{\SubConf{Array:\ Collection}} = \ttbox{Array}\\[\medskipamount] \] The substituted subject type of \verb|Self.Index| is the type witness for \verb|Index|, which is \verb|Int|: \begin{gather*} \ttbox{Self.Index} \times \SubMapC{\SubType{Self}{Array}}{\SubConf{Array:\ Collection}}\\[\medskipamount] = \ttbox{[Collection]Index} \times \ttbox{Array:\ Collection}\\[\medskipamount] = \ttbox{Int} \end{gather*} With the substituted subject types on hand, the conformance checker then performs a global conformance lookup to find each associated conformance: \begin{gather*} \ttbox{Sequence} \times \ttbox{Array} = \ttbox{Array:\ Sequence}\\[\medskipamount] \ttbox{Comparable} \times \ttbox{Int} = \ttbox{Int:\ Comparable} \end{gather*} \paragraph{Notation} We're going to use the notation \verb|(Self.Index: Comparable)| for associated conformance requirements. The parentheses will serve as a visual reminder that they are different from abstract conformances, which use the notation \verb|T: P|. The distinction is important; an abstract conformance describes a type parameter that is known to conform to a protocol in some \emph{generic} signature (possibly as a non-trivial consequence of other requirements), whereas an associated conformance requirement is a \emph{specific} requirement directly appearing in a protocol's \emph{requirement} signature. \paragraph{Projection} Projecting an associated conformance from a normal conformance can be understood as the action of an associated conformance requirement (from a protocol's requirement signature) on the left of a normal conformance (to this protocol): \[ \mathboxed{conformance requirement} \times \mathboxed{normal conformance} = \mathboxed{associated conformance} \] With a specialized conformance, we do the same thing as when getting a type witness; first, we get the associated conformance from the underlying normal conformance, and then we apply the conformance substitution map: \begin{gather*} \mathboxed{conformance requirement} \times \mathboxed{specialized conformance}\\ = \mathboxed{conformance requirement} \times \left(\, \mathboxed{normal conformance} \times \mathboxed{substitution map} \,\right)\\ = \left(\,\mathboxed{conformance requirement} \times \mathboxed{normal conformance}\,\right) \times \mathboxed{substitution map}\\ = \mathboxed{associated conformance} \times \mathboxed{substitution map} \end{gather*} Now we can project associated conformances from normal conformances and specialized conformances. Last but not least, we need to define associated conformance projection from an abstract conformance. Just as the type witnesses of an abstract conformance are dependent member types, associated conformances of an abstract conformance are other abstract conformances: \[ \ttbox{(Self.[P]A:\ Q)} \times \ttbox{T:\ P} = \ttbox{T.[P]A : Q} \] \begin{example} The associated conformances of a normal conformance can themselves be any kind of conformance, including normal, specialized or abstract. Listing~\ref{associated conformance example} shows these possibilities. The protocol \texttt{P} states three associated conformance requirements, and each of the associated conformances of the normal conformance \verb|S: P| are a different kind of conformance: \begin{quote} \begin{tabular}{|l|l|l|} \hline \textbf{Requirement}&\textbf{Associated conformance}&\textbf{Kind}\\ \hline \verb|A: Equatable|&\verb|Int: Equatable|&Normal\\ \verb|B: Equatable|&\verb|Array: Equatable|&Specialized\\ \verb|C: Equatable|&\verb|T: Equatable|&Abstract\\ \hline \end{tabular} \end{quote} The case where the associated conformance is abstract is important, because it arises when the type witness is a type parameter of the conforming type's generic signature. \begin{listing}\caption{Different kinds of associated conformances}\label{associated conformance example} \begin{Verbatim} protocol P { associatedtype A: Equatable associatedtype B: Equatable associatedtype C: Equatable } struct S: P { typealias A = Int typealias B = Array typealias C = T } \end{Verbatim} \end{listing} Now consider what happens when we project the associated conformance \verb|(C: P)| from the specialized conformance \verb|S: P|: \begin{gather*} \ttbox{(C:\ P)} \times \ttbox{S:\ P}\\[\medskipamount] = \left(\, \ttbox{(C:\ P)} \times \ttbox{S:\ P} \,\right) \times \SubMapC{\SubType{T}{String}}{\SubConf{String:\ Equatable}}\\[\medskipamount] = \ttbox{T:\ Equatable} \times \SubMapC{\SubType{T}{String}}{\SubConf{String:\ Equatable}} \end{gather*} The associated conformance projection operation actually turns around and reduces to a local conformance lookup into the substitution map, which gives us the final result: \[ \ttbox{T:\ Equatable} \times \SubMapC{\SubType{T}{String}}{\SubConf{String:\ Equatable}} = \ttbox{String:\ Equatable} \] This has some unexpected consequences, which are explored in Section~\ref{recursive conformances}. \end{example} \section{Source Code Reference}\label{conformancesourceref} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/ProtocolConformanceRef.h} \item \SourceFile{include/swift/AST/ProtocolConformance.h} \item \SourceFile{lib/AST/ProtocolConformanceRef.cpp} \item \SourceFile{lib/AST/ProtocolConformance.cpp} \end{itemize} Other source files: \begin{itemize} \item \SourceFile{include/swift/AST/DeclContext.h} \item \SourceFile{include/swift/AST/Module.h} \item \SourceFile{lib/AST/ConformanceLookupTable.h} \item \SourceFile{lib/AST/ConformanceLookupTable.cpp} \item \SourceFile{lib/AST/Module.cpp} \end{itemize} \index{local conformance} \apiref{IterableDeclContext}{class} Base class inherited by \texttt{NominalTypeDecl} and \texttt{ExtensionDecl}. \begin{itemize} \item \texttt{getLocalConformances()} returns a list of conformances directly declared on this nominal type or extension. \end{itemize} \index{nominal type deeclaration} \apiref{NominalTypeDecl}{class} See also Section~\ref{declarationssourceref}. \begin{itemize} \item \texttt{getAllConformances()} returns a list of all conformances declared on this nominal type, its extensions, and inherited from its superclass, if any. \end{itemize} \index{conformance lookup table} \apiref{ConformanceLookupTable}{class} A conformance lookup table for a nominal type. Every \texttt{NominalTypeDecl} has a private instance of this class, but it is not exposed outside of the global conformance lookup implementation. \index{module declaration} \index{global conformance lookup} \apiref{ModuleDecl}{class} See also Section~\ref{compilation model source reference}. \begin{itemize} \item \texttt{lookupConformance()} returns the conformance of a type to a protocol. This is the a global conformance lookup operation. \end{itemize} \index{abstract conformance} \index{protocol conformance} \apiref{ProtocolConformanceRef}{class} A protocol conformance. Stores a single pointer, and is cheap to pass around by value. \begin{itemize} \item \texttt{isInvalid()} answers if this is an invalid conformance reference, meaning the type did not actually conform to the protocol. \item \texttt{isAbstract()} answers if this is an abstract conformance reference. \item \texttt{isConcrete()} answers if this is a concrete conformance reference. \item \texttt{getConcrete()} returns the \texttt{ProtocolConformance} instance if this is a concrete conformance. \item \texttt{getRequirement()} returns the \texttt{ProtocolDecl} instance if this is an abstract or concrete conformance. \item \texttt{subst()} returns the protocol conformance obtained by applying a substitution map to this conformance. \end{itemize} \index{concrete conformance} \apiref{ProtocolConformance}{class} A concrete protocol conformance. This class is the root of a class hierarchy shown in Figure~\ref{conformancehierarchy}. Concrete protocol conformances are allocated in the AST context, and are always passed by pointer. \begin{figure}\captionabove{The \texttt{ProtocolConformance} class hierarchy}\label{conformancehierarchy} \begin{center} \begin{tikzpicture}[% grow via three points={one child at (0.5,-0.7) and two children at (0.5,-0.7) and (0.5,-1.4)}, edge from parent path={[->] (\tikzparentnode.south) |- (\tikzchildnode.west)}] \node [class] {\texttt{\vphantom{p}ProtocolConformance}} child { node [class] {\texttt{\vphantom{p}RootProtocolConformance}} child { node [class] {\texttt{\vphantom{p}NormalProtocolConformance}}} child { node [class] {\texttt{\vphantom{p}SelfProtocolConformance}}} } child [missing] {} child [missing] {} child { node [class] {\texttt{\vphantom{p}InheritedProtocolConformance}}} child { node [class] {\texttt{\vphantom{p}SpecializedProtocolConformance}}}; \end{tikzpicture} \end{center} \end{figure} \index{conforming type} \index{type witness} \index{associated conformance} \begin{itemize} \item \texttt{getType()} returns the conforming type. \item \texttt{getProtocol()} returns the conformed protocol. \item \texttt{getTypeWitness()} returns the type witness for an associated type. \item \texttt{getAssociatedConformance()} returns the associated conformance for a conformance requirement in the protocol's requirement signature. \item \texttt{subst()} returns the protocol conformance obtained by applying a substitution map to this conformance. \end{itemize} \apiref{RootProtocolConformance}{class} Abstract base class for \texttt{NormalProtocolConformance} and \texttt{SelfProtocolConformance}. Inherits from \texttt{ProtocolConformance}. \index{normal conformance} \apiref{NormalProtocolConformance}{class} A normal protocol conformance. Inherits from \texttt{RootProtocolConformance}. \begin{itemize} \item \texttt{getDeclContext()} returns the conforming declaration context, either a nominal type declaration or extension. \item \texttt{getGenericSignature()} returns the generic signature of the conforming context. \item \texttt{finishSignatureConformances()} computes the associated conformances of this conformance. Not intended to be called directly. \end{itemize} \index{conformance substitution map} \index{specialized conformance} \apiref{SpecializedProtocolConformance}{class} A specialized protocol conformance. Inherits from \texttt{ProtocolConformance}. \begin{itemize} \item \texttt{getGenericConformance()} returns the underlying normal conformance. \item \texttt{getSubstitutionMap()} returns the conformance substitution map. \end{itemize} \chapter{Generic Environments}\label{genericenv} \index{generic environment} \index{reduced type} In Chapter~\ref{types}, type parameters and archetypes were introduced as two kinds of ``abstract types.'' So far, we've only talked about type parameters, which appear in the interface types of declarations. Archetypes appear in the types of expressions inferred by the expression type checker, and in the SIL instructions constructed by lowering expressions to SIL. To understand how archetypes are different from type parameters, consider two key properties of type parameters: \begin{enumerate} \item Type parameters only have meaning with respect to their generic signature. For example, generic signature queries (Section~\ref{genericsigqueries}) are called with a generic signature together with a type parameter. \item In a generic signature, two type parameters that are not canonical-equal might still belong to the same equivalence class, and be reduced-equal. Type parameters can represent the different ``spellings'' which are equivalent as a result of same-type requirements. This gives the two levels of equality on interface types: canonical equality, and reduced equality with respect to a generic signature. \end{enumerate} An archetype represents a reduced type parameter in a specific \emph{generic environment}. Unlike type parameters, they are self-describing, since they point back at their parent generic environment. The underlying type parameter of an archetype is always reduced, so an equivalence class of type parameters is represented by a single archetype in a given generic environment. \index{contextual type} Recall that a type containing type parameters is called an interface type. Similarly, a type containing archetypes is called a \emph{contextual type}. A pair of operations define a mapping between interface types and contextual types: \begin{itemize} \item \textbf{Mapping into an environment} transforms an interface type into a contextual type by replacing the interface type's type parameters with archetypes. Any type parameters that are not reduced are replaced by their reduced type first. This mapping is performed with respect to a fixed generic environment. \item \textbf{Mapping out of an environment} transforms a contextual type into an interface type by replacing each archetype with the reduced type parameter it represents. This operation does not take a generic environment; all archetypes know their interface type. \end{itemize} \index{primary generic environment} \index{opaque generic environment} \index{opened generic environment} There are three kinds of generic environment: \begin{itemize} \item Every generic signature has exactly one \textbf{primary generic environment}. The archetypes in the primary environment are called \emph{primary archetypes}. Primary archetypes represent the generic parameters of a function inside of a function body, both in AST statement and expression nodes, and in the SIL instructions of a SIL function. Primary generic environments preserve the sugared names of generic parameters for the printed representation of an archetype, so two canonically-equal but not pointer-equal generic signatures will instantiate distinct primary generic environments. \[\mathboxed{generic signature} \Leftrightarrow \mathboxed{primary environment}\] \item When a declaration has an opaque return type, an \textbf{opaque generic environment} is created for each unique substitution map of the declaration's generic signature. The archetypes of this environment are used for both the declaration and references to the declaration's opaque result type. These are discussed in Chapter~\ref{opaqueresult}. \[\mathboxed{opaque type declaration}\times\mathboxed{substitution map} = \mathboxed{opaque environment}\] \item An \textbf{opened generic environment} is created when an existential value is opened inside an expression. Opened archetypes represent the concrete payload of a value of existential type. A call site where an existential value is opened will instantiate a unique opened generic environment, and the usage of the opened archetypes is scoped to the call's argument expressions. Opened archetypes are discussed in Chapter~\ref{existentialtypes}. \[\mathboxed{generic signature} \times \mathboxed{existential type} \times \mathboxed{UUID} = \mathboxed{opened environment}\] \end{itemize} A generic environment contains a lazily-populated mapping from the reduced type parameters of its generic signature to archetypes. The archetypes instantiated by a generic environment are not pointer-equal or canonical-equal to the archetypes of any other generic environment. \paragraph{Motivation} You might wonder why archetypes exist at all, when at first glance, they appear equivalent to a reduced type parameter together with a generic signature. In the case of primary archetypes at least, the reason is partly historical. However, the additional indirection provided by creating multiple generic environments from a single generic signature allows archetypes to represent abstract types which are not described by the generic parameters that are in the scope of a generic declaration, namely opaque return types and existential types. \index{local requirements} \paragraph{Local requirements} The \emph{local requirements} of an archetype describe the behavior of the archetype's underlying type parameter in the generic signature of the archetype's generic environment. Local requirements are stored inside the archetype. They are derived when the archetype is first constructed within a generic environment using the generic signature queries of Section~\ref{genericsigqueries}: \begin{itemize} \item \textbf{Required protocols:} a minimal and canonical list of protocols the archetype is known to conform to, from the \texttt{getRequiredProtocols()} generic signature query. \item \textbf{Superclass bound:} an optional superclass type that the archetype is known to be a subclass of, computed by mapping the interface type returned by the \texttt{getSuperclassBound()} generic signature query into the generic environment. \item \textbf{Requires class flag:} a boolean indicating if the archetype is class-constrained, computed from the \texttt{requiresClass()} generic signature query. \item \textbf{Layout constraint:} an optional layout constraint the archetype is known to satisfy, computed from the \texttt{getLayoutConstraint()} generic signature query. \end{itemize} There is no equivalent of the \texttt{getConcreteType()} generic signature query in the world of archetypes. Archetypes represent reduced type parameters, and type parameters fixed to a concrete type are not reduced. If a generic signature fixes a type parameter to a concrete type, mapping the type parameter into an environment will first replace the type parameter with its concrete type, and then recursively map the resulting concrete type into the environment. If the concrete type contains type parameters, they will be replaced with archetypes (or concrete types, if they are themselves fixed to concrete types). For the same reason, generic signature queries to operate on reduced types do not have equivalents in the world of archetypes. Reduced types are computed as part of mapping an interface type into a generic environment. Since archetypes represent reduced type parameters, the three notions of pointer, canonical and reduced equality collapse into one. Contextual types that contain archetypes may still differ by type sugar in other positions, however canonical equality is sufficient to determine if two contextual types represent the same reduced type. \index{global conformance lookup} \paragraph{Global conformance lookup} In Section~\ref{conformance lookup}, we introduced global conformance lookup on nominal types. It generalizes to archetypes in a straightforward way: \begin{enumerate} \item If the archetype conforms abstractly via a protocol conformance requirement, global conformance lookup returns an abstract conformance. \item If the archetype conforms concretely via a superclass requirement, global conformance lookup recursively calls itself with the archetype's superclass type and returns an inherited conformance (Section~\ref{inheritedconformance}). \end{enumerate} \index{qualified lookup} \paragraph{Qualified name lookup} Continuing the trend of operations on concrete types that also support archetypes, an archetype can be used as the base type of a qualified name lookup. Recall the notion of a reachable declaration context from Section~\ref{contextsubstmap}. The reachable declaration contexts of an archetype are the protocols it conforms to, the class declaration of its superclass type, and any protocols the superclass conforms to. \index{context substitution map} \paragraph{Context substitution map} An archetype can serve as the base type when computing a context substitution map for a declaration context. The declaration context can either be a protocol context or a class context. In the case of a protocol context, the archetype can conform abstractly or concretely, as described above; a protocol substitution map is constructed from the archetype and the conformance returned by global conformance lookup. The case where the declaration context is a class is handled by a small addition to Algorithm~\ref{superclassfordecl}. Before proceeding with the main algorithm, we first check if the type \texttt{T} is an archetype, and replace it with the archetype's superclass type. The class context must be the class declaration (or an extension) of some ancestor class of the archetype's superclass requirement. \begin{figure}\captionabove{A generic signature with multiple generic environments} \tikzstyle{sig} = [rectangle, draw=black, text centered] \tikzstyle{env} = [rectangle, draw=black, text centered, minimum width=11.5em] \tikzstyle{archetype} = [rectangle, draw=black, text centered, minimum width=11em] \tikzstyle{arrow} = [->,>=stealth] \begin{center} \begin{tikzpicture}[node distance=1cm] \node (primaryEnv) [env] {Primary environment}; \node (openedEnv1) [env, below of=primaryEnv] {Opened environment \#1}; \node (openedEnv2) [env, below of=openedEnv1] {Opened environment \#2}; \node (opaqueEnv1) [env, below of=openedEnv2] {Opaque environment \#1}; \node (opaqueEnv2) [env, below of=opaqueEnv1] {Opaque environment \#2}; \node (moreEnv) [env, below of=opaqueEnv2] {...\vphantom{Primary}}; \node (signature) [sig, left of=openedEnv2, xshift=-11em, yshift=-12pt] {\texttt{}}; \node (primaryArchetype) [archetype, right=of primaryEnv] {Primary archetype \texttt{T}}; \node (openedArchetype1) [archetype, right=of openedEnv1] {Opened archetype \texttt{T} \#1}; \node (openedArchetype2) [archetype, right=of openedEnv2] {Opened archetype \texttt{T} \#2}; \node (opaqueArchetype1) [archetype, right=of opaqueEnv1] {Opaque archetype \texttt{T} \#1}; \node (opaqueArchetype2) [archetype, right=of opaqueEnv2] {Opaque archetype \texttt{T} \#2}; \draw [arrow] (signature.east) -- (primaryEnv.west); \draw [arrow] (signature.east) -- (openedEnv1.west); \draw [arrow] (signature.east) -- (openedEnv2.west); \draw [arrow] (signature.east) -- (opaqueEnv1.west); \draw [arrow] (signature.east) -- (opaqueEnv2.west); \draw [arrow] (signature.east) -- (moreEnv.west); \draw [arrow] (primaryEnv.east) -- (primaryArchetype.west); \draw [arrow] (openedEnv1.east) -- (openedArchetype1.west); \draw [arrow] (openedEnv2.east) -- (openedArchetype2.west); \draw [arrow] (opaqueEnv1.east) -- (opaqueArchetype1.west); \draw [arrow] (opaqueEnv2.east) -- (opaqueArchetype2.west); \end{tikzpicture} \end{center} \end{figure} \paragraph{Invariants} It is unwise to mix interface types and contextual types. Generally, when talking about the external interface of a declaration, you should use interface types, and when talking about types appearing inside the body of a function, you should use contextual types. A pair of recursively-computed properties distinguish interface types from archetypes: \begin{description} \item[\texttt{hasTypeParameter()}] answers if the type contains a type parameter. \item[\texttt{hasArchetype()}] answers if the type contains a primary or opened archetype. Types containing opaque archetypes do not respond with \texttt{true} to this call, for reasons that are explained later. \end{description} These predicates should be used in assertions to establish invariants. Generally, the predicate you assert is the negation of the \emph{opposite} predicate. If your function only operations on interface types, you should check for the absence of archetypes; if your function only expects contextual types, you should check for the absence of type parameters. This allows for fully-concrete types, which contain neither type parameters nor archetypes. Mapping a type into an environment asserts that the input type does not contain archetypes, and similarly mapping a type out of an environment asserts that the input type does not contain type parameters. This means you cannot call these operations ``just in case''; you need to establish that you're dealing with the correct kind of type upfront with an additional check or assertion. Furthermore, mapping a type out of an environment asserts that the type does not contain opened archetypes. Since the type parameter of an opened archetype does not correspond to a type parameter in the declaration's generic signature, mapping an opened archetype out of its environment is not a meaningful operation. \section{Primary Archetypes}\label{archetypesubst} \index{primary archetype type} \index{primary generic environment} Every generic signature stores its primary generic environment. The archetypes of a primary generic environment are called \emph{primary archetypes}. \[\mathboxed{reduced type parameter}\times\mathboxed{generic signature}=\mathboxed{primary archetype}\] \begin{example}\label{archetypeexample} Consider this generic function: \begin{Verbatim} func sum>(_ seq: S) -> Int { ... } \end{Verbatim} The function's generic signature: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} We can write four type parameters for this generic signature: \begin{quote} \begin{verbatim} S S.[Sequence]Element S.[Sequence]Iterator S.[Sequence]Iterator.[IteratorProtocol]Element \end{verbatim} \end{quote} The type parameters \texttt{S} and \texttt{S.[Sequence]Iterator} are reduced, so they map to two distinct archetypes \archetype{S} and \archetype{S.[Sequence]Iterator} in the function's primary generic environment. The other two type parameters not reduced, because they are fixed to the concrete type \texttt{Int}. Mapping them into the environment produces \texttt{Int}. \end{example} Contextual types are substitutable in the same way as interface types. Applying a substitution map to a contextual type is defined by first mapping the contextual type out of its environment. \[ \mathboxed{contextual type} \times \mathboxed{substitution map} = \mathboxed{interface type} \times \mathboxed{substitution map} \] Thus, there is no distinction between substitution maps operating on interface types and contextual types. However, there is a distinction when you look at the replacement types that are \emph{output} by the substitution. We can define an \emph{interface substitution map} as one where the replacement types are interface types, and a \emph{contextual substitution map} as one where the replacement types are contextual types. Applying an interface substitution map to an interface type or contextual type always produces an interface type. Applying a contextual substitution map to an interface type or contextual type always produces a contextual type: \begin{gather*} \mathboxed{interface type} \times \mathboxed{interface substitution map} = \mathboxed{substituted interface type}\\ \mathboxed{contextual type} \times \mathboxed{interface substitution map} = \mathboxed{substituted interface type}\\ \mathboxed{interface type} \times \mathboxed{contextual substitution map} = \mathboxed{substituted contextual type}\\ \mathboxed{contextual type} \times \mathboxed{contextual substitution map} = \mathboxed{substituted contextual type} \end{gather*} We saw in Section~\ref{submapcomposition} that every generic signature has an identity substitution map, and applying the identity substitution map to an interface type leaves the type unchanged: \[\mathboxed{interface type}\times\mathboxed{identity substitution map} = \mathboxed{interface type}\] Every generic environment has a \emph{forwarding substitution map} that replaces each generic parameter with the generic parameter mapped into the environment. The forwarding substitution map plays the role of an identity with contextual types. Applying the forwarding substitution map to a contextual type leaves the type unchanged: \[\mathboxed{contextual type}\times\mathboxed{forwarding substitution map} = \mathboxed{contextual type}\] What happens if you apply the identity substitution map to a \emph{contextual} type? Applying \emph{any} substitution map to a contextual type first maps it out of its environment, producing an interface type, and the identity substitution map leaves all type parameters in this interface type unchanged. Thus, applying the identity substitution map to a contextual type is the same as mapping the contextual type out of its environment: \[\mathboxed{contextual type}\times\mathboxed{identity substitution map} = \mathboxed{interface type}\] There is one final combination. Applying the forwarding substitution map to an interface type replaces all type parameters with archetypes, so it is the same operation as mapping the interface type into the environment: \[\mathboxed{interface type}\times\mathboxed{forwarding substitution map} = \mathboxed{contextual type}\] The replacement types of a substitution map can be mapped into an environment by applying the forwarding substitution map for the appropriate generic environment on the right: \begin{multline*}\mathboxed{interface substitution map}\times\mathboxed{forwarding substitution map} \\ = \mathboxed{contextual substitution map} \end{multline*} Applying an interface substitution map and then mapping the result into an environment has the same effect has applying the corresponding contextual substitution map: \begin{multline*} \left(\,\mathboxed{interface type}\times \mathboxed{interface substitution map}\,\right)\times\mathboxed{generic environment}\\ = \mathboxed{interface type}\times \left(\,\mathboxed{interface substitution map} \times\mathboxed{generic environment}\,\right)\\ = \mathboxed{interface type}\times \mathboxed{contextual substitution map} \end{multline*} Another way to map the replacement types of a contextual substitution map out of their environment is to apply the identity substitution map on the right. However, this requires finding the output generic signature for the substitution map. Just as contextual types can be mapped out of an environment without providing the environment, substitution maps support a \textbf{map replacement types out of environment} operation. \paragraph{Archetypes are not ``inherited''} There's a potential pitfall worth mentioning. Recall that when generic declarations nest, the inner declaration inherits the generic parameters and requirements of the outer declaration, possibly adding new generic parameters or requirements: \begin{Verbatim} func myAlgorithm(_ seq: S) where S.Element: Comparable { func helper(_ t: T) where T.Element == S { let s1: S = ... print(s1) } let s2: [S] = [seq] helper(s2) } \end{Verbatim} The inner \texttt{helper()} function has a distinct generic signature, and therefore a distinct generic environment, from the outer \texttt{myAlgorithm()} function. In particular, the outer function's generic parameter \texttt{S} maps to two \emph{different} archetypes inside the two declarations; say, $\archetype{S}_1$ and $\archetype{S}_2$. The type of the expression \texttt{s1} in \texttt{print(s1)} is $\archetype{S}_1$, and the type of \texttt{s2} in \texttt{helper(s2)} is $\archetype{S}_2$. The call to \texttt{helper()} supplies a substitution map which replaces the generic parameter \texttt{S} with the archetype $\archetype{S}_2$, and \texttt{T} with the contextual type \texttt{Array<}$\archetype{S}_2$\texttt{>}. The only case where a generic environment is inherited by an inner declaration is if the inner declaration is not ``more generic'' in any way; it does not declare generic parameters, \emph{or} a \texttt{where} clause. As another example, anonymous closure expressions always inherit the generic environment of the outer declaration, because they cannot be generic except by referencing outer generic parameters. When implementing type checker logic for nested function declarations, take care to map types into the correct generic environment, corresponding to the exact declaration where they will be used. \section{Source Code Reference} \apiref{GenericEnvironment}{class} A generic environment. Instances are allocated in the AST context, and passed by pointer. \begin{itemize} \item \texttt{getGenericSignature()} returns this generic environment's generic signature. \item \texttt{mapTypeIntoContext()} returns the contextual type obtained by mapping an interface type into this generic environment. \item \texttt{getForwardingSubstitutionMap()} returns a substitution map for mapping each generic parameter to its contextual type---an archetype, or a concrete type if the generic parameter is fixed to a concrete type via a same-type requirement. \end{itemize} \apiref{GenericSignature}{class} See also Section~\ref{genericsigsourceref}. \begin{itemize} \item \texttt{getGenericEnvironment()} returns the primary generic environment associated with this generic signature. \end{itemize} \apiref{TypeBase}{class} See also Section~\ref{typesourceref}. \begin{itemize} \item \texttt{mapTypeOutOfContext()} returns the interface type obtained by mapping this contextual type out of its generic environment. \end{itemize} \apiref{SubstitutionMap}{class} See also Section~\ref{substmapsourcecoderef}. \begin{itemize} \item \texttt{mapReplacementTypesOutOfContext()} returns the substitution map obtained by mapping this substitution map's replacement types and conformances out of their generic environment. \end{itemize} \apiref{ProtocolConformanceRef}{class} See also Section~\ref{conformancesourceref}. \begin{itemize} \item \texttt{mapConformanceOutOfContext()} returns the protocol conformance obtained by mapping this protocol conformance out of its generic environment. \end{itemize} \apiref{DeclContext}{class} See also Section~\ref{declarationssourceref}. \begin{itemize} \item \texttt{getGenericEnvironmentOfContext()} returns the generic environment of the innermost generic declaration containing this declaration context. \item \texttt{mapTypeIntoContext()} Maps an interface type into the primary generic environment for the innermost generic declaration. If at least one outer declaration context is generic, this is equivalent to: \begin{Verbatim} dc->getGenericEnvironmentOfContext()->mapTypeIntoContext(type); \end{Verbatim} For convenience, the \texttt{DeclContext} version of \texttt{mapTypeIntoContext()} also handles the case where no outer declaration is generic. In this case, it returns the input type unchanged, after asserting that it does not contain any type parameters (since type parameters appearing outside of a generic declaration are nonsensical). \end{itemize} \part{Odds and Ends}\label{part odds and ends} \chapter{Type Resolution}\label{typeresolution} \ifWIP Type resolution transforms a type representation read by the parser into a semantic type understood by the type checker, performing name lookups and structural validation along the way. In addition to the written representation itself, the type resolution procedure needs some additional information to interpret the type: \begin{enumerate} \item The declaration context where the type representation appears. Identifiers appearing in the type representation are resolved with a name lookup from this declaration context. (The declaration context alone is actually not enough, because the tree of lexical scopes is more fine-grained than what the declaration context hierarchy encodes; unqualified name lookup also needs a source location, which is stored in the type representation itself.) \item A set of \emph{type resolution options} to specify the semantic behavior of the type's position in the language grammar. \item A \emph{type resolution stage}, to select between \emph{structural} and \emph{interface} type resolution. \end{enumerate} The \emph{type resolution options} encode how a type representation can resolve to different types, or be rejected altogether, depending on where it appears. One well-known example is that since Swift 3~\cite{se0103}, a function type representation appearing in the parameter list of a function declaration or another function type resolves into a non-escaping function type by default, unless it was annotated with the \texttt{@escaping} attribute. In other positions, such as the type of a variable declaration or in the return type of a function, a function type is always understood to be \texttt{@escaping}. The \emph{type resolution stage} breaks the inherent circularity between type resolution and generic signature construction. Type resolution must use the generic signature of the current declaration context to resolve member types of generic parameters, and to validate generic arguments supplied to generic types if those arguments involve the generic parameters of the current context. However, generic signature construction itself needs to perform type resolution, resolving types appearing in the inheritance clause of generic parameter declarations and the requirements of a trailing \texttt{where} clause. Requirement inference requires resolving additional types, such as parameter and return types of functions. To break the circularity, structural resolution skips any semantic checks that would require the generic signature of the current declaration context to have already been built. Once the generic signature of the current declaration context has been computed, interface resolution visits all type representations again, performing additional checks. Validation of a generic declaration thus proceeds as follows: \begin{enumerate} \item Generic signature construction uses structural resolution to resolve the types of generic requirements and any types needed for requirement inference. \item Structural resolution forms generic types only checking that the correct number of generic arguments were provided. Lookups of member types of generic parameters always succeed, producing an \emph{unbound} \texttt{DependentMemberType} which simply stores a base type and identifier. \item The interface resolution stage begins once a generic signature is available. \item In the interface resolution stage, the generic arguments of concrete generic types are validated against the generic signature of the referenced type declaration. Lookups of member types of generic parameters query the current generic signature to determine the protocol conformances of the base type. A member type of a generic parameter will either resolve to a \emph{bound} \texttt{DependentMemberType} which stores the actual associated type declaration, or fails with an error if no protocol provides the named type. \item Finally, the interface type of the declaration is formed from the results of the interface resolution stage. For a function declaration, the interface type is a generic function type consisting of the generic signature, the interface type of each parameter, and the return type. \end{enumerate} We can distinguish unbound and bound member types with notation. An unbound member type can be written as it appears in source, for instance \texttt{S.Element}. For a bound member type, we can write \texttt{S.[Sequence]Element} to make it explicit that an associated type declaration is referenced and not just an identifier. However, keep in mind this is not valid syntax, but simply notation. \begin{example} The return type of this function declaration contains a member type: \begin{Verbatim} func union(_: S1, _: S2) -> Set where S1.Element == S2.Element \end{Verbatim} Generic signature construction performs structural resolution on various type representations appearing above: \begin{itemize} \item The inheritance clause entry \texttt{Sequence} of \texttt{S1}; \item The inheritance clause entry \texttt{Sequence} of \texttt{S2}; \item The left hand side of the same-type requirement, \texttt{S1.Element}; \item The right hand side of the same-type requirement, \texttt{S2.Element}; \item The types of the function's parameters, \texttt{S1} and \texttt{S2}; \item The return type of the function, \texttt{Set}. \end{itemize} The type representations \texttt{S1.Element} and \texttt{S2.Element} resolve to unbound member types during structural resolution. In addition to the explicit conformance and same-type requirements, requirement inference also introduces the conformance requirement \texttt{S1.Element:~Hashable} from the return type. Requirement minimization resolves unbound member types to bound member types via its own mechanism. The requirements in the constructed generic signature will contain bound member types: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} Once a generic signature is available, the parameter and return types of our function are resolved again, this time with interface resolution. The parameter types remain unchanged, but the return type becomes \texttt{Set}. The interface type of the function declaration can now be constructed from the generic signature, parameter types and return types. Avoiding re-stating the entire generic signature for clarity, the interface type of \texttt{merge()} is thus: \begin{quote} \begin{verbatim} (S1, S2) -> Set \end{verbatim} \end{quote} \end{example} \paragraph{Practical considerations} While structural resolution skips many semantic checks, there is some overlap between the work performed by the two stages and care must be taken to not emit the same diagnostics twice if the given type representation is invalid. A bit of mutable state helps with this. Each type representation has an ``invalid'' flag. After emitting a diagnostic, the invalid flag should be set and an error type returned. Resolving an invalid type representation again will short-circuit and immediately return an error type again. Outside of generic signature construction, the interface type resolution stage should be used by default. Many parts of the compiler, for example type substitution, assert that type parameters are bound. One notable exception is that the generic signature queries of Section~\ref{genericsigqueries} are happy to operate on unbound type parameters. If you somehow encounter an unbound type parameter outside of type resolution, you can convert it to a bound type parameter with the \texttt{getReducedType()} generic signature query. TODO: \begin{itemize} \item Diagram showing two-stage resolution for function types \end{itemize} \fi \section{Identifier Type Representations}\label{identtyperepr} \ifWIP \index{identifier type representation} Structural types, such as function types and tuples, have their own type representations parsed from special syntax, and type resolution constructs the corresponding semantic types directly. On the other hand, references to type declarations---nominal types, type aliases, generic parameters and associated types---are resolved via name lookup from a very general kind of type representation called an \emph{identifier type representation}. This kind of type representation consists of one or more \emph{components}, separated by dot in written syntax. Each component stores an identifier, together with an optional list of one or more generic arguments, where each generic argument is again recursively a type representation. The following identifier type representation has three components, two of which have generic arguments: \begin{quote} \begin{verbatim} Foo.Bar<(Int) -> ()>.Baz \end{verbatim} \end{quote} \paragraph{Unqualified lookup} The first component is special. An unqualified name lookup is performed to find a type declaration with the given name, starting from the innermost lexical scope, finally reaching the top level, after which all imported modules are searched. The first component can be a module name, in which case there must be at least two components; modules can only be used as the base of a lookup, and are not first-class values which stand on their own. If the first component names the innermost type declaration or one of its parent types, and no generic arguments are specified, the component resolves to the declared interface type of this type declaration. The identifier \texttt{Self} serves a similar purpose since Swift 5.1~\cite{se0068}. If the innermost type declaration is a struct or enum, \texttt{Self} stands for its declared interface type. If the innermost type declaration is a class, \texttt{Self} stands for the declared interface type of the class, wrapped in a \texttt{DynamicSelfType}. Inside a protocol or protocol extension, \texttt{Self} is just the protocol's implicit generic parameter named \texttt{Self}, not a special case. Unqualified name lookup will find generic parameters if one of the outer declaration contexts has a generic parameter list. In this manner, if the first component names a generic parameter declaration, the type representation will resolve to a generic parameter type (if this is the only component) or a member type thereof (if there is more than one component). There is an important difference between type resolution performed from the \texttt{where} clause of a protocol compared with another kind of nominal type. In a protocol \texttt{where} clause, the unqualified name lookup begins inside the protocol, and thus the protocol's associated types can be referenced directly, without the explicit ``\texttt{Self.}'' prefix. In other nominal types, type representations written in the \texttt{where} clause cannot reference member types directly, without prefixing them by the type name first. \begin{listing}\captionabove{Unqualified lookup behaviors in type resolution}\label{type resolution unqualified} \begin{Verbatim} struct Outer { struct Inner { // Return type is understood to be `Outer.Inner' func f1() -> Self {} // Return type is understood to be `Outer.Inner' func f2() -> Inner {} // Return type is understood to be `Outer' func f3() -> Outer {} } } // This is OK; the subject type resolves to `Self.[P]A' protocol P where A: Hashable { associatedtype A } // This is not OK; A is not visible from the `where' clause struct G where A: Hashable { typealias A = T } \end{Verbatim} \end{listing} Listing~\ref{type resolution unqualified} demonstrates some of the above behaviors. \paragraph{Qualified lookup} Each subsequent component names a member type of the previous component. Member types are resolved by performing a qualified name lookup, using the previous component's resolved type as the base type of the lookup. Qualified name lookup looks inside the base type's declaration, any protocols it conforms to, and if the base type is a class, the base type's superclass. Once a \emph{type declaration} has been found, it is resolved to a \emph{type}. In the structural resolution stage, a qualified name lookup cannot be performed if the base type is a type parameter, since without a generic signature, the protocol conformance requirements satisfied by the base type parameter are unknown. Instead, structural resolution always succeeds by constructing an unbound member type from the base type parameter and the current component's identifier. If the base type is a concrete nominal type, qualified name lookup might find an associated type declaration from one of the base type's conformances. Type resolution will never form a member type with a concrete base type. Instead, a conformance lookup is performed, and the corresponding type witness is returned. \begin{listing}\captionabove{Resolving a reference to an associated type member of a concrete type}\label{associated type of concrete type} \begin{Verbatim} protocol Animal { associatedtype CommodityType func produce() -> CommodityType } struct Chicken: Animal { func produce() -> Egg {...} } func cookOmelette(_ egg: Chicken.CommodityType) {} \end{Verbatim} \end{listing} \begin{example} In Listing~\ref{associated type of concrete type}, the type representation \texttt{Chicken.CommodityType} has two components. The second component is resolved by performing a qualified name lookup of the name \texttt{CommodityType} in the base type \texttt{Chicken}. This qualified lookup will find the associated type declaration \texttt{CommodityType} of the \texttt{Animal} protocol, because \texttt{Chicken} conforms to \texttt{Animal}. The type witness of \texttt{CommodityType} in the normal conformance \texttt{Chicken:\ Animal} is the type alias member \texttt{CommodityType} of \texttt{Chicken}, with underlying type \texttt{Egg}. \end{example} Regardless of whether the base type is a type parameter, concrete type or existential, qualified name lookup can also find a type alias member of a protocol. This exciting possibility is discussed further in Section~\ref{protocol type alias}. \paragraph{Applying generic arguments} If a component provides generic arguments, type resolution will apply the arguments to the component's named type declaration when resolving the type of the component. The component only provides generic arguments for the innermost generic parameters of the named type declaration. The type declaration will also have outer generic parameters if it is nested inside of another generic context. The manner in which these outer generic arguments are derived depends on whether the type declaration was found with unqualified or qualified lookup. If this is the first component, the only possibilities are that the named type declaration does not have any outer generic parameters at all, or that it is was found in an outer generic context of the generic context containing the type representation. In this case, the outer generic parameters, if any, are mapped to themselves. Components other than the first recover the outer generic arguments from the base type. If the base type is generic, it means the named type declaration is nested inside of another generic type declaration, and the outer generic arguments are derived from the context substitution map of the base type. The outer generic arguments together with the component's specified generic arguments form a substitution map for the generic signature of the named type declaration. Applying this substitution map to the declared interface type of the named type declaration will produce the final resolved type of the component. \begin{listing}\captionabove{Applying generic arguments in type resolution}\label{applying generic arguments} \begin{Verbatim} struct Outer { struct Inner { func f1() -> Outer.Inner {} // Return type resolves to `Outer.Inner` func f2() -> Inner {} } } \end{Verbatim} \end{listing} \begin{example} The return type of \texttt{f1()} in Listing~\ref{applying generic arguments} is an identifier type representation with two components: \begin{enumerate} \item The first component is resolved by applying the substitution map $\texttt{T}:=\texttt{Int}$ to the declared interface type of \texttt{Outer}, which outputs \texttt{Outer}. \item The second component is resolved by first building a substitution map for the generic signature of \texttt{Inner}, which is \texttt{}. The base type \texttt{Outer} provides the replacement $\texttt{T}:=\texttt{Int}$, and the component's single generic argument \texttt{String} provides the replacement $\texttt{U}:=\texttt{String}$. The declared interface type of the named type declaration \texttt{Inner} is \texttt{Outer.Inner}. Applying the combined substitution map to this type gives \texttt{Outer.Inner}. \end{enumerate} \end{example} The type resolution process for an identifier type representation might seem unnecessarily convoluted. When resolving a type representation like \texttt{Outer.Inner}, we build the type of the first component by applying a substitution map to the declared interface type of the type declaration, \texttt{Outer}. In the next step, we turn this type back into a substitution map, extend the substitution map with a replacement type for \texttt{U}, then apply it to the declared interface type of \texttt{Outer.Inner}. It seems like we might be able to get away with performing a chain of name lookups to find the final type declaration, then collect all of the generic arguments and apply them in one shot. Unfortunately, the next example shows why this appealing simplification does not handle the full generality of type resolution. \begin{listing}\captionabove{The named type declaration of a component can depend on generic arguments previously applied}\label{type resolution with dependent base} \begin{Verbatim} struct Paul { struct Pony {} } struct Maureen { struct Pony {} } struct Person { typealias Rides = T } struct Misty {} typealias A = Person.Rides.Pony typealias B = Person.Rides.Pony \end{Verbatim} \end{listing} \begin{example} The two type aliases \texttt{A} and \texttt{B} in Listing~\ref{type resolution with dependent base} demonstrate the phenomenon alluded to above. The type representations of their underlying types look very similar, only differing in the generic arguments applied. However, they resolve to two different nominal types. First, consider how type resolution builds the underlying type of \texttt{A}: \begin{enumerate} \item The first component performs an unqualified name lookup, which finds the declaration of \texttt{Person}. Applying the substitution map $\texttt{T}:=\texttt{Paul}$ to the declared interface type of \texttt{Person} outputs \texttt{Person}. \item The second component performs a qualified name lookup into the declaration of \texttt{Person}, which finds the member type alias \texttt{Rides}. Applying the substitution map to the underlying type of \texttt{Person.Rides} gives us \texttt{Paul}. \item The third component performs a qualified name lookup into the declaration of \texttt{Paul}, which finds the non-generic member type declaration \texttt{Paul.Pony}. The declared interface type \texttt{Paul.Pony} becomes the final resolved type. \end{enumerate} Now compare the above with the underlying type of \texttt{B}: \begin{enumerate} \item The first component performs an unqualified name lookup, which finds the declaration of \texttt{Person}. Applying the substitution map $\texttt{T}:=\texttt{Maureen}$ to the declared interface type of \texttt{Person} outputs \texttt{Person}. \item The second component performs a qualified name lookup into the declaration of \texttt{Person}, which again finds the member type alias \texttt{Rides}. Applying the substitution map to the underlying type of \texttt{Person.Rides} gives us \texttt{Maureen}. The base type for the third component is now a completely different nominal type! \item The third component performs a qualified name lookup into the declaration of \texttt{Maureen}, which finds the generic member type declaration \texttt{Maureen.Pony}. Applying the generic argument to this type declaration's declared interface type gives us \texttt{Maureen.Pony}. \end{enumerate} Clearly, \texttt{Paul.Pony} and \texttt{Maureen.Pony} are two unrelated nominal type declarations, and one is even generic while the other is not. If you tried to save the generic arguments and apply them in a single shot at the end, you'd quickly realize there is no way to resolve the type representation \texttt{Person.Rides.Pony} to a single type declaration. The complication here is the type alias \texttt{Person.Rides}, whose underlying type is a type parameter. \end{example} \paragraph{Bound components} A minor optimization worth understanding, because it slightly complicates the implementation. After type resolution of a component succeeds, the bound (or found, perhaps) type declaration is stored inside the component. If the identifier type representation is resolved again, any bound components will skip the name lookup and proceed to compute the final type from the bound declaration. The optimization was more profitable in the past, when type resolution actually had \emph{three} stages, with a third stage resolving interface types to archetypes. The third stage was subsumed by the \texttt{mapTypeIntoContext()} operation on generic environments. Parsing textual SIL also ``manually'' binds components to type declarations which name lookup would otherwise not find, in order to parse some of the more esoteric SIL syntax that we're not going to discuss here. \fi \section{Checking Generic Arguments}\label{checking generic arguments} \ifWIP When resolving a generic type, type resolution checks that the provided generic arguments actually satisfy the type declaration's requirements. This checking is done by the interface resolution stage---the structural resolution stage only checks that the number of generic arguments in the component matches the number of generic parameters in the named type declaration. The applied generic arguments are represented by a substitution map, so our problem reduces to asking if a substitution map satisfies the requirements of a generic signature. This question can be answered by substituting requirements. Conformance, superclass and same-type requirements store a pair of types; layout requirements only store a single type. Applying a substitution map to a requirement applies the substitution map to the types stored in the requirement, producing a \emph{substituted requirement} with the same kind but different types: \[\mathboxed{requirement}\times\mathboxed{substitution map} = \mathboxed{substituted requirement}\] It is convenient to only consider substitution maps where the replacement types might contain archetypes, but not type parameters. This is not a real limitation, because if the replacement types contain type parameters, we can first map the replacement types into the generic environment of the substitution map's output generic signature. With such a substitution map, the substituted requirement becomes a statement about concrete types, whose validity is independent of any generic signature; if the statement holds, the substituted requirement is \emph{satisfied}. Checking if a set of substituted requirements are satisfied by a generic signature has applicability beyond type resolution; in fact, it comes up over and over again throughout the compiler: \begin{enumerate} \item The expression type checker uses similar logic when solving constraints generated from generic requirements. \item Conformance checking ensures that the conforming type and its type witnesses satisfy the protocol's requirement signature (Section~\ref{requirement sig}). \item Requirement inference is in some sense solving the ``opposite'' problem, but the implementation is similar: we want to \emph{add} requirements to our declaration's generic signature from the list of generic arguments to a generic type (Section~\ref{requirementinference}). \item Conditional requirements of a conditional conformance are computed by taking the subset of a constrained extension's generic requirements not satisfied by the generic signature of the extended type (Section~\ref{conditional conformance}). \item Class method override checking checks if the generic signature of the subclass method satisfies the requirements of the superclass method (Section~\ref{overridechecking}). \end{enumerate} \begin{algorithm}[``Requirement is satisfied'' check]\label{reqissatisfied} As input, takes a substituted requirement whose types do not contain type parameters, but may contain archetypes. Returns true if the requirement is satisfied. Consider each kind of requirement: \begin{itemize} \item \textbf{Conformance requirements:} Perform a global conformance lookup with the requirement's subject type and protocol. There are three possible outcomes: \begin{enumerate} \item If the conformance is abstract (the subject type is necessarily an archetype in this case), the requirement is satisfied. \item If the conformance is concrete, it might have conditional requirements (Section~\ref{conditional conformance}). These are checked by recursively applying the algorithm. \item Otherwise, the conformance is invalid and the requirement is unsatisfied. \end{enumerate} \item \textbf{Superclass requirements:} The requirement stores two types, the subject type and the constraint type. There are three possible cases: \begin{enumerate} \item If the subject type is canonically equal to the constraint type, the requirement is satisfied. \item If the subject type does not have a superclass type as defined in Chapter~\ref{classinheritance}, the requirement is unsatisfied. \item In the remaining case, recursively apply the algorithm to a new requirement constructed by replacing the subject type with the subject type's superclass type, while leaving the constraint type unchanged. \end{enumerate} \item \textbf{Layout requirements:} The requirement stores a subject type and a layout constraint. The only kind of layout constraint that can be written in source is an \texttt{AnyObject} constraint. The substituted requirement is satisfied if the subject type is a class type, an archetype satisfying the \texttt{AnyObject} layout constraint, or one of a small number of other types that do, such as an \texttt{AnyObject} or \texttt{@objc} existential. \item \textbf{Same-type requirements} The requirement is satisfied if the subject type is canonically equal to the constraint type. \end{itemize} \end{algorithm} \begin{algorithm}[Substitution map requirement check] As input, takes a substitution map where the replacement types do not contain type parameters, and a list of requirements containing type parameters for the substitution map's input generic signature. The original requirements are partitioned into three lists: a \emph{satisfied} list, an \emph{unsatisfied} list, and a \emph{failed} list. \begin{enumerate} \item Initialize the three output lists, initially empty. \item If the input list is empty, return. \item Otherwise, remove the next original requirement from the input list and apply the substitution map to get a substituted requirement. \item If the substituted requirement now contains error types, add the original requirement to the failed list. \item Otherwise, check if the substituted requirement is satisfied using Algorithm~\ref{reqissatisfied}, and add the original requirement to the satisfied or unsatisfied list depending on the outcome of this check. \item Go back to Step~2. \end{enumerate} \end{algorithm} Type resolution applies the above algorithm to the generic argument substitution map together with the list of requirements in the named type declaration's generic signature. Any failed requirements are ignored, because a substitution failure indicates that either some other requirement is unsatisfied, or an error was diagnosed elsewhere in the program. Unsatisfied requirements are diagnosed at the source location of the component's generic arguments, with the appropriate error message showing the substituted subject type and requirement kind. \begin{listing}\captionabove{Satisfied and unsatisfied requirements with concrete types}\label{unsatisfied requirements} \begin{Verbatim} class Base {} class Derived: Base {} struct G> where T.Element == U.Element {} // (1) all requirements satisfied typealias A = G, Set, Derived> // (2) `T.Element == U.Element', `V: Base' unsatisfied typealias B = G, Set, Base> // (3) `T: Sequence' unsatisfied; `T.Element == U.Element' substitution failure typealias C = G, Derived> \end{Verbatim} \end{listing} \begin{example} Listing~\ref{unsatisfied requirements} shows three examples of checking generic arguments. The generic signature of \texttt{G} is: \begin{quote} \begin{verbatim} , T.[Sequence]Element == U.[Sequence]Element> \end{verbatim} \end{quote} There are four requirements: \begin{quote} \begin{tabular}{|l|l|l|} \hline Kind&First type&Second type\\ \hline Conformance&\texttt{T}&\texttt{Sequence}\\ Conformance&\texttt{U}&\texttt{Sequence}\\ Superclass&\texttt{V}&\texttt{Base}\\ Same type&\texttt{T.[Sequence]Element}&\texttt{U.[Sequence]Element}\\ \hline \end{tabular} \end{quote} \paragraph{First type alias} The context substitution map of the underlying type of \texttt{A} on line 7 is: \begin{quote} \SubMapC{ \texttt{T}&:=&\texttt{Array}\\ \texttt{U}&:=&\texttt{Set}\\ \texttt{V}&:=&\texttt{Derived} }{ \SubConf{Array:\ Sequence}\\ \SubConf{Set:\ Sequence} } \end{quote} We apply this substitution map to each requirement of our generic signature: \begin{quote} \begin{tabular}{|l|l|l|c|} \hline Kind&First type&Second type&Satisfied?\\ \hline Conformance&\texttt{Array}&\texttt{Sequence}&$\checkmark$\\ Conformance&\texttt{Set}&\texttt{Sequence}&$\checkmark$\\ Superclass&\texttt{Derived}&\texttt{Base}&$\checkmark$\\ Same type&\texttt{Int}&\texttt{Int}&$\checkmark$\\ \hline \end{tabular} \end{quote} The substituted requirements can be seen to be satisfied from the rules outlined above. \paragraph{Second type alias} The context substitution map of the underlying type of \texttt{B} on line 10 is: \begin{quote} \SubMapC{ \texttt{T}&:=&\texttt{Array}\\ \texttt{U}&:=&\texttt{Set}\\ \texttt{V}&:=&\texttt{Base} }{ \SubConf{Array:\ Sequence}\\ \SubConf{Set:\ Sequence} } \end{quote} We apply this substitution map to each requirement of our generic signature: \begin{quote} \begin{tabular}{|l|l|l|c|} \hline Kind&First type&Second type&Satisfied?\\ \hline Conformance&\texttt{Array}&\texttt{Sequence}&$\checkmark$\\ Conformance&\texttt{Set}&\texttt{Sequence}&$\checkmark$\\ Superclass&\texttt{Base}&\texttt{Base}&$\times$\\ Same type&\texttt{Int}&\texttt{String}&$\times$\\ \hline \end{tabular} \end{quote} The superclass requirement is unsatisfied, because \texttt{Base} is not related to \texttt{Base}. The substituted same-type requirement is unsatisfied, because the two substituted types are not equal. \paragraph{Third type alias} The context substitution map of the underlying type of \texttt{C} on line 13 is: \begin{quote} \SubMapC{ \texttt{T}&:=&\texttt{Float}\\ \texttt{U}&:=&\texttt{Set}\\ \texttt{V}&:=&\texttt{Derived} }{ \SubConf{(invalid)}\\ \SubConf{Set:\ Sequence} } \end{quote} We apply this substitution map to each requirement of our generic signature: \begin{quote} \begin{tabular}{|l|l|l|c|} \hline Kind&First type&Second type&Satisfied?\\ \hline Conformance&\texttt{Float}&\texttt{Sequence}&$\times$\\ Conformance&\texttt{Set}&\texttt{Sequence}&$\checkmark$\\ Superclass&\texttt{Derived}&\texttt{Base}&$\checkmark$\\ Same type&\texttt{<>}&\texttt{Int}&$-$\\ \hline \end{tabular} \end{quote} The first conformance requirement is unsatisfied and will be diagnosed. The same-type requirement has a substitution failure, and does not need to be diagnosed. \end{example} In the case where the component's generic arguments reference the generic parameters of the current declaration context, the substitution map's replacement types will contain type parameters. By building a new substitution map where each replacement type is mapped into the current declaration context's generic environment, we get substituted requirements whose types might contain archetypes. Recall from Chapter~\ref{genericenv} that archetypes behave like concrete types in some ways; in particular, the above strategy for checking requirement satisfaction actually works if the requirements contain archetypes as well. \begin{listing}\captionabove{Satisfied and unsatisfied requirements with archetypes}\label{unsatisfied requirements archetypes} \begin{Verbatim} class Base {} class Derived: Base {} struct G> where T.Element == U.Element {} struct H, D> where A.Element == B.Element { // (1) all requirements satisfied typealias A = G // (2) `V: Base' and `T.Element == U.Element' unsatisfied typealias B = G, Base> } \end{Verbatim} \end{listing} \begin{example} Listing~\ref{unsatisfied requirements archetypes} shows two examples of checking generic arguments where the replacement types contain type parameters. The two type aliases \texttt{A} and \texttt{B} are members of the generic declaration \texttt{H}, and their underlying types reference the generic parameters of \texttt{H}. Let's write \archetype{A}, \archetype{B} and \archetype{C} for the archetype of \texttt{A}, \texttt{B} and \texttt{C} in the generic environment of \texttt{H}. \paragraph{First type alias} The context substitution map for the underlying type of \texttt{A} on line 8 is a substitution map for the generic signature of \texttt{G}, with replacement types in the generic environment of \texttt{H}: \begin{quote} \SubMapC{ \texttt{T}&:=&\archetype{A}\\ \texttt{U}&:=&\archetype{B}\\ \texttt{V}&:=&\archetype{C} }{ \SubConf{T:\ Sequence}\\ \SubConf{U:\ Sequence} } \end{quote} We apply this substitution map to each requirement of our generic signature: \begin{quote} \begin{tabular}{|l|l|l|c|} \hline Kind&First type&Second type&Satisfied?\\ \hline Conformance&\archetype{A}&\texttt{Sequence}&$\checkmark$\\ Conformance&\archetype{B}&\texttt{Sequence}&$\checkmark$\\ Superclass&\archetype{C}&\texttt{Base}&$\checkmark$\\ Same type&\archetype{A.Element}&\archetype{A.Element}&$\checkmark$\\ \hline \end{tabular} \end{quote} All requirements are satisfied. The archetypes \archetype{A} and \archetype{B} both conform to \texttt{Sequence} via the generic signature of \texttt{H}. Similarly, the archetype \archetype{C} has the superclass type \texttt{Base}, and satisfies the superclass requirement by definition. The same type requirement merits some explanation because it demonstrates how there are two generic signatures in play here; the generic signature of \texttt{G} describes the requirements to be satisfied, and the generic signature of \texttt{H} describes how interface types appearing in the underlying type of \texttt{B} relate to each other. Both of the type parameters in the same-type requirement of \texttt{G}, \texttt{T.[Sequence]Element} and \texttt{U.[Sequence]Element}, are mapped to \archetype{A.Element} by our substitution map, for the following reason. Our substitution map replaces the generic parameters \texttt{T} and \texttt{U} of the generic signature of \texttt{B} with the archetypes \archetype{A} and \archetype{B} in the generic environment of \texttt{H}. The generic signature of \texttt{H} defines a same-type requirement between \texttt{A.Element} and \texttt{B.Element}, making the former the reduced type of the latter; so both map to the same archetype, \archetype{A.Element}, in the generic environment of \texttt{H}. \paragraph{Second type alias} The context substitution map for the underlying type of \texttt{B} on line 11 is also a substitution map for the generic signature of \texttt{G}, with replacement types in the generic environment of \texttt{H}: \begin{quote} \SubMapC{ \texttt{T}&:=&\archetype{A}\\ \texttt{U}&:=&\texttt{Array<\archetype{D}>}\\ \texttt{V}&:=&\texttt{Base<\archetype{D}>} }{ \SubConf{\archetype{A}:\ Sequence}\\ \SubConf{Array<\archetype{D}>:\ Sequence} } \end{quote} We apply this substitution map to each requirement of the generic signature of \texttt{H}: \begin{quote} \begin{tabular}{|l|l|l|c|} \hline Kind&First type&Second type&Satisfied?\\ \hline Conformance&\texttt{\archetype{A}}&\texttt{Sequence}&$\checkmark$\\ Conformance&\texttt{Array<\archetype{D}>}&\texttt{Sequence}&$\checkmark$\\ Superclass&\texttt{Base<\archetype{D}>}&\texttt{Base}&$\times$\\ Same type&\archetype{A.Element}&\archetype{D}&$\times$\\ \hline \end{tabular} \end{quote} The superclass requirement is unsatisfied because \texttt{Base<\archetype{D}>} and \texttt{Base} are unrelated types. The same-type requirement is unsatisfied because \texttt{\archetype{A.Element}} and \archetype{D} are different types. \end{example} Recall from Section~\ref{trailing where clauses} that non-generic type declarations can have a \texttt{where} clause, if the declaration is nested inside a generic context. When resolving a reference to such a type, requirements must be checked as well. This is done in the same manner as above, except there are no generic arguments to apply, so the substitution map is the context substitution map of the base type with respect to the named type declaration's declaration context. \begin{example} \end{example} TODO: \begin{itemize} \item unbound generic type \item we don't support generic arguments applied to a member type with a dependent base \end{itemize} \fi \section{Protocol Type Aliases}\label{protocol type alias} \ifWIP TODO: \begin{itemize} \item Protocol substitution map \item Special behavior with structural resolution \end{itemize} A type alias inside a protocol does not impose a requirement on concrete conforming types; it is merely a shorthand for writing out a possibly longer type by hand, much like any other type alias. The underlying type of a protocol type alias can reference \texttt{Self} or the protocol's associated types. The protocol type alias is visible as a member of type parameters conforming to the protocol, as well as concrete types. The type that the lookup is performed on is called the ``base type''. Here is an example of both cases: \begin{Verbatim} protocol Animal { associatedtype Feed: AnimalFeed typealias FeedStorage = Silo } struct Cow: Animal { typealias Feed = Grain } func useAlias>(_: T) { let x: T.Element.FeedStorage = ... // Silo let y: Horse.FeedStorage = ... // Silo } \end{Verbatim} When a protocol type alias is looked up on a base type, occurrences of \texttt{Self} in the underlying type of the type alias are substituted with the base type. \paragraph{Dependent base} If the base type is another type parameter, we say the type alias reference is \emph{dependent}. References to associated types from the underlying type of the type alias remain as type parameters--in the above example, the declaration of \texttt{x} references the protocol type alias with a base type of \texttt{T.Element}, which is known to conform to \texttt{Animal}; substituting \texttt{Self} with \texttt{T.Element} in the underlying type \texttt{Silo} produces \texttt{Silo}. \paragraph{Concrete base} If the base type is a concrete type, associated type references on \texttt{Self} are substituted with the corresponding associated type witnesses from the conformance of the concrete type to the protocol. In the above example, looking up the protocol type alias with a base type of \texttt{Horse} will substitute \texttt{Self.Feed} with \texttt{Grain}, because \texttt{Horse} declares a type alias \texttt{Feed} with an underlying type of \texttt{Grain} to satisfy the \texttt{Feed} associated type requirement of the \texttt{Animal} protocol. (Later on in Section~\ref{buildingsubmap}, you will see that this replacement of the protocol \texttt{Self} type with another type parameter or concrete type is encoded in a context substitution map.) \paragraph{Existential base} Notice how with the \texttt{FeedStorage} protocol type alias above, it would not make sense to access it as a member of the \texttt{Animal} protocol type itself. We cannot substitute \texttt{Self} with \texttt{Animal} because there is no fixed type we can use as the replacement for \texttt{Self.Feed} when \texttt{Self} is \texttt{Animal}; by definition, the \texttt{Feed} associated type depends on the concrete conforming type. If you write \texttt{Animal.FeedStorage}, the compiler will complain. There is one case where a protocol type alias can be referenced with the protocol as the base type, which is when the underlying type of the protocol type alias does not reference \texttt{Self} at all. In this case, the underlying type is just a shortcut for some other type, and no substitution is performed: \begin{Verbatim} protocol Animal { typealias Age = Int } func celebrateBirthday(_ animal: some Animal, age: Animal.Age) { ... } \end{Verbatim} Associated type declarations in protocols were formerly written with the \texttt{typealias} keyword. The \texttt{associatedtype} keyword was introduced in Swift 2.2 \cite{se0011}, to open up the possibility of protocol type aliases as a distinct concept from associated types. Protocol type aliases were introduced in Swift 3 \cite{se0092}. \fi \section{Source Code Reference} \ifWIP TODO: \fi \chapter{Building Generic Signatures}\label{building generic signatures} \ifWIP Chapter~\ref{generic declarations} described generic contexts and the varied syntax for declaring generic parameters and requirements. Then, Chapter~\ref{genericsig} introduced the generic signature, as higher-level semantic object describing all the generic parameters and requirements visible from a generic context. Now is the time to fill in the gaps and uncover how generic signatures are built from syntactic forms. \index{generic signature request} \index{abstract generic signature request} \index{serialization} There are three general mechanisms by which generic signatures get built. The first one is special, the other two both go through the \emph{requirement minimization} algorithm: \begin{enumerate} \item Generic signatures can be built directly from an ordered list of generic parameters and minimal, reduced requirements. This approach is taken when it is already known that the input requirements satisfy the correct invariants (described shortly in Section~\ref{minimal requirements}), for example because they're built from the serialized representation of a previously-constructed generic signature. \item Asking a generic context for its generic signature lazily invokes the \textbf{generic signature request}. This request recursively gets the generic signature of the parent context, resolves any syntactic forms that introduce new generic parameters and requirements, and builds a new generic signature by adding the new generic parameters and requirements to the parent signature. \item In between the first and second mechanisms above, a generic signature can also be built directly from an abstract list of requirements which are not necessarily minimal and reduced. The \textbf{abstract generic signature request} takes a parent generic signature, and a list of new generic parameter types and requirements to add. \end{enumerate} \index{where clause} \index{inheritance clause} \index{generic parameter list} \index{extension declaration} \index{protocol Self type} \paragraph{Ceneric signature request} This request receives a generic context as input. If the generic context is a protocol declaration or an unconstrained protocol extension, the generic signature is built immediately from the protocol \texttt{Self} type and conformance requirement \texttt{Self:~P}. A single conformance requirement with a reduced subject type is always minimal and reduced, so there is no need to spin up the requirement minimization machinery to build the generic signature in this case. \index{inferred generic signature request} For all other kinds of generic contexts, the next thing to check is whether the generic context has a generic parameter list or trailing \texttt{where} clause. If neither is present, the generic context inherits its generic signature from its parent, so the parent signature is returned immediately. Otherwise, a list of arguments is prepared to hand off to a lower-level request, the \textbf{inferred generic signature request}: \begin{enumerate} \item The generic signature of the parent context, if any. \item The generic context's generic parameter list, if any. \item The generic context's trailing \texttt{where} clause, if any. \item Any additional requirements to add. \item For requirement inference in functions and subscripts, a list consisting of all parameter types and the return type. \item A flag indicating whether generic parameters can be subject to concrete same-type requirements. \end{enumerate} \paragraph{Inferred generic signature request} The name ``inferred generic signature request'' is a bit of a misnomer. While it performs requirement inference, for the most part this request builds the generic signature from written syntax; the signature itself is not ``inferred'' in any sense. A few words about some of the parameters above. At least one of the first two parameters must be specified; if there is no parent signature, and there are no generic parameters to add, the resulting generic signature is necessarily empty, which should have been handled already. The list of additional requirements (parameter 4) is only used for special inference behavior in extensions, described in Section~\ref{constrained extensions}. For all other declarations, the inferred generic signature request builds the list of requirements from syntactic representations, by resolving type representations in inheritance clauses and requirement representations in the \texttt{where} clause. The list of requirement inference types (parameter 5) is described in the next section. The flag (parameter 6) enforces an artificial restriction whereby only extensions can constrain generic parameters to concrete types. While it is generally useful to be able to write: \begin{Verbatim} extension Array where Element == Int {...} \end{Verbatim} something like the following is nonsensical and probably indicates a programming mistake, because the generic parameter \texttt{T} may as well be removed entirely, with all references to \texttt{T} replaced with \texttt{Int}: \begin{Verbatim} func add(_ lhs: T, _ rhs: T) -> T where T == Int { return lhs + rhs } \end{Verbatim} \index{unbound type parameter} \index{structural type resolution stage} \index{type resolution stage} The inferred generic signature request uses the structural type resolution stage (Chapter~\ref{typeresolution}) to resolve type representations appearing in inheritance clauses and trailing \texttt{where} clauses. This means that in general, user-written requirements enter as unbound type parameters. Requirement minimization reduces unbound type parameters to bound type parameters as part of computing the set of minimal and reduced requirements. \index{conflicting requirement} \index{redundant requirement} The inferred generic signature request also emits certain diagnostics at the source location of the generic context. If some set of input requirements can be proven to contradict each other, no substitution map can satisfy all of the requirements simultaneously, which would prevent the generic declaration from ever being used. Such \emph{conflicting requirements} are detected and diagnosed at the declaration site. The opposite situation is when a requirement can be proven from other requirements. Requirement desugaring and minimization drop redundant requirements because they do not affect the generic signature. Normally redundant requirements are silently ignored, but the \texttt{-Xfrontend -warn-redundant-requirements} flag enables warnings about them. \begin{Verbatim} protocol SetProtocol { associatedtype Element: Hashable } struct NotHashable {} // warning: redundant conformance constraint `T.Element : Hashable' func f(_: T) where T.Element: Hashable {} // error: no type for `T.Element' can satisfy both `T.Element == NotHashable' // and `T.Element : Hashable' func g(_: T) where T.Element == NotHashable {} \end{Verbatim} \index{abstract generic signature request} \paragraph{Abstract generic signature request} This request takes three inputs: \begin{enumerate} \item The parent generic signature, if any. \item A list of generic parameter types to add, if any. \item A list of generic requirements to add, if any. \end{enumerate} At least one of the first two parameters must be specified; the caller is expected to handle the case where an empty generic signature would be built by not invoking the request at all. The list of requirements is sometimes the list of requirements from another generic signature, with a substitution map applied to each one. This will come up in Section~\ref{overridechecking} and Section~\ref{witnessthunksignature}. \index{requirement signature} \index{requirement signature request} \index{associated type declaration} \index{protocol declaration} \paragraph{Requirement signatures} Requirement signatures of protocols are either built directly from deserialized representations, or via the \textbf{requirement signature request}. This request is lazily invoked the first time a protocol declaration is asked for its requirement signature; it resolves requirements on the protocol and its associated types and performs requirement minimization. There is no equivalent of the abstract generic signature request for requirement signatures, which only exist in direct correspondence to some protocol. \fi \section{Requirement Inference}\label{requirementinference} \ifWIP Requirement inference is a language feature whereby generic requirements can be omitted entirely if they are implied by a generic type appearing in one of several special positions of a declaration. It is easiest to explain with an example. \begin{example}\label{requirementinferenceexample1} Recall that the standard library \texttt{Set} type declares a single \texttt{Element} generic parameter constrained to \texttt{Hashable}: \begin{Verbatim} struct Set {...} \end{Verbatim} When the \texttt{Set} type appears in a function's parameter list, the \texttt{Hashable} requirement is inferred from the generic argument to \texttt{Set}: \begin{Verbatim} func removeAll(_ seq: S, _ elts: Set) {...} // Equivalent to: func removeAll(_ seq: S, _ elts: Set) where S.Element: Hashable {...} \end{Verbatim} \end{example} Requirement inference has an elegant formulation in terms of applying a substitution map to the requirements of a generic signature. In a sense, the problem being solved here is the opposite of applying generic arguments in type resolution (Section~\ref{checking generic arguments}). There, we determine if a substitution map satisfies the requirements of the referenced type declaration's generic signature, by applying the substitution map to each requirement and evaluating the truth of the substituted requirement. Requirement inference, on the other hand, \emph{adds} the substituted requirements to the new generic signature, in order to \emph{make them true}. In our example, we're building the generic signature of \texttt{removeAll()}, so we look at the type representation \texttt{Set}. After resolving this to a type, we take its context substitution map, which has a single replacement type \texttt{S.Index}. Applying this substitution map to the requirement \texttt{Element:\ Hashable} of \texttt{Set}'s generic signature yields a substituted requirement \texttt{S.Index:\ Hashable}. This substituted requirement's subject type is a type parameter \texttt{S.Index} from \texttt{removeAll()}, and the requirement ends up in the generic signature of \texttt{removeAll()}: \[\ttbox{Element:\ Hashable}\times\SubMapC{\SubType{Element}{S.Index}}{\SubConf{S.Index:\ Hashable}} = \ttbox{S.Index:\ Hashable}\] The following positions are eligible for requirement inference when the generic signature of a generic context is being built: \begin{enumerate} \item Inheritance clauses of generic parameter declarations. \item The types inside requirement representations in the \texttt{where} clause. \item An additional list of type representations passed in to the inferred generic signature request. For functions and subscripts, this is the list of parameter types together with the return type. For type aliases, this is the underlying type of the type alias. \end{enumerate} We resolve each of the above type representations with the structural type resolution stage (Chapter~\ref{typeresolution}), then visit all generic nominal and generic type alias types appearing within. Upon encountering a recursively-nested generic nominal or type alias type, requirement inference decomposes it into a generic signature and substitution map. The generic signature is the signature of the type declaration. For type alias types, the substitution map from type resolution is preserved in the type alias type itself. For generic nominal types, the context substitution map is used. The substitution map is applied to each requirement in the generic signature, producing a list of \emph{inferred requirements}. In Example~\ref{requirementinferenceexample1}, the substituted requirement's subject type was another type parameter. This is not always the case, as the next example makes clear; this motivates the \emph{requirement desugaring} algorithm, introduced in the next section. \begin{example} When the generic signature of \texttt{Transformer.transform()} is built, requirement inference will consider the function's parameter type representation: \begin{Verbatim} struct Transform where X.Element == Y.Element {} struct Transformer { func transform(_: Transformer = Array where Element: Hashable func removeAll(_: HashableArray) {} // Equivalent to: func removeAll(_: Array) where Element: Hashable {} \end{Verbatim} \end{example} \begin{example} The underlying type of a type alias need not mention of the type alias's generic parameters directly at all. Before parameterized protocol types were added to the language, a few people discovered a funny trick that could simulate them in a sense: \begin{Verbatim} typealias SequenceOf = Any where T: Sequence, T.Element == E func sum>(_: T) {...} \end{Verbatim} To understand what this does, and how, consider the generic signature of \texttt{SequenceOf}: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} The \texttt{sum()} function declares a generic parameter \texttt{S} with an inheritance clause containing a constraint type of \texttt{SequenceOf}. The canonical type of this type alias is \texttt{Any}, which introduces the trivial requirement \texttt{S:\ Any}. However, requirement inference also visits the type representation \texttt{SequenceOf}, and applies the type alias type's substitution map to each requirement of the type alias declaration's generic signature: \[ \left\{ \begin{array}{l} \ttbox{T:\ Sequence}\\[\medskipamount] \ttbox{E == T.Element} \end{array} \right\} \times \SubMapC{ \SubType{T}{S}\\ \SubType{E}{Int} }{\multicolumn{3}{|l|}{$\ldots$}} = \left\{ \begin{array}{l} \ttbox{S:\ Sequence}\\[\medskipamount] \ttbox{Int == S.Element} \end{array} \right\} \] So the above : \begin{Verbatim} func sum>(_: T) {...} func sum(_: T) where S: Sequence, S.Element == Int {...} \end{Verbatim} \end{example} \begin{example} It is instructive to consider the behavior of type resolution's generic argument checking, with and without requirement inference. Requirement inference is only performed if the generic context is getting it's own generic signature for some other reason, either because it adds generic parameters or has a trailing \texttt{where} clause. In the first two definitions below, requirement inference infers the \texttt{T:\ Hashable} requirement. In the last one, requirement inference does not run, because the function \texttt{example3()} is not generic: \begin{Verbatim} struct G { func example1(_: V, _: Set) {} // infers T: Hashable func example2(_: Set) where U: Sequence {} // infers T: Hashable func example3(_: Set) {} // nothing inferred; error because `T: Hashable' unsatisfied } \end{Verbatim} In the first function, the inferred generic signature request runs because the function declares a generic parameter list. Requirement inference adds the inferred requirement \texttt{T:\ Hashable}; thus the generic signature is \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} After the generic signature is built, type resolution process to the interface type resolution stage and checks the generic arguments of \texttt{Set}. The type is mapped into the function's generic environment, producing a contextual type \texttt{Set<\archetype{T}>} containing the primary archetype \archetype{T}. The substituted requirement \texttt{\archetype{T}:\ Hashable} holds, because the local requirements of \archetype{T} include a conformance to \texttt{Hashable}. In the second example, the generic signature is slightly different, but the generic argument check succeeds for the same reason: \begin{quote} \begin{verbatim} \end{verbatim} \end{quote} In the third example, the declaration does not have a generic parameter list or \texttt{where} clause, so it inherits the generic signature from the parent context, \texttt{}. In this generic environment, the archetype \archetype{T} does not satisfy a conformance requirement to \texttt{Hashable}. An error is diagnosed, pointing at the source location of the type representation \texttt{Set}. \end{example} \paragraph{Requirement signatures} Unlike the inferred generic signature request, the requirement signature request does not perform requirement inference. So the \texttt{Particle:\ Hashable} requirement cannot be omitted below: \begin{Verbatim} protocol Cloud { associatedtype Particle: Hashable associatedtype Particles: Sequence where Particles.Element == Set } \end{Verbatim} Originally the reasoning was that all requirements imposed on the concrete conforming type should be explicitly stated inside the protocol body, for clarity. This has since been retconned into an appeal for implementation simplicity: a consequence of how protocol requirement signatures are computed is that the set of all protocols referenced from the right hand side of all protocol conformance requirements inside a protocol needs to be known very early, before requirement inference could have a chance to run. This is incompatible with requirement inference introducing new conformance requirements. \fi \section{Desugared Requirements}\label{requirement desugaring} \ifWIP After requirement resolution and requirement inference, the inferred generic signature request has all of the information needed to build a new generic signature: the parent generic signature, a list of generic parameters to add, and a list of generic requirements, including inferred requirements. The abstract generic signature request \emph{starts} here; the list of requirements are directly provided by the caller, rather than being constructed from user-written requirement representations and requirement inference. The full list of generic parameters in the new signature is computed by concatenating the parent signature's generic parameters with the list of generic parameters to add. The full list of requirements is similarly computed from the parent signature's requirements together with the list of requirements to add, but there are several extra steps before we can arrive at the \emph{minimal}, \emph{reduced} list of requirements stored in a generic signature. Generic signatures play an important role in the Swift binary interface: they determine the calling convention of generic functions, the layout of generic nominal type metadata, the mangling of symbol names, and more. Therefore, the exact form of the requirements in a generic signature is quite important to specify. A formal definition of a generic signature was first written down in \cite{gensig}; the next two sections present a new definition with more rigor and vigor. The first step is \emph{requirement desugaring}, which establishes the invariant defined below. \index{desugared requirement} \index{requirement desugaring} \begin{definition}\label{desugaredrequirementdef} A \emph{desugared requirement} satisfies two conditions: \begin{enumerate} \item For conformance requirements, the constraint type must be a protocol type (and not a protocol composition or parameterized protocol). \item For all requirement kinds, the first type in the requirement--the subject type---must be a type parameter. \end{enumerate} \end{definition} \index{conformance requirement} \index{protocol type} \index{protocol composition type} \index{parameterized protocol type} To deal with the first invariant, the complex constraint types that can appear in a user-written requirement need to be broken down into multiple simpler requirements. All three of the following declarations are equivalent, but only the last one has the final desugared form: \begin{Verbatim} func persist(data: D) where D: Sequence & Hashable {...} func persist(data: D) where D: Sequence, D: Hashable {...} func persist(data: D) where D: Sequence, D.Element == String, D: Hashable {...} \end{Verbatim} This can be formalized as an algorithm to recursively visit parameterized protocol types and members of protocol compositions. \begin{algorithm}[Expanding conformance requirements] As input, takes a list of requirements. Outputs a new list of equivalent requirements where all conformance requirements have a protocol type on the right hand side. \begin{enumerate} \item Initialize the output list to an empty list. \item Initialize the worklist to initially contain all requirements from the input list. \item (Check) If the worklist is empty, return the output list. \item (Loop) Take a conformance requirement $\texttt{T}:~\texttt{C}$ from the worklist. \item (Base case) If \texttt{C} is a protocol type, add this conformance requirement \texttt{T:~C} to the output list. \item (Composition) If \texttt{C} is a protocol composition type, visit each protocol composition member $\texttt{M}\in\texttt{C}$. If \texttt{M} is a class type or \texttt{AnyObject}, add the superclass or layout requirement \texttt{T:~M} to the output list. Otherwise, \texttt{M} might need to be decomposed further, so add a conformance requirement \texttt{T:~M} to the worklist. \item (Parameterized) If \texttt{C} is a parameterized protocol type \texttt{P} with base type \texttt{P} and generic arguments \texttt{Gi}, decompose the requirement as follows: \begin{enumerate} \item The base type \texttt{P} is always a protocol type. Add the conformance requirement \texttt{T:~P} to the output list. \item For each primary associated type \texttt{Ai} of \texttt{P}, construct a same-type requirement between the associated type member of \texttt{T} and the generic argument type, respectively. Add this requirement \texttt{T.[P]Ai == Gi} to the output list. \end{enumerate} \end{enumerate} \end{algorithm} This gives us the first invariant of Definition~\ref{desugaredrequirementdef}. To understand how requirement desugaring establishes the second invariant, we need to consider what it means to build a generic signature where some of the requirements are applied to concrete types. \index{global conformance lookup} Recall from Section~\ref{checking generic arguments} that a requirement where all types are fully concrete is a ``statement'' whose truth can be evaluated. A conformance requirement about a concrete type can be checked by performing a global conformance lookup; a same-type requirement between concrete types can be checked by comparing two types for canonical equality, and so on. Requirement desugaring can similarly apply Algorithm~\ref{reqissatisfied} to check if a fully concrete requirement is satisfied. If it is satisfied, the requirement is necessarily redundant because it does not give us any new information about the generic signature's type parameters. If it is unsatisfied, the requirement is said to \emph{conflict}; an error is diagnosed. Requirements between concrete types often appear after applying a substitution map to the requirements of a different generic signature. You saw this already with requirement inference, and it will come up again in Section~\ref{overridechecking} and Section~\ref{witnessthunksignature}. \begin{listing}\captionabove{Trivial and contradictory requirements}\label{trivialandcontradictoryreqs} \begin{Verbatim} func trivial(_: T) where Int: Hashable {} func contradictory(_: T) where Int: Sequence {} \end{Verbatim} \end{listing} \begin{example} In Listing~\ref{trivialandcontradictoryreqs}, the first declaration's requirement \texttt{Int:\ Hashable} is redundant, because it can be proven from ``first principles.'' The second declaration's requirement \texttt{Int:\ Sequence} is a conflict. \end{example} \index{conditional requirements} \index{conditional conformance} \index{invalid conformance} \index{same-type requirement} If the requirement's types \emph{contain} type parameters but are not themselves type parameters, the requirement might need to be broken down into new requirements, involving type parameters as subject types: \begin{enumerate} \item A \textbf{conformance requirement} where the subject type is a concrete type can still be evaluated by performing a global conformance lookup, even if the subject type contains type parameters. If the conformance is invalid, the requirement is rejected and diagnosed. If the conformance is unconditional, the conformance requirement similarly becomes redundant. Otherwise, if the conformance is conditional, the conformance requirement is replaced with the list of conditional requirements in the conditional conformance (Section~\ref{conditional conformance}). \item A \textbf{superclass requirement} between two concrete types is desugared by checking if the subject type inherits from the superclass declaration, and replacing the subject type with the corresponding superclass type if so (Chapter~\ref{classinheritance}). \item A \textbf{layout requirement} can still be immediately evaluated even the subject type contains type parameters, because the layout constraint of a concrete type is determined entirely by whether the outermost type is a class type. \item A \textbf{same-type requirement} is the interesting case covered here. \end{enumerate} \index{abstract same-type requirement} \index{concrete same-type requirement} After desugaring, there are two varieties of same-type requirements: \emph{abstract} same-type requirements between type parameters, and \emph{concrete} same-type requirements between a type parameter and a concrete type. A same-type requirement where the subject type is concrete but the constraint type is a type parameter does not satisfy the conditions of a desugared requirement, however since same-type requirements are \emph{symmetric}, it suffices to flip the subject type and constraint type, which produces an equivalent same-type requirement that happens to be desugared: \begin{quote} \begin{Verbatim} func sum(_: T) -> Int where T.Element == Int {} func sum(_: T) -> Int where Int == T.Element {} \end{Verbatim} \end{quote} Therefore the remaining case is the desugaring of same-type requirements where \emph{both} types are concrete. A same-type requirement is a statement that two types have the same reduced type. If both types are concrete, they must be compatible in certain ways. Computing a reduced type of a concrete type (Algorithm~\ref{reducedtypealgo}) only transforms the leaves that happen to be type parameters, by replacing them with other type parameters or concrete types; the ``shape'' of the concrete type remains the same. Under our interpretation, a same-type requirement \texttt{Array == Array} should be equivalent to \texttt{Element == Int}. To see why, consider what it means to say that \texttt{Array} and \texttt{Array} have the same reduced type. The second type is already fully concrete, and cannot be reduced further. The only way to get the second type from the first type is to replace \texttt{Element} with \texttt{Int}. This suggests that the reduced type of the type parameter \texttt{Element} is \texttt{Int}. On the other hand, a requirement like \texttt{Array == Set} is non-sensical. The reduced type of \texttt{Array} will always be some specialization of \texttt{Array}, which will never equal a specialization of \texttt{Set}. This can be formalized by considering what it means for two types to have matching sub-components. \begin{definition} Two types have \emph{matching sub-components} if they have the same kind, same number of child types, and exactly equal non-type information, such as the declaration of nominal types, the labels of two tuples, value ownership kinds of function parameters, and so on. This property only considers the root of the tree; \texttt{Array} and \texttt{Array>} still have matching sub-components, but the two sub-components \texttt{Int} and \texttt{Set} do not. \end{definition} A same-type requirement between two types with matching sub-components desugars to a list of same-type requirements between the pair-wise component types. These requirements might need further desugarging recursively. \begin{algorithm}[Same-type requirement desugaring] As input, takes an arbitrary same-type requirement. As output, returns three lists of same-type requirements, the \emph{desugared} list, \emph{redundant} list, and \emph{conflict} list. \begin{enumerate} \item Initialize the desugared list, redundant list and conflict list to empty lists of requirements. \item Initialize a worklist to contain the input requirement. \item (Loop) Take the next requirement \texttt{T == U} from the worklist. \item (Abstract) If \texttt{T} and \texttt{U} are both type parameters, add \texttt{T == U} to the desugared list. \item (Concrete) If \texttt{T} is a type parameter and \texttt{U} is concrete, add \texttt{T == U} to the desugared list. \item (Flipped) If \texttt{T} is concrete and \texttt{U} is a type parameter, add \texttt{U == T} (note the flip) to the desugared list. \item (Redundant) If \texttt{T} and \texttt{U} are both concrete and canonically equal, add \texttt{T == U} to the redundant list. \item (Recurse) If \texttt{T} and \texttt{U} are not canonically equal but have matching sub-components, let $\texttt{T1}\ldots\texttt{Tn}$ and $\texttt{U1}\ldots\texttt{Un}$ be the child types of \texttt{T} and \texttt{U}. For each $1\le \texttt{i}\le \texttt{n}$, add the same-type requirement \texttt{Ti == Ui} to the worklist. \item (Conflict) If \texttt{T} and \texttt{U} are both concrete and do not have matching sub-components, add \texttt{T == U} to the conflict list. \item (Check) If the worklist is empty, return. Otherwise, go back to Step~3. \end{enumerate} \end{algorithm} \fi \section{Minimal Requirements}\label{minimal requirements} \ifWIP Requirement desugaring eliminates certain forms of redundancy, but the list of desugared requirements might still contain redundancies or conflicts. Also, the type parameters in a generic signature are expected to be bound type parameters, whereas type parameters in user-written requirements are unbound because they come from the structural type resolution stage. The \emph{requirement minimization} algorithm transforms a list of desugared requirements into a list of minimal, reduced requirements. The below definitions only show how to \emph{check} if a list of requirements satisfies the necessary invariants, without revealing how requirement minimization arrives at the solution. For that, you will need to wait until Part~\ref{part rqm}. \begin{definition}\label{generic signature invariants definition} The requirements of a generic signature are desugared, valid, minimal, and reduced. \end{definition} Desugared requirements were defined in the previous section; the remaining concepts are defined below. \begin{definition} A requirement is \emph{valid} if the subject type and any type parameters appearing on the right hand side are valid. \end{definition} \begin{definition} A type parameter is \emph{valid} if one of the following holds: \begin{itemize} \item The type parameter is a generic parameter type in the generic signature. \item The type parameter is a dependent member type \texttt{T.[P]A} with base type \texttt{T} and associated type \texttt{A} of protocol \texttt{P}, and the base type \texttt{T} is both recursively valid, and conforms to the protocol \texttt{P}, that is, the conformance requirement \texttt{T:\ P} is known to be satisfied via the \texttt{requiresProtocol()} generic signature query. \end{itemize} \end{definition} \begin{definition} We can attempt to \emph{delete} a requirement by forming a new generic signature from the remaining requirements and checking the invariants of Definition~\ref{generic signature invariants definition}. A requirement is \emph{minimal} if one of the following holds: \begin{itemize} \item The requirement cannot be deleted, because at least one of the remaining requirements would become invalid. \item The requirement can be deleted, but the resulting generic signature does not satisfy the deleted requirement. \end{itemize} \end{definition} \begin{definition} For requirements other than abstract same-type requirements, the definition of a \emph{reduced} requirement is straightforward: \begin{itemize} \item A conformance or layout requirement is reduced if the requirement's subject type is a reduced type parameter. \item A superclass or concrete same-type requirement is reduced if the subject type is a reduced type parameter, and if any type parameters appearing in the right hand side type are reduced. \end{itemize} \end{definition} In the case of an abstract same-type requirement, \texttt{A == B} defines an equivalence between two type parameters, so by definition at least one of \texttt{A} or \texttt{B} is not reduced and the above definition would not work. We just need a few more steps. \begin{definition} An abstract same-type requirement is \emph{oriented} if the left hand side precedes the right hand side in type parameter order. \end{definition} We know that the type parameter on the right hand side of an oriented same-type requirement can always be reduced to the left hand side by the same-type requirement itself, but the key property we want is that \emph{no other} same-type requirement can reduce it. \begin{definition} A same-type requirement is \emph{right reduced} if it is oriented, and the right hand side cannot be reduced by any combination of same-type requirements not involving this requirement itself. \end{definition} What is the correct condition for the type parameter on the left hand side? Starting from a list of requirements equating each one of \texttt{T.A}, \texttt{T.B}, \texttt{T.C} and \texttt{T.D} with the rest (this is a complete graph of order 4), \begin{quote} \begin{verbatim} T.A == T.B T.A == T.C T.A == T.D T.B == T.C T.B == T.D T.C == T.D \end{verbatim} \end{quote} the minimization algorithm is defined to output the ``circuit,'' \begin{quote} \begin{verbatim} T.A == T.B T.B == T.C T.C == T.D \end{verbatim} \end{quote} and not the ``star,'' \begin{quote} \begin{verbatim} T.A == T.B T.A == T.C T.A == T.D \end{verbatim} \end{quote} This formalizes as follows. \begin{definition}\label{left reduced requirement} A same-type requirement is \emph{left reduced} if two conditions hold: \begin{enumerate} \item The left hand side is not equal to the left hand side of any other (abstract or concrete) same-type requirement. \item The left hand side is either equal to the \emph{right} hand side of some other same-type requirement, or it is reduced. \end{enumerate} \end{definition} Finally, we have our definition of a reduced abstract same-type requirement. \begin{definition} An abstract same-type requirement is \emph{reduced} if it is left reduced and right reduced. \end{definition} We now have a complete picture of what it means for a set of requirements to be well-formed; all that remains is to sort the requirements in a certain order when constructing the new generic signature. \begin{definition}\label{requirement order} Requirements in a generic signature are ordered as follows: \begin{itemize} \item Requirements with different subject types are ordered by the type parameter order on their subject types. \item Requirements with the same subject type and different kinds are ordered by kind: \begin{enumerate} \item superclass, \item layout, \item conformance, \item same-type. \end{enumerate} \item Conformance requirements with the same subject type must have different right hand sides, which are ordered with the protocol ordering. \end{itemize} \end{definition} \begin{theorem} Under our validity conditions, the requirement order is a total order. \end{theorem} \begin{proof} Suppose we have two desugared, valid, minimal and reduced requirements that cannot be ordered. This means they have the same subject type and kind, but are not conformance requirements. We can show that each remaining requirement kind leads to a contradiction. If we have two layout requirements with the same subject type, they must be equal, as the only layout constraint that can be written in the source language is \texttt{AnyObject}. Either duplicate requirement can be deleted, and neither requirement is minimal. This contradicts our assumption that all requirements are minimal. If we have two same-type requirements and at least one of the two is an abstract same-type requirement, then the fact that it has the same subject type as the other violates Condition~1 of Definition~\ref{left reduced requirement}. This means the abstract same-type requirement is not left reduced, so in particular it is not reduced. This contradicts our assumption that each requirement is reduced. If we have two concrete same-type requirements, say \texttt{T == C} and \texttt{T == D} where \texttt{C} and \texttt{D} are concrete types, we know \texttt{T}, \texttt{C} and \texttt{D} all have the same reduced type. We also know that \texttt{C} and \texttt{D} are already reduced, because they appear on the right hand side of reduced concrete same-type requirements. This implies that \texttt{C} and \texttt{D} are exactly equal. Again, it follows that we have duplicate requirements, so either requirement can be deleted, and neither requirement is minimal. This contradicts our assumption that all requirements are minimal. The only remaining case is that both are superclass requirements. Proving this also leads to a contradiction is left as an exercise for the reader. \end{proof} This shows it is not possible to have two layout, superclass or concrete type requirements with the same subject type. We can prove an even stronger condition. \begin{theorem} The subject type of a concrete same-type requirement cannot equal the subject type of \emph{any} other requirement in a generic signature. \end{theorem} \begin{proof} Suppose our generic signature contains a concrete same-type requirement \texttt{T~==~C} and a a conformance requirement \texttt{T:~P}. This means the subject type of the conformance requirement, \texttt{T}, can be reduced to \texttt{C}, violating the condition that all requirements are reduced. The proof for the other requirement kinds is similar. \end{proof} TODO: examples \fi \section{Source Code Reference}\label{buildinggensigsourceref} \ifWIP TODO: \begin{description} \item[\texttt{GenericSignatureRequest}] The request evaluator request for looking up the generic signature of a declaration. This either returns the parent declaration's generic signature, or kicks off \texttt{InferredGenericSignatureRequest}. \item[\texttt{InferredGenericSignatureRequest}] The request evaluator request for building a generic signature from requirements written in source. Takes a parent signature, generic parameter list, trailing where clause, and list of types from which to infer additional requirements. All components are optional, but at least one must be supplied. \item[\texttt{AbstractGenericSignatureRequest}] The request evaluator request for building a generic signature from a set of ``abstract'' requirements, constructed from whole cloth. \item[\texttt{buildGenericSignature()}] A utility function that is a convenience wrapper around the \texttt{AbstractGenericSignatureRequest}. \end{description} \fi \chapter{Extensions}\label{extensions} \ifWIP Extensions are a special kind of declaration that adds new members to a nominal type declaration. This nominal type declaration is called the \emph{extended type}. Extensions do not have names and cannot be referenced; only their members are referenced directly. For this reason, \texttt{ExtensionDecl} is a subclass of \texttt{Decl} and not \texttt{ValueDecl}. Extensions cannot be nested inside other declarations and always appear at the top level of a source file; an extension of a nested type can be declared by referencing the qualified name of the type, like \texttt{extension Outer.Inner}. Extensions of nominal types can declare nested nominal types. Extensions of protocols cannot declare nested nominal types for the same reasons that protocols cannot declare nested nominal types. \begin{example} The following extension adds a method named \texttt{foo()} to \texttt{Outer.Middle}, and declares a type named \texttt{Outer.Middle.Inner}: \begin{Verbatim} struct Outer { struct Middle {} } extension Outer.Middle { func foo() {} struct Inner {} } \end{Verbatim} \end{example} \index{self interface type} \index{declared interface type} \index{extended type} The declared interface type of an extension is the declared interface type of the extended type; the self interface type of an extension is the self interface type of the extended type. The two concepts coincide except when the extended type is a protocol, in which case the declared interface type is the protocol type, whereas the self interface type is the \texttt{Self} generic parameter of the protocol. \index{extension binding} \paragraph{Extension binding} The type checker establishes a two-way association between an extension and its extended type. The extension references the extended type, and the extended type's lookup table is updated to incorporate the extension's members. This process is known as \emph{extension binding}. Name lookups are expected to find members of extensions, so extension binding runs very early in the compilation process, immediately after source files have been parsed and imports resolved. \index{synthesized declaration} Extension binding itself must perform a name lookup to find the extended type. Ordinary name lookup can trigger generic signature construction and conformance checking; performing these during extension binding would lead to incorrect behavior if those calculations themselves expect name lookup to find members of other extensions that haven't been bound yet. This is resolved by using a special form of name lookup that avoids calls into other parts of the type checker. This more limited form of name lookup will not resolve references to type aliases which are synthesized by associated type inference, or generic type aliases with a dependent underlying type. \begin{example}\label{bad extension 1} An invalid extension of an inferred type alias: \begin{Verbatim} protocol Animal { associatedtype FeedType func eat(_: FeedType) } struct Horse: Animal { func eat(_: Hay) {} } // error: extension of type Horse.FeedType must be declared as an extension of Hay extension Horse.FeedType {...} \end{Verbatim} \end{example} \begin{example}\label{bad extension 2} An invalid extension of a type alias with a dependent underlying type: \begin{Verbatim} typealias G = T.Element // error: extension of type G> must be declared as an extension of Int extension G> {...} \end{Verbatim} \end{example} A further complication is that resolving the extended type has an ordering dependency if the extended type was itself defined inside of an extension. The compiler tries to be agnostic to the order of top-level declarations within a file, or the order of files in a module; however extension binding violated this property until the problem was fixed in Swift 5.0 \cite{sr631}. The correct approach is to allow name lookup of the extended type to fail without emitting diagnostics, and then to iterate the process until fixed point. \begin{algorithm}[Extension binding] Takes a list of all extensions in the current module as input. \begin{enumerate} \item Initialize the worklist with the input. \item Initialize the flag to false. \item (Resolve) For each extension in the worklist, attempt to resolve the extended type. If lookup succeeds, bind the extension, remove it from the worklist, and set the flag to true. If lookup fails, the extension remains on the worklist. \item (Retry) If the flag is true, we managed to bind at least one extension, which shrinks the worklist. Go back to Step~2. \item (Diagnose) If no forward progress was made, we have reached fixed point. For each extension remaining in the worklist: \begin{enumerate} \item Resolve the extended type using ordinary type resolution. \item If resolution failed, a diagnostic will have been emitted already. Otherwise, resolution succeeded, meaning the extended type was found by ordinary type resolution but not the extension binding process. \item If resolution returned a type alias type \texttt{Foo} desugaring to a nominal type \texttt{Bar}, we are in one of the unsupported cases involving a type alias. Emit the ``extension of type \texttt{Foo} must be declared as an extension of \texttt{Bar}'' diagnostic. \item Otherwise, resolution returned some other type, perhaps a non-nominal type. Emit the ``non-nominal type cannot be extended'' diagnostic. \end{enumerate} \end{enumerate} \end{algorithm} This algorithm is quadratic in the worst case, but only unrealistic code examples trigger this pathological behavior. The common case of extensions of top-level types, or nested types declared directly inside their parent type, is handled in the first pass. Extensions of types declared in other extensions are rare, but do come up in practice. \begin{example} The following example requires four iterations if the extensions are processed in the written order: \begin{Verbatim} struct Outer {} extension Outer.Middle.Inner {} extension Outer.Middle { struct Inner {} } extension Outer { struct Middle {} } extension DoesNotExist {} \end{Verbatim} The first iteration binds the extension of \texttt{Outer}, which adds \texttt{Middle} to the name lookup table. The second iteration binds the extension of \texttt{Outer.Middle}, which adds \texttt{Inner} to the name lookup table. The third iteration binds the extension of \texttt{Outer.Middle.Inner}. After the third iteration, the extension of \texttt{DoesNotExist} is still on the worklist. The fourth iteration attempts to resolve \texttt{DoesNotExist}, which still fails, so it remains on the worklist. We end up in the diagnostic path, which resolves \texttt{DoesNotExist} yet again, this time via ordinary type resolution which emits the standard diagnostic. \end{example} \paragraph{Generic parameter lists} An extension does not declare an explicit generic parameter list in source; instead, the specified behavior is that the generic parameters of the extended type can also be referenced from inside the extension. While extensions always appear at the top level of a source file, the extended type might be a nested type, with generic parameters at multiple depths. To deal with this, a generic parameter list can point at an outer parameter list. The generic parameter list of an extension is built by cloning the generic parameter list of the extended type and each outer generic context, linking them together with the innermost generic parameter list at the head of the list. The innermost generic parameter list becomes the generic parameter list of the extension. Name lookup and generic signature construction traverse the linked list of outer pointers when searching for generic parameter declarations that are in scope. TODO: draw a figure The outer pointer is not used for generic parameter lists of generic contexts other than extensions. For generic contexts that are actually nested in source, the declaration tree provides sufficient structure that does not need to be duplicated via the outer pointer. \begin{example} The extension of \texttt{Outer.Inner} has the generic parameter declarations \texttt{T} and \texttt{U} in scope, with canonical types $\ttgp{0}{0}$ and $\ttgp{1}{0}$ respectively: \begin{Verbatim} struct Outer { struct Inner { ... } } extension Outer.Inner { ... } \end{Verbatim} The extension's generic parameter list stores a single generic parameter declaration for \texttt{U}. This generic parameter list points at an outer generic parameter list, which stores the declaration for \texttt{T}. \end{example} \fi \section{Constrained Extensions}\label{constrained extensions} \ifWIP The generic signature of an unconstrained extension is the same as the generic signature of the extended type. An extension can also impose additional requirements on the extended type's generic parameters; the members of a \emph{constrained extension} are only available on specializations of the extended type that satisfy these requirements. There are three ways to declare a constrained extension: \begin{enumerate} \item using a \texttt{where} clause, \item by extending a generic nominal type with generic arguments, \item by extending a generic type alias, with some restrictions. \end{enumerate} Case~1 is the most general form; Case~2 and Case~3 can be expressed by writing the appropriate requirements in a \texttt{where} clause. \begin{example} An extension of \texttt{Set} which constrains the \texttt{Element} type to \texttt{Int} can be written as follows: \begin{Verbatim} extension Set where Element == Int {...} \end{Verbatim} The generic signature of \texttt{Set} is \texttt{}. The generic signature of the above extension is built by adding the additional same-type requirement \texttt{Element == Int}. The original requirement \texttt{Element:\ Hashable} becomes redundant, since \texttt{Int} is known to conform to \texttt{Hashable}, so the extension's generic signature becomes \begin{Verbatim} \end{Verbatim} \end{example} \paragraph{Extending a generic nominal type} The second case is a shorthand for an extension that fully constrains all generic parameters to concrete types. Instead of writing out a series of same-type requirements, the extended type can be written as a generic nominal type with arguments: \begin{Verbatim} extension Set {...} \end{Verbatim} The new syntax was introduced in Swift 5.7 \cite{se0361}. In the early days of Swift, the generics implementation did not support same-type requirements between generic parameters and concrete types at all; only associated types of generic parameters could be made concrete. This meant that the first of the two extensions below was accepted, but perhaps confusingly to users, not the second: \begin{Verbatim} // Worked since the dawn of time, because the requirement is actually ``Self.Element == Int'': extension Sequence where Element == Int {...} // Rejected before Swift 3 extension Array where Element == Int {...} \end{Verbatim} This restriction was lifted after a non-trivial amount of work in Swift 3; the changes introduced many of the concepts described in this book, such as substitution maps and generic environments. \paragraph{Extending a generic type alias} If the extended type is a generic type alias, the generic signature of the extension is built from the requirements of the generic type alias. There are several important restrictions: \begin{enumerate} \item the underlying type of the generic type alias must be a nominal type, \item the generic parameters of the type alias must have the same names and order as the generic parameters of the nominal type, \item the underlying type must apply the generic arguments to the nominal type in the same order. \end{enumerate} Such a type alias is essentially equivalent to the nominal type, except that it may introduce additional requirements via a \texttt{where} clause. The type checker calls this a ``pass-through type alias''. An extension of a pass-through type alias behaves exactly like an extension of the underlying nominal type, except the additional requirements become part of the extension's generic signature. \begin{example} Here is an extension of a type alias satisfying the requirements of a pass-through type alias: \begin{Verbatim} typealias CodableDictionary = Dictionary where Key: Codable, Value: Codable extension CodableDictionary {...} \end{Verbatim} When building the generic signature of \texttt{CodableDictionary}, requirement inference introduces the requirement \texttt{Key:\ Hashable} from the underlying type \texttt{Dictionary}, because the generic signature of \texttt{Dictionary} has a \texttt{Key:\ Hashable} requirement. So the generic signature of the generic type alias, and therefore the extension, is \begin{Verbatim} \end{Verbatim} \end{example} \begin{example} The following are not pass-through type aliases: \begin{Verbatim} typealias MyArray = Array typealias SillyDictionary = Dictionary typealias StringMap = Dictionary \end{Verbatim} \end{example} \fi \section{Conditional Conformances}\label{conditional conformance} \ifWIP TODO: \begin{itemize} \item infinite recursion - \texttt{https://github.com/apple/swift/issues/49273} \end{itemize} A \emph{conditional conformance} is a normal conformance declared on a constrained extension. If a nominal type conforms to a protocol conditionally, a specialization of the extended type only conforms to the protocol if it satisfies the requirements of the constrained extension. Conditional conformances were introduced in Swift~4.2~\cite{se0143}. For example, many generic containers can implement equality of containers in terms of the equality operation on the element type. However, if equality comparison is otherwise not an intrinsic requirement for the data structure, it would be undesirable to unconditionally restrict the element type to be \texttt{Equatable} just to allow the generic container type to be used as an \texttt{Equatable} value. Instead, we can declare that the generic container is \texttt{Equatable} if its element type happens to be \texttt{Equatable} by stating the conformance on a constrained extension. The element type of the container itself is thus not required to be \texttt{Equatable}. \begin{example} The standard library declares various conditional conformances on \texttt{Array}: \begin{Verbatim} struct Array {...} extension Array: Equatable where Element: Equatable {...} extension Array: Hashable where Element: Hashable {...} extension Array: Comparable where Element: Comparable {...} \end{Verbatim} \end{example} If the conformance context is a constrained extension, the generic signature of the normal conformance will contain additional requirements not present in the generic signature of the conforming type. These requirements are called \emph{conditional requirements} and are stored in the normal conformance. \begin{example} The normal conformance of \texttt{Array} to \texttt{Equatable} looks like this: \begin{description} \item[Type] \texttt{Array<\ttgp{0}{0}>} \item[Generic signature] \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable>} \item[Type witnesses] None. \item[Associated conformances] None. \item[Conditional requirements] \phantom{a} \begin{itemize} \item \texttt{\ttgp{0}{0}:\ Equatable} \end{itemize} \end{description} \end{example} \begin{example} The normal conformance of \texttt{Array} to \texttt{Hashable} looks like this: \begin{description} \item[Type] \texttt{Array<\ttgp{0}{0}>} \item[Generic signature] \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Hashable>} \item[Type witnesses] None. \item[Associated conformances] \phantom{a} \begin{tabular}{|l|l|} \hline Conformance requirement&Conformance\\ \hline \hline \texttt{Self:\ Equatable}&Normal conformance \texttt{Array:\ Equatable}\\ \hline \end{tabular} \item[Conditional requirements] \phantom{a} \begin{itemize} \item \texttt{\ttgp{0}{0}:\ Hashable} \end{itemize} \end{description} Note that since the \texttt{Hashable} protocol inherits from \texttt{Equatable}, the requirement signature of \texttt{Hashable} has a conformance requirement \texttt{Self:\ Equatable}, meaning that every type conforming to \texttt{Hashable} also conforms to \texttt{Equatable}. As a result the conformance \texttt{Array:\ Hashable} stores an associated conformance \texttt{Array:\ Equatable}. \end{example} The interesting case is when a specialized conformance has conditional requirements. The specialized conformance's substitution map is applied to each conditional requirement; the requirement can then be checked for satisfiability, answering the question of whether this specialization of the generic type conditionally conforms to the protocol. \begin{example}\label{arrayintequatable} The specialized conformance of \texttt{Array} to \texttt{Equatable} looks like this: \begin{description} \item[Type] \texttt{Array} \item[Substitution map] \phantom{a} \SubMapC{ \SubType{Element}{Int} }{ \SubConf{Int:\ Equatable} } \item[Type witnesses] None. \item[Associated conformances] None. \item[Conditional requirements] \phantom{a} \begin{itemize} \item \texttt{Int:\ Equatable} \end{itemize} \end{description} \end{example} \paragraph{Checking conditional requirements} A conditional requirement of a specialized conformance is ``fully substituted'' if it does not contain any type parameters. You can call the \texttt{Requirement::isSatisfied()} method to determine if a fully substituted requirement is satisfied. TODO: \begin{itemize} \item Connect this with earlier notions of ``substituted requirements'' \end{itemize} \begin{example} The conditional requirement \texttt{Int:\ Equatable} in the specialized conformance from Example~\ref{arrayintequatable} is fully substituted. This requirement holds, because the standard library declares that \texttt{Int} conforms to \texttt{Equatable}. Therefore, \texttt{Array} conforms to \texttt{Equatable}. \end{example} \begin{example} Looking up the conformance of \texttt{Array} to \texttt{Equatable} produces a different specialized conformance. Here, the fully-substituted conditional requirement is \texttt{AnyObject:\ Equatable}. This requirement is not satisfied, since \texttt{AnyObject} does not conform to \texttt{Equatable} (or any other protocol). Therefore, \texttt{Array} is \emph{not} \texttt{Equatable}. \end{example} When looking up the conformance on a type containing type parameters, the conditional requirements in the resulting specialized conformance might not be fully substituted, and the \texttt{isSatisfied()} method cannot be used. Mapping type parameters to archetypes converts these requirements into fully substituted requirements; see Chapter~\ref{genericenv}. \begin{example} The specialized conformance of \texttt{Array>} to \texttt{Equatable} has a conditional requirement \texttt{Array<\ttgp{0}{0}>:\ Equatable}. This requirement is satisfied by another conditional conformance with conditional requirement \texttt{\ttgp{0}{0}:\ Equatable}. The generic parameter \texttt{\ttgp{0}{0}} is only understood with respect to some generic signature which is not directly represented in the specialized conformance. To see the different behaviors, suppose we map the generic parameter \texttt{\ttgp{0}{0}} into the primary generic environment of two generic signatures before looking up the conformance: \begin{enumerate} \item \texttt{<\ttgp{0}{0}>}, \item \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Hashable>}. \end{enumerate} This produces two distinct specialized conformances written in terms of two different archetypes instantiated from the same generic parameter \ttgp{0}{0}. The first archetype does not conform to any protocols, whereas the second one conforms to \texttt{Hashable} and \texttt{Equatable}. In the first case, the substituted conditional requirement is not satisfied. In the second case, the conditional requirement is satisfied. \end{example} \paragraph{Requirement desugaring} Recall Section~\ref{requirement desugaring}. For example, \texttt{Array} conforms to \texttt{Equatable} conditionally via the following conformance in the standard library: \begin{Verbatim} extension Array: Equatable where Element: Equatable {...} \end{Verbatim} Therefore, \texttt{Array:\ Equatable} will desugar to \texttt{Element:\ Equatable}: \begin{Verbatim} func allEqual(_ elts: Array) -> Bool where Array: Equatable {...} func allEqual(_ elts: Array) -> Bool where Element: Equatable {...} \end{Verbatim} \begin{example} The declaration of \texttt{Dictionary} type has two generic parameters \texttt{Key} and \texttt{Value}, and a single requirement \texttt{Key:\ Hashable}. Suppose our current declaration has the type \texttt{Dictionary, Int>} appearing somewhere, perhaps in the parameter list or a \texttt{where} clause: \begin{Verbatim} func calculate(_: Dictionary, Int>) -> Int {...} \end{Verbatim} The type checker performs the substitution \texttt{Key := Array} on the requirement \texttt{Key:\ Hashable} to get the substituted requirement \texttt{Array:\ Hashable}, which desugars to \texttt{Element:\ Hashable} via the conditional conformance of \texttt{Array} to \texttt{Hashable}, as explained above. The original requirement \texttt{Key:\ Hashable} is written in terms of the generic parameters of \texttt{Dictionary}, which is the referenced declaration. After substitution and desugaring, we obtain the requirement \texttt{Element:\ Hashable}, which is written in terms of the generic parameters of the \emph{current} declaration. Therefore, the requirement \texttt{Element:\ Hashable} is inferred and can be omitted from the declaration of \texttt{calculate()} by the user. \end{example} \begin{example} Requirement inference doesn't always introduce new requirements. The substituted generic requirement might be vacuous; for example, if the written type is \texttt{Set}, we get \texttt{Int:\ Hashable} after substitution, which does not involve any type parameters; it is a tautologically true statement that can simply be discarded. \end{example} \fi \section{Source Code Reference} \ifWIP TODO: \apiref{TypeChecker}{namespace} Namespace for type checker functions. \begin{itemize} \item \texttt{conformsToProtocol()} returns the conformance of a type to a protocol, checking conditional requirements. \end{itemize} \index{extension declaration} \apiref{ExtensionDecl}{class} Extension declarations. Also a \texttt{DeclContext}. \begin{itemize} \item \texttt{getExtendedNominal()} returns the extended nominal type declaration. \item \texttt{getExtendedType()} returns the written extended type, which might be a type alias type or a generic nominal type. \item \texttt{getDeclaredInterfaceType()} returns the type of an instance of the extended type declaration. \item \texttt{getSelfInterfaceType()} returns the type of the \texttt{self} value inside the body of this extension. Different from the declared interface type for protocol extensions, where the declared interface type is a nominal but the declared self type is the generic parameter \texttt{Self}. \end{itemize} \item[\texttt{TypeChecker}] Namespace for type checker functions. \begin{itemize} \item \texttt{isPassthroughTypealias()} Determines if a generic type alias satisfies the constraints that allow it to be the extended type of an extension. \end{itemize} \item[\texttt{NormalProtocolConformance}] A class representing a normal protocol conformance. Inherits from \texttt{ProtocolConformance}. \begin{itemize} \item \texttt{getConditionalRequirements()} Returns an array of conditional requirements, will be empty if the conformance is unconditional. \end{itemize} \item[\texttt{SpecializedProtocolConformance}] A class representing a specialized protocol conformance. Inherits from \texttt{ProtocolConformance}. \begin{itemize} \item \texttt{getConditionalRequirements()} Returns an array of substituted conditional requirements, will be empty if the conformance is unconditional. \end{itemize} \item[\texttt{Requirement}] A generic requirement. \begin{itemize} \item \texttt{isSatisfied()} Returns true if this fully substituted requirement is satisfied. \item \texttt{canBeSatisfied()} Returns true if this requirement can possibly be true after some potential substitutions. \end{itemize} \end{description} \fi \chapter{Conformance Paths}\label{conformance paths} \ifWIP TODO: \begin{itemize} \item Diagrams for each example \item Example where substitution map conformance lookup produces an abstract conformance \item Use algorith environment for lookupConformance and subst algorithms \item How conformance substitution works with abstract conformance \end{itemize} We're now going to see how \texttt{Type::subst()} is implemented when the original interface type contains a \texttt{DependentMemberType}. For the most part, you can skip this chapter and just rely on the implementation to do the right thing; but if you're curious to learn how it all works, you can finally see why substitution maps need to store protocol conformances at all. Say we have a substitution map for a generic signature. In the general case, a (resolved) \texttt{DependentMemberType} looks like this, where \texttt{Base} is some other type parameter, \texttt{Proto} is a protocol and \texttt{Assoc} is an associated type of \texttt{Proto}: \begin{quote} \texttt{Base.[Proto]Assoc} \end{quote} What is the substituted type of the above original type with respect to our substitution map? We know that the base type conforms to \texttt{Proto}, either directly via a conformance requirement in the generic signature, or indirectly as a consequence of other requirements. If we could somehow recover the conformance \texttt{Base:\ Proto}, then looking up the type witness for \texttt{Assoc} would produce our final substituted type. \begin{figure}\captionabove{Substituting a \texttt{DependentMemberType}} \tikzstyle{io} = [rounded corners, draw=black, text centered] \tikzstyle{process} = [rectangle, draw=black, text centered] \tikzstyle{arrow} = [->,>=stealth] \begin{tikzpicture}[node distance=1cm] \node (dummy) [] {}; \node (DependentMemberType) [io, rectangle split, rectangle split parts=2, left=of dummy, yshift=3em] {Original type \nodepart{second}\texttt{Base.[Proto]Assoc}}; \node (AssociatedTypeDecl) [io, rectangle split, rectangle split parts=2, right=of dummy] {\texttt{AssociatedTypeDecl}\nodepart{second}\texttt{associatedtype Assoc}}; \node (SubstitutionMap) [io, left=of DependentMemberType, yshift=-4em, xshift=4em] {\texttt{SubstitutionMap}}; \node (BaseType) [io, below=of DependentMemberType] {\texttt{Base}}; \node (ProtocolDecl) [io, rectangle split, rectangle split parts=2, below=of AssociatedTypeDecl, xshift=-4em] {\texttt{\vphantom{p}ProtocolDecl}\nodepart{second}\texttt{protocol Proto}}; \node (dummy2) [right of=BaseType] {}; \node (lookupConformance) [process, below=of dummy2,xshift=-2em,yshift=-3em] {\texttt{lookupConformance()}}; \node (Conformance) [io, rectangle split, rectangle split parts=2, below=of lookupConformance] {\vphantom{p}Conformance\nodepart{second}\texttt{Base:\ Proto}}; \node (dummy3) [right of=Conformance] {}; \node (getTypeWitness) [process, below=of AssociatedTypeDecl, yshift=-10em, xshift=2em] {\texttt{getTypeWitness()}}; \node (SubstType) [io, below=of getTypeWitness] {Substituted type}; \draw [arrow] (DependentMemberType) -- (BaseType); \draw [arrow] (DependentMemberType) -- (AssociatedTypeDecl); \draw [arrow] ($(AssociatedTypeDecl.south)-(2em,0)$) -- ($(ProtocolDecl.north)+(2em,0)$); \draw [arrow] (SubstitutionMap) |- (lookupConformance); \draw [arrow] ($(ProtocolDecl.south)-(2em,0)$) |- (lookupConformance); \draw [arrow] (BaseType) -- (lookupConformance); \draw [arrow] (lookupConformance) -- (Conformance); \draw [arrow] ($(AssociatedTypeDecl.south)+(2em,0)$) -- (getTypeWitness.north); \draw [arrow] (Conformance.east) -- (getTypeWitness.west); \draw [arrow] (getTypeWitness) -- (SubstType); \end{tikzpicture} \end{figure} The \texttt{SubstitutionMap::lookupConformance()} method recovers this conformance. Unlike the global conformance lookup operations from Chapter~\ref{conformances}, this takes a type parameter, not a substituted type. There are two cases to consider: \begin{enumerate} \item If generic signature directly states the \texttt{Base:\ Proto} conformance requirement, the conformance will also appear directly in the substitution map. \item If the conformance of \texttt{Base} to \texttt{Proto} is a consequence of other requirements written in the generic signature, the desired conformance is not going to be found immediately in the substitution map. However, it will appear as the associated conformance of some \emph{other} conformance. We can start from one of the conformances stored in the substitution map, and ``drill down'' a path of associated conformances to find the one we are looking for. \end{enumerate} Calling \texttt{GenericSignature::getConformanceAccessPath()} will compute this special kind of path. This method takes two inputs: a type parameter, and a protocol declaration. \begin{definition} A conformance path is a \emph{proof} that a given type parameter \texttt{T} of a fixed generic signature conforms to a protocol \texttt{P}. Given a substitution map for this generic signature, following the conformance path ``digs out'' the conformance \texttt{T:\ P} by starting from one of the conformances stored directly in a substitution map. A path has one or more steps: \begin{enumerate} \item The first step is always one of the conformance requirements written in the original generic signature. If the generic signature contains a direct requirement \texttt{T:\ P}, the path ends here, and we have a proof that \texttt{T} conforms to \texttt{P}. Otherwise, the first step is some conformance requirement \texttt{T1:\ P1} for some other type parameter \texttt{T1} and protocol \texttt{P1}. \item Each subsequent step is an associated conformance requirement appearing the requirement signature of the previous step's protocol. \end{enumerate} \end{definition} Let's look at a few examples. \begin{example} Take the generic signature \texttt{}, and the following substitution map: \begin{quote} \SubMapC{ \SubType{T}{Array} }{ \SubConf{Array:\ Sequence} } \end{quote} Consider the interface type \texttt{T.[Sequence]Element}. The first step is to recover the conformance \texttt{T:\ Sequence} from our substitution map. The conformance requirement appears directly in the generic signature, so the conformance path is trivial: \begin{quote} \texttt{(T:\ Sequence)} \end{quote} Evaluating this conformance path outputs the stored conformance corresponding to the conformance requirement \texttt{T:\ Sequence}, which is \texttt{Array:\ Sequence}. Finally, looking up the type witness for \texttt{Element} in this conformance outputs the substituted type \texttt{Int}. \end{example} \begin{example} Now, let's make things slightly more interesting. Take the generic signature \texttt{}, and the following substitution map: \begin{quote} \SubMapC{ \SubType{T}{Array} }{ \SubConf{Array:\ Collection} } \end{quote} Again, consider the interface type \texttt{T.[Sequence]Element}. As before, the first step is to recover the conformance \texttt{T:\ Sequence} from our substitution map. This time though, the generic signature does not directly state the \texttt{T:\ Sequence} conformance requirement, so we have a non-trivial conformance path: \begin{quote} \texttt{(T:\ Collection)(Self:\ Sequence)} \end{quote} What does this mean? We know that \texttt{Collection} inherits from \texttt{Sequence}, hence the requirement signature of \texttt{Collection} contains the requirement \texttt{Self:\ Sequence}. This in turn means that the \texttt{Array:\ Collection} conformance, which is directly stored in our substitution map, stores the \texttt{Array:\ Sequence} conformance as an associated conformance. We compute the substituted type in three steps: \begin{enumerate} \item Search the substitution map to find the stored conformance corresponding to the conformance requirement \texttt{T:\ Collection}. \item Look up the associated conformance for \texttt{Self:\ Sequence} in the conformance from Step~1. \item Look up the type witness for \texttt{Element} in the conformance from Step~2. \end{enumerate} By following the above steps, \texttt{Type::subst()} will again produce \texttt{Int} as the substituted type. \end{example} \begin{example} Here is an example where the conformance path involves an associated conformance for a subject type other than \texttt{Self}. Take the same generic signature and substitution map as before, but consider the original interface type \begin{quote} \texttt{T.[Collection]SubSequence.[Sequence]Iterator}. \end{quote} The conformance path is: \begin{quote} \texttt{(T:\ Collection)(Self.SubSequence:\ Collection)(Self:\ Sequence)} \end{quote} We compute the substituted type in four steps: \begin{enumerate} \item Search the substitution map to find the stored conformance corresponding to the conformance requirement \texttt{T:\ Collection}. \item Look up the associated conformance for \texttt{Self.SubSequence:\ Collection} in the conformance from Step~1. \item Look up the associated conformance for \texttt{Self:\ Sequence} in the conformance from Step~2. \item Look up the type witness for \texttt{Iterator} in the conformance from Step~3. \end{enumerate} By following the above steps, \texttt{Type::subst()} will produce \texttt{ArraySlice.Iterator} as the substituted type. \end{example} \section{Computing Conformance Paths}\label{computing conformance paths} The problem of computing a conformance path for a type parameter and protocol is best understood by first considering the \emph{inverse} problem. Given a conformance path, we can recover a canonical type parameter and protocol, where the type parameter conforms to the protocol via the original conformance path. The protocol is always just the protocol appearing on the right hand side of the final step in the conformance path. The canonical type parameter can be recovered by the following algorithm. \begin{algorithm}[Mapping a conformance path to a type parameter]\label{invertconformancepath} The input is a conformance path in a fixed generic signature: \[C := (T_1: P_1)\ldots (T_n: P_n)\] Recall that the first step, $(T_1: P_1)$ is a conformance requirement in this generic signature. Subsequent steps are associated conformance requirements, such that $(T_i: P_i)$ is an associated conformance requirement in the requirement signature of protocol $P_{i-1}$. Every associated conformance requirement subject type $T_i$ for $i>1$ is a type parameter rooted in the \texttt{Self} generic parameter of protocol $P_i$. \begin{enumerate} \item (Base case.) If $n=1$, that is, if the conformance path is trivial and only consists of a single step, we're done; the type parameter is $T_1$. \item (Recursive case.) Otherwise, recursively compute the type parameter for the conformance path $(T_1: P_1)\ldots (T_{n-1}: P_{n-1})$, obtained by dropping the last component. \item Take the subject type of the final associated conformance requirement $T_n$, and replace the \texttt{Self} generic parameter with the type parameter returned by the recursive call in Step~2. This will produce the type parameter for the conformance path $(T_1: P_1)\ldots (T_n: P_n)$. \item As the final step, canonicalize this type parameter. Now, it will be equal to the original canonical type parameter which was used to look up the conformance path. \end{enumerate} \end{algorithm} \begin{example} Consider the conformance paths we saw earlier in this chapter: \begin{verbatim} (T: Sequence) (T: Collection)(Self: Sequence) (T: Collection)(Self.SubSequence: Collection)(Self: Sequence) \end{verbatim} The corresponding canonical type parameters are: \begin{verbatim} T T T.SubSequence \end{verbatim} TODO: This sucks! Need better examples. \end{example} We can also enumerate conformance paths. While there are infinitely many unique conformance paths in the general case, there are always only finitely many \emph{of a fixed length}. The conformance paths of length 1 are exactly the conformance requirements of a generic signature. Then, conformance paths of length $n$ can be constructed recursively: \begin{enumerate} \item For a conformance path of length $n-1$, we look up the requirement signature of the protocol named by the $(n-1)$th step. This requirement signature will have $k\ge 0$ conformance requirements. \item Appending every possible associated conformance requirement to our path will generate $k$ different paths of length $n$. \item Repeating this process for each conformance path of length $n-1$ will generate all paths of length $n$. \end{enumerate} If none of the protocols referenced from the generic signature have recursive conformance requirements, the process will eventually terminate; after some maximum length $n$ is reached, each conformance path will end in a protocol without associated conformance requirements, from which longer paths cannot be constructed. On the other hand, the process will go on forever if at least one of the protocols has a recursive conformance requirement. Even in the non-terminating case though, the key fact is that we generate all paths of length $n-1$ before we move on to length $n$. \begin{example}\label{collectionconformancepaths} Consider the generic signature \texttt{}. There is a single conformance path of length 1: \begin{Verbatim} (T: Collection) \end{Verbatim} To generate conformance paths of length 2, we append each associated conformance requirement of the \texttt{Collection} protocol: \begin{Verbatim} (T: Collection)(Self: Sequence) (T: Collection)(Self.Index: Comparable) (T: Collection)(Self.Indices: Collection) (T: Collection)(Self.SubSequence: Collection) \end{Verbatim} This process can be repeated to generate conformance paths of length 3. Note that the second path of length 2 does not yield any new conformance paths of length 3, because the \texttt{Comparable} protocol has an empty requirement signature. \begin{Verbatim} (T: Collection)(Self: Sequence)(Self.Iterator: IteratorProtocol) (T: Collection)(Self.Indices: Collection)(Self: Sequence) (T: Collection)(Self.Indices: Collection)(Self.Index: Comparable) (T: Collection)(Self.Indices: Collection)(Self.Indices: Collection) (T: Collection)(Self.Indices: Collection)(Self.SubSequence: Collection) (T: Collection)(Self.SubSequence: Collection)(Self: Sequence) (T: Collection)(Self.SubSequence: Collection)(Self.Index: Comparable) (T: Collection)(Self.SubSequence: Collection)(Self.Indices: Collection) (T: Collection)(Self.SubSequence: Collection)(Self.SubSequence: Collection) \end{Verbatim} We can continue this process and generate all conformance paths of length 4. This time around, the paths on lines~1, 3 and 7 above do not generate any new conformance paths, because the \texttt{IteratorProtocol} and \texttt{Comparable} protocols do not have any associated conformance requirements. The other conformance paths generate new conformance paths of length 4. In fact this generic signature generates infinitely many conformance paths of arbitrary length, because the \texttt{Collection} protocol has two recursive conformance requirements: \begin{verbatim} Self.Indices: Collection Self.SubSequence: Collection \end{verbatim} \end{example} We can compute a conformance path by enumerating the (possibly infinite) sequence of conformance paths, computing the canonical type parameter/protocol pair for each one. As soon a we encounter a conformance path for the type parameter/protocol pair we are looking for, we stop enumeration. This process is guaranteed to terminate as long as the type parameter actually conforms to the protocol, because the conformance path enumeration is exhaustive. The conformance path that will be found is also guaranteed to be the \emph{shortest} such conformance path with respect to the enumeration order. This algorithm seems extremely inefficient, but we can improve upon it with two simple modifications: \begin{enumerate} \item We can implement the conformance path enumeration as a coroutine, and memoize the results in a side table. We first check if the side table contains an entry for the given type/protocol pair. If not, we resume the enumeration coroutine, which stops when it finds a conformance path for the given type parameter/protocol pair, memoizing any other conformance path it encounters along the way. \item If enumeration encounters a conformance path with a canonical type parameter/protocol pair that already exists in the side table, we do not have to consider any new conformance paths generated from this conformance path, because we will have already found shorter equivalents for those conformance paths as well. \end{enumerate} Case~2 comes up in Example~\ref{collectionconformancepaths}. The \texttt{Collection} protocol defines a same-type requirement \texttt{Self.SubSequence.SubSequence == Self}. When our algorithm enumerates all conformance paths of length 3, one of the paths is the following: \begin{Verbatim} (T: Collection)(Self.SubSequence: Collection)(Self.SubSequence: Collection) \end{Verbatim} The corresponding type parameter \texttt{T.SubSequence.SubSequence} is canonically equal to \texttt{T}, and we've already seen a path for \texttt{T:\ Collection}; it was our initial path of length 1. This means we don't have to explore any paths with the above prefix when enumerating paths of length 4. The sequence of conformance paths is still infinite after this optimization, but now the only paths of arbitrary length arise from recursive applications of the \texttt{Self.Indices:\ Collection} conformance requirement. Putting everything together, we can write down the final algorithm for computing conformance paths. \begin{algorithm}[Conformance path computation] The algorithm operates on a fixed generic signature, and takes a type parameter $T$ and a protocol $P$ as input. The output is a conformance access path for $(T: P)$. This algorithm maintains the following persistent state for each generic signature: \begin{itemize} \item A dictionary where the keys are type parameter/protocol pairs and the values are conformance paths. Initially empty. \item A non-negative integer $N$, representing the longest conformance path stored in the dictionary. Initially 0. \item A growable array $B$ storing all conformance paths of length $N$. Initially empty. \item A growable array $B'$ storing all conformance paths of length $N+1$. Initially empty. \end{itemize} We can assume that $T$ is a canonical type parameter. If $T$ is not canonical, replace $T$ with its canonical form before proceeding. \begin{enumerate} \item (Check) If the dictionary contains an entry for $(T: P)$, return the conformance path associated with this entry and stop. \item (Initialize) If $N=0$, initialize $B$ with the conformance requirements of the generic signature, and set $N:=1$. \item (Record) For each conformance path $c\in B$, compute the canonical type parameter and protocol $(T_c: P_c)$ of $c$ using Algorithm~\ref{invertconformancepath}. If the dictionary does not contain an entry with key $(T_c: P_c)$, \begin{enumerate} \item add an entry with key $(T_c: P_c)$ and value $c$ to the dictionary, \item for every associated conformance requirement $(T_c': P_c')$ of protocol $P_c$, append $(T_c': P_c')$ to $c$ to get a new conformance access path $c'$, and insert $c'$ at the end of $B'$. \end{enumerate} \item (Refill) Swap the contents of $B$ with $B'$, and clear $B'$. Increment $N$. \item (Retry) Go back to Step~1. \end{enumerate} \end{algorithm} TODO: \begin{itemize} \item Intuitition for abstract conformances -- generic function calls another generic function and fulfills a generic requirement with one of its own conformances \item Abstract conformances, looking up conformance in substitution map \item The abstract conformance representational problem This means the type was a type parameter (or later in Chapter~\ref{genericenv}, an archetype). \end{itemize} \fi \section{Recursive Conformances}\label{recursive conformances} \ifWIP The intuitive notion of a ``protocol with an associated type that conforms to itself'' leads us to consider the \emph{protocol dependency graph}. We can imagine a directed graph where the vertices are protocols, and edges $\texttt{P}\rightarrow\texttt{Q}$ connect protocol \texttt{P} to protocol \texttt{Q} if \texttt{P} declares a conformance requirement whose right hand side is \texttt{Q}. An edge is a special case of a path through a directed graph, so in this case we know there is a path from \texttt{P} to \texttt{Q}. If there is also a path back from \texttt{Q} to \texttt{P}, possibly involving multiple edges, our conformance requirement is a \emph{recursive conformance requirement}. We saw in Section~\ref{computing conformance paths} how recursive conformance requirements generate infinitely many conformance paths. Starting from a single root concrete conformance, we can evaluate each of these conformance paths by recursively following paths of associated conformance requirements. The set of all unique conformances reachable by following paths of associated conformances is possibly infinite. Of course, the graph must be represented in memory with finite data structures, so its structure can be understood. In particular, there is only ever a finite set of normal conformances, so as we traverse the farther reaches of the infinite graph, we will at most encounter those same normal conformances over and over, sometimes dressed up in more and more elaborate substitution maps. If a conformance requirement is recursive, the associated conformance must model this recursion somehow. The three interesting scenarios are when an associated conformance is be abstract, normal, or specialized: \begin{enumerate} \item If the associated conformance is abstract, the conformance requirement is fulfilled by a conformance requirement in the concrete type's generic signature. Abstract conformances do not store any further information, so the recursion bottoms out here. \item If the associated conformance is normal, then this normal conformance will in turn have at least one other recursive associated conformance. This recursion will either eventually hit Case~1, or it will refer back to a normal conformance already seen. \item A specialized conformance is a normal conformance together with a substitution map, so we're back in Case~2. The substitution map might also store conformances recursively, so a normal conformance might point back to itself indirectly through the substitution map of an associated conformance. \end{enumerate} The associated conformances of a normal conformance are stored in an array whose entries correspond to conformance requirements in the protocol's requirement signature. Circular normal conformances are constructed in two stages. For normal conformance written in source, the conformance comes into existence when the conformance lookup table for the type declaration is being built, making the conformance available to lookup very early. The second phase initialization of an existing conformance looks up each associated conformance and stores it into the array. The subject type of a conformance requirement in a protocol is a type parameter in the protocol's generic signature; substituting the protocol \texttt{Self} type with the concrete type gives the right type to perform the lookup on. By the time associated conformances are being built, all normal conformances already exist, but some have not yet been fully initialized. In this manner, cycles are introduced. \begin{listing}\captionabove{A recursive conformance requirement and some recursive associated conformance witnesses}\label{recursive conformance basic} \begin{Verbatim} protocol P { associatedtype A: P } struct X: P { typealias A = Y } struct Y: P { typealias A = Y } struct G: P {} \end{Verbatim} \end{listing} \begin{example} Listing~\ref{recursive conformance basic} shows a protocol \texttt{P} with a recursive conformance requirement \texttt{Self.A:\ P}. This requirement generates an infinite sequence of conformance paths in the generic signature \texttt{}: \begin{verbatim} (Self: P) (Self: P)(Self.A: P) (Self: P)(Self.A: P)(Self.A: P) ... \end{verbatim} The three types \texttt{X}, \texttt{Y} and \texttt{G} witness this conformance requirement in different ways. In both normal conformances \texttt{X:\ P} and \texttt{Y:\ P}, the associated conformance of \texttt{Self.A:\ P} is the second normal conformance \texttt{Y:\ P}. This means that if you start with either normal conformance and evaluate longer and longer conformance paths composed of \texttt{(Self.A:\ P)}, you will end up at \texttt{Y:\ P} after one step; nothing points back at \texttt{X:\ P}. TODO: figure In \texttt{G}, the associated type \texttt{A} is witnessed by the generic parameter named \texttt{A}. To avoid confusion, from now on the generic parameter will be written as its canonical type, \ttgp{0}{0}. This generic parameter fulfills the conformance requirement in the protocol, which directly follows from \texttt{G}'s generic signature, \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ P>}. This makes the associated conformance an abstract conformance. The normal conformance \texttt{G<\ttgp{0}{0}>:\ P} does contain any circular data structure at all. The behavior of \texttt{G<\ttgp{0}{0}>:\ P} with respect to our infinite sequence of conformance paths can be understood by starting from a specialized type, like \texttt{G}. The specialized conformance \texttt{G:\ P} stores the normal conformance \texttt{G<\ttgp{0}{0}>:\ P} and the context substitution map for \texttt{G}. Since the generic signature of \texttt{G} has a requirement \texttt{\ttgp{0}{0}:\ P}, the context substitution map stores the conformance \texttt{X:\ P}. TODO: figure \begin{enumerate} \item To compute the associated conformance \texttt{Self.A:\ P} of the specialized conformance \texttt{G:\ P}, we start from the normal conformance. \item Remember that in the normal conformance, this associated conformance is abstract, and the type of \texttt{Self.A} is \ttgp{0}{0}. \item To compute the substituted associated conformance from an abstract conformance, we first compute a conformance path in the generic signature of \texttt{G}, then apply it to our substitution map. \item The original type is \ttgp{0}{0} and the conformance path is the one-step path \texttt{(\ttgp{0}{0}:\ P)}, which recovers the \texttt{X:\ P} normal conformance stored in the substitution map. \end{enumerate} With only a few details changed, a similar explanation shows that the associated conformance \texttt{Self.A:\ P} of \texttt{G>:\ P} is \texttt{G:\ P}. If you start from \texttt{G>:\ P}, you get to \texttt{G:\ P}, then \texttt{X:\ P}, then finally \texttt{Y:\ P}, which points at itself. The normal conformances of \texttt{X} and \texttt{Y} encode very simple graphs, and while \texttt{G} makes things more interesting with the appearance of specialized conformances, even something like \texttt{G>>>>:\ P} generates finitely many unique reachable conformances. Each associated conformance step makes for a simpler conforming type with a smaller substitution map, until the process reaches \texttt{X:\ P} above. TODO: figure \begin{listing}\captionabove{A normal conformance with a specialized associated conformance}\label{normal associated specialized} \begin{Verbatim} protocol P { associatedtype A: P } struct Y: P { typealias A = Y } struct G: P { typealias A = G> } \end{Verbatim} \end{listing} \end{example} \begin{example} Listing~\ref{normal associated specialized} demonstrates it is possible to encode a conformance with infinitely many reachable conformances. The associated conformance \texttt{Self.A:\ P} of the normal conformance \texttt{G<\ttgp{0}{0}>} is the specialized conformance \texttt{G>:\ P}. This specialized conformance points back at the original normal conformance, creating a cycle. But there is another cycle. The specialized conformance stores the context substitution map of \texttt{G>}, which replaces \ttgp{0}{0} with \texttt{G<\ttgp{0}{0}>}. The generic signature has a conformance requirement, so this substitution map stores a reference to the normal conformance \texttt{G<\ttgp{0}{0}>:\ P}, too. TODO: figure If the associated conformance of \texttt{G<\ttgp{0}{0}>:\ P} is \texttt{G>}, then the associated conformance of \texttt{G>:\ P} is \texttt{G>>:\ P}. Following this path produces an infinite sequence of specialized conformances that all reference the same normal conformance, but equip it with bigger and bigger substitution maps. Representationally, each such specialized conformance of \texttt{G} points back at the specialized conformance with one \emph{fewer} nested application of \texttt{G<>} via its substitution map. TODO: figure While the substitution process can construct arbitrarily large specialized conformances, ultimately only a finite number of conformance paths, all of which have finite length, will be evaluated during the compilation of the program. \end{example} \begin{listing}\captionabove{Non-terminating associated conformance}\label{non-terminating associated conformance} \begin{Verbatim} protocol P { associatedtype A: P } struct X: P { typealias A = G } struct G: P { typealias A = T.A.A } func f() -> G.A { // what does this return? } \end{Verbatim} \end{listing} \begin{example} The previous example was wild but remained relatively well-behaved; a finite conformance path could always be resolved to an associated conformance, after an arbitrary but ultimately finite amount of work. Unfortunately, Listing~\ref{non-terminating associated conformance} proves this is not always the case. The setup is another slight variation on the earlier two examples. The associated conformance of the normal conformance \texttt{X:\ P} is the specialized conformance \texttt{G:\ P}. The normal conformance \texttt{G<\ttgp{0}{0}>:\ P} has a type parameter as the type witness for \texttt{Self.A}, so the associated conformance for \texttt{Self.A} is abstract. TODO: figure Now then, consider what happens if we try to compute the associated conformance of the \texttt{G:\ P} specialized conformance. \begin{enumerate} \item The underlying normal conformance is \texttt{G<\ttgp{0}{0}>:\ P}. The substitution map stores the replacement $\ttgp{0}{0}:=\texttt{X}$ and the conformance \texttt{X:\ P} in its substitution map. \item The underlying associated conformance is abstract. \item When the underlying associated conformance is abstract, we need to compute a conformance path and apply it to the substitution map. \item Substituting $\texttt{Self}:=\texttt{\ttgp{0}{0}}$ in the subject type \texttt{Self.A} yields the type witness \texttt{\ttgp{0}{0}.A.A} in the generic signature of \texttt{G}. \item The conformance path for \texttt{\ttgp{0}{0}.A.A:\ P} in the generic signature of \texttt{G} has three steps: \begin{quote} \texttt{(\ttgp{0}{0}:\ P)(Self.A:\ P)(Self.A:\ P)} \end{quote} \item The first step of the path loads the root conformance \texttt{X:\ P} from the substitution map. \item The second step loads the \texttt{Self.A:\ P} associated conformance for \texttt{X:\ P}, which is \texttt{G:\ P}. \item The third component needs to compute the \texttt{Self.A:\ P} associated conformance for \texttt{G:\ P}, but this is precisely where we started in Step~1. \end{enumerate} \end{example} At the time of writing, this problem remains and the compiler terminates with a stack overflow during the resolution of the type \texttt{G.A} in the above example. This raises two open questions: \begin{enumerate} \item Are all diverging examples sufficiently simple that we could detect and diagnose them during conformance checking without imposing new restrictions on previously-valid programs? \item If type substitution can actually encode arbitrary computation and not just infinite loops, the first question reduces to the halting problem and becomes impossible. In that scenario, what changes to the generics model would be needed to make type substitution sound and decidable while accepting as many existing programs as possible? \end{enumerate} \section{Runtime Representation} TODO: \begin{itemize} \item An example \item Concrete conformance references witness table directly vs abstract conformance evaluates conformance path \end{itemize} Conformance paths are also used in code generation for unspecialized generic functions. Unlike languages such as C++ and Rust which implement generics by specialization, generic functions can be separately compiled in Swift. In the general case, there is a single unspecialized entry point that receives all type arguments at runtime. The calling convention has the same structure as a substitution map. Just like substitution maps represent a concrete specialization of a generic signature at compile time, the calling convention represents a concrete specialization at runtime. Two kinds of hidden parameters are passed in addition to the formal parameters declared by the user: \begin{itemize} \item For every generic parameter, a pointer to \emph{type metadata} for the concrete type argument. \item For every conformance requirement in the function's generic signature, a pointer to a \emph{witness table} for the conformance. \end{itemize} Type metadata stores the size and alignment, and a table of \emph{value witness functions} for common operations that can be performed on all values, such as copying a value, moving a value, and destroying a value. Similarly, a witness table is the runtime representation of a concrete conformance; the layout of a witness table is derived from the requirement signature of the protocol: \begin{itemize} \item For every associated type, an accessor function that returns type metadata for the concrete type witness. \item For every conformance requirement, an accessor function that returns the witness table for the associated conformance. \item For every value requirement, a function pointer to the implementation of the witness. \end{itemize} The generated code for a generic function uses type metadata and witness tables to abstractly manipulate values of generic type and invoke protocol requirements. Now consider the problem of looking up the witness table for a type parameter that does not directly appear in a function's generic signature. Substitution maps follow a conformance path to perform this lookup at compile time. Similarly, code generation for an unspecialized generic function uses a conformance path to perform the same lookup at run time. A conformance path translates to code as follows: \begin{itemize} \item The first step is a conformance requirement in the function's generic signature. Conformance requirements correspond to witness table parameters, which can be referenced directly. \item Each subsequent step is an associated conformance requirement in a protocol's requirement signature. This corresponds to loading the associated conformance from a witness table. (In reality, this is not a simple load of a pointer. Recursive conformances and library evolution complicate matters, because witness tables for associated conformances are instantiated lazily and offsets of witness table entries cannot be hard coded in client code if the protocol comes from a different module. So the load of the associated conformance is accomplished by calling an entry point in the Swift runtime, which takes the witness table and an associated conformance descriptor.) \end{itemize} There is a lot more to say about the implementation of runtime type information and separately-compiled generic functions. For now, the best resource is a recording of a talk from the LLVM Developer's Conference \cite{llvmtalk}. \section{Source Code Reference} TODO: \fi \chapter{Opaque Return Types}\label{opaqueresult} \ifWIP TODO: \begin{itemize} \item Say a few words about how the underlying type is inferred \item Joe's thing where an opaque return type of a generic method cannot fulfill an associated type requirement \end{itemize} An opaque return type hides a fixed concrete type behind a generic interface. Opaque return types are declared by defining a function, property or subscript return type with the \texttt{some} keyword: \begin{Verbatim} func foo() -> some P {...} var bar: some P {...} subscript() -> some P {...} \end{Verbatim} Opaque return types were first introduced in Swift 5.1 \cite{se0244}. The feature was generalized to allow occurrences of \texttt{some} structurally nested in other types, as well as multiple occurrences of \texttt{some}, in Swift 5.7 \cite{se0328}. The type that follows \texttt{some} is a constraint type, as defined in Section~\ref{constraints}. The underlying type is inferred from \texttt{return} statements in the function body. There must be at least one return statement; if there is more than one, all must return the same concrete type. At the implementation level, a declaration has an associated \emph{opaque return type declaration} if \texttt{some} appears at least once in the declaration's return type. An opaque return type declaration stores three pieces of information: \begin{enumerate} \item A generic signature describing the \emph{interface} of the opaque return type, called the \emph{opaque interface generic signature}. \item A generic environment instantiated from the opaque generic signature, called the \emph{opaque generic environment}, mapping each generic parameter to its opaque archetype. (Opaque generic environments also store a substitution map, described in the next section.) \item A substitution map for this generic signature, called the \emph{underlying type substitution map}, mapping each generic parameter to its underlying type. The underlying type substitution map is the \emph{implementation} of the opaque type declaration, and callers from other modules cannot depend on its contents. (This is different from the substitution map stored inside the generic environment.) \end{enumerate} The opaque interface generic signature is built from the generic signature of the original function, property or subscript (the owning declaration). Each occurrence of the \texttt{some} keyword introduces a new generic parameter with a single requirement relating the generic parameter to the constraint type. The opaque generic environment describes the type of an opaque return type. When computing the interface type of the owner declaration, type resolution replaces each occurrence of the \texttt{some} keyword in the return type with the corresponding opaque archetype. The underlying substitution map is computed by analyzing the concrete types returned by one or more \texttt{return} statements appearing in the owner declaration's body. The underlying type substitution map is only needed when emitting the owner declaration, not when referencing the owner declaration. It is not computed if the body of the declaration appears in a secondary source file in batch mode (because the body is not parsed or type checked), or if the declaration was parsed from a \texttt{swiftinterface} file (because declaration bodies are not printed in module interfaces). \begin{example} Consider the following declaration: \begin{Verbatim} struct Farm { var horses: [Horse] = [] var hungryHorses: some Collection { return horses.lazy.filter(\.isHungry) } } \end{Verbatim} The \texttt{hungryHorses} property has an associated opaque result type declaration, because \texttt{some} appears in its return type. The property appears in a non-generic context, so there is no parent generic signature. The return type has a single occurrence of \texttt{some}, so the opaque interface generic signature has a single generic parameter \texttt{\ttgp{0}{0}}. The constraint type is \texttt{Collection}, so the sugared generic requirement \texttt{\ttgp{0}{0}:\ Collection} desugars to a pair of requirements. The opaque generic signature is \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Collection, \ttgp{0}{0}.[Sequence]Element == Horse>}. \end{quote} The return statement's underlying type is \texttt{LazyFilterSequence>}, so the substitution map is \begin{quote} \SubMapC{ \SubType{\ttgp{0}{0}}{LazyFilterSequence>} }{ \SubConf{LazyFilterSequence>:\ Collection} } \end{quote} The opaque generic environment has a single opaque archetype \archetype{\ttgp{0}{0}} corresponding to the opaque interface generic signature's single generic parameter \ttgp{0}{0}. The interface type of the declaration \texttt{hungryHorses} is the opaque archetype \archetype{\ttgp{0}{0}}. \end{example} \begin{example} Consider the following declaration: \begin{Verbatim} func makePair(first: T, second: T) -> (some Collection, some Collection) { return ([first], [second]) } \end{Verbatim} The generic signature of the \texttt{makePair} function is \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable>} \end{quote} The opaque interface generic signature is constructed from this with two additional generic parameters and requirements added for each of the two occurrences of \texttt{some} in the return type: \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{1}{0}, \ttgp{1}{1} where \ttgp{0}{0}:\ Equatable, \ttgp{1}{0}:\ Collection, \ttgp{1}{1}:\ Collection, \ttgp{1}{0}.[Collection]Element == \ttgp{1}{1}.[Collection]Element, \ttgp{1}{1}.[Collection]Element == \ttgp{0}{0}>} \end{quote} The substitution map sends the outer generic parameter \ttgp{0}{0} to itself, and the two inner generic parameters \ttgp{1}{0}, \ttgp{1}{1} both to \texttt{Array<\ttgp{0}{0}>}. \begin{quote} \SubMapC{ \SubType{\ttgp{0}{0}}{\ttgp{0}{0}}\\ \SubType{\ttgp{1}{0}}{Array<\ttgp{0}{0}>}\\ \SubType{\ttgp{1}{1}}{Array<\ttgp{0}{0}>} }{ \SubConf{\ttgp{0}{0}:\ Equatable} } \end{quote} The opaque generic environment has two opaque archetypes \archetype{\ttgp{1}{0}} and \archetype{\ttgp{1}{1}}. The interface type of the function is \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable> (\ttgp{0}{0}) -> (\archetype{\ttgp{1}{0}}, \archetype{\ttgp{1}{1}})} \end{quote} \end{example} \fi \section{Opaque Archetypes}\label{opaquearchetype} \ifWIP TODO: \begin{itemize} \item opaque archetypes: global extent and same underlying concrete type \item primary archetypes: lexically scoped and bound by caller \item every type written with primary archetypes has an equivalent interface type representation. opaque archetypes don't correspond to any type parameter in our generic context's signature, etc. \end{itemize} Opaque archetypes appear inside interface types, unlike primary archetypes. Also, unlike primary archetypes, opaque archetypes do not represent type parameters to be substituted by the caller. They behave differently from primary archetypes in two important respects: \begin{itemize} \item The \texttt{TypeBase::hasArchetype()} predicate does not detect their presence, since this predicate is asserted to be false for interface types of declarations. To check if a type contains opaque archetypes, use \texttt{TypeBase::hasOpaqueArchetype()}. \item The \texttt{Type::subst()} method does not replace opaque archetypes by default. For situations where opaque archetypes need to be replaced, \texttt{subst()} takes an optional set of flags. The \texttt{SubstFlags::SubstituteOpaqueArchetypes} flag can be passed in to enable replacement of opaque archetypes. Usually the lower level two-callback form of \texttt{subst()} is used with this flag, instead of the variant taking a substitution map. To differentiate between the two behaviors, let's call them ``opaque archetype replacement'' and ``opaque archetype substitution,'' respectively. \end{itemize} Opaque archetype substitution is the default and common case, but there is less to say about opaque archetype replacement, so let's discuss it first. \paragraph{Opaque archetype replacement} A notable appearance of the first behavior is for an optimization performed during SIL lowering. When a usage of an opaque return type appears in the same compilation unit as the definition (the same source file in batch mode, or the same module in whole-module mode), the opaque archetype can be safely replaced with its underlying type. This replacement is performed after type checking; the abstraction boundary between the opaque archetype's interface and implementation still exists as far as the type checker is concerned, but SIL optimizations can generate more efficient code with knowledge of the underlying type. SIL type lowering implements this with appropriately-placed calls to \texttt{Type::subst()} with an instance of the \texttt{ReplaceOpaqueTypesWithUnderlyingTypes} functor as the type replacement callback, and the \texttt{SubstFlags::SubstituteOpaqueArchetypes} flag set. \paragraph{Opaque archetype substitution} Opaque archetypes are parameterized by the generic signature of the owner declaration. In the general case, the underlying type of an opaque archetype can depend on the generic parameters of the owner declaration. For this reason, each substitution map of the owner declaration's generic signature must produce a different opaque archetype. A new opaque generic environment is instantiated for each combination of an opaque return declaration and substitution map; the substitution map is stored in the opaque generic environment: \[\left(\,\ttbox{OpaqueTypeDecl}\times \ttbox{SubstitutionMap}\,\right) \rightarrow \mathboxed{Opaque \texttt{GenericEnvironment}}\] \begin{algorithm}[Applying a substitution map to an opaque archetype]\label{opaquearchetypesubst} As input, takes an opaque archetype $T$ and a substitution map $S$. As output, produces a new type (which is not necessarily an opaque archetype). \begin{enumerate} \item Let $G$ be the opaque generic environment of $T$. \item Compose the original substitution map of $G$ with $S$ to produce the substituted substitution map $S'$. \item Look up the opaque generic environment for the same opaque return type declaration as $T$ the substituted substitution map $S'$; call it $G'$. \item Map the interface type of $T$ into $G'$ to produce the result, $T'$. \end{enumerate} \end{algorithm} TODO: figure with one generic signature, two generic environments, two archetypes. Arrow from generic signatures to generic environments, arrow from generic environment to archetype labeled ``map type into context''. arrow from one archetype to another labeled ''substitution''. \begin{example} Consider this definition: \begin{Verbatim} func underlyingType(_ t: T) -> some Equatable { return 3 } \end{Verbatim} The original declaration's generic signature has a single generic parameter, and the return type has a single occurrence of \texttt{some}, so the opaque interface generic signature has two generic parameters, both constrained to \texttt{Equatable}: \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{1}{0} where \ttgp{0}{0}:\ Equatable, \ttgp{1}{0}:\ Equatable>} \end{quote} The interface type of \texttt{underlyingType()} is a generic function type with an opaque archetype as the return type: \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable> (\ttgp{0}{0}) -> \$\ttgp{1}{0}} \end{quote} Consider the following three calls: \begin{Verbatim} let x = underlyingType(1) let y = underlyingType(2) let z = underlyingType("hello") \end{Verbatim} The types of \texttt{x}, \texttt{y} and \texttt{z} are constructed by applying substitution maps to the opaque archetype \texttt{\$\ttgp{1}{0}}. For \texttt{x} and \texttt{y}, the substitution map is the following: \begin{quote} \SubMapC{ \SubType{\ttgp{0}{0}}{Int} }{ \SubConf{Int:\ Equatable} } \end{quote} For \texttt{z}, the substitution map is different: \begin{quote} \SubMapC{ \SubType{\ttgp{0}{0}}{String} }{ \SubConf{String:\ Equatable} } \end{quote} Per Algorithm~\ref{opaquearchetypesubst}, two new opaque generic environments are constructed from the opaque return type declaration of \texttt{underlyingType()} with each of the above two substitution maps. The substituted opaque archetypes are constructed by mapping the interface type \texttt{\ttgp{1}{0}} into each of the two opaque generic environments. Indeed, even though the generic parameter \texttt{T} and the value \texttt{t} are completely unused in the body of the \texttt{underlyingType()} function, each call of \texttt{underlyingType()} with a different specialization produces a different type. This can be observed by noting the behavior of the \texttt{Equatable} protocol's \texttt{==} operator; it expects both operands to have the same type: \begin{Verbatim} let x = underlyingType(1) let y = underlyingType(2) print(x == y) // okay let z = underlyingType("hello") print(x == z) // type check error \end{Verbatim} The expression \texttt{x == y} type checks successfully, because \texttt{x} and \texttt{y} have the same type, an opaque archetype instantiated from the declaration of \texttt{underlyingType()} with the substitution \texttt{T := Int}. On the other hand, the expression \texttt{x == z} fails to type check, because \texttt{x} and \texttt{z} have different types; both originate from \texttt{underlyingType()}, but with different substitutions: \begin{itemize} \item the type of \texttt{x} was instantiated with \texttt{T := Int}, \item the type of \texttt{z} was instantiated with \texttt{T := String}. \end{itemize} \end{example} \begin{example} The above behavior might seem silly, since the underlying type of \texttt{underlyingType()}'s opaque return type is always \texttt{Int}, irrespective of the generic parameter \texttt{T} supplied by the caller. However, since opaque return types introduce an abstraction boundary, it is in fact a source-compatible and binary-compatible change to redefine \texttt{underlyingType()} as follows: \begin{Verbatim} func underlyingType(_ t: T) -> some Equatable { return t } \end{Verbatim} Now, the underlying type is \texttt{T}; it would certainly not be valid to mix up the result of calling \texttt{underlyingType()} with an \texttt{Int} and \texttt{String}. \end{example} The \texttt{GenericEnvironment::forOpaqueType()} method creates an opaque generic environment for a given substitution map, should you have occasion to do this yourself outside of the type substitution machinery. The opaque generic environment's substitution map plays a role beyond its use as a uniquing key for creating new opaque archetypes; it is also applied to the ``outer'' generic parameters of the opaque return type's interface signature when they are mapped into context. This is important when a same-type requirement equates an associated type of an opaque return type with a generic parameter of the owner declaration; the substituted opaque archetype will behave correctly when the associated type is projected. \begin{example} In this example, the specialization of the original declaration is ``exposed'' via a same-type requirement on the opaque return type. \begin{Verbatim} func sequenceOfOne(_ elt: Element) -> some Sequence { return [elt] } let result = sequenceOfOne(3) var iterator = result.makeIterator() let value: Int = iterator.next()! \end{Verbatim} Let's walk through the formalities first. The opaque interface generic signature: \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{1}{0} where \ttgp{1}{0}:\ Sequence, \ttgp{1}{0}.Element == \ttgp{0}{0}>} \end{quote} The interface type of \texttt{sequenceOfOne()} has an opaque archetype in return position: \begin{quote} \texttt{<\ttgp{0}{0}> (\ttgp{0}{0}) -> \archetype{\ttgp{1}{0}}} \end{quote} The type of \texttt{result} is the substituted opaque archetype with the substitution map \texttt{T := Int}, which is the substitution map of the call to \texttt{sequenceOfOne()}. For lack of a better notation, call this archetype \archetype{\ttgp{1}{0}}. The type of \texttt{iterator} is calculated by applying a substitution replacing the protocol \texttt{Self} type with the type of \texttt{result} to the return type of the \texttt{makeIterator()} requirement of the \texttt{Sequence} protocol. The type of \texttt{result} is the substituted opaque archetype we're calling \archetype{\ttgp{1}{0}} above, so the type of \texttt{iterator} is the substituted opaque archetype \archetype{\ttgp{1}{0}.Iterator} from the same substituted opaque generic environment as \archetype{\ttgp{1}{0}}. What about \texttt{value}? The \texttt{next()} requirement of the \texttt{IteratorProtocol} protocol returns the \texttt{Self.Element} associated type of \texttt{IteratorProtocol}. We're substituting \texttt{Self} here with the type of \texttt{iterator}, which is \texttt{\ttgp{1}{0}.Iterator}. This means the type of \texttt{value} can be computed by mapping the type parameter \texttt{\ttgp{1}{0}.Iterator.Element} into the substituted opaque generic environment. The type parameter \texttt{\ttgp{0}{0}.Iterator.Element} is equivalent to \texttt{\ttgp{0}{0}} in the opaque interface generic signature. So mapping \texttt{\ttgp{1}{0}.Iterator.Element} into our substituted opaque generic environment applies the substitution map to the interface type \texttt{\ttgp{0}{0}}. This is just \texttt{Int}. So the type of \texttt{value} is \texttt{Int}! \end{example} \fi \section{Referencing Opaque Archetypes}\label{reference opaque archetype} \ifWIP Opaque return types are different from other type declarations in that the \texttt{some P} syntax serves to both declare an opaque return type, and immediately reference the declared type. It is however possible to reference an opaque return type of an existing declaration from a different context. The trick is to use associated type inference to synthesize a type alias whose underlying type is the opaque return type, and then reference this type alias. This can be useful when writing tests to exercise an opaque return type showing up in compiler code paths that might not expect them. \begin{example} The normal conformance \texttt{ConcreteP:\ P} in Listing~\ref{reference opaque return type} shows how an opaque archetype can witness an associated type requirement. The method \texttt{ConcreteP.f()} witnesses the protocol requirement \texttt{P.f()}. The return type of \texttt{ConcreteP.f()} is a tuple type of two opaque archetypes, and the type witnesses for the \texttt{X} and \texttt{Y} associated types are inferred to be the first and second of these opaque archetypes, respectively. Associated type inference synthesizes two type aliases, \texttt{ConcreteP.X} and \texttt{ConcreteP.Y}, which are then referenced further down in the program: \begin{enumerate} \item The global variable \texttt{mince} has an explicit type \texttt{(ConcreteP.X,~ConcreteP.Y)}. \item The function \texttt{pie()} declares a same-type requirement whose right hand side is the type alias \texttt{ConcreteP.X}. \end{enumerate} \begin{listing}\captionabove{Referencing an opaque return type via associated type inference}\label{reference opaque return type} \begin{Verbatim} public protocol P { associatedtype X: Q associatedtype Y: Q func f() -> (X, Y) } public protocol Q {} public struct ConcreteP: P { public func f() -> (some Q, some Q) { return (FirstQ(), SecondQ()) } } public struct FirstQ: Q {} public struct SecondQ: Q {} public let mince: (ConcreteP.X, ConcreteP.Y) = ConcreteP().f() public func pie(_: S) where S.Element == ConcreteP.X {} \end{Verbatim} \end{listing} \end{example} \index{synthesized declaration} The above trick allows referencing opaque return types, albeit indirectly. Is there a way to write down the underlying type of the type aliases \texttt{ConcreteP.X} and \texttt{ConcreteP.Y}? The answer is yes, but only in module interface files and textual SIL, not source code. Module interface files explicitly spell out all type aliases synthesized by associated type inference, avoiding the need to perform associated type inference when building the interface file in another compilation job. Textual SIL similarly needs to spell out the type of the value produced by each SIL instruction. \index{symbol mangling} \index{mangled name} A direct reference to an opaque return type is expressed in the grammar as a type attribute encoding the mangled name of the owner declaration together with an index: \begin{quote} \texttt{@\_opaqueReturnTypeOf("\underline{mangled name}", \underline{index}) \underline{identifier}} \end{quote} The mangled name unambiguously identifies the owner declaration. The index identifies a specific opaque archetype among several when the owner declaration's return type contains multiple occurrences of \texttt{some}. The identifier is ignored; in the Swift language grammar, a type attribute must apply to some underlying type representation, so by convention module interface printing outputs ``\texttt{\_\_}'' as the underlying type representation. \begin{example} Listing~\ref{reference opaque return type from interface} shows the generated module interface for Listing~\ref{reference opaque return type}, with some line breaks inserted for readability. \begin{listing}\captionabove{References to opaque return types in a module interface}\label{reference opaque return type from interface} \begin{Verbatim} public protocol P { associatedtype X : mince.Q associatedtype Y : mince.Q func f() -> (Self.X, Self.Y) } public protocol Q { } public struct ConcreteP : mince.P { public func f() -> (some mince.Q, some mince.Q) public typealias X = @_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 0) __ public typealias Y = @_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 1) __ } public struct FirstQ : mince.Q { } public struct SecondQ : mince.Q { } public let mince: (@_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 0) __, @_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 1) __) public func pie(_: S) where S : Swift.Sequence, S.Element == @_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 0) __ \end{Verbatim} \end{listing} \end{example} A direct reference to a substituted opaque archetype is written like a generic argument list following the identifier. The generic arguments correspond to the flattened list of generic parameters in the generic signature of the opaque archetype's owner declaration. \begin{example} In Listing~\ref{substituted opaque archetype reference}, the conformance is declared on the \texttt{Derived} class, but the type witness for \texttt{X} is an opaque archetype from a method on \texttt{Outer.Inner}. The superclass type of \texttt{Derived} is \texttt{Outer.Inner}, so a substitution map is applied to the opaque archetype: \begin{quote} \SubMap{ \SubType{T}{Int}\\ \SubType{U}{String} } \end{quote} In the module interface file, this prints as the generic argument list \texttt{}, as shown in Listing~\ref{substituted opaque archetype reference interface}. \end{example} \begin{listing}\captionabove{Source code with a substituted opaque archetype as a type witness}\label{substituted opaque archetype reference} \begin{Verbatim} public protocol P { associatedtype X: Q func f() -> X } public protocol Q {} public struct ConcreteQ: Q {} public class Outer { public class Inner { public func f() -> some Q { return ConcreteQ() } } } public class Derived: Outer.Inner, P {} \end{Verbatim} \end{listing} \begin{listing}\captionabove{Module interface with a substituted opaque archetype as a type witness}\label{substituted opaque archetype reference interface} \begin{Verbatim} public protocol P { associatedtype X : mince.Q func f() -> Self.X } public protocol Q { } public struct ConcreteQ : mince.Q { } public class Outer { public class Inner { public func f() -> some mince.Q } } public class Derived : mince.Outer.Inner, mince.P { public typealias X = @_opaqueReturnTypeOf("$s5mince5OuterC5InnerC1fQryF", 0) __ } \end{Verbatim} \end{listing} \section{Runtime Representation} At runtime, an instance of an opaque archetype must be manipulated abstractly, similar to a generic parameter. This mechanism allows the underlying type of an opaque return type to change without breaking callers in other modules. Recall that an opaque type declaration consists of an opaque interface generic signature, and an underlying type substitution map for this generic signature. The opaque interface generic signature is the \emph{interface} of the opaque type declaration. The underlying type substitution map is the \emph{implementation} of the opaque type declaration. For each generic parameter and conformance requirement, the compiler emits an accessor function. Each accessor function returns the corresponding concrete type metadata or witness table from the substitution map. Opaque archetypes are also parameterized by the owner declaration's generic signature. The generic parameters and conformance requirements of the owner declaration become the input parameters of these accessor functions. Note the symmetry here between a function's ``input'' generic parameters and conformance requirements, which become input parameters, and the opaque type declaration's ``output'' generic parameters and conformance requirements, which become calls to accessor functions. The caller provides a substitution map for the ``input'' parameters by passing in concrete type metadata and witness tables. The opaque type declaration provides a substitution map for the ``output'' parameters by emitting an accessor function to return the concrete type metadata and witness tables. TODO: figure \begin{example} The following generic function declares an opaque return type: \begin{Verbatim} func uniqueElements(_ elts: [E]) -> some Sequence {...} \end{Verbatim} The calling convention for \texttt{uniqueElements()} receives the type metadata for \texttt{E} together with a witness table for \texttt{E:\ Hashable} as lowered arguments. The return value is an instance of an opaque archetype, and is returned indirectly. In order to allocate a buffer of the correct size to hold the return value prior to making the call and to manipulate the return value after the call, the caller invokes the opaque type metadata accessor for \texttt{uniqueElements()}. The metadata accessor also takes the type metadata for \texttt{E} together with a witness table for \texttt{E:\ Hashable}, since the underlying type is parameterized by the generic signature of \texttt{uniqueElements()}. Finally, the witness table for the conformance of the underlying type to \texttt{Sequence} is obtained by calling the opaque type witness table accessor for \texttt{uniqueElements()}, which again takes the type metadata for \texttt{E} together with a witness table for \texttt{E:\ Hashable}. \end{example} \section{Source Code Reference} TODO: \begin{description} \item[\texttt{TypeBase}] The base class of the Swift type hierarchy. \begin{itemize} \item \texttt{hasOpaqueArchetype()} Returns true if the type contains an opaque archetype. \end{itemize} \item[\texttt{OpaqueTypeArchetypeType}] The class of opaque archetypes. \begin{itemize} \item \texttt{getOpaqueDecl()} Returns the opaque type declaration that owns this archetype. \item \texttt{getSubstitutions()} Returns substitutions applied to this archetype's generic environment. Initially this is an identity substitution map. \end{itemize} \item[\texttt{OpaqueTypeDecl}] An opaque type declaration. \begin{itemize} \item \texttt{getNamingDecl()} Returns the original declaration having this opaque return type. \item \texttt{getOpaqueInterfaceGenericSignature()} Returns the generic signature describing the opaque return types and their requirements. \item \texttt{getUniqueUnderlyingTypeSubstitutions()} Returns the substitution map describing the underlying types of the opaque archetypes. Will return \texttt{None} if the underlying types have not been computed yet (or if they will never be computed because the original declaration's body is not available). \end{itemize} \item[\texttt{GenericEnvironment}] A mapping from type parameters to archetypes with respect to a generic signature. \begin{itemize} \item \texttt{forOpaqueType()} Returns the unique opaque generic environment for an opaque return type declaration and substitution map. \end{itemize} \end{description} \fi \chapter{Existential Types}\label{existentialtypes} \ifWIP As every Swift developer knows, protocols serve a dual purpose in the language: as generic constraints, and as the types of values. The latter feature, formally known as existential types, is the topic of this chapter. An existential type can be thought of as a container for values which satisfy certain requirements. Existential types were borrowed from Objective-C, and have been part of the Swift language since the beginning, in the form of protocol types and protocol compositions. This feature has an interesting history. The protocols that could be used as types were initially restricted to those without associated types, or requirements with \texttt{Self} in non-covariant position (the latter rules out \texttt{Equatable} for example). This meant that the implementation of existential types was at first rather disjoint from generics. As existential types gained the ability to state more complex constraints over time, the two sides of protocols converged. Protocol compositions were originally written as \texttt{protocol} for a value of a type conforming to both protocols \texttt{P} and \texttt{Q}. The modern syntax for protocol compositions \texttt{P~\&~Q} was introduced in Swift 3 \cite{se0095}. Protocol compositions with superclass terms were introduced in Swift 4 \cite{se0156}. The spelling \texttt{any P} of an existential type, to distinguish from \texttt{P} the constraint type, was introduced in Swift 5.6 \cite{se0355}. This was followed by Swift 5.7 allowing all protocols to be used as existential types \cite{se0309}, and introducing implicit opening of existential types \cite{se0352}, and constrained existential types \cite{se0353}. An existential type is written with the \texttt{any} keyword followed by a constraint type, which is a concept previously defined in Section~\ref{constraints}. For aesthetic reasons, the \texttt{any} keyword can be omitted if the constraint type is \texttt{Any} or \texttt{AnyObject}, since \texttt{any~Any} or \texttt{any~AnyObject} looks funny. For backwards compatibility, \texttt{any} can also be omitted if the protocols appearing in the constraint type do not have any associated types or requirements with \texttt{Self} in non-covariant position. \paragraph{Type representation} Existential types are instances of \texttt{ExistentialType}, which wraps a constraint type. Even in the cases where \texttt{any} can be omitted, type resolution will wrap the constraint type in \texttt{ExistentialType} when resolving a type in a context where the type of a value is expected. If the constraint type is a protocol composition with a superclass term, or a parameterized protocol type, arbitrary types can appear as structural components of the constraint type. This means that the constraint type of an existential type is subject to substitution by \texttt{Type::subst()}. For example, the interface type of the properties \texttt{foo} and \texttt{bar} below are existential types containing type parameters: \begin{Verbatim} struct S { var foo: any Sequence var bar: any Equatable & C } class C {} \end{Verbatim} Existential metatypes, written \texttt{any (P).Type} for some constraint type \texttt{P}, are containers for storing a concrete metatype whose instance type satisfies some requirements. An existential metatype is represented by an instance of \texttt{ExistentialMetatypeType}, which wraps a constraint type similarly to \texttt{ExistentialType}. The metatype of the existential value itself, \texttt{(any P).Type}, is represented as a \texttt{MetatypeType} with an instance type that is an \texttt{ExistentialType}. The special \texttt{Any} type can store an arbitrary Swift value. This ``absence of constraints'' is represented as an existential type with an empty protocol composition as the constraint type. The \texttt{ASTContext::getAnyExistentialType()} method returns this type. The \texttt{AnyObject} type which can store an arbitrary reference-counted pointer is an existential type with a special protocol composition storing a layout constraint as the constraint type. The \texttt{ASTContext::getAnyObjectType()} method returns this type. The \texttt{AnyClass} type in the standard library is a type alias for the existential metatype of \texttt{AnyObject}: \begin{Verbatim} typealias AnyClass = AnyObject.Type \end{Verbatim} \fi \section{Opened Existentials}\label{open existential archetypes} \ifWIP The \emph{opened existential signature} is a generic signature whose substitutions describe the possible concrete types that can be stored inside an existential type. The opened existential signature takes one of two forms, depending on whether the constraint type contains type parameters or not: \begin{enumerate} \item If the constraint type does not contain type parameters, the opened existential signature is a generic signature built from a single generic parameter \texttt{\ttgp{0}{0}} constrained to the constraint type. Note that if the constraint type contains archetypes, they behave essentially like concrete types when they appear inside the opened existential signature. The generic parameter \texttt{\ttgp{0}{0}} is called the \emph{interface type} of the existential. \item If the constraint type contains type parameters from some parent generic signature, the opened existential signature is built by adding a single generic parameter to the parent generic signature. The new parameter has a depth one higher than the depth of the last generic parameter of the parent generic signature. In this case, the last generic parameter of the opened existential signature is the interface type of the existential. The first case is in fact a special case of the second, if you consider the parent generic signature to be empty. \end{enumerate} The \texttt{ASTContext::getOpenedArchetypeSignature()} method takes an existential type and an optional parent generic signature as arguments, and returns the opened existential signature. This is a relatively cheap operation used throughout the compiler; the results are cached. \begin{example}\label{existentialsigexample} Some examples of constraint types that do not contain type parameters, and their existential signatures. \begin{enumerate} \item The existential type \texttt{any Equatable} has this existential signature: \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable>} \end{quote} You may recall this is also the generic signature of the \emph{declaration} of the \texttt{Equatable} protocol. This is true of all existential types of the form \texttt{any P} for a protocol \texttt{P}. \item The existential type \texttt{any Equatable \& Sequence} has this existential signature: \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable, \ttgp{0}{0}:\ Sequence>} \end{quote} \item Suppose there is a generic class \texttt{SomeClass} with a single unconstrained generic parameter. The existential type \texttt{any Equatable \& SomeClass} has this existential signature: \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ SomeClass, \ttgp{0}{0}:\ Equatable>} \end{quote} \item The existential type \texttt{any Sequence} has this existential signature: \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Sequence, \ttgp{0}{0}.[Sequence]Element == Int>} \end{quote} \end{enumerate} \end{example} \begin{example} Consider this example: \begin{Verbatim} func foo(x: any Equatable & SomeClass, y: any Sequence) { let xx = x let yy = y } class SomeClass {} \end{Verbatim} The interface type of \texttt{foo()} involves existential types containing type parameters: \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{0}{1}> (any Equatable \& SomeClass<\ttgp{0}{0}>, any Sequence<\ttgp{0}{1}>) -> ()} \end{quote} The existential type \texttt{any Equatable \& SomeClass} has this existential signature: \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{0}{1}, \ttgp{1}{0} where \ttgp{1}{0}:\ SomeClass<\ttgp{0}{0}>>} \end{quote} The existential type \texttt{any Sequence} has this existential signature: \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{0}{1}, \ttgp{1}{0} where \ttgp{0}{1} == \ttgp{1}{0}.[Sequence]Element, \ttgp{1}{0}:\ Sequence>} \end{quote} In both signatures, the interface type of the existential is \texttt{\ttgp{1}{0}}. \end{example} Recall from Chapter~\ref{genericenv} that there are three kinds of generic environments. We've seen primary generic environments, which are associated with generic declarations. We also saw opaque generic environments, which are instantiated from an opaque return declaration and substitution map, in Section~\ref{opaquearchetype}. Now, it's time to introduce the third kind, the opened generic environment. An opened generic environment is created from an opened existential signature of the first kind (with no parent generic signature). The archetypes of an opened generic environment are \emph{opened archetypes}. When the expression type checker encounters a call expression where an argument of existential type is passed to a parameter of generic parameter type, the existential value is \emph{opened}, projecting the value and assigning it a new opened archetype from a fresh opened generic environment. The call expression is rewritten by wrapping the entire call is wrapped in an \texttt{OpenExistentialExpr}, which stores two sub-expressions. The first sub-expression is the original call argument, which evaluates to the value of existential type. The payload value and opened archetype is scoped to the second sub-expression, which consumes the payload value. The call argument is replaced with a \texttt{OpaqueValueExpr}, which has the opened archetype type. The opened archetype also becomes the replacement type for the generic parameter in the call's substitution map. For example, if \texttt{animal} is a value of type \texttt{any Animal}, the expression \texttt{animal.eat()} calling a protocol method looks like this before opening: \begin{quote} \begin{tikzpicture}[% grow via three points={one child at (0.5,-0.7) and two children at (0.5,-0.7) and (0.5,-1.4)}, edge from parent path={[->] (\tikzparentnode.south) |- (\tikzchildnode.west)}] \node [class] {\texttt{\vphantom{p}CallExpr}} child { node [class] {\texttt{\vphantom{p}SelfApplyExpr}} child { node [class] {\texttt{\vphantom{p}DeclRefExpr:\ Animal.eat()}}} child { node [class] {\texttt{\vphantom{p}DeclRefExpr:\ animal}}} } child [missing] {} child [missing] {} child { node [class] {\texttt{\vphantom{p}ArgumentList}}}; \end{tikzpicture} \end{quote} After opening, a new opened generic environment is created for the generic signature \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Animal>}. The entire call is wrapped in a \texttt{OpenExistentialExpr}, the \texttt{self} argument to the call becomes an \texttt{OpaqueValueExpr}, the reference to the \texttt{animal} variable moves up to the \texttt{OpenExistentialExpr}: \begin{quote} \begin{tikzpicture}[% grow via three points={one child at (0.5,-0.7) and two children at (0.5,-0.7) and (0.5,-1.4)}, edge from parent path={[->] (\tikzparentnode.south) |- (\tikzchildnode.west)}] \node [class] {\texttt{\vphantom{p}OpenExistentialExpr}} child { node [class] {\texttt{\vphantom{p}CallExpr}} child { node [class] {\texttt{\vphantom{p}SelfApplyExpr}} child { node [class] {\texttt{\vphantom{p}DeclRefExpr:\ Animal.eat()}}} child { node [class] {\texttt{\vphantom{p}OpaqueValueExpr}}} } child [missing] {} child [missing] {} child { node [class] {\texttt{\vphantom{p}ArgumentList}}} } child [missing] {} child [missing] {} child [missing] {} child [missing] {} child { node [class] {\texttt{\vphantom{p}DeclRefExpr:\ animal}}}; \end{tikzpicture} \end{quote} Not shown in this picture is that the type of the \texttt{OpaqueValueExpr} is an opened archetype type, and the substitution map replacing \ttgp{0}{0} with this opened archetype is stored in the \texttt{DeclRefExpr} for \texttt{Animal.eat()}. An existential value can store different concrete types dynamically, so each call site where an existential value is opened must produce a new opened archetype from a fresh opened generic environment. Opened generic environments are keyed by the opened existential signature together with a unique ID: \[\left(\,\ttbox{GenericSignature}\times \mathboxed{Unique ID}\,\right) \rightarrow \mathboxed{Opened \texttt{GenericEnvironment}}\] The \texttt{GenericEnvironment::forOpenedExistential()} method creates a fresh opened generic environment, should you have occasion to do this yourself outside of the expression type checker. \section{Existential Layouts}\label{existentiallayouts} The compiler selects one of several possible representations for an existential type by analyzing the existential's constraint. The \texttt{TypeBase::getExistentialLayout()} method returns an instance of \texttt{ExistentialLayout}, which encodes the information used to determine the representation. Various methods of \texttt{ExistentialLayout} that are occasionally useful: \begin{description} \item[\texttt{getKind()}] Returns an element of the \texttt{ExistentialLayout::Kind} enum, which is one of \texttt{Class}, \texttt{Error}, or \texttt{Opaque}, corresponding to one of the below representations. \item[\texttt{requiresClass()}] Returns whether this existential type requires the stored concrete type to be a class, that is, whether it uses class representation. \item[\texttt{getSuperclass()}] Returns the existential's superclass bound, either explicitly stated in a protocol composition or declared on a protocol. \item[\texttt{getProtocols()}] Returns the existential's protocol conformances. The protocols in this array are minimized with respect to protocol inheritance, and sorted in canonical protocol order (Definition~\ref{linear protocol order}). \item[\texttt{getLayoutConstraint()}] Returns the existential's layout constraint, if there is one. This is the \texttt{AnyObject} layout constraint if the existential can store any Swift or Objective-C class instance. If the superclass bound is further known to be a Swift-native class, this is the stricter \texttt{\_NativeClass} layout constraint. \end{description} Some of the above methods might look familiar from the description of generic signature queries in Section~\ref{genericsigqueries}, or the local requirements of archetypes in Chapter~\ref{genericenv}. Indeed, for the most part, the same information can be recovered by asking questions about the existential's interface type in the opened existential signature, or if you have an opened archetype handy, by calling similar methods on the archetype. There is one important difference though. In a generic signature, the minimization algorithm drops protocol conformance requirements which are satisfied by a superclass bound. This is true with opened existential signatures as well. However, for historical reasons, the same transformation is not applied when computing an existential layout. This means that the list of protocols in \texttt{ExistentialLayout::getProtocols()} may include more protocols than the \texttt{getConformsTo()} query on the opened existential signature. It is the former list of protocols coming from the \texttt{ExistentialLayout} that informs the runtime representation of the existential type \texttt{any C \& P}. If ABI stability was not a concern, this would be reworked to match the behavior of requirement minimization. \begin{example} Consider these definitions: \begin{Verbatim} protocol Q {} protocol P: Q {} class C: P {} let x: any P & Q = ... let y: any P & C = ... \end{Verbatim} First, consider \texttt{x}. The existential signature of \texttt{any P \& Q} is \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ P>}; the requirement \texttt{\ttgp{0}{0}:\ Q} is dropped because the protocol \texttt{P} inherits from protocol \texttt{Q}. The \texttt{ExistentialLayout} also only stores the single protocol \texttt{P}. The existential type \texttt{any P \& Q} canonicalizes to \texttt{any P}. Now, look at \texttt{y}. The existential signature of \texttt{any C \& P} is \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ C>}; notice that the conformance requirement \texttt{\ttgp{0}{0}:\ P} is dropped because the class \texttt{C} conforms to \texttt{P}. However, \texttt{any~C~\&~P} and \texttt{C} are still distinct types in the Swift type system, and the runtime representation of \texttt{any C \& P} stores a witness table for the conformance of \texttt{C} to \texttt{P} even though the conformance requirement \texttt{\ttgp{0}{0}:\ P} does not appear in the opened existential signature. This is because the list of protocols in the \texttt{ExistentialLayout} does not drop consider the conformance of \texttt{C} to \texttt{P} and retains the protocol \texttt{P}. \end{example} \paragraph{Opaque representation} This is the most general representation, used when no other specialized representation is applicable. Consists of a three-word buffer, type metadata for the stored concrete type, followed by zero or more witness tables. If the stored concrete type fits in the three-word buffer and uses default alignment, the value is stored directly in the buffer. Otherwise, the buffer stores a pointer to a copy-on-write buffer, sized to store the concrete type. The list of witness tables has the same length and ordering as the list of protocols returned by \texttt{ExistentialLayout::getProtocols()}. \begin{quote} \begin{tabular}{|l|l|} \hline Word 1&Value buffer\\ Word 2&\\ Word 3&\\ \hline \hline Word 4&Type metadata\\ \hline \hline Word 5&Witness table \#1\\ Word 6&Witness table \#2\\ Word 7&\ldots\\ \hline \end{tabular} \end{quote} \paragraph{Class representation} This representation is used when the concrete type is known to be a reference-counted pointer. Instead of a three-word value buffer, only a single pointer is stored, and the type metadata does not need to be separately stored since it can be recovered from the first word of the heap allocation (the ``isa pointer''). The trailing witness tables are stored as in the opaque representation. \begin{quote} \begin{tabular}{|l|l|} \hline Word 1&Reference-counted pointer\\ \hline \hline Word 2&Witness table \#1\\ Word 3&Witness table \#2\\ Word 4&\ldots\\ \hline \end{tabular} \end{quote} \paragraph{Objective-C representation} A specialized variant of the class representation when all protocols named by the constraint type are \texttt{@objc} protocols. In this case, no witness tables are passed in and the existential value is layout-compatible with the corresponding Objective-C protocol type. \begin{quote} \begin{tabular}{|l|l|} \hline Word 1&Reference-counted pointer\\ \hline \end{tabular} \end{quote} \paragraph{Error representation} A special representation only used for types conforming to \texttt{Error}. This representation consists of a single reference-counted pointer. The heap allocation is layout-compatible with the Objective-C \texttt{NSError} class. The concrete value and the witness table for the conformance is stored inside the heap allocation. \begin{quote} \begin{tabular}{|l|l|} \hline Word 1&Reference-counted pointer\\ \hline \end{tabular} \end{quote} \paragraph{Metatype representation} This representation is only used for existential metatypes. It stores a concrete metatype, followed by zero or more witness tables. \begin{quote} \begin{tabular}{|l|l|} \hline Word 1&Type metadata\\ \hline \hline Word 2&Witness table \#1\\ Word 3&Witness table \#2\\ Word 4&\ldots\\ \hline \end{tabular} \end{quote} \section{Generalization Signatures} \index{metatype type} \index{runtime type metadata} Swift metatype values have a notion of equality. While metatypes are not nominal types, and cannot conform to protocols, in particular the \texttt{Equatable} protocol,\footnote{but maybe one day...} the standard library nevertheless defines an overload of the \texttt{==} operator taking a pair of \texttt{Any.Type} values. You might recall from the previous section that \texttt{Any.Type} is an existential metatype with no constraints, so it is represented is a single pointer to runtime type metadata. Equality of metatypes can therefore implemented as pointer equality. What this means is that runtime type metadata must be unique by construction. Frozen fixed-size types such as \texttt{Int} have statically-emitted metadata which is directly referenced thereafter, so uniqueness is trivial. On the other hand, generic nominal types and structural types such as functions or tuples can be instantiated with arbitrary generic arguments. Since the arguments are recursively guaranteed to be unique, the metadata instantiation function for each kind of type constructor maintains a cache mapping all generic arguments seen so far to instantiated types. Each new instantiation is only constructed once for a given set of generic arguments, guaranteeing uniqueness. \index{symbol mangling} \index{mangled name} \begin{listing}\captionabove{Example demonstrating uniqueness of runtime metadata}\label{metadataunique} \begin{Verbatim} func concrete() -> Any.Type { return (Int, Int).self } func generic(_: T.Type) -> Any.Type { return (T, T).self } print(concrete() == generic(Int.self)) // true \end{Verbatim} \end{listing} Listing~\ref{metadataunique} constructs the same metatype twice, once in a concrete function and then again in a generic function: \begin{itemize} \item The \texttt{concrete()} function encodes the type \texttt{(Int, Int)} using a compact mangled representation and passes it to the a runtime entry point for instantiating metadata from a mangled type string. This entry point ultimately calls the tuple type constructor after demangling the input string. \item The \texttt{generic()} function receives the type metadata for \texttt{Int} as an argument, and directly calls the tuple type constructor to build the type \texttt{(T, T)} with the substitution \texttt{T := Int}. Both functions return the same value of \texttt{Any.Type} because the two calls to the tuple type constructor return the same value. \end{itemize} In the absence of constrained existential types, the type metadata for an existential type looks like an \texttt{ExistentialLayout}: a minimal, canonical list of zero or more protocols, an optional superclass type, and an optional \texttt{AnyObject} layout constraint. This layout could not encode arbitrary generic requirements so it was not suitable for constrained existential types. Constrained existential type metadata uses a more general encoding based on the opened existential signature. \begin{listing}\captionabove{An example to motivate generalization signatures}\label{generalizationexample} \begin{Verbatim} protocol P { associatedtype X: Q associatedtype Y where X.T == Y } protocol Q { associatedtype T } struct ConcreteQ: Q { typealias T = Int } func concrete() -> Any.Type { return (any P).self } func generic(_: X.Type) -> Any.Type where X.T == Int { return (any P).self } print(concrete() == generic(ConcreteQ.self)) \end{Verbatim} \end{listing} As a first attempt at solving this problem, you might think to use the opened existential signature as the uniquing key for existential type metadata at runtime. Unfortunately, naively encoding the requirements of the opened existential signature does not give you uniqueness, because the opened existential signature also includes all generic parameters and requirements from the parent generic signature. Listing~\ref{generalizationexample} shows a ``concrete vs. generic'' example similar to the above, but with constrained existential types. The opened existential signature of \texttt{any P} in \texttt{concrete()} is: \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ P, \ttgp{0}{0}.[P]X == ConcreteQ>} \end{quote} Note that the second same-type requirement \texttt{\ttgp{0}{0}.[P]Y == Int} is not part of the generic signature because it is implied by the first same-type requirement together with the relationship between \texttt{X} and \texttt{Y} in protocol \texttt{P}. The opened existential signature of \texttt{any P} in \texttt{generic()} is: \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{1}{0} where \ttgp{0}{0} == \ttgp{1}{0}.[P]X, \ttgp{1}{0}:\ P, \ttgp{0}{0}.[P]T == Int>} \end{quote} Applying the substitution map \texttt{X := ConcreteQ} to the type \texttt{any~P} produces the type \texttt{any~P}. This suggests that calling \texttt{generic()} with \texttt{X~:=~ConcreteQ} should output the same type metadata as a call to \texttt{concrete()}. In the compiler, you can certainly transform the second generic signature into the first as follows. We begin by applying a substitution map to each requirement of the second signature: \begin{quote} \SubMapC{ \SubType{\ttgp{0}{0}}{ConcreteQ}\\ \SubType{\ttgp{1}{0}}{\ttgp{0}{0}} }{ \SubConf{\ttgp{0}{0}:\ P} } \end{quote} This produces a list of substituted generic requirements: \begin{quote} \begin{tabular}{|l|l|} \hline Original requirement:&Substituted requirement:\\ \hline \texttt{\ttgp{0}{0} == \ttgp{1}{0}.[P]X}&\texttt{ConcreteQ == \ttgp{0}{0}.[P]X}\\ \texttt{\ttgp{1}{0}:\ P}&\texttt{\ttgp{0}{0}:\ P}\\ \texttt{\ttgp{0}{0}.[P]T == Int}&\texttt{Int == Int}\\ \hline \end{tabular} \end{quote} If we feed these requirements into \texttt{buildGenericSignature()} with the singleton generic parameter list \texttt{\ttgp{0}{0}}, we get back our original signature: \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ P, \ttgp{0}{0}.[P]X == ConcreteQ>} \end{quote} This two-step process of applying a substitution map to the requirements of a generic signature, then building a new generic signature from the substituted generic requirements re-appears several times throughout the compiler. Requirement inference in Section~\ref{requirementinference} used this technique. It will also come up again in Chapters \ref{classinheritance}~and~\ref{valuerequirements}. In this case though, \textsl{it doesn't actually solve our problem!} Whatever transformation we do here needs to happen at runtime, since the implementation of \texttt{generic()} needs to be able to do it for an arbitrary type \texttt{T}. Teaching the runtime to build minimal canonical generic signatures from scratch is not practical since it would require duplicating a large portion of the compiler there. Instead of using the ``most concrete'' opened existential signature as the uniquing key, the compiler constructs the ``most generic'' signature together with a substitution map. If the replacement types in this substitution map contain type parameters, they are filled in at runtime from the generic context when the existential type metadata is being constructed. The resulting generalization signature and substitution map serves as the uniquing key for the runtime instantiation of existential type metadata. This algorithm is implemented in \texttt{ExistentialGeneralization::get()}. \begin{algorithm}[Existential generalization]\label{existentialgeneralizationalgo} As input, takes the constraint type of an existential type, possibly containing type parameters. As output, produces a new constraint type, a new generic signature, and a substitution map for this signature. \begin{enumerate} \item Initialize $\texttt{N}:=0$. \item Initialize \texttt{R} to an empty list of requirements. \item Initialize \texttt{S} to an empty list of substitutions. \item Recursively generalize the constraint type by considering each of these five cases: \begin{description} \item [Protocol composition type] Recursively perform Step~4 on each term of the protocol composition. \item [Parameterized protocol type] Generalize each argument type by visiting them in order, and build a new parameterized protocol type with the generalized arguments: \begin{enumerate} \item replace the argument type with \ttgp{0}{N}, \item add a substitution replacing \ttgp{0}{N} with the argument type to \texttt{S}, \item increment \texttt{N}. \end{enumerate} \item [Generic class type] Generalize each argument type by visiting them in order, and build a new generic class type with the generalized arguments: \begin{enumerate} \item replace the argument type with \ttgp{0}{N}, \item add a substitution replacing \ttgp{0}{N} with the argument type to \texttt{S}, \item increment \texttt{N}. \end{enumerate} Let \texttt{C} be the context substitution map of the updated generic class type. For each requirement of the generic signature of the class, apply \texttt{C} to the requirement, and add the substituted requirement to \texttt{R}. \item [Protocol type] The type remains unchanged. \item [Class type] The type remains unchanged. \end{description} \item If $\texttt{N}=0$, the type does not have any substitutable arguments, and both \texttt{R} and \texttt{S} should be empty. Return the original constraint type with an empty generic signature and substitution map. \item Otherwise, build a new generic signature with parameters \texttt{\ttgp{0}{0}}\ldots\ttgp{0}{(N-1)} and requirements \texttt{R}. Note that the generalized constraint type is written with respect to this outer generic signature. Build a new substitution map from the new generic signature and list of substitutions \texttt{S}. Return the generalized constraint type, generic signature and substitution map. \end{enumerate} \end{algorithm} Say we have two existential types $T_1$ and $T_2$. Applying generalization to both types produces $(T_1', G_1, S_1)$ and $(T_2', G_2, S_2)$, where the tuple components are the generalized constraint type, generalization signature, and generalization substitution map, respectively. If $T_2$ can be constructed from $T_1$ by applying a substitution map $S$, then we have the following: \begin{enumerate} \item The generalized constraint types and generalization signatures will be equal; that is $T_1'=T_2'$, and $G_1=G_2$. \item The substitution map $S_2$ can be constructed by applying $S$ to $S_1$. \end{enumerate} These are the necessary invariants that ensures uniqueness of existential type metadata. \begin{example} Let's look at Listing~\ref{generalizationexample} again. Starting with \texttt{concrete()}, applying Algorithm~\ref{existentialgeneralizationalgo} to the type \texttt{any~P} gives the generalized constraint type \texttt{any~P<\ttgp{0}{0},~\ttgp{0}{1}>} and the generalization signature \texttt{<\ttgp{0}{0}, \ttgp{0}{1}>} and the following substitution map: \begin{quote} \SubMap{ \SubType{\ttgp{0}{0}}{ConcreteQ}\\ \SubType{\ttgp{0}{1}}{Int} } \end{quote} Next up, in \texttt{generic()}, applying the algorithm to the type \texttt{any~P} gives the same generalized constraint type and signature, but with a different substitution map: \begin{quote} \SubMap{ \SubType{\ttgp{0}{0}}{X}\\ \SubType{\ttgp{0}{1}}{Int} } \end{quote} When \texttt{generic()} is called with the substitution map \texttt{X := ConcreteQ}, the runtime type metadata collected for the uniquing key is the same in both \texttt{concrete()} and \texttt{generic()}, and both calls produce the same runtime type metadata pointer. \end{example} \begin{example} The generalization signature in the previous example does not have any generic requirements. In Listing~\ref{generalizationrequirements}, the existential type is a protocol composition containing a generic class type, which can introduce requirements in the generalization signature. Applying Algorithm~\ref{existentialgeneralizationalgo} to the type \texttt{any~Q~\&~G} produces the generalized constraint type \texttt{any~Q<\ttgp{0}{0}>~\&~G<\ttgp{0}{1}>} and the following generalization signature: \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{0}{1} where \ttgp{0}{1}:\ P, \ttgp{0}{1}.[P]X == \ttgp{0}{1}.[P]Y>} \end{quote} and substitution map: \begin{quote} \SubMapC{ \SubType{\ttgp{0}{0}}{Int}\\ \SubType{\ttgp{0}{1}}{ConcreteP} }{ \SubConf{ConcreteP:\ P} } \end{quote} \end{example} \begin{listing}\captionabove{Example where the generalization signature has requirements}\label{generalizationrequirements} \begin{Verbatim} protocol P { associatedtype X associatedtype Y } struct ConcreteP: P { typealias X = Int typealias Y = Int } class G where U.X == U.Y {} protocol Q { associatedtype T } func concrete() -> Any.Type { return (any Q & G).self } \end{Verbatim} \end{listing} \fi \section{Self-Conforming Protocols}\label{selfconformingprotocols} \ifWIP A common source of confusion for beginners is that in general, protocols in Swift do not conform to themselves. The layperson's explanation of this is that an existential type is a ``box'' for storing a value with an unknown concrete type. If the box requires that the value's type conforms to a protocol, you can't fit the ``box itself'' inside of another box, because it has the wrong shape. This explanation will be made precise in this section. For many purposes, implicit existential opening introduced in Swift 5.7 \cite{se0352} offers an elegant way around this problem: \begin{Verbatim} protocol Animal {...} func petAnimal(_ animal: A) {...} func careForAnimals(_ animals: [any Animal]) { for animal in animals { careForAnimal(animal) // existential opened here in Swift 5.7; // type check error in Swift 5.6. } } \end{Verbatim} The above code type checks in Swift 5.7 because the replacement type for the generic parameter \texttt{A} of \texttt{careForAnimal()} becomes the opened archetype from the payload of \texttt{animal}. The lack of self-conformance can still be observed in Swift 5.7 when a generic parameter type is a structural sub-component of another type: \begin{Verbatim} func petAnimals(_ animals: [A]) {...} func careForAnimals(_ animals: [any Animal]) { petAnimals(animals) // type check error. } \end{Verbatim} It is not possible to simultaneously open every element of \texttt{animals}, and the call to \texttt{petAnimals()} does not type check with the replacement type \texttt{any Animal} for the generic parameter \texttt{A}. \index{global conformance lookup} \index{self protocol conformance} Now let's make precise the ``in general'' part of ``in general, protocols in Swift do not conform to themselves.'' Some protocols do conform to themselves, and global conformance lookup returns a special \texttt{SelfProtocolConformance} type in this case. The first two special kinds of self-conforming existential types are those that do not have conformance requirements. \paragraph{Any} The \texttt{Any} type is an existential where the constraint type is an empty protocol composition. Constraining a generic parameter to \texttt{Any} has no effect and is equivalent to leaving the generic parameter unconstrained. An unconstrained generic parameter can be substituted with an arbitrary type, including \texttt{Any}. So in this sense, \texttt{Any} ``conforms to itself'': \begin{Verbatim} func doStuff(_: [T]) {...} // `T: Any' is pointless let value: Any = ... doStuff([value]) // okay \end{Verbatim} \paragraph{AnyObject} The \texttt{AnyObject} type is an existential where the constraint type requires the stored value to be a single reference-counted pointer. The \texttt{AnyObject} existential does not carry any witness tables, so the existential itself has the same representation as its payload. For this reason, the \texttt{AnyObject} existential type satisfies the \texttt{AnyObject} layout constraint. The calling convention of \texttt{doStuff()} takes the type metadata for \texttt{T}, and an array of reference-counted pointers. Passing the type metadata of \texttt{AnyObject} itself for \texttt{T}, and an array of \texttt{AnyObject} values works just fine: \begin{Verbatim} func doStuff(_: [T]) {...} let value: AnyObject = ... doStuff([value]) // okay \end{Verbatim} The next two kinds of self-conforming existential have protocol conformance requirements, but nevertheless does not carry witness tables. \paragraph{Sendable protocol} The \texttt{Sendable} protocol does not have a witness table or any requirements, so \texttt{Sendable} existentials trivially conform to themselves. \paragraph{Certain @objc protocols} Objective-C protocols do not use witness tables to dispatch method calls, so an existential type where all protocols are \texttt{@objc} has the same representation as \texttt{AnyObject}---a single reference-counted pointer. This allows protocol compositions where all terms are \texttt{@objc} protocols to conform to themselves as long as each protocol satisfies some additional conditions: \begin{enumerate} \item Each inherited protocol must recursively self-conform. \item The protocol must be an \texttt{@objc} protocol. \item The protocol must not declare any static methods. \item The protocol must not declare any constructors. \end{enumerate} TODO: example of working case here. The last two conditions are semantic, and not representational. If the last condition was not enforced, the below code would be accepted, despite not having a well-defined meaning---the \texttt{init()} requirement is being invoked on the protocol metatype itself, and not a concrete implementation of the protocol: \begin{Verbatim} @objc protocol Initable { init() } func makeNewInstance(_ type: I.Type) -> I { return type.init() } makeNewInstance(Initable.self) \end{Verbatim} \paragraph{Error protocol} The \texttt{Error} existential again uses a special representation where it is made to look like a single reference-counted pointer. The \texttt{Error} protocol dispatches method calls through a witness table, but the witness table for the concrete conformance to \texttt{Error} is stored inside the heap allocation, alongside the concrete value. When a function has a generic parameter constrained to \texttt{Error}, it expects to receive the witness table for the \texttt{Error} conformance as a argument to the call, alongside the type metadata for the generic parameter. The witness table for the concrete conformance is stored inside the \emph{value}, and we don't have a value; if we did, we would have opened the \texttt{Error} existential instead. The solution is that the compiler emits a special \emph{self-conformance} witness table for the \texttt{Error} protocol. At the point where the witness method in the witness table is invoked, a value is available, so the witness method implementations in the self-conformance witness table unwrap the existential and dispatch again, this time through the concrete conformance witness table. \begin{itemize} \item Error as existential - error as generic arg - error as self-conforming generic arg \end{itemize} The need for double-dispatch becomes apparent if you consider the case where two \emph{different} concrete types conforming to \texttt{Error} are stored in an array of \texttt{any Error}: \begin{Verbatim} func printErrorDomain(_ errors: [E]) { for error in errors { print(error._domain) } } printErrorDomain([MyError() as Error, YourError() as Error]) \end{Verbatim} The \texttt{printErrorDomain()} function receives two lowered parameters in addition to the formal \texttt{errors} parameter: type metadata for \texttt{E}, and a witness table for the \texttt{E:~Error} conformance. The call on line 7 passes the existential type metadata {any~Error} as the generic parameter \texttt{E}, and the self-conforming witness table for the \texttt{E:~Error} conformance. Inside the body of \texttt{printErrorDomain()}, each call to \texttt{print(error.\_domain)} encounters an existential storing a different concrete type, but the call is made with the same self-conformance witness table. This works though, because the witness method in the self-conformance witness table loads the concrete witness table from the existential and dispatches to the actual concrete witness method. \index{SILGen} The self-conformance witness table for the \texttt{Error} protocol is emitted when building the standard library in \texttt{SILGenModule::emitSelfConformanceWitnessTable()}. \paragraph{What about other protocols?} In theory, the semantic conditions imposed on self-conforming \texttt{@objc} protocols could be combined with a trick like the self-conformance witness table for \texttt{Error} to allow more protocols to self-conform, perhaps with an opt-in mechanism to avoid the unconditional code size hit from always emitting a self-conformance witness table. For class existentials, some kind of boxing would be necessary as well, similar to \texttt{Error}, since otherwise a class existential with witness tables does not satisfy the \texttt{AnyObject} layout constraint. This in turn would complicate the implementation of the \texttt{===} pointer identity operator, among other things. Doesn't seem worth the considerable increase in complexity... which is why Swift does not implement general self-conformance for protocols today. \section{Source Code Reference} TODO: \begin{description} \item[\texttt{TypeBase}] The base class of the Swift type hierarchy. \begin{itemize} \item \texttt{isAnyExistentialType()} Returns true if this is an \texttt{ExistentialType} or \texttt{ExistentialMetatypeType}. \end{itemize} \item[\texttt{ExistentialType}] An existential \texttt{any} type. \begin{itemize} \item \texttt{getConstraintType()} Returns the underlying constraint type. \end{itemize} \item[\texttt{ExistentialMetatypeType}] An existential metatype. \begin{itemize} \item \texttt{getConstraintType()} Returns the underlying constraint type. \end{itemize} \item[\texttt{MetatypeType}] A concrete metatype. \begin{itemize} \item \texttt{getInstanceType()} Returns the underlying instance type. \end{itemize} \item[\texttt{ASTContext}] Singleton for global state. \begin{itemize} \item \texttt{getAnyExistentialType()} Returns the existential type for \texttt{Any}. \item \texttt{getAnyObjectType()} Returns the existential type for \texttt{AnyObject}. \end{itemize} \item[\texttt{GenericEnvironment}] A mapping from type parameters to archetypes with respect to a generic signature. \begin{itemize} \item \texttt{forOpenedExistential()} Creates a fresh opened generic environment. \end{itemize} \item[\texttt{ASTContext}] Singleton for global state. \begin{itemize} \item \texttt{getOpenedArchetypeSignature()} Builds an opened existential signature. \end{itemize} \end{description} \fi \chapter{Class Inheritance}\label{classinheritance} \ifWIP TODO: \begin{itemize} \item various rules around classes conforming to protocols---need to understand restrictions on protocol requirements and witnesses \item silly example: \begin{Verbatim} class Base { init(_: T) { print("a") } init(_: U) { print("b") } } class Derived: Base { } Derived(123) \end{Verbatim} \end{itemize} When a subclass inherits from a superclass, there is a subtype relationship between the subclass and superclass. If neither class is generic, the relationship is straightforward. Here, we have a pair of classes \texttt{C} and \texttt{D}; \texttt{D} inherits from \texttt{C}, so instances of \texttt{D} are also instances of \texttt{C}: \begin{Verbatim} class C {} class D {} let instanceOfD: D = D() let instanceOfC: C = instanceOfD // okay \end{Verbatim} With generic classes, the situation is more subtle. The subclass \emph{declaration} states a superclass \emph{type}. The superclass type appears in the inheritance clause of the subclass, and can reference the subclass's generic parameters: \begin{Verbatim} class Base {} class Derived: Base {} \end{Verbatim} Now, the declaration \texttt{Derived} has the generic superclass type \texttt{Base}. Intuitively, we expect that \texttt{Derived} is a subtype of \texttt{Base}, and \texttt{Derived} is a subtype of \texttt{Base}, but that \texttt{Derived} and \texttt{Base} are unrelated types. To get a complete picture of the subtype relationship, we need to define the concept of the superclass type \emph{of a type}, and not just the superclass type of a declaration. First of all, what is the superclass type of the declared interface type of a class? The superclass type of the class declaration is an interface type for the class declaration's generic signature, so we say that the superclass type of the declared interface type is just the superclass type of the declaration. In our example, this tells us that \texttt{Derived} is a subtype of \texttt{Base} and an unrelated type to \texttt{Base}. What about the superclass type of an arbitrary specialization of the class? Here, we rely on the property that a specialized type is the result of applying its context substitution map to the declared interface type. If we instead apply the context substitution map to the superclass type of the class declaration, we get the superclass type of our specialized type. This can be shown with a commutative diagram: \begin{quote} \begin{tikzcd}[column sep=3cm,row sep=1cm] \mathboxed{declared interface type} \arrow[d, "\text{superclass type}"{left}] \arrow[r, "\text{substitution}"] &\mathboxed{specialized type} \arrow[d, "\text{superclass type}"] \\ \mathboxed{superclass type of declaration} \arrow[r, "\text{substitution}"]&\mathboxed{superclass type of type} \end{tikzcd} \end{quote} Now that we can compute the superclass type of a type, we can walk up the inheritance hierarchy by iterating the process, to get the superclass type of a superclass type, and so on. \fi \begin{algorithm}[Iterated superclass type]\label{superclassfordecl} As input, takes a class type \texttt{T} and a superclass declaration \texttt{D}. Returns the superclass type of \texttt{T} for \texttt{D}. \begin{enumerate} \item Let \texttt{C} be the class declaration referenced by \texttt{T}. If $\texttt{C}=\texttt{D}$, return \texttt{T}. \item If \texttt{C} does not have a superclass type, fail with an invariant violation; \texttt{D} is not actually a superclass of \texttt{T}. \item Otherwise, apply the context substitution map of \texttt{T} to the superclass type of \texttt{C}. Assign this new type to \texttt{T}, and go back to Step~1. \end{enumerate} \end{algorithm} \ifWIP \begin{listing}\captionabove{Computing superclass types}\label{generic superclass example listing} \begin{Verbatim} class Base { typealias C = () -> T } class Middle: Base<(T, T)> {} class Derived: Middle {} let derived = Derived() let instanceOfMiddle: Middle = derived // okay let instanceOfBase: Base<(Int, Int)> = derived // okay \end{Verbatim} \end{listing} \begin{example}\label{genericsuperclassexample} Listing~\ref{generic superclass example listing} shows a class hierarchy demonstrating these behaviors: \begin{enumerate} \item The superclass type of \texttt{Derived} is \texttt{Middle}. \item The superclass type of \texttt{Middle} is \texttt{Base<(T, T)>}. \end{enumerate} The superclass type of the type \texttt{Middle} is the superclass type of \texttt{Middle} with the context substitution map of \texttt{Middle} applied: \[\ttbox{Base<(T, T)>}\times \SubMap{ \SubType{T}{Int}\\ \SubType{U}{String} } = \ttbox{Base<(Int, Int)>} \] This means the superclass type of \texttt{Derived} with respect to \texttt{Base} is \texttt{Base<(Int, Int)>}. What is the type \texttt{Derived.C}? The type alias \texttt{C} is declared in \texttt{Base}. The superclass type of \texttt{Derived} with respect to \texttt{Base} is \texttt{Base<(Int, Int)>}. We can apply the context substitution map of this superclass type to the declared interface type of \texttt{C}: \[\ttbox{() -> T}\times \SubMap{ \SubType{T}{(Int, Int)} } = \ttbox{() -> (Int, Int)} \] \end{example} We can finally describe the implementation of Case~3 of Definition~\ref{context substitution map for decl context}. The base type here is a class type, and the declaration context is some superclass declaration or an extension thereof. We first apply Algorithm~\ref{superclassfordecl} to the base type and superclass declaration to get the correct superclass type. Then, we compute the context substitution map of this superclass type with respect to our declaration context, which is now either the exact superclass declaration or an extension. Thus we have reduced the problem to Case~1, which we already know how to solve. TODO: example \fi \section{Inherited Conformances}\label{inheritedconformance} \ifWIP TODO: \begin{itemize} \item example where base class is not generic but subclass is -- two inherited conformances with same underlying specialized conformance \item behavior of inherited conformances under substitution \item restrictions on class conformances \item longest possible delegation chain is $inherited \rightarrow specialized \rightarrow normal$. \end{itemize} Protocol conformances are inherited from superclass to subclass. At each level of class inheritance, the conformance table of a subclass is initialized with a copy of the conformance table of the superclass, with the superclass substitution map applied to each inherited conformance. This behavior can be broken down into two cases. If a superclass directly conforms to a protocol, the superclass's conformance table will store a normal conformance. The subclass will inherit a specialized conformance built from this normal conformance together with the superclass substitution map. In the general case, the superclass conforms via an inherited conformance from further up the hierarchy, and the subclass conformance is built by composing the substitution map in some other specialized conformance with the superclass substitution map. In this way, an inherited conformance is ultimately derived from the normal conformance of some class further up in the hierarchy, specialized by the substitution map obtained by composing all superclass substitution maps at every level of inheritance, up to the base class declaring the conformance. The lookup conformance table machinery actually introduces an additional level of indirection by wrapping these specialized conformances in a bespoke \emph{inherited conformance} data type. Conformances store their conforming type; the defining invariant is that if a conformance was the result of a lookup, the stored conforming type should equal the original type of the lookup. With class inheritance however, the conforming type of a conformance declared on the superclass is ultimately always some substitution of the type of the superclass. An inherited conformance stores the original subclass type, but otherwise just delegates to an underlying conformance, either normal or specialized. By wrapping inherited conformances in a special type, the compiler is able to keep track of the original type of a conformance lookup. \begin{example} We can amend Example~\ref{genericsuperclassexample} to add a conformance to the \texttt{Base} class: \begin{Verbatim} protocol P { associatedtype A } extension Base: P {} \end{Verbatim} The normal conformance \texttt{Base:\ P} stores the type witness for \texttt{A}, which is \texttt{Array}. Looking up the conformance \texttt{Derived:\ P} returns an inherited conformance. The inherited conformance reports its conforming type as \texttt{Derived} instead of \texttt{Base<(Int,~Int)>}, but otherwise delegates all operations to a specialized conformance with the substitution map $\texttt{T}:=\texttt{(Int, Int)}$. The specialized conformance in turn delegates to the normal conformance, but applies the substitution map when looking up type witnesses and associated conformances. Therefore, looking up the type witness for \texttt{A} in the inherited conformance \texttt{Derived:\ P} returns \texttt{Array<(Int,~Int)>}, which is the result of applying our substitution map to the type witness stored in the normal conformance. \end{example} \fi \section{Override Checking}\label{overridechecking} \ifWIP When a subclass overrides a method from a superclass, the type checker must ensure the subclass method is compatible with the superclass method in order to guarantee that instances of the subclass are dynamically interchangeable with a superclass. If neither the superclass nor the subclass are generic, the compatibility check simply compares the fully concrete parameter and result types of the non-generic declarations. Otherwise, the superclass substitution map plays a critical role yet again, because the compatibility relation must project the superclass method's type into the subclass to meaningfully compare it with the override. \paragraph{Non-generic overrides} The simple case is when the superclass or subclass is generic, but the superclass method does not define generic parameters of its own, either explicitly or via the opaque parameters of Section~\ref{opaque parameters}. Let's call such a method ``non-generic,'' even if the class it appears inside is generic. So a non-generic method has the same generic signature as its parent context, which in our case is a class. In the non-generic case, the superclass substitution map is enough to understand the relation between the interface type of the superclass method and its override. \begin{listing}\captionabove{Some method overrides}\label{method overrides} \begin{Verbatim} class Outer { class Inner { func doStuff(_: T, _: U) {} func doGeneric(_: A) {} } } class Derived: Outer.Inner<(V, V)> { func doStuff(_: Int, _: (V, V)) {} override func doGeneric(_: A) {} } \end{Verbatim} \end{listing} In Listing~\ref{method overrides}, the \texttt{Derived} class overrides the \texttt{doStuff()} method from \texttt{Outer.Inner}. Dropping the first level of function application from the interface type of \texttt{doStuff()} leaves us with \texttt{(T, U) -> ()}, to which we apply the superclass substitution map for \texttt{Derived} to get the final result: \[ \ttbox{(T, U) -> ()} \times \SubMap{ \SubType{T}{Int}\\ \SubType{U}{(V, V)} } = \ttbox{(Int, (V, V)) -> ()} \] This happens to exactly equal the interface type of the subclass method \texttt{doStuff()} in \texttt{Derived}, again not including the self clause. An override with an exact type match is valid. (In fact, some variance in parameter and return types is permitted as well, but it's not particularly interesting from a generics point of view, so here is the executive summary: an override can narrow the return type, and widen the parameter types. This means it is valid to override a method returning \texttt{Optional} with a method returning \texttt{T}, because a \texttt{T} can also trivially become a \texttt{Optional} via an injection. Similarly, if \texttt{A} is a superclass of \texttt{B}, a method returning \texttt{A} can be overridden to return \texttt{B}, because a \texttt{B} is always an \texttt{A}. A dual set of rules are in play in method parameter position; if the original method takes an \texttt{Int} the override can accept \texttt{Optional}, etc.) \paragraph{Generic overrides} In the non-generic case, applying the superclass substitution map directly to the interface type of a superclass method tells us what ``the type of the superclass method should be'' in the subclass, and this happens to work because the superclass method had the same generic signature as the superclass. Once this is no longer required to be so, the problem becomes more complicated, and the below details were not worked out until Swift 5.2 \cite{sr4206}. The generic signature of the superclass (resp. override) method is built by adding any additional generic parameters and requirements to the generic signature of the superclass (resp. subclass) itself. To relate these four generic signatures together, we generalize the superclass substitution map into something called the \emph{attaching map}. Once we can compute an attaching map, applying it to the interface type of the superclass method produces a substituted type which can be compared against the interface type of the override, just as before. However, while this part is still necessary, it is no longer sufficient, since we also need to compare the \emph{generic signatures} of the superclass method and its override for compatibility. Here the attaching map also plays a role. Overrides with a different number of innermost generic parameters are immediately known to be invalid, and no further checking needs to take place. (Interestingly enough, the names of generic parameters do not matter. Generic parameters are uniquely identified by depth and index, not name.) Once it is known that both generic signatures have the same number of innermost parameters, we can define a 1:1 correspondence between the two generic parameter lists which preserves the index but possibly changes the depth. We build the attaching map by ``extending'' the superclass substitution map, adding replacement types for the superclass method's innermost generic parameters, which map to the subclass method's generic parameters via the above correspondence. In addition to new replacement types, the attaching map stores conformances not present in the superclass substitution map, if the superclass method introduces conformance requirements. \begin{algorithm}[Compute attaching map for generic method override]\label{superclass attaching map} As input, takes the superclass method's generic signature \texttt{G}, the superclass declaration \texttt{B}, and some subclass declaration \texttt{D}. Outputs a substitution map for \texttt{G}. \begin{enumerate} \item Initialize \texttt{R} to an empty list of replacement types. \item Initialize \texttt{C} to an empty list of conformances. \item Let $\texttt{G}'$ be the generic signature of \texttt{B}, and let \texttt{T} be the declared interface type of \texttt{D}. \item (Trivial case) If $\texttt{D}=\texttt{B}$, return \texttt{G}. \item (Remapping) Let \texttt{S} be the context substitution map of \texttt{T} for the declaration context of \texttt{B}. \item (Replacements) For each generic parameter of \texttt{G}, check if this is a valid generic parameter in $\texttt{G}'$. If so, this is a generic parameter of the superclass, so apply \texttt{S} and record the replacement type in \texttt{R}. Otherwise, this is an innermost generic parameter of the superclass method. Adjust the depth of this parameter by subtracting the generic context depth of \texttt{B} and adding the generic context depth of \texttt{D}, and record a new generic parameter type with the adjusted depth but identical index in \texttt{R}. \item (Conformances) For each conformance requirement \texttt{T:\ P} of \texttt{G}, first check if \texttt{T} is a valid type in $\texttt{G}'$, and if \texttt{T} conforms to \texttt{P} in $\texttt{G}'$. If so, look up the conformance \texttt{T:\ P} in \texttt{S} and record the result in \texttt{C}. Otherwise, this is a new conformance requirement present in \texttt{G} but not $\texttt{G}'$. Record the abstract conformance to \texttt{P} in \texttt{C}. \item (Return) Return the substitution map for \texttt{G} from \texttt{R} and \texttt{C}. \end{enumerate} \end{algorithm} \begin{example} To continue the \texttt{doGeneric()} example from Listing~\ref{method overrides}, the superclass method defines a generic parameter \texttt{A} at depth 2, but the ``same'' parameter has depth 1 in the subclass method of \texttt{Derived}. For clarity, the attaching map is written with canonical types (otherwise, it would replace \texttt{A} with \texttt{A}, with a different meaning of \texttt{A} on each side): \begin{quote} \SubMapC{ \SubType{\ttgp{0}{0}}{Int}\\ \SubType{\ttgp{1}{0}}{(\ttgp{0}{0}, \ttgp{0}{0})}\\ \SubType{\ttgp{2}{0}}{\ttgp{1}{0}} }{ \SubConf{\ttgp{1}{0}:\ Equatable} } \end{quote} \end{example} \paragraph{The override signature} The attaching map is called such because it allows us to ``glue'' together the generic signature of the superclass method with the generic signature of the subclass type, and build the expected generic signature of the subclass method, also known as the \emph{override signature}. The expected generic signature can then be compared against the actual generic signature of the subclass method. The actual generic signature of the subclass method is constructed from three parts: \begin{enumerate} \item the generic signature of the subclass type \item the subclass method's innermost generic parameters \item any additional generic requirements imposed by the subclass method \end{enumerate} The computation of the expected generic signature is similar, except in place of the third step, we build the additional requirements by applying the attaching map to each requirement of the \emph{superclass} method. \begin{algorithm}[Compute override generic signature] As input, takes the superclass method's generic signature \texttt{G}, the superclass declaration \texttt{B}, and some subclass declaration \texttt{D}. Outputs a new generic signature. \begin{enumerate} \item Initialize \texttt{P} to an empty list of generic parameter types. \item Initialize \texttt{R} to an empty list of generic requirements. \item Let \texttt{S} be the attaching map for \texttt{G}, \texttt{B} and \texttt{D} computed using Algorithm~\ref{superclass attaching map}. \item (Parent signature) Let $\texttt{G}''$ be the generic signature of \texttt{D}. (In Algorithm~\ref{superclass attaching map}, $\texttt{G}'$ was used for the generic signature of \texttt{B}.) \item (Additional parameters) For each generic parameter of \texttt{G} at the innermost depth, apply \texttt{S} to the generic parameter. By construction, the result is another generic parameter type; record this type in \texttt{P}. \item (Additional requirements) For each requirement of \texttt{G}, apply \texttt{S} to the requirement and record the result in \texttt{R}. \item (Return) Build a minimized generic signature from $\texttt{G}''$, \texttt{P} and \texttt{R}, and return the result. \end{enumerate} \end{algorithm} For the override to satisfy the contract of the superclass method, it should accept any valid set of concrete type arguments also accepted by the superclass method. The override might be more permissive, however. The correct relation is that each generic requirement of the actual override signature must be satisfied by the expected override signature, but not necessarily vice versa. This uses the same mechanism as conditional requirement checking for conditional conformances, described in Section~\ref{conditional conformance}. The requirements of one signature can be mapped to archetypes of the primary generic environment of another signature. This makes the requirement types concrete, which allows the \texttt{isSatisfied()} predicate to be checked against the substituted requirement. \begin{example} In Listing~\ref{method overrides}, the superclass method generic signature is \texttt{}. The generic parameter \texttt{A} belongs to the method; the other two are from the generic signature of the superclass. The override signature glues together the innermost generic parameters and their requirements from the superclass method with the generic signature of the subclass, which is \texttt{}. This operation produces the signature \texttt{}. This is different from the actual override generic signature of \texttt{doStuff()} in \texttt{Derived}, which is \texttt{}. However, the actual signature's requirements are satisfied by the expected signature. \end{example} \section{Designated Initializer Inheritance} TODO: \begin{itemize} \item Initializer or constructor? \item Substitute requirements + build new signature \item The rules \item Worked example \end{itemize} \section{Source Code Reference} TODO: \fi \chapter{Witness Thunks}\label{valuerequirements} \ifWIP When protocol conformances were introduced in Chapter~\ref{conformances}, our main focus was the mapping from associated type requirements to type witnesses, and how conformances participate in type substitution. Now let's look at the other facet of conformances, which is how they map value requirements to value witnesses.\footnote{The term ``value witness'' is overloaded to have two meanings in Swift. The first is a witness to a value requirement in a protocol. The second is an implementation of an intrinsic operation all types support, like copy, move, destroy, etc., appearing in the value witness table of runtime type metadata. Here I'm talking about the first meaning.} Recording a witness for a protocol requirement requires more detail than simply stating the witness. What is the relationship between the generic signature of a protocol requirement and the generic signature of the witness? Well, ``it's complicated.'' A protocol requirement's generic signature has a \texttt{Self} generic parameter constrained to that protocol. If the witness is a default implementation from a protocol extension, it will have a \texttt{Self} generic parameter, too, but it might conform to a \emph{different} protocol. Or if the witness is a member of the conforming type and the conforming type has generic parameters of its own, it will have its own set of generic parameters, with different requirements. A witness might be ``more generic'' than a protocol requirement, where the requirement is satisfied by a fixed specialization of the witness. Conditional conformance and class inheritance introduce even more possibilities. (There will be examples of all of these different cases at the end of Section~\ref{witnessthunksignature}.) \index{SILGen} All of this means that when the compiler generates a witness table to represent a conformance at runtime, the entries in the witness table cannot simply point directly to the witness implementations. The protocol requirement and the witness will have different calling conventions, so SILGen must emit a \emph{witness thunk} to translate the calling convention of the requirement into that of each witness. Conformance checking records a mapping between protocol requirements and witnesses together with the necessary details for witness thunk emission inside each normal conformance. The \texttt{ProtocolConformance::getWitness()} method takes the declaration of a protocol value requirement, and returns an instance of \texttt{Witness}, which stores all of the this information, obtainable by calling getter methods: \begin{description} \item[\texttt{getDecl()}] The witness declaration itself. \item[\texttt{getWitnessThunkSignature()}] The \emph{witness thunk generic signature}, which bridges the gap between the protocol requirement's generic signature and the witness generic signature. Adopting this generic signature is what allows the witness thunk to have the correct calling convention that matches the caller's invocation of the protocol requirement, while providing the necessary type parameters and conformances to invoke a member of the concrete conforming type. \item[\texttt{getSubstitutions()}] The \emph{witness substitution map}. Maps the witness generic signature to the type parameters of the witness thunk generic signature. This is the substitution map at the call of the actual witness from inside the witness thunk. \item[\texttt{getRequirementToWitnessThunkSubs()}] The \emph{requirement substitution map}. Maps the protocol requirement generic signature to the type parameters of the witness thunk generic signature. This substituted map is used by SILGen to compute the interface type of the witness thunk, by applying it to the interface type of the protocol requirement. \end{description} TODO: \begin{itemize} \item diagram with the protocol requirement caller, the protocol requirement type, the witness thunk signature/type, and the witness signature/type. \item more details about how the witness\_method CC recovers self generic parameters in a special way \end{itemize} \section{Covariant Self Problem} In Swift, subclasses inherit protocol conformances from their superclass. If a class conforms to a protocol, a requirement of this protocol can be called on an instance of a subclass. When the protocol requirement is witnessed by a default implementation in a protocol extension, the \texttt{Self} parameter of the protocol extension method is bound to the specific subclass substituted at the call site. The subclass can be observed if, for example, the protocol requirement returns an instance of \texttt{Self}, and the default implementation constructs a new instance via an \texttt{init()} requirement on the protocol. The protocol requirement can be invoked in one of two ways: \begin{enumerate} \item Directly on an instance of the class or one of its subclasses. Since the implementation is known to always be the default implementation, the call is statically dispatched to the default implementation without any indirection through the witness thunk. \item Indirectly via some other generic function with a generic parameter constrained to the protocol. Since the implementation is unknown, the call inside the generic function is dynamically dispatched via the witness thunk stored in the witness table for the conformance. If the generic function in turn is called with an instance of the class or one of its subclasses, the witness thunk stored in the witness table for the conformance will statically dispatch to the default implementation. \end{enumerate} The two cases are demonstrated in Listing~\ref{covariantselfexample}. The \texttt{Animal} protocol, which defines a \texttt{clone()} requirement returning an instance of \texttt{Self}. This requirement has a default implementation which constructs a new instance of \texttt{Self} via the \texttt{init()} requirement on the protocol. The \texttt{Horse} class conforms to \texttt{Animal}, using the default implementation for \texttt{clone()}. The \texttt{Horse} class also has a subclass, \texttt{Pony}. It follows from substitution semantics that both \texttt{newPony1} and \texttt{newPony2} should have a type of \texttt{Pony}: \begin{itemize} \item The definition of \texttt{newPony1} calls \texttt{clone()} with the substitution map $\texttt{Self} := \texttt{Pony}$. The original return type of \texttt{clone()} is \texttt{Self}, so the substituted type is \texttt{Pony}. \item Similarly, the definition of \texttt{newPonyIndirect} calls \texttt{cloneAnimal()} with the substitution map $\texttt{A} := \texttt{Pony}$. The original return type of \texttt{cloneAnimal()} is \texttt{A}, so the substituted type is also \texttt{Pony}. \end{itemize} The second call dispatches through the witness thunk, so the witness thunk must also ultimately call the default implementation of \texttt{Animal.clone()} with the substitution map $\texttt{Self} := \texttt{Pony}$. When the conforming type is a struct or an enum, the \texttt{self} parameter of a witness thunk has a concrete type. If the conforming type was a class though, it would not be correct to use the concrete \texttt{Horse} type, because the witness thunk would then invoke the default implementation with the substitution map $\texttt{Self} := \texttt{Horse}$, and the second call would return an instance of \texttt{Horse} at runtime and not \texttt{Pony}, which would be a type soundness hole. \begin{listing}\captionabove{Statically and dynamically dispatched calls to a default implementation}\label{covariantselfexample} \begin{Verbatim} protocol Animal { init() func clone() -> Self } extension Animal { func clone() -> Self { return Self() } } class Horse: Animal {} class Pony: Horse {} func cloneAnimal(_ animal: A) -> A { return animal.clone() } let newPonyDirect = Pony().clone() let newPonyIndirect = cloneAnimal(Pony()) \end{Verbatim} \end{listing} This soundness hole was finally discovered and addressed in Swift~4.1 \cite{sr617}. The solution is to model the covariant behavior of \texttt{Self} with a superclass-constrained generic parameter. When the conforming type is a class, witness thunks dispatching to a default implementation have this special generic parameter, in addition to the generic parameters of the class itself (there are none in our example, so the witness thunk just has the single generic parameter for \texttt{Self}). In the next section, the algorithms for building the substitution map and generic signature all take a boolean flag indicating if a covariant \texttt{Self} type should be introduced. The specific conditions under which this flag is set are a bit subtle: \begin{enumerate} \item The conforming type must be a non-final class. If the class is final, there is no need to preserve variance since \texttt{Self} is always the exact class type. \item The witness must be in a protocol extension. If the witness is a method on the class, there is no way to observe the concrete substitution for the protocol \texttt{Self} type, because it is not a generic parameter of the class method. \item (The hack) The interface type of the protocol requirement must not mention any associated types. \end{enumerate} The determination of whether to use a static or covariant \texttt{Self} type for a class conformance is implemented by the type cheker function \texttt{swift::matchWitness()}. Indeed, Condition~3 is a hack; it opens up an exception where the soundness hole we worked so hard to close is once again allowed. In an ideal world, Conditions 1~and~2 would be sufficient, but by the time the soundness hole was discovered and closed, existing code had already been written taking advantage of it. The scenario necessitating Condition~3 is when the default implementation appears in a \emph{constrained} protocol extension: \begin{Verbatim} protocol P { associatedtype T = Self func f() -> T } extension P where Self.T == Self { func f() -> Self { return self } } class C: P {} class D: C {} \end{Verbatim} The non-final class \texttt{C} does not declare a type witness for associated type \texttt{T} of protocol~\texttt{P}. The associated type specifies a default, so conformance checking proceeds with the default type witness. The language model is that a conformance is checked once, at the declaration of \texttt{C}, so the default type \texttt{Self} is the ``static'' \texttt{Self} type of the conformance, which is \texttt{C}. Moving on to value requirements, class \texttt{C} does not provide an implementation of the protocol requirement \texttt{f()} either, and the original intent of this code is that the default implementation of \texttt{f()} from the constrained extension of \texttt{P} should used. Without Condition~3, the requirement \texttt{Self.T == Self} would not be satisfied when matching the requirement \texttt{f()} with its witness; the left hand side of the requirement, \texttt{C}, is not exactly equal to the right hand side, which is the covariant \texttt{Self} type that is only known to be \emph{some subclass} of \texttt{C}. The conformance would be rejected unless \texttt{C} was declared final. With Condition~3, \texttt{Self.T == Self} is satisfied because the static type \texttt{C} is used in place of \texttt{Self} during witness matching. The compiler therefore continued to accept the above code, because it worked prior to Swift~4.1. Unfortunately, it means that a call to \texttt{D().f()} via the witness thunk will still return an instance of \texttt{C}, and not \texttt{D} as expected. One day, we might remove this exception and close the soundness hole completely, breaking source compatibility for the above example until the developer makes it type safe by declaring \texttt{C} as final. For now, a good guideline to ensure type safety when mixing classes with protocols is \textsl{only final classes should conform to protocols with associated types}. \section{Witness Thunk Signatures}\label{witnessthunksignature} Now we turn our attention to the construction of the data recorded in the \texttt{Witness} type. This is done with the aid of the \texttt{RequirementEnvironment} class, which implements the ``builder'' pattern. Building the witness thunk signature is an expensive operation. The below algorithms only depend on the conformance being checked, the generic signature of a protocol requirement, and whether the witness requires the use of a covariant \texttt{Self} type. These three pieces of information can be used as a uniquing key to cache the results of these algorithms. Conformance checking might need to consider a number of protocol requirements, each requirement having multiple candidate witnesses that have to be checked to find the best one. In the common case, many protocol requirements will share a generic signature---for example, any protocol requirement without generic parameters of its own has the simple generic signature \texttt{}. Therefore this caching can eliminate a fair amount of duplicated work. The \textbf{witness substitution map} is built by the constraint solver when matching the interface type of a witness to the interface type of a requirement. A description of this process is outside of the scope of this manual. The \textbf{requirement substitution map} is built by mapping the requirement's \texttt{Self} parameter either to the witness thunk's \texttt{Self} parameter (if the witness has a covariant class \texttt{Self} type), or to the concrete conforming type otherwise. All other generic parameters of the requirement map over to generic parameters of the witness thunk, possibly at a different depth. The requirement's \texttt{Self} conformance is always a concrete conformance, even in the covariant \texttt{Self} case, because \texttt{Self} is subject to a superclass requirement in that case. All other conformance requirements of the requirement's generic signature remain abstract. The \textbf{witness thunk generic signature} is constructed by stitching together the generic signature of the conformance context with the generic signature of the protocol requirement. \begin{algorithm}[Build the requirement to witness thunk substitution map] As input, takes a normal conformance~\texttt{N}, the generic signature of a protocol requirement~\texttt{G}, and a flag indicating if the witness has a covariant class \texttt{Self} type,~\texttt{F}. Outputs a substitution map for \texttt{G}. \begin{enumerate} \item Initialize \texttt{R} to an empty list of replacement types. \item Initialize \texttt{C} to an empty list of conformances. \item (Remapping) First compute the depth at which non-\texttt{Self} generic parameters of \texttt{G} appear in the witness thunk signature. Let $\texttt{G}'$ be the generic signature of \texttt{N}, and let \texttt{D} be one greater than the depth of the last generic parameter of $\texttt{G}'$. If $\texttt{G}'$ has no generic parameters, set $\texttt{D}=0$. If \texttt{F} is set, increment \texttt{d} again. \item (Self replacement) If \texttt{F} is set, record the replacement $\ttgp{0}{0} := \ttgp{0}{0}$ in \texttt{R}. Otherwise, let \texttt{T} be the type of \texttt{N}, and record the replacement $\ttgp{0}{0} := \texttt{T}$. \item (Remaining replacements) Any remaining generic parameters of \texttt{G} must have a depth of 1. For each remaining generic parameter \ttgp{1}{i}, record the replacement $\ttgp{1}{i}~:=~\ttgp{D}{i}$. \item (Self conformance) If \texttt{F} is set, build a substitution map $\texttt{S}$ for $\texttt{G}'$ mapping each generic parameter \ttgp{d}{i} to \ttgp{(d+1)}{i}. Apply this substitution map to \texttt{N} to get a specialized conformance, and record this specialized conformance in \texttt{C}. \item (Self conformance) Otherwise if \texttt{F} is not set, just record \texttt{N} in \texttt{C}. \item (Remaining conformances) Any remaining conformance requirements in \texttt{G} have a subject type rooted in a generic parameter at depth~1. For each remaining conformance requirement \texttt{T:~P}, record an abstract conformance to \texttt{P} in \texttt{C}. Abstract conformances do not store a conforming type, but if they did, the same remapping process would be applied here. \item (Return) Build a substitution map for \texttt{G} from \texttt{R} and \texttt{C}. \end{enumerate} \end{algorithm} \begin{algorithm}[Build the witness thunk generic signature] As input, takes a normal conformance~\texttt{N}, the generic signature of a protocol requirement~\texttt{G}, and a flag indicating if the witness has a covariant class \texttt{Self} type,~\texttt{F}. Outputs a substitution map for \texttt{G}. \begin{enumerate} \item Initialize \texttt{P} to an empty list of generic parameter types. \item Initialize \texttt{R} to an empty list of generic requirements. \item (Remapping) First compute the depth at which non-\texttt{Self} generic parameters of \texttt{G} appear in the witness thunk signature. Let $\texttt{G}'$ be the generic signature of \texttt{N}, and let \texttt{d} be one greater than the depth of the last generic parameter of $\texttt{G}'$. If $\texttt{G}'$ has no generic parameters, set $\texttt{d}=0$. If \texttt{F} is set, increment \texttt{d} again. \item If \texttt{F} is set, we must first introduce a generic parameter and superclass requirement for the covariant \texttt{Self} type: \begin{enumerate} \item (Self parameter) Add the generic parameter \ttgp{0}{0} to \texttt{P}. This generic parameter will represent the covariant \texttt{Self} type. \item (Remap Self type) Build a substitution map for $\texttt{G}'$ mapping each generic parameter \ttgp{d}{i} to \ttgp{(d+1)}{i}. Apply this substitution map to the type of \texttt{N}, and call the result \texttt{T}. \item (Self requirement) Add a superclass requirement \texttt{\ttgp{0}{0}:\ T} to \texttt{R}. \item (Context generic parameters) For each generic parameter \ttgp{d}{i} in $\texttt{G}'$, add the generic parameter \ttgp{(d+1)}{i} to \texttt{P}. \item (Context generic requirements) For each requirement of $\texttt{G}'$, apply \texttt{S} to the requirement and add the substituted requirement to \texttt{R}. \end{enumerate} \item If \texttt{F} is not set, the generic parameters and requirements of the conformance context carry over unchanged: \begin{enumerate} \item (Context generic parameters) Add all generic parameters of $\texttt{G}'$ to \texttt{P}. \item (Context generic requirements) Add all generic requirements of $\texttt{G}'$ to \texttt{R}. \end{enumerate} \item (Remaining generic parameters) All non-\texttt{Self} generic parameters of \texttt{G} must have a depth of 1. For each remaining generic parameter \ttgp{1}{i}, add \ttgp{D}{i} to \texttt{P}. \item (Trivial case) If no generic parameters have been added to \texttt{P} so far, the witness thunk generic signature is empty. Return. \item (Remaining generic requirements) For each generic requirement of \texttt{G}, apply the requirement to witness thunk substitution map to the requirement, and add the substituted requirement to \texttt{R}. \item (Return) Build a minimized generic signature from \texttt{P} and \texttt{R} and return the result. \end{enumerate} \end{algorithm} \vfill \eject \begin{example} If the neither the conforming type nor the witness is generic, and there is no covariant \texttt{Self} parameter, the witness thunk signature is trivial. \begin{Verbatim} protocol Animal { associatedtype CommodityType: Commodity func produce() -> CommodityType } struct Chicken: Animal { func produce() -> Egg {...} } \end{Verbatim} \begin{description} \item[Witness thunk signature] None. \item[Witness generic signature] None. \item[Witness substitution map] None. \item[Requirement generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Requirement substitution map] The protocol requirement does not have its own generic parameter list, but it still inherits a generic signature from the protocol declaration. \begin{quote} \SubMapC{ \SubType{Self}{Chicken} }{ \SubConf{Chicken:\ Animal} } \end{quote} \end{description} \end{example} \vfill \eject \begin{example} Generic conforming type. \begin{Verbatim} protocol Habitat { associatedtype AnimalType: Animal func adopt(_: AnimalType) } struct Barn: Habitat { func adopt(_: AnimalType) {...} } \end{Verbatim} \begin{description} \item[Witness thunk signature] \vphantom{a} \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{0}{1} where \ttgp{0}{0}:\ AnimalType>} \end{quote} \item[Witness generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Witness substitution map] This is actually the identity substitution map because each generic parameter is replaced with its canonical form. \begin{quote} \SubMapC{ \SubType{AnimalType}{\ttgp{0}{0}}\\ \SubType{StallType}{\ttgp{0}{1}} }{ \SubConf{AnimalType:\ Animal} } \end{quote} \item[Requirement generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Requirement substitution map] \phantom{a} \begin{quote} \SubMapC{ \SubType{Self}{Barn<\ttgp{0}{0}, \ttgp{0}{1}>} }{ \SubConf{Barn<\ttgp{0}{0}, \ttgp{0}{1}>:\ Habitat} } \end{quote} \end{description} \end{example} \vfill \eject \begin{example} Conditional conformance. \begin{Verbatim} struct Dictionary {...} extension Dictionary: Equatable where Value: Equatable { static func ==(lhs: Self, rhs: Self) -> Bool {...} } \end{Verbatim} \begin{description} \item[Witness thunk signature] \vphantom{a} \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{0}{1} where \ttgp{0}{0}:\ Hashable, \ttgp{0}{1}:\ Equatable>} \end{quote} \item[Witness generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Witness substitution map] This is again the identity substitution map because each generic parameter is replaced with its canonical form. \begin{quote} \SubMapC{ \SubType{Key}{\ttgp{0}{0}}\\ \SubType{Value}{\ttgp{0}{1}} }{ \SubConf{Key:\ Hashable}\\ \SubConf{Value:\ Equatable} } \end{quote} \item[Requirement generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Requirement substitution map] \vphantom{a} \begin{quote} \SubMapC{ \SubType{Self}{Dictionary<\ttgp{0}{0}, \ttgp{0}{1}>} }{ \SubConf{Dictionary<\ttgp{0}{0}, \ttgp{0}{1}>:\ Equatable}\\ \multicolumn{3}{|l|}{with conditional requirement \texttt{\ttgp{0}{1}:\ Equatable}} } \end{quote} \end{description} \end{example} \vfill \eject \begin{example} Witness is in a protocol extension. \begin{Verbatim} protocol Shape { var children: [any Shape] } protocol PrimitiveShape:\ Shape { var children: [any Shape] { return [] } } struct Empty: PrimitiveShape {} \end{Verbatim} \begin{description} \item[Witness thunk signature] None. \item[Witness generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Witness substitution map] \vphantom{a} \begin{quote} \SubMapC{ \SubType{Self}{Empty} }{ \SubConf{Empty:\ PrimitiveShape} } \end{quote} \item[Requirement generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Requirement substitution map] \phantom{a} \begin{quote} \SubMapC{ \SubType{Self}{Empty} }{ \SubConf{Empty:\ Shape} } \end{quote} \end{description} \end{example} \vfill \eject \begin{example} Conforming type is a generic class, and the witness is in a protocol extension. \begin{Verbatim} protocol Cloneable { init(from: Self) func clone() -> Self } extension Cloneable { func clone() -> Self { return Self(from: self) } } class Box: Cloneable { var contents: Contents required init(from other: Self) { self.contents = other.contents } } \end{Verbatim} \begin{description} \item[Witness thunk signature] \vphantom{a} \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{1}{0} where \ttgp{0}{0}:\ Box<\ttgp{1}{0}>>} \end{quote} \item[Witness generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Witness substitution map] \vphantom{a} \begin{quote} \SubMapC{ \SubType{Self}{\ttgp{0}{0}} }{ \SubConf{Box<\ttgp{1}{0}>:\ Cloneable} } \end{quote} \item[Requirement generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Requirement substitution map] \phantom{a} \begin{quote} \SubMapC{ \SubType{Self}{\ttgp{0}{0}} }{ \SubConf{Box<\ttgp{1}{0}>:\ Cloneable} } \end{quote} \end{description} \end{example} \vfill \eject \begin{example} Requirement is generic. \begin{Verbatim} protocol Q {} protocol P { func f(_: A) } struct Outer { struct Inner: P { func f(_: A) {} } } \end{Verbatim} \begin{description} \item[Witness thunk signature] \vphantom{a} \begin{quote} \texttt{<\ttgp{0}{0}, \ttgp{1}{0}, \ttgp{2}{0} where \ttgp{2}{0}:\ Q>} \end{quote} \item[Witness generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Witness substitution map] \vphantom{a} \begin{quote} \SubMapC{ \SubType{T}{\ttgp{0}{0}}\\ \SubType{U}{\ttgp{1}{0}}\\ \SubType{A}{\ttgp{2}{0}} }{ \SubConf{\ttgp{2}{0}:\ Q} } \end{quote} \item[Requirement generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Requirement substitution map] \phantom{a} \begin{quote} \SubMapC{ \SubType{Self}{Outer<\ttgp{0}{0}>.Inner<\ttgp{1}{0}>}\\ \SubType{A}{\ttgp{2}{0}} }{ \SubConf{\ttgp{2}{0}:\ Q} } \end{quote} \end{description} \end{example} \vfill \eject \begin{example} Witness is more generic than the requirement. \begin{Verbatim} protocol P { associatedtype A: Equatable associatedtype B: Equatable func f(_: A, _: B) } struct S: P { typealias B = Int func f(_: T, _: U) {} } \end{Verbatim} The type witness for \texttt{A} is the generic parameter \texttt{A}, and the type witness for \texttt{B} is the concrete type \texttt{Int}. The witness \texttt{S.f()} for \texttt{P.f()} is generic, and can be called with any two types that conform to \texttt{Equatable}. Since the type witnesses for \texttt{A} and \texttt{B} are both \texttt{Equatable}, a fixed specialization of \texttt{S.f()} witnesses \texttt{P.f()}. \begin{description} \item[Witness thunk signature] \vphantom{a} \begin{quote} \texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable>} \end{quote} \item[Witness generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Witness substitution map] \vphantom{a} \begin{quote} \SubMapC{ \SubType{A}{\ttgp{0}{0}}\\ \SubType{T}{\ttgp{0}{0}}\\ \SubType{U}{Int} }{ \SubConf{\ttgp{0}{0}:\ Equatable}\\ \SubConf{\ttgp{0}{0}:\ Equatable}\\ \SubConf{Int:\ Equatable} } \end{quote} \item[Requirement generic signature] \vphantom{a} \begin{quote} \texttt{} \end{quote} \item[Requirement substitution map] \phantom{a} \begin{quote} \SubMapC{ \SubType{Self}{S<\ttgp{0}{0}>} }{ \SubConf{S<\ttgp{0}{0}>:\ P} } \end{quote} \end{description} \end{example} \section{Source Code Reference} TODO: \chapter{The \texttt{@\_specialize} Attribute} TODO: \begin{itemize} \item link here from other places that mention getLayoutConstraint(), etc. \item spelling of each layout requirement \item intersection relation \end{itemize} \section{Source Code Reference} TODO: \fi \part{The Requirement Machine}\label{part rqm} \ifWIP \chapter{Introduction} The Swift 5.6 compiler incorporates a new generic programming implementation, dubbed the ``requirement machine''. The goals were to improve the correctness, performance and maintainability of this aspect of the compiler. Internally, the requirement machine is based around confluent rewrite systems. Chapter~\ref{history} will give the historical context showing how the evolution of the language and implementation motivated work on the requirement machine. Chapters \ref{monoids}, \ref{monoidsasprotocols} and \ref{rewritesystemintro} give an overview of the mathematical theory of confluent rewrite systems that underpins the requirement machine. Chapter \ref{protocolsasmonoids} will begin to translate this theory into practice, then Chapter~\ref{associatedtypes} builds up a series of progressively more complex examples showing how rewrite systems can be used to reason about type parameters in a generic signature. Chapter \ref{requirementmachine} will build upon this intuitive understanding to give a formal definition of the requirement machine rewrite system. Chapter \ref{propertymap} introduces the ``property map,'' which builds upon the rewrite system and answers queries that cannot be resolved with term rewriting alone. The property map also plays a crucial role in the implementation of superclass and concrete type requirements. \chapter{A Little Bit of History}\label{history} The original Swift 1.0 language supported all the modern kinds of generic requirements except for layout requirements; those did not exist because $\proto{AnyObject}$ was actually a special protocol with built-in support from the compiler, but it behaved much like the $\proto{AnyObject}$ layout constraint does today. However, Swift 1.0 imposed two major restrictions on the expressivity of protocol definitions: \begin{itemize} \item protocol definitions did not allow \texttt{where} clauses, \item associated types in protocols could not state a conformance requirement which referenced the protocol containing the associated type, either directly or indirectly. \end{itemize} Swift 4.0 introduced \texttt{where} clauses on protocols and associated types \cite{se0142}. Swift 4.1 lifted the restriction prohibiting recursive protocol conformances \cite{se0157}. Both features are desirable, as the modern $\proto{Collection}$ protocol demonstrates as part of the definition of the $\namesym{SubSequence}$ associated type: \begin{Verbatim} protocol Collection : Sequence { associatedtype SubSequence : Collection where SubSequence.Element == Element, SubSequence.SubSequence == SubSequence } \end{Verbatim} Intuitively, these requirements have the following interpretation: \begin{itemize} \item Slicing a collection can return another, possibly different type of collection, but the slice must have the same element type as the original. \item If you slice a slice, you get the same type, since it would not be desirable if slices stacked recursively. \end{itemize} \index{recursive conformance requirement} \index{conformance requirement!recursive|see{recursive conformance requirement}} A requirement like $\namesym{SubSequence}\colon\proto{Collection}$ is called a \emph{recursive conformance requirement}, because it appears inside the definition of the $\proto{Collection}$ protocol itself. In the $\proto{Collection}$ protocol, the recursion via $\namesym{SubSequence}$ only goes one level deep, because of the second same-type requirement which ``ties it off''. That is, if $\genericparam{T}$ is constrained to $\proto{Collection}$, $\genericparam{T}.\namesym{SubSequence}$ is a distinct type parameter conforming to $\proto{Collection}$, but $\genericparam{T}.\namesym{SubSequence}.\namesym{SubSequence}$ is equivalent to $\genericparam{T}.\namesym{SubSequence}$. However, it is also permissible to use an unconstrained recursive conformance requirement to define an infinite sequence of type parameters. The SwiftUI $\proto{View}$ protocol is one example: \begin{Verbatim} protocol View { associatedtype Body : View var body: Body { get } } \end{Verbatim} If $\genericparam{V}$ is constrained to conform to $\proto{View}$, there is an infinite sequence of unique type parameters rooted at $\genericparam{V}$: \begin{align*} &\genericparam{V}\\ &\genericparam{V}.\namesym{Body}\\ &\genericparam{V}.\namesym{Body}.\namesym{Body}\\ &\genericparam{V}.\namesym{Body}.\namesym{Body}.\namesym{Body}\\ &\cdots \end{align*} In contrast, in the absence of recursive protocol conformances, a generic signature can only induce a \emph{finite} set of distinct type parameters. In Swift 3.1 and older, the compiler component for reasoning about type parameters had a simple design: \begin{algorithm}[Old \texttt{ArchetypeBuilder} algorithm]\label{archetypebuilder} The inputs are a list of generic parameters and generic requirements. The output is a directed graph. \index{equivalence class} A path beginning at a root node corresponds to a valid type parameter. Multiple type parameters that are equivalent via same-type requirements are different paths that reach the same node. A node corresponds to an equivalence class of type parameters. Nodes store a list of conformance, superclass and concrete type requirements that apply to each type parameter in the equivalence class. The algorithm proceeds in two phases: \begin{enumerate} \item (Expand) Begin by building a ``forest'' of type parameters, with generic parameters at the roots. Each generic parameter node starts out without children. \begin{enumerate} \item For every top-level requirement, find the subject generic parameter, record the requirement in the generic parameter's node, and if its a conformance requirement, add new children corresponding to each associated type of the protocol. \item Recursively record and expand requirements on any associated type nodes introduced above. \end{enumerate} \item (Union-find) Then, process top-level same-type requirements: \begin{enumerate} \item First, resolve the left and right-hand sides, and merge the two nodes into an equivalence class. \item Find any pairs of child nodes in the two merged nodes that have the same name, and recursively merge those child nodes as well. \end{enumerate} \end{enumerate} \end{algorithm} % add a graph A side effect of both the recursive expansion and union-find steps is the gathering of a list of requirements in each equivalence class. These gathered requirements were used to answer queries such as ``does this type parameter conform to a protocol''. This algorithm survived the introduction of protocol \texttt{where} clauses in Swift 4.0 with some relatively minor changes; namely, the processing of same-type requirements became somewhat more complex, since they could be introduced at any level in the graph. When recursive conformances were introduced in Swift 4.1, the \texttt{ArchetypeBuilder} underwent a major overhaul, where it was renamed to \texttt{GenericSignatureBuilder}. Since the equivalence class graph was no longer necessarily finite, the biggest change was the move to a lazy evaluation approach---traversing a hitherto-unvisited part of the equivalence class graph would now lazily expand conformance requirements as needed. Unfortunately the limitations of this lazy expansion approach soon made themselves apparent. The equivalence class graph could be mutated both as a consequence of adding the initial set of requirements in a generic signature, and also by lazy expansion performed while answering queries. The highly mutable nature of the implementation made it difficult to understand and debug. It also became a performance problem, because ``expanding too much'' had to be balanced against ``not expanding enough.'' For example, any generic signature referencing one of the more complicated protocol towers, such as $\proto{RangeReplaceableCollection}$ or $\proto{FixedWidthInteger}$ from the standard library, would re-build the entire sub-graph of all nested associated types of each protocol from scratch. On the other hand, skipping expansion of recursive nested types could lead to same-type requirements being missed, which would result in the incorrect formation of multiple distinct equivalence classes that should actually be a single class. I later realized the lazy expansion strategy suffers from an even fundamental problem; as you will see in Chapter~\ref{monoidsasprotocols}, the full generality of the generics system makes generic signature queries undecidable. The design of the \texttt{GenericSignatureBuilder} was not sufficiently principled to determine if the input was too complex for its internal model, either crashing or silently producing incorrect results if this occurred. The development of the requirement machine was motivated by the desire to find an immutable, closed-form representation of an entire, potentially infinite, type parameter graph. While the undecidability of the problem means this is not possible in the general case, I believe the formulation in terms of a confluent rewrite system should handle any reasonable generic signatures that appear in practice. \chapter{Monoids}\label{monoids} Over the next two chapters, I'm going to take a rather circuitous route through the field of abstract algebra. It's possible to define finitely-presented monoids without talking about quotients of free monoids. For that matter, rewrite systems can be introduced without using the word ``monoid'' at all. But the various definitions you'll see along the way will come in handy later, and you can always skim this section and come back to it if needed. \index{monoid} \index{binary operation} \index{identity element} \index{associative operation} \begin{definition} A \emph{monoid} $(M,\, \otimes,\, \varepsilon)$ is a set $M$ together with a binary operation $\otimes$ and a unique\footnote{Just for fun, you can try showing that the uniqueness of the identity doesn't have to be stated explicitly as an axiom; that is, if $\varepsilon$ and $\epsilon$ both satisfy the conditions of an identity element, it necessarily follows that $\varepsilon=\epsilon$.} identity element $\varepsilon$, where the binary operation satisfies the following pair of axioms: \begin{itemize} % FIXME closure \item (Associativity) For $x, y, z \in M$, $x\otimes(y\otimes z)=(x\otimes y)\otimes z$. \item (Identity) For $x\in M$, $x\otimes \varepsilon=\varepsilon\otimes x=x$. \end{itemize} \end{definition} % FIXME mention 'group' Often it's convenient to omit the binary operation symbol, for example when it's juxtaposed between two single-letter variable names, and e.g., write $xy$ in place of $x\otimes y$. I will also use exponent notation $a^n$ to denote the product of $n$ copies of $a$: \[a^n=\underbrace{a\otimes\cdots\otimes a}_{\textrm{$n$ times}}\] \begin{example} Once you know what to look for, you'll start seeing monoids everywhere. Some common examples of monoids: \begin{enumerate} \index{natural numbers} \item Natural numbers under addition: $(\mathbb{N},\, +,\, 0)$. \item Integers modulo 4 under multiplication: $(\mathbb{Z}/4\mathbb{Z},\, \times,\, 1)$. \index{strings} \item Strings over the alphabet $\{a,b,c\}$ with string concatenation as the binary operation, with the empty string as the identity element: $(\{a, b, c\}^*,\, \otimes,\, \varepsilon)$. \item All functions $S\rightarrow S$ for some set $S$, with function composition as the binary operation, and the identity function $\mathrm{id}\colon S\rightarrow S$ as the identity element: $(S^S,\, \circ,\, \mathrm{id})$. \end{enumerate} \end{example} \index{commutativity} The last two examples show that the binary operation need not be commutative; that is, in general $x\otimes y \neq y\otimes x$. Examples 1 and 3 are two instances of a particularly important class of monoid. \index{free monoid} \begin{definition} A \emph{free monoid} over some set $A$, denoted $A^*$, is the set of all strings with elements from $A$, together with string concatenation as the binary operation, and the empty string as the identity. Since the empty string is, well, empty, I'm going to write it as $\varepsilon$ (hopefully $\varepsilon$ itself is not an element of $A$, though). The set $A$ is the \emph{alphabet} of $A^*$; another bit of jargon you will see is that $A$ is the \emph{generating set} of $A^*$. \end{definition} The third example is of course the free monoid over the 3-symbol alphabet $\{a,\,b,\,c\}$. Two elements of this monoid are $abab$ and $cbca$; their product is the concatenation: \[abab\otimes cbca=ababcbca.\] What about the first example of the natural numbers under addition? You should be able to convince yourself that this is the free monoid generated by the singleton set $\{1\}$, or using the \emph{Kleene star} notation, $\{1\}^*$. The identity element, or ``empty string of 1's,'' is denoted $0$. This follows from the fact that any natural number can be written as a (possibly empty) sum of $1$'s; e.g., $3=1+1+1$. \index{finitely-generated free monoid} The set $A$ generating a free monoid may be finite or infinite. If the generating set $A$ is finite, you can say that $A^*$ is \emph{finitely generated}. The set of \emph{elements} of $A^*$ on the other hand is never finite unless $A$ is empty, in which case $A^*$ is the trivial monoid consisting of a single identity element, $\{\varepsilon\}$. \section{Finitely-Presented Monoids} Next, let's generalize finitely-generated free monoids to get what is called a \emph{finitely-presented monoid}. Informally, you can think of a finitely-presented monoid as also being a set of strings over an alphabet under concatenation, except that there are one or more ``equations'' for rewriting substrings, with each element possibly having multiple different equivalent spellings. First, I'm going to formalize what is meant by ``equations.'' \index{relation} \begin{definition}\label{relationdef} If $S$ is some set, then $R$ is a \emph{relation} over $S$ if $R\subseteq S\times S$. That is, a relation is a set of ordered pairs with elements from $S$. For some $s, t\in S$, $R$ \emph{relates} $s$ and $t$, or more concisely $s\mathrel{R}t$, if $(s, t)\in R$. An alternative definition is that a relation is a function $R\colon S\times S \rightarrow \{0, 1\}$, where $s\mathrel{R}t$ if $R(s, t)=1$. This definition should feel familiar to programmers since most languages model relations as operators or functions returning a boolean value. Some relations have special properties: \begin{itemize} \index{reflexive relation} \item The relation is \emph{reflexive} if for all $s\in S$, $s\mathrel{R}s$. \index{transitive relation} \item The relation is \emph{transitive} if for all $s, t, u\in S$, $s\mathrel{R}t$ and $t\mathrel{R}u$ implies that $s\mathrel{R}u$. \index{symmetric relation} \item The relation is \emph{symmetric} if for all $s, t\in S$, $s\mathrel{R}t$ if and only if $t\mathrel{R}s$. \index{equivalence relation} \index{equivalence class} \item The relation is an \emph{equivalence relation} is it is reflexive, transitive, and symmetric. \item If $R$ is an equivalence relation over $S$ and $s\in S$, then the \emph{equivalence class} of $s$ is the set of all elements $t\in S$ such that $s\mathrel{R}t$. Every element of $S$ belongs to exactly one equivalence class. \end{itemize} \end{definition} Some example relations and the properties they satisfy: \begin{itemize} \item The ``less than or equal to'' relation on natural numbers is reflexive, because $n\le n$ for all $n\in\mathbb{N}$. However, ``strictly less than'' is not reflexive, because $n \nless n$. \item Both the ``less than or equal to'' and ``strictly less than'' relations on natural numbers are transitive; if $a\le b\le c$, then $a\le c$. \item Neither ``less than or equal to'' nor ``strictly less than'' are symmetric; $4 < 5$ certainly does not imply $5 < 4$. \index{power set} \item Given some set $S$, the relation $R$ over the power set\footnote{The set of all subsets of $S$, sometimes denoted $2^S$ or $\mathcal{P}(S)$.} of $S$ defined by ``two sets have a non-empty intersection'' is symmetric and reflexive, but not transitive. To see why, let $S=\{x, y, z\}$. The sets $\{x\}$ and $\{x, y\}$ are related by $R$; so are the sets $\{x, y\}$ and $\{y\}$. However, $\{x\}$ and $\{y\}$ are not related since they do not intersect. \item The relation over $\mathbb{N}$ given by $x\mathrel{R}y$ iff $x=y \mod 5$ is an equivalence relation.\footnote{You can try showing that this is true. Hint: observe that $x\mathrel{R}y$ iff $x-y=5n$ for some $n\in\mathbb{N}$, and then prove that $R$ is symmetric, reflexive, and transitive.} \end{itemize} Now, let's focus on relations over monoids specifically, rather than arbitrary sets. \index{translation-invariant relation} \begin{definition}\label{transinv} A relation $R$ over the free monoid $A^*$ is \emph{translation-invariant} if for all $x$, $y$, $s$, $t\in A^*$, \[x\mathrel{R}y\qquad\hbox{implies}\qquad (s\otimes x\otimes t)\mathrel{R}(s\otimes y\otimes t).\] \end{definition} Translation invariance generalizes an intuitive geometric concept---if you take two points on the real number line $x,y\in\mathbb{R}$, then moving both sides by an equal amount preserves inequalities; for example $x(_ x: X) -> X.A.B.A { return x.a.b.a } func multiplyByBB(_ x: X) -> X.B.B { return x.b.b } \end{Verbatim} \end{listing} Consider the generic signature $\gensig{\genericparam{T}}{\genericparam{T}\colon\proto{P}}$, with $\proto{P}$ as shown in Listing \ref{freetwoproto}. The two associated types $\namesym{A}$ and $\namesym{B}$ recursively conform to $\proto{P}$, which generates infinitely many type parameters. These type parameters all begin with $\genericparam{T}$ followed by an arbitrary sequence of $\namesym{A}$'s and $\namesym{B}$'s: \begin{quote} \noindent$\genericparam{T}$\\ $\genericparam{T}.\namesym{A}$\\ $\genericparam{T}.\namesym{B}$\\ $\genericparam{T}.\namesym{A}.\namesym{A}$\\ $\genericparam{T}.\namesym{A}.\namesym{B}$\\ $\genericparam{T}.\namesym{B}.\namesym{A}$\\ $\genericparam{T}.\namesym{B}.\namesym{B}$\\ $\ldots$ \end{quote} You might (correctly) guess that this definition of $\bm{\mathsf{P}}$ is in fact a representation of the free monoid over two generators $\{a, b\}$ in the Swift language. Compositions of the property accessors \texttt{.a}, \texttt{.b} and \texttt{.id} are actually performing the monoid operation $\otimes$ \emph{at compile time}, with type parameters as monoid elements. Listing \ref{freetwoproto} also shows a pair of function definitions, \begin{itemize} \item \texttt{multiplyByBB(\_:)}, and \item \texttt{multiplyByABA(\_:)}. \end{itemize} These functions implement ``multiplication'' by $bb$ and $aba$, respectively. Say that \texttt{t} is a $\genericparam{T}$, and $\genericparam{T}$ conforms to $\bm{\mathsf{P}}$. If we first apply \texttt{multiplyByABA(\_:)} to \texttt{t}, and then apply \texttt{multiplyBB(\_:)} to the result, you will have ``multiplied'' the type $\genericparam{T}$ by $ababb$ on the right: \begin{itemize} \item First, substituting $\genericparam{X}:=\genericparam{T}$ into the type of \texttt{multiplyByABA(\_:)} gives \[\genericparam{T}.\namesym{A}.\namesym{B}.\namesym{A}.\] \item Then, substituting $\genericparam{X} := \genericparam{T}.\namesym{A}.\namesym{B}.\namesym{A}$ into the type of \texttt{multiplyByBB(\_:)} gives the final result \[\genericparam{T}.\namesym{A}.\namesym{B}.\namesym{A}.\namesym{B}.\namesym{B}.\] \end{itemize} In a free monoid, each term denotes a unique element; in the world of Swift that means each path of $\namesym{A}$'s and $\namesym{B}$'s is a unique type parameter. This can be formalized as follows: \begin{algorithm}[Constructing a protocol from a free monoid]\label{freemonoidproto} Let $A^*$ be the free monoid over the alphabet $\{a_1,a_2,\ldots,a_n\}$. A protocol $\proto{P}$ can be constructed from $A^*$ as follows: \begin{enumerate} \item First, begin with an empty protocol definition: \begin{Verbatim} protocol P {} \end{Verbatim} \item Now, for each element $a_i\in A$, declare an associated type conforming to $\proto{P}$ within the protocol's body: \begin{Verbatim} associatedtype Ai : P \end{Verbatim} \end{enumerate} \end{algorithm} \index{lifting map} \index{lowering map} \begin{definition}[Lowering and lifting maps]\label{liftingloweringmaps} Let $\proto{P}$ be a protocol constructed from a free monoid $A^*$ by the above algorithm, and write $\mathsf{Type}$ for the set of all type parameters of $\proto{P}$. The elements of $\mathsf{Type}$ all begin with the protocol $\genericparam{Self}$ type, followed by zero or more associated type names, joined with ``\texttt{.}''. Define a pair of maps, called the \emph{lifting map} and the \emph{lowering map}. The lowering map sends terms to type parameters, and the lifting map sends type parameters to terms: \begin{align*} \Lambda_{\proto{P}}&\colon \mathsf{Type}\rightarrow A^*\\ \Lambda^{-1}_{\proto{P}}&\colon A^*\rightarrow\mathsf{Type} \end{align*} \index{protocol $\genericparam{Self}$ type} \begin{itemize} \item The lowering map $\Lambda_{\proto{P}}$ drops the $\genericparam{Self}$ parameter, and maps each associated type name $\namesym{Ai}$ to the corresponding element $a_i\in A$, concatenating all elements to form the final result. \item The lifting map $\Lambda^{-1}_{\proto{P}}$ operates in reverse; given an arbitrary term in $A^*$, it replaces each element $a_i\in A$ with the associated type name $\namesym{Ai}$, joins the associated type names with ``\texttt{.}'' to form Swift syntax for a nested type, and finally prepends the protocol $\genericparam{Self}$ type to the result. \end{itemize} Note that applying the lifting map to the identity element $\varepsilon\in A^*$ produces the protocol $\genericparam{Self}$ type. \end{definition} \begin{lemma} The lowering and lifting maps have a couple of interesting properties: \begin{itemize} \item They are inverses of each other; that is, for all $x\in A^*$ and $T\in\mathsf{Type}$, \begin{align*} \Lambda_{\proto{P}}(\Lambda_{\proto{P}}^{-1}(x))&=x,\\ \Lambda_{\proto{P}}^{-1}(\Lambda_{\proto{P}}(T))&=T. \end{align*} \item If $T$, $U\in\mathsf{Type}$, define $T[\genericparam{Self}:=U]$ to be the type parameter obtained by substituting the protocol $\genericparam{Self}$ type of $T$ with $U$. This type satisfies the following identity: \[\Lambda_{\proto{P}}(T[\genericparam{Self}:=U]) = \Lambda_{\proto{P}}(U)\otimes \Lambda_{\proto{P}}(T).\] That is, the lowering and lifting maps are compatible with the monoid operation in $A^*$. \end{itemize} \end{lemma} The construction performed by Algorithm~\ref{freemonoidproto} can be generalized to finitely-presented monoids. The overall idea is the same as with free monoids, except for the addition of relations, which become same-type requirements in the Swift world. Listing \ref{bicyclicproto} shows a Swift protocol representation of the bicyclic monoid from Example \ref{bicyclic}, together with a \texttt{multiplyByBA(\_:)} function that performs some compile-time monoid arithmetic. \begin{listing}\captionabove{Bicyclic monoid}\label{bicyclicproto} \begin{Verbatim} protocol Bicyclic { associatedtype A : Bicyclic associatedtype B : Bicyclic where A.B == Self var a: A { get } var b: B { get } var id: Self { get } } func multiplyByBA(_ x: X) -> X.B.A { return x.b.a } \end{Verbatim} \end{listing} Unlike our free monoid, the bicyclic monoid's presentation has a relation, so some elements can be spelled in multiple ways; for example, $aabba=a$. What does this identity mean in Swift? Well, an equivalence of monoid elements becomes an equivalence of type parameters. You can write down some type parameters in the signature $\gensig{\genericparam{T}}{\genericparam{T}\colon\proto{Bicyclic}}$, and then pass a pair of values to the \texttt{same(\_:\_:)} function, which will only type check if both values have equivalent types. In Listing \ref{bicycliccheck}, the first call to \texttt{same(\_:\_:)} will type check, since $aabba=a$. The second call will not type check, since $ab\ne ba$. \begin{listing}\captionabove{Checking equivalences in the bicyclic monoid}\label{bicycliccheck} \begin{Verbatim} func same(_: X, _: X) {} func bicyclicTest(_ x: X) { let s: X.A.A.B.B.A = x.a.a.b.b.a let t: X.A = x.a same(s, t) // this is OK let u: X.A.B = x.a.b let v: X.B.A = x.b.a same(u, v) // type check failure } \end{Verbatim} \end{listing} \index{monoid} You can construct similar code examples for any finitely-presented monoid; there is nothing special about the bicyclic monoid here. \begin{algorithm}[Constructing a protocol from a finitely-presented monoid]\label{protocolmonoidalgo} Let $\langle A;R\rangle$ be a finitely-presented monoid. A protocol $\proto{P}$ can be constructed from $\langle A;R\rangle$ as follows: \begin{enumerate} \item First, build $\proto{P}$ from the free monoid $A^*$ using Algorithm~\ref{freemonoidproto}. \item Second, introduce an empty \texttt{where} clause in the declaration of $\proto{P}$. \item Finally, for every equation $s=t$ of $R$, add a same-type requirement to this \texttt{where} clause, using the lifting map to obtain a pair of type parameters from $s$ and $t$: \[\Lambda_{\proto{P}}^{-1}(s)==\Lambda_{\proto{P}}^{-1}(t)\] \end{enumerate} \end{algorithm} \begin{theorem}\label{protocolmonoidthm} Let $\langle A;R\rangle$ be a finitely-presented monoid. The protocol~\proto{P} constructed by Algorithm~\ref{protocolmonoidalgo} has the property that if $x$, $y \in A^*$ are equal as elements of $\langle A;R\rangle$, then applying the \texttt{areSameTypeParametersInContext()} generic signature query to the type parameters $\Lambda_{\proto{P}}^{-1}(x)$ and $\Lambda_{\proto{P}}^{-1}(y)$ should return true. The other direction also holds, showing that type parameter equality is exactly the monoid congruence $\sim_R$ on $A^*$. \end{theorem} \begin{proof} If $x$, $y\in A^*$ are equal as elements of $\langle A;R\rangle$, it means that they satisfy $x\sim_R y$, where $\sim_R$ is the monoid congruence generated by $R$. This means that $y$ can be obtained from $x$ by applying a series of equations from $R$, replacing subterms at different positions. \index{derivation path} This can be formalized by writing down a \emph{derivation path}, which is a sequence of pairs $(s_i \Rightarrow t_i, k_i)$ where $s_i=t_i$ or $t_i=s_i$ is an equation of $R$ (depending on the direction in which the equation is applied), and $k_i\in\mathbb{N}$ is a non-negative number indicating the position at which to replace $s$ with $t$ with the intermediate term. Derivation paths satisfy a validity property. Let $x_i\in A^*$ be the $i$th intermediate term, obtained by applying the first $i$ components of the derivation path to the initial term $x$. Notice that $x_0=x$ since no components have been applied yet, and if $n$ is the total number of components, then $x_n=y$ is the final term. Also, if the $i$th derivation path component is $(s_i\Rightarrow t_i, k_i)$, the subterm of $x_i$ beginning at position $k$ is equal to $s_i$, and the subterm of $x_{i+1}$ beginning at position $k$ is equal to $t_i$. Each associated type of $\proto{P}$ conforms to $\proto{P}$ itself, which implies that every nested type parameter also conforms to $\proto{P}$. So the $k_i$-length prefix of $\Lambda_{\proto{P}}^{-1}(x_i)$ also conforms to $\proto{P}$. By construction, one of the equations $s=t$ or $t=s$ corresponds to a same-type requirement of $\proto{P}$. By the validity property, this same-type requirement can be applied to the type parameter $\Lambda_{\proto{P}}^{-1}(x_i)$ at position $k_i$ to obtain $\Lambda_{\proto{P}}^{-1}(x_{i+1})$. The derivation path that witnesses an equivalence between $x$ and $y$ via the intermediate terms $x_i$ can be viewed as a proof of the equivalence of the type parameters $\Lambda_{\proto{P}}^{-1}(x)$ and $\Lambda_{\proto{P}}^{-1}(y)$ via a series of same-type requirements applied to the intermediate type parameters $\Lambda_{\proto{P}}^{-1}(x_i)$. The other direction can be shown to hold via a similar argument. \end{proof} \section{Examples} This section re-states examples of finitely-presented monoids from the previous chapter as Swift protocol definitions using Algorithm~\ref{protocolmonoidalgo}. Feel free to skip ahead if you're not interested. \index{integers modulo $n$} \begin{example} % FIXME: re-state monoid presentation here The monoid of integers modulo 5 under addition: \begin{Verbatim} protocol Z5 { associatedtype A : Z5 where A.A.A.A.A == Self } \end{Verbatim} \end{example} \index{free commutative monoid} \begin{example} The free commutative monoid with two generators: \begin{Verbatim} protocol F2 { associatedtype A : F2 associatedtype B : F2 where A.B == B.A } \end{Verbatim} \end{example} \index{group of integers} \begin{example} The group of integers under addition: \begin{Verbatim} protocol Z { associatedtype A : Z where A.B == Self associatedtype B : Z where B.A == Self } \end{Verbatim} \end{example} \index{infinite dihedral group} \begin{example} The infinite dihedral group: \begin{Verbatim} protocol DInf { associatedtype A : DInf where A.A == Self associatedtype B : DInf where B.B == Self } \end{Verbatim} \end{example} \index{binary icosahedral group} \begin{example} The binary icosahedral group: \begin{Verbatim} protocol TwoI { associatedtype S : TwoI where S.S.S == Self, associatedtype T : TwoI where T.T.T.T.T == Self, S.T.S.T == Self } \end{Verbatim} \end{example} \section{Undecidability} Algorithm~\ref{protocolmonoidalgo} allows you to write down a ``well-formed'' protocol definition isomorphic to an arbitrary finitely-presented monoid, and Theorem~\ref{protocolmonoidthm} shows this construction can express computations in the monoid at compile-time. Note that I was very careful with the use of ``should'' in the statement of Theorem~\ref{protocolmonoidthm}. This is because it describes the operation of a ``platonic ideal'' Swift compiler. As it turns out, this is unimplementable in the real world, because Swift generics as specified are a little bit \emph{too} expressive. \index{decidability} \index{word problem} The \emph{word problem on finitely-presented monoids} asks if two strings in the free monoid $A^*$ are equivalent as elements of a finitely-presented monoid $\langle A; R\rangle$. All examples of monoids I've shown so far have decidable word problems. However, finitely-presented monoids with undecidable word problems do exist, meaning there is no computable function which can solve it in the general case. \begin{theorem}[From \cite{undecidablegroup}]\label{undecidablemonoid} The monoid presented by the following set of generators and relations has an undecidable word problem: \[\langle a, b, c, d, e;\;ac=ca;\;bc=cb;\;bd=db;\;ce=eca;\;de=edb;\;cca=ccae\rangle\] \end{theorem} Applying Algorithm~\ref{protocolmonoidalgo} to the above presentation produces the Swift program in Listing \ref{undecidableproto}. The requirement machine must be able to solve the word problem in any protocol definition it does accept. Therefore, this protocol definition must be rejected as invalid by the requirement machine. The best we can do is carve out a useful sub-class of protocols where the word problem is decidable, and reject all other protocol definitions. This is the focus of the next chapter. \begin{listing}\captionabove{Protocol with undecidable word problem}\label{undecidableproto} \begin{Verbatim} protocol Impossible { associatedtype A : Impossible associatedtype B : Impossible associatedtype C : Impossible associatedtype D : Impossible associatedtype E : Impossible where A.C == C.A, A.D == D.A B.C == C.B, B.D == D.B C.E == E.C.A, D.E == E.D.B C.C.A == C.C.A.E } \end{Verbatim} \end{listing} \chapter{Rewrite Systems}\label{rewritesystemintro} This section presents an informal introduction to the theory of \emph{rewrite systems}. A very thorough treatment of this subject can be found in \cite{andallthat}. This book talks about rewrite systems that manipulate more general ``tree-like'' algebraic terms, of which strings are just a special case. The requirement machine only needs \emph{string} rewriting, which greatly simplifies the theory, so I will re-state some of the key ideas in a self-contained manner below. To motivate some formal definitions, let's look at another finitely-presented monoid: \[\langle a,b,c;\; cc=c,\; a=ba,\; ca=a\rangle\] The intuitive mental model is that these equations are bi-directional; the equation $c=cc$ could just as easily been written as $cc=c$. The bi-directional nature of these equations will become apparent in the proof that $acca=bcaa$. First, let's list and number the relations: \begin{align} cc&\Longleftrightarrow c\tag{1}\\ a&\Longleftrightarrow ba\tag{2}\\ ca&\Longleftrightarrow a\tag{3} \end{align} Starting with the term $acca$, you can replace the $cc$ with $c$ by applying equation (1) in the $\Rightarrow$ direction, leaving us with $aca$. Then you can continue applying equations as follows: \begin{align} a\underline{cc}a&\rightarrow a\underline{c}a\tag{Eq 1, $\Rightarrow$}\\ \underline{a}ca&\rightarrow \underline{ba}ca\tag{Eq 2, $\Rightarrow$}\\ b\underline{a}ca&\rightarrow b\underline{ca}ca\tag{Eq 3, $\Leftarrow$}\\ bca\underline{ca}&\rightarrow bca\underline{a}\tag{Eq 3, $\Rightarrow$} \end{align} It so happens that the monoid presented above has a decidable word problem. Despite that, from looking at the presentation of the monoid, it is not immediately apparent that $acca=bcaa$, and proving this fact required applying equations in both directions, making the intermediate terms ``larger'' and ``smaller'' at different steps. This doesn't seem to produce a viable evaluation strategy. So clearly some additional structure is needed, even for this simple example. \index{rewrite rules} \index{irreducible term} \index{reducing a term} Instead of looking for a way to transform one term into another by applying equations in both directions, you can ``orient'' the relations by choosing to only apply the direction where the left-hand side is always larger than the right-hand side. This turns the equations into unidirectional rewrite rules: \begin{align} cc&\Longrightarrow c\tag{1}\\ ba&\Longrightarrow a\tag{2}\\ ca&\Longrightarrow a\tag{3} \end{align} This guarantees that at each step, the original term can only become shorter. If a term can be transformed into another term by applying zero or more unidirectional rewrite rules, the original term is said to \emph{reduce} to the other term. A term which cannot be reduced further is said to be \emph{irreducible}. Now you can reformulate the word problem slightly. Instead of attempting to transform an arbitrary term into another, you reduce both terms as much as possible. If both terms have the same irreducible form, they must be equivalent. Let's attempt this new strategy with our original inputs, $acca$ and $bcaa$. First, $acca$ reduces to $aa$: \begin{align} a\underline{cc}a&\rightarrow a\underline{c}a\tag{Rule 1}\\ a\underline{ca}&\rightarrow a\underline{a}\tag{Rule 3} \end{align} At this point, the term $aa$ is irreducible. Now, $bcaa$ also reduces to $aa$: \begin{align} b\underline{ca}a&\rightarrow b\underline{a}a\tag{Rule 3}\\ \underline{ba}a&\rightarrow \underline{a}a\tag{Rule 2} \end{align} This shows that both $acca$ and $bcaa$ reduce to $aa$; therefore, $acca=bcaa$. In fact, this strategy completely solves the word problem in this specific monoid at least. It won't work in many other interesting cases, as you will see below, but it forms the basis for what comes next. Now, I will formalize what is meant by rewrite rules producing a ``shorter'' term at every step. The following definitions are standard. \index{partial order} \index{linear order} \begin{definition}\label{partialorderdef} A \emph{partial order} on a set $S$ is a relation $<$ satisfying two properties: \begin{itemize} \item (Transitivity) If $xx_2>x_3>\ldots>x_n>\ldots\] \end{definition} Using a well-founded order guarantees that applying a reduction relation until fixed point will always terminate, since a non-terminating reduction sequence would witness an infinite descending chain, contradicting the assumption of well-foundedness. \index{translation-invariant relation} A partial order used for reduction must also be translation-invariant (Definition~\ref{transinv}). Translation invariance means that if you have a rule like $ca\Rightarrow a$, then not only is it true that $ca>a$, but also replacing $ca$ with $a$ anywhere in the \emph{middle} of a term produces a smaller term; for example $bcab$ can be reduced to $bab$, because $ca>a$ implies that $bcab>bab$. \begin{definition}A \emph{reduction order} on the free monoid $A^*$ is a well-founded and translation invariant partial order. \end{definition} Next, I will define the specific reduction order used from here on out. This generalizes the canonical type order from Section~\ref{canonicaltypes}. \index{shortlex order} \index{translation-invariant relation} \index{linear order} \index{well-founded order} \begin{definition}(Shortlex order)\label{shortlex} Suppose $A^*$ is a free monoid where the generating set $A$ is equipped with a partial order $<$. This partial order can be extended to the \emph{shortlex order} on $A^*$, also written as $<$. For $x, y\in A^*$, $xab>aab>aaab>aaaab>\ldots\] Finally, I can formalize the notion of ``reducing'' terms by making them ``smaller''. \index{relation} \index{reduction relation} \begin{definition}[Reduction relation]\label{transinvdef} If $A^*$ is equipped with a reduction order $<$, then a relation $\rightarrow$ is a \emph{reduction relation} with respect to $<$ if $x\rightarrow y$ implies that $y$, the inverse relation of $<$. \end{definition} \index{rewrite system} As you saw in the previous example, a reduction relation for a finitely-presented monoid can be constructed by orienting the equations from the presentation with respect to some reduction order, a process which converts the equations into rewrite rules. Such a set of rewrite rules is called a \emph{rewrite system}. There is a simple algorithm for reducing a term into an irreducible form: \begin{algorithm}[Reducing a term]\label{reducingaterm} Let $t$ be a term in some rewrite system $R$. \begin{enumerate} \item Initialize the boolean flag to false. \item If there is some rewrite rule $x\Rightarrow y$ in $R$ such that $t$ contains $x$ as a subterm, \begin{itemize} \item write $t=u\otimes x\otimes v$ for some prefix $u$ and suffix $v$, \item set $t$ to $u\otimes y\otimes v$, replacing the occurrence of $x$ with $y$, \item set the flag to true. \end{itemize} \item If the flag is now true, go back to Step 1. \item Otherwise, the algorithm returns with the final value of $t$. \end{enumerate} \end{algorithm} \section{Confluence and Completion}\label{confluenceandcompletion} Applying Algorithm \ref{reducingaterm} to the relations of a finitely-presented monoid let us solve the word problem, at least in one case. Of course this can't solve the word problem in all cases, since the word problem is undecidable. So where does this approach go wrong? The basic problem is right there in the name---we're using a reduction \emph{relation}, not a ``reduction function,'' so for a given term $x\in A^*$, there might be two (or more) distinct terms $y$, $z\in A^*$ such that both $x\rightarrow y$ and $x\rightarrow z$ can apply. This corresponds to non-determinism in Step~2 of Algorithm~\ref{reducingaterm}, where a choice has to be made between multiple rules which could all apply at a single step. To see this phenomenon in action, consider the following finitely-presented monoid: \[\langle a, b, c, d;\; ab=a,\; bc=b,\; d=b\rangle\] I'm going to use the order $a{t}_2{\downarrow}$),}\] or \[{t}_2{\downarrow}\Rightarrow {t}_1{\downarrow}\qquad\hbox{(if ${t}_2{\downarrow}>{t}_1{\downarrow}$).}\] The process for resolving critical pairs is summarized in Figure \ref{criticalfig}. \end{enumerate} \item If the above loop added new rules, go back to Step 1 to check if any of the new rules overlap with existing rules. Otherwise, all critical pairs have been resolved and the completion procedure has produced a confluent rewrite system. \item There is a final simplification step. For each rule $x\Rightarrow y$, \begin{enumerate} \item If $x$ can be reduced by some other rule $x'\Rightarrow y'$, meaning $x=ux'w$ for some $u$, $v\in A^*$, delete $x\Rightarrow y$. This deletion is valid; since the rewrite system is now confluent, rewrite rules can be applied in any order, meaning $x'\Rightarrow y'$ can always be applied before $x\Rightarrow y$, so there is never any reason to apply $x\Rightarrow y$. \item Otherwise, reduce $y$ to canonical form ${y}\,{\downarrow}$, and replace the rule $x\Rightarrow y$ with $x\Rightarrow {y}\,{\downarrow}$. \end{enumerate} \end{enumerate} \end{algorithm} \begin{figure}\captionabove{Resolving critical pairs in Algorithm \ref{knuthbendix}}\label{criticalfig} \begin{center} \begin{tikzcd} &uvw \arrow[ld, bend right] \arrow[rd, bend left] \\ t_1\arrow[d]&&t_2\arrow[d]\\ {t}_1{\downarrow}\arrow[rr, leftrightarrow, dashed]&&{t}_2{\downarrow} \end{tikzcd} \end{center} \end{figure} \index{convergent presentation} If the Knuth-Bendix completion procedure terminates after a finite number of steps, the monoid is said to be \emph{convergent}. If the monoid is not convergent, the algorithm will continue adding new rewrite rules forever, as longer and longer overlapped terms are discovered in Step 2. In practice, you want an algorithm that will succeed or fail, instead of always succeeding after a possibly-infinite number of steps. This is can be handled by limiting the maximum number of iterations or the maximum length of the left-hand side of a rewrite rule. If either limit is exceeded, the rewrite system is rejected. Previously I showed you a couple of finitely-presented monoids and made some hand-wavy claims about the resulting rewrite system. By applying the Knuth-Bendix algorithm we can verify that those claims were correct. \begin{example}[Trivial case]\label{trivialex} In this example, I claimed that orienting the equations to form rewrite rules and applying them in any order is sufficient to solve the word problem: \[\langle a,b,c;\; cc=c,\; a=ba,\; ca=a\rangle\] To see why, you can check for overlapping rules. There is a single overlapping pair of rules, $cc\Rightarrow c$ and $ca\Rightarrow a$. The overlapped term is $cca$. Reducing this term with both rules produces the pair $(ca,ca)$. Reducing both sides yields the critical pair $(a, a)$. This critical pair is trivial, so the Knuth-Bendix algorithm terminates successfully without adding any new rules; the rewrite system is already confluent. Figure \ref{trivialfig} summarizes this.\footnote{The idea of representing critical pairs as diagrams comes from \cite{guiraud:hal-00818253}.} \begin{figure}\captionabove{Trivial critical pair in Example \ref{trivialex}}\label{trivialfig} \begin{center} \begin{tikzcd} &cca \arrow[ld, bend right] \arrow[rd, bend left] \\ ca\arrow[rr, equal]&&ca \end{tikzcd} \end{center} \end{figure} \end{example} \begin{example}[Adding a single rule]\label{singleruleex} In this example, I claimed that adding the single rule $ac\Rightarrow a$ ensures the resulting rewrite system is confluent: \[\langle a, b, c, d;\; ab=a,\; bc=b,\; d=b\rangle\] Once again, you can check for overlapping rules. There is a single overlapping pair of rules, $ab\Rightarrow a$ and $bc\Rightarrow b$. The overlapped term is $abc$. Reducing this term with both rules produces the pair $(ac,ab)$. While $ac$ is irreducible, you can further reduce $ab$ to $a$. This yields the critical pair $(ac,a)$, which is resolved by adding a new rule $ac\Rightarrow a$. A second iteration of the Knuth-Bendix algorithm does not discover any new critical pairs, so the algorithm terminates successfully. Once again, this can be summarized in a diagram, show in Figure \ref{singleruleex}. \begin{figure}\captionabove{Critical pair in Example \ref{singleruleex}}\label{singlerulefig} \begin{center} \begin{tikzcd} &abc \arrow[ld, bend right] \arrow[rd, bend left] \\ ac\arrow[d, equal]&&ab \arrow[d]\\ ac\arrow[rr, dashed]&&a \end{tikzcd} \end{center} \end{figure} \end{example} \index{convergent presentation} Now I will show you a finitely-presented monoid where the presentation is not convergent. \begin{example}[Infinite case]\label{infiniteex} Consider the following finitely-presented monoid $M$: \[\langle a, b;\; aba=ab\rangle\] The rule $aba\Rightarrow ab$ overlaps with itself. The overlapped term is $ababa$. There are two ways to reduce this term using our rule, which yields the pair $(abba, abab)$. The second term in the pair, $abab$, can be reduced with a further application of our original rule, producing the critical pair $(abba, abb)$. Resolving this critical pair adds a new rewrite rule $abba\Rightarrow abb$. A new rule was added, so the algorithm runs again. This time, we have an overlap between the new rule $abba\Rightarrow abb$ and the original rule $aba\Rightarrow ab$. The overlapped term is $abbaba$. Reducing this term with both rules produces the pair $(abbba, abbab)$. The second term in the pair, $abbab$, can be reduced with a further application of $abba\Rightarrow abb$, yields the critical pair $(abbba, abbb)$. Resolving this critical pair adds a new rewrite rule $abbba\Rightarrow abbb$. This process continues forever, adding an infinite series of rewrite rules of the form \[ab^na\Rightarrow ab^n\] Figure \ref{infinitefig} shows these ``runaway'' critical pairs in diagram form. \begin{figure}\captionabove{Infinitely many critical pairs in Example \ref{infiniteex}}\label{infinitefig} \begin{center} \begin{tikzcd} &ababa \arrow[ld, bend right] \arrow[rd, bend left] \\ abba\arrow[d, equal]&&abab \arrow[d]\\ abba\arrow[rr, dashed]&&abb \end{tikzcd} \begin{tikzcd} &abbaba \arrow[ld, bend right] \arrow[rd, bend left] \\ abbba\arrow[d, equal]&&abbab \arrow[d]\\ abbba\arrow[rr, dashed]&&abbb \end{tikzcd} \begin{tikzcd} &ab^{n-1}aba \arrow[ld, bend right] \arrow[rd, bend left] \\ ab^na\arrow[d, equal]&&ab^{n-1}ab \arrow[d]\\ ab^na\arrow[rr, dashed]&&ab^n \end{tikzcd} \end{center} \end{figure} \end{example} The interesting thing about Example \ref{infiniteex} is that the word problem in this monoid is still decidable, just not via this particular application of the Knuth-Bendix algorithm. Indeed, applying the Knuth-Bendix algorithm to a different presentation of the same monoid can still produce a confluent rewrite system. \begin{example}[A different presentation]\label{diffpresex} Consider the following equivalent presentation of the above monoid; call it $M'$: \[\langle t, u, v;\; uv=t,\; tu=t\rangle\] \index{isomorphism} First of all, I should prove that $M$ and $M'$ are isomorphic by exhibiting an isomorphism $\varphi\colon~M'\rightarrow M$: \begin{align*} t&\leftrightarrow ab\\ u&\leftrightarrow a\\ v&\leftrightarrow b \end{align*} To convince yourself that this is an isomorphism, apply $\varphi$ to both sides of the relations in the presentation of $M'$: \begin{itemize} \item Applying $\varphi$ to $uv=t$ gives $ab=ab$, which is trivial. \item Applying $\varphi$ to $tu=t$ gives $aba=ab$, which is the defining relation of $M$. \end{itemize} Going in the other direction, there is only the single relation in the presentation of $M$ to check: \begin{itemize} \item $\varphi^{-1}$ applied to $aba=ab$ becomes $tu=t$, which is one of the defining relations of $M'$. \end{itemize} Now, if you run the Knuth-Bendix algorithm on $M'$ you will see that $tu\Rightarrow t$ overlaps with $uv\Rightarrow t$. The overlapped term is $tuv$. Reducing this term with both rules produces the critical pair $(tv, tt)$. Orienting this pair produces a new rewrite rule $tv\Rightarrow tt$. This is shown in Figure \ref{diffpresfig}. \begin{figure}\captionabove{Critical pair in Example \ref{diffpresex}}\label{diffpresfig} \begin{center} \begin{tikzcd} &tuv \arrow[ld, bend right] \arrow[rd, bend left] \\ tv\arrow[rr, leftarrow, dashed]&&tt \end{tikzcd} \end{center} \end{figure} A second iteration of the Knuth-Bendix algorithm does not discover any new critical pairs, so the algorithm terminates successfully. You will encounter this example again in Section \ref{associatedtypes}. \end{example} You might ask, can \emph{any} finitely-presented monoid with a decidable word problem be presented as a confluent rewrite system, just maybe with a different set of generators and relations? Unfortunately the answer is ``no,'' meaning there are ``bespoke'' monoids where the word problem is decidable, just not via the Knuth-Bendix algorithm. \begin{theorem}[From \cite{SQUIER1994271}] The following finitely-presented monoid has a decidable word problem, but cannot be presented as a confluent rewrite system: \[\langle a, b, t, x, y;\; ab=\varepsilon,\; xa=atx,\; xt=tx,\; xb=bx,\; xy=\varepsilon\rangle\] \end{theorem} This result together with Theorem \ref{undecidablemonoid} means the inclusions here are proper: \[ \begin{array}{c} \hbox{Confluent rewrite systems} \\ \subsetneq \\ \hbox{Decidable word problems} \\ \subsetneq \\ \hbox{Finitely-presented monoids} \end{array} \] \chapter{Protocols are Monoids}\label{protocolsasmonoids} To recap the most important results from the two previous chapters: \begin{itemize} \item Algorithm \ref{protocolmonoidalgo} shows how to construct a well-formed protocol definition from a finitely-presented monoid, \item Theorem \ref{protocolmonoidthm} shows that generic signature queries on this protocol can express the word problem, \item Theorem \ref{undecidablemonoid} shows that the word problem on finitely-presented monoids is in general undecidable, \item Algorithm \ref{knuthbendix} shows how to build a confluent rewrite system to solve the word problem on a finitely-presented monoid with a convergent presentation. \end{itemize} The ultimate goal here is to solve generic signature queries using a confluent rewrite system, but first a more general method for constructing a finitely-presented monoid from a set of Swift protocol definitions is needed. While Theorem \ref{protocolmonoidthm} defined an isomorphism between finitely-presented monoids and a restricted subset of Swift protocols, it doesn't immediately generalize beyond protocols that satisfy some rather stringent restrictions: \begin{itemize} \item every associated type must conform to the same protocol, \item conformance requirements to other protocols are not allowed, \item the only kind of generic requirement allowed in a \texttt{where} clause is a same-type requirement between type parameters. \end{itemize} \index{conformance requirement} \index{same-type requirement} This chapter sketches an overview of the more general construction by building the rewrite system from a stripped down set of standard library protocols, shown in Listing \ref{protocolrewritesystemex}. I will call the formulation presented in this chapter ``the requirement machine with name and protocol symbols,'' to distinguish it from the real formulation, introduced in the next chapter. \index{associated type} \index{name symbol} \index{protocol symbol} \index{protocol Self type} First of all, the alphabet of this rewrite system will include the names of the associated types: $\namesym{Element}$, $\namesym{Iterator}$, and $\namesym{SubSequence}$. Since there are multiple protocols in play, the alphabet also needs to be extended with additional symbols that represent protocol names. These new protocol symbols are distinct from name symbols, so I'm going to write them differently: \begin{itemize} \item $\namesym{Horse}$ is a name symbol; \item $\protosym{Horse}$ is a protocol symbol. \end{itemize} The four protocol symbols that will be used in this example are $\protosym{IteratorProtocol}$, $\protosym{Sequence}$, $\protosym{Collection}$, and $\protosym{OptionSet}$. Equipped with these, the type lowering map from Definition~\ref{liftingloweringmaps} can be generalized to produce terms that are ``rooted'' in a protocol symbol. \begin{listing}\captionabove{Example protocols for building a rewrite system}\label{protocolrewritesystemex} \begin{Verbatim} protocol IteratorProtocol { associatedtype Element } protocol Sequence { associatedtype Element where Iterator.Element == Element associatedtype Iterator : IteratorProtocol } protocol Collection : Sequence { associatedtype SubSequence : Collection where SubSequence.SubSequence == SubSequence, SubSequence.Element == Element } protocol OptionSet : Collection where Element == Self { } \end{Verbatim} \end{listing} \index{type lowering map} \begin{definition}\label{typelowering1} For each protocol $\proto{P}$, define the \emph{type lowering map} $\Lambda_{\proto{P}}:\mathsf{Type}\rightarrow\mathsf{Term}$ as follows: \begin{itemize} \item The protocol $\genericparam{Self}$ type appearing at the root of of a type parameter maps to the protocol symbol $\protosym{P}$. \item Each subsequent associated type name $\namesym{T}$ maps to a name symbol $\namesym{T}$. \end{itemize} This definition will be refined further in Chapter~\ref{requirementmachine}, but it is good enough for now. \end{definition} With this new formulation, when a type parameter like $\genericparam{Self}.\namesym{Iterator}.\namesym{Element}$ appears in the requirement signature of $\proto{Sequence}$, the lowered term is now ``qualified'' with the protocol whence it came: \[\protosym{Sequence}.\namesym{Iterator}.\namesym{Element}\] \index{requirement lowering map} The final step is to encode conformance requirements and same-type requirements as rewrite rules using a requirement lowering map. \begin{definition}\label{reqlowering1} The \emph{requirement lowering map} $\Lambda_{\proto{P}}\colon\namesym{Requirement}\rightarrow\namesym{Rule}$ takes as input a generic requirement in the protocol $\proto{P}$, and produces a rewrite rule using the type lowering map $\Lambda_{\proto{P}}\colon\namesym{Type}\rightarrow\namesym{Term}$ to lower types to terms: \begin{itemize} \item \textbf{Protocol conformance requirements} $\namesym{T}\colon\proto{P}$ lower to a rule eliminating a protocol symbol from the end of the lowered term for $\namesym{T}$: \[\Lambda_{\proto{P}}(\namesym{T}).\protosym{P} \Rightarrow \Lambda_{\proto{P}}(\namesym{T})\] \item \textbf{Same-type requirements} $\namesym{T}==\namesym{U}$ lower to an equivalence of terms. Assuming that $\Lambda_{\proto{P}}(\namesym{T}) > \Lambda_{\proto{P}}(\namesym{U})$ in the reduction order on terms (if not, flip the terms around): \[\Lambda_{\proto{P}}(\namesym{T}) \Rightarrow \Lambda_{\proto{P}}(\namesym{U})\] \end{itemize} This definition does not support layout, superclass or concrete type requirements. Those will be addressed in Chapter~\ref{requirementmachine}. \end{definition} Applying the requirement lowering map to the conformance requirements in our example produces eight rules: four same-type requirements, and four conformance requirements: \begin{align} \protosym{Sequence}.\namesym{Iterator}.\namesym{Element} &\Rightarrow \protosym{Sequence}.\namesym{Element}\tag{1}\\ \protosym{Collection}.\namesym{SubSequence}.\namesym{SubSequence} &\Rightarrow \protosym{Collection}.\namesym{SubSequence}\tag{2}\\ \protosym{Collection}.\namesym{SubSequence}.\namesym{Element} &\Rightarrow \protosym{Collection}.\namesym{Element}\tag{3}\\ \protosym{OptionSet}.\namesym{Element} &\Rightarrow \protosym{OptionSet}\tag{4}\\ \protosym{Sequence}.\namesym{Iterator}.\protosym{IteratorProtocol} &\Rightarrow \protosym{Sequence}.\namesym{Iterator}\tag{5}\\ \protosym{Collection}.\protosym{Sequence} &\Rightarrow \protosym{Collection}\tag{6}\\ \protosym{Collection}.\namesym{SubSequence}.\protosym{Collection} &\Rightarrow \protosym{Collection}.\namesym{SubSequence}\tag{7}\\ \protosym{OptionSet}.\protosym{Collection} &\Rightarrow \protosym{OptionSet}\tag{8} \end{align} (Note that protocol inheritance being the trivial case of a conformance requirement on $\genericparam{Self}$ explains why Rule 6 and Rule 8 look the way they do.) Intuitively, a protocol symbol at the \emph{beginning} of a term means ``this rule applies to type parameters that conform to this protocol''; a protocol symbol at the \emph{end} of a term means ``if you can construct this term, you \emph{know} it conforms to this protocol''. \index{confluence} \index{Knuth-Bendix algorithm} There's one more thing. This rewrite system is not confluent! For example, Rule~6 and Rule~1 overlap on the following term: \[\protosym{Collection}.\protosym{Sequence}.\namesym{Iterator}.\namesym{Element}\] \index{canonical form of a term} Thankfully, the Knuth-Bendix algorithm finishes successfully after three rounds, albeit adding a very large number of new rules, as you will see shortly. Nevertheless, this construction is good enough to solve a couple of generic signature queries, at least for type parameters from generic signatures of the form $\gensig{\genericparam{Self}}{\genericparam{Self}\colon\proto{P}}$. The two queries and their implementation: \begin{itemize} \item \texttt{areSameTypeParametersInContext(T, U)} answers true if the terms $\Lambda_{\proto{P}}(T)$ and $\Lambda_{\proto{P}}(U)$ both reduce to the same canonical form. \item \texttt{requiresProtocol(T, Q)} answers true if the terms $\Lambda_{\proto{P}}(T)$ and $\Lambda_{\proto{P}}(T).\protosym{Q}$ both reduce to the same canonical form. \end{itemize} \begin{example} You can show that $\genericparam{Self}.\namesym{SubSequence}.\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element}$ is equivalent to $\genericparam{Self}$ in the $\proto{OptionSet}$ protocol: \begin{align} \protosym{OptionSet}.\namesym{SubSequence}.&\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element}\nonumber\\ &\rightarrow\protosym{OptionSet}.\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element}\tag{Rule 11}\\ &\rightarrow\protosym{OptionSet}\tag{Rule 20} \end{align} Rule 11 was added by resolving the overlap between Rule~8 and Rule~3. Rule 20 was added by resolving the overlap between Rule~8 and Rule~15, which was added when resolving the overlap between Rule~10 and Rule~1, and finally, Rule~10 was added by resolving the overlap between Rule~7 and Rule~6. \end{example} \begin{example} The $\genericparam{Self}.\namesym{SubSequence}.\namesym{SubSequence}$ type parameter in the $\proto{Collection}$ protocol conforms to $\proto{Sequence}$: \begin{align} \protosym{Collection}.\namesym{SubSequence}.&\namesym{SubSequence}.\protosym{Sequence}\nonumber\\ &\rightarrow\protosym{Collection}.\namesym{SubSequence}.\protosym{Sequence}\tag{Rule 2}\\ &\rightarrow\protosym{Collection}.\namesym{SubSequence}\tag{Rule 10} \end{align} \end{example} \begin{listing}\captionabove{Rewrite system for $\proto{IteratorProtocol}$, $\proto{Sequence}$, $\proto{Collection}$ and $\proto{OptionSet}$}\label{rewritesystemcompleted} \begin{itemize} \item The initial set of rules obtained by lowering protocol requirement signatures: \begin{align} \protosym{S}.\namesym{Iterator}.\namesym{Element}&\Rightarrow\protosym{S}.\namesym{Element}\tag{1}\\ \protosym{C}.\namesym{SubSequence}.\namesym{SubSequence}&\Rightarrow\protosym{C}.\namesym{SubSequence}\tag{2}\\ \protosym{C}.\namesym{SubSequence}.\namesym{Element}&\Rightarrow\protosym{C}.\namesym{Element}\tag{3}\\ \protosym{O}.\namesym{Element}&\Rightarrow\protosym{O}\tag{4}\\ \protosym{S}.\namesym{Iterator}.\protosym{I}&\Rightarrow\protosym{S}.\namesym{Iterator}\tag{5}\\ \protosym{C}.\protosym{S}&\Rightarrow\protosym{C}\tag{6}\\ \protosym{C}.\namesym{SubSequence}.\protosym{C}&\Rightarrow\protosym{C}.\namesym{SubSequence}\tag{7}\\ \protosym{O}.\protosym{C}&\Rightarrow\protosym{O}\tag{8} \end{align} \item New rules added by the first round of the completion procedure: \begin{align} \protosym{C}.\namesym{Iterator}.\namesym{Element}&\Rightarrow\protosym{C}.\namesym{Element}\tag{9}\\ \protosym{C}.\namesym{SubSequence}.\protosym{S}&\Rightarrow\protosym{C}.\namesym{SubSequence}\tag{10}\\ \protosym{O}.\namesym{SubSequence}.\namesym{SubSequence}&\Rightarrow\protosym{O}.\namesym{SubSequence}\tag{11}\\ \protosym{O}.\namesym{SubSequence}.\namesym{Element}&\Rightarrow\protosym{O}\tag{12}\\ \protosym{O}.\protosym{S}&\Rightarrow\protosym{O}\tag{13}\\ \protosym{O}.\namesym{SubSequence}.\protosym{C}&\Rightarrow\protosym{O}.\namesym{SubSequence}\tag{14} \end{align} \item New rules added by the second round of the completion procedure: \begin{align} \protosym{C}.\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element}&\Rightarrow\protosym{C}.\namesym{Element}\tag{15}\\ \protosym{C}.\namesym{SubSequence}.\namesym{Iterator}.\protosym{I}&\Rightarrow\protosym{C}.\namesym{SubSequence}.\namesym{Iterator}\tag{16}\\ \protosym{O}.\namesym{Iterator}.\namesym{Element}&\Rightarrow\protosym{O}\tag{17}\\ \protosym{O}.\namesym{Iterator}.\protosym{I}&\Rightarrow\protosym{O}.\namesym{Iterator}\tag{18}\\ \protosym{O}.\namesym{SubSequence}.\protosym{S}&\Rightarrow\protosym{O}.\namesym{SubSequence}\tag{19} \end{align} \item New rules added by the third and final round of the completion procedure: \begin{align} \protosym{O}.\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element}&\Rightarrow\protosym{O}\tag{20}\\ \protosym{O}.\namesym{SubSequence}.\namesym{Iterator}.\protosym{I}&\Rightarrow\protosym{O}.\namesym{SubSequence}.\namesym{Iterator}\tag{21} \end{align} \end{itemize} \end{listing} \begin{figure}\captionabove{Non-trivial critical pairs resolved on the first iteration of the Knuth-Bendix algorithm.}\label{rewritesystemfig1} \begingroup \tiny \begin{center} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{C}.\protosym{S}.\namesym{Iterator}.\namesym{Element} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{C}.\namesym{Iterator}.\namesym{Element} \arrow[d, equal] && \protosym{C}.\protosym{S}.\namesym{Element} \arrow[d] \\ \protosym{C}.\namesym{Iterator}.\namesym{Element} \arrow[rr, dashed] && \protosym{C}.\namesym{Element} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{C}.\namesym{SubSequence}.\protosym{C}.\protosym{S} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{C}.\namesym{SubSequence}.\protosym{S} \arrow[d, equal] && \protosym{C}.\namesym{SubSequence}.\protosym{C} \arrow[d] \\ \protosym{C}.\namesym{SubSequence}.\protosym{S} \arrow[rr, dashed] && \protosym{C}.\namesym{SubSequence} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\protosym{C}.\namesym{SubSequence}.\namesym{SubSequence} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{SubSequence}.\namesym{SubSequence} \arrow[d, equal] && \protosym{O}.\protosym{C}.\namesym{SubSequence} \arrow[d] \\ \protosym{O}.\namesym{SubSequence}.\namesym{SubSequence} \arrow[rr, dashed] && \protosym{O}.\namesym{SubSequence} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\protosym{C}.\namesym{SubSequence}.\namesym{Element} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{SubSequence}.\namesym{Element} \arrow[d, equal] && \protosym{O}.\protosym{C}.\namesym{Element} \arrow[d] \\ \protosym{O}.\namesym{SubSequence}.\namesym{Element} \arrow[rr, dashed] && \protosym{O} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\protosym{C}.\protosym{S} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\protosym{S} \arrow[d, equal] && \protosym{O}.\protosym{C} \arrow[d] \\ \protosym{O}.\protosym{S} \arrow[rr, dashed] && \protosym{O} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\protosym{C}.\namesym{SubSequence}.\protosym{C} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{SubSequence}.\protosym{C} \arrow[d, equal] && \protosym{O}.\protosym{C}.\namesym{SubSequence} \arrow[d] \\ \protosym{O}.\namesym{SubSequence}.\protosym{C} \arrow[rr, dashed] && \protosym{O}.\namesym{SubSequence} \end{tikzcd} \end{center} \endgroup \end{figure} \begin{figure}\captionabove{Non-trivial critical pairs resolved on the second iteration of the Knuth-Bendix algorithm.}\label{rewritesystemfig2} \begin{center} \begingroup \tiny \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{C}.\namesym{SubSequence}.\protosym{C}.\namesym{Iterator}.\namesym{Element} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{C}.\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element} \arrow[d, equal] && \protosym{C}.\namesym{SubSequence}.\namesym{Element} \arrow[d] \\ \protosym{C}.\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element} \arrow[rr, dashed] && \protosym{C}.\namesym{Element} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{C}.\namesym{SubSequence}.\protosym{S}.\namesym{Iterator}.\protosym{I} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{C}.\namesym{SubSequence}.\namesym{Iterator}.\protosym{I} \arrow[rr, dashed] && \protosym{C}.\namesym{SubSequence}.\namesym{Iterator} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\protosym{S}.\namesym{Iterator}.\namesym{Element} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{Iterator}.\namesym{Element} \arrow[d, equal] && \protosym{O}.\protosym{S}.\namesym{Element} \arrow[d] \\ \protosym{O}.\namesym{Iterator}.\namesym{Element} \arrow[rr, dashed] && \protosym{O} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\protosym{S}.\namesym{Iterator}.\protosym{I} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{Iterator}.\protosym{I} \arrow[d, equal] && \protosym{O}.\protosym{S}.\namesym{Iterator} \arrow[d] \\ \protosym{O}.\namesym{Iterator}.\protosym{I} \arrow[rr, dashed] && \protosym{O}.\namesym{Iterator} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\namesym{SubSequence}.\protosym{C}.\protosym{S} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{SubSequence}.\protosym{S}\arrow[rr, dashed] && \protosym{O}.\namesym{SubSequence} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\namesym{SubSequence}.\protosym{C}.\namesym{SubSequence}.\protosym{S} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{SubSequence}.\namesym{SubSequence}.\protosym{S} \arrow[d] && \protosym{O}.\namesym{SubSequence}.\protosym{C}.\namesym{SubSequence} \arrow[d] \\ \protosym{O}.\namesym{SubSequence}.\protosym{S} \arrow[rr, dashed] && \protosym{O}.\namesym{SubSequence} \end{tikzcd} \endgroup \end{center} \end{figure} \begin{figure}\captionabove{Non-trivial critical pairs resolved on the third iteration of the Knuth-Bendix algorithm.}\label{rewritesystemfig3} \begin{center} \begingroup \tiny \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\namesym{SubSequence}.\protosym{S}.\namesym{Iterator}.\namesym{Element} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element} \arrow[d, equal] && \protosym{O}.\namesym{SubSequence}.\protosym{S}.\namesym{Element} \arrow[d] \\ \protosym{O}.\namesym{SubSequence}.\namesym{Iterator}.\namesym{Element} \arrow[rr, dashed] && \protosym{O} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{O}.\namesym{SubSequence}.\protosym{S}.\namesym{Iterator}.\protosym{I} }\arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{O}.\namesym{SubSequence}.\namesym{Iterator}.\protosym{I} \arrow[rr, dashed]&& \protosym{O}.\namesym{SubSequence}.\namesym{Iterator} \end{tikzcd} \endgroup \end{center} \end{figure} %Listing \ref{rewritesystemcompleted} shows the full list of rules in the confluent rewrite system output by the Knuth-Bendix algorithm. Figures \ref{rewritesystemfig1}, \ref{rewritesystemfig2}, and \ref{rewritesystemfig3} show the non-trivial critical pairs resolved on each iteration, using the diagram notation first introduced in Section \ref{confluenceandcompletion}. To get some of the subsequent listings and diagrams to fit, I abbreviated the protocol symbols, showing only the first letter of each protocol's name---think of it as a particularly silly example of a rewrite system if you want: \begin{itemize} \item $\protosym{IteratorProtocol}$ becomes $\protosym{I}$, \item $\protosym{Sequence}$ becomes $\protosym{S}$, \item $\protosym{Collection}$ becomes $\protosym{C}$, \item $\protosym{OptionSet}$ becomes $\protosym{O}$. \end{itemize} The toy requirement machine with name and protocol symbols is somewhat limited in what it can do: \begin{enumerate} \item As you will see in the next chapter, the most serious issue is that this rewrite system cannot cope with recursive conformance requirements. This makes it no more expressive than the ancient \texttt{ArchetypeBuilder}, described in Algorithm~\ref{archetypebuilder}. \item Reducing terms to canonical form lets you determine if two type parameters are equivalent, but it's insufficient for \texttt{getCanonicalTypeInContext()}. The latter is expected to produce ``resolved'' \texttt{DependentMemberTypes}, where the type is equipped with a pointer to a \texttt{AssociatedTypeDecl}; so the canonical form of $\genericparam{Self}.\namesym{SubSequence}$ actually needs to point at the declaration of $\namesym{SubSequence}$ in the $\proto{Collection}$ protocol, rather than the ``unresolved'' form which consists of a bare identifier. The issue is that this association between associated types and protocols is ``erased'' in this formulation, and there is no way to define a \emph{lifting map} taking terms to types, to serve as the inverse of the type lowering map. \item This rewrite system can only reason about type parameters in the trivial protocol generic signature $\gensig{\genericparam{Self}}{\genericparam{Self}\colon\proto{P}}$ for some protocol $\proto{P}$. This restricts it to answering queries about type parameters written inside protocols, and not top-level generic signatures attached to generic functions and types, which can have multiple generic parameters and requirements. \item While the \texttt{requiresProtocol()} generic signature query can be made to work, there doesn't seem to be any easy way to implement \texttt{getRequiredProtocols()}, which returns \emph{all} protocols that a type parameter must conform to. \item Layout, superclass and concrete type requirements are not supported. \end{enumerate} The first two problems are closely intertwined and the full solution is the subject of the next chapter. Problem 2 has a straightforward solution, described in Section \ref{genericparamsym}; it requires adding more symbols to the alphabet. Problem 3 is really not a shortcoming of the rewrite system itself, but rather something that requires building some machinery on top; that is the topic of Chapter \ref{propertymap}. \chapter{Associated Types}\label{associatedtypes} \index{recursive conformance requirement} \begin{listing}\captionabove{The SwiftUI $\proto{View}$ protocol.}\label{viewproto} \begin{Verbatim} protocol View { associatedtype Body : View } \end{Verbatim} \end{listing} To motivate the introduction of the next concept, consider the SwiftUI $\proto{View}$ protocol shown in Listing~\ref{viewproto}. The protocol's requirement signature contains a recursive conformance requirement, $\genericparam{Self}.\namesym{Body}\colon \proto{View}$. It turns out the rewrite system constructed in the requirement machine with name and protocol symbols is not convergent! First, let's take a closer look at where exactly things go wrong. The initial rewrite system consists of a single rule: \[\protosym{View}.\namesym{Body}.\protosym{View}\Rightarrow\protosym{View}.\namesym{Body}\] The rule's left-hand side has a prefix $\protosym{View}$ equal to its own suffix, so the rule overlaps with itself with an overlap of the second kind. The overlapped term that can be reduced in two different ways is: \[\protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body}.\protosym{View}\] Applying the rule both ways and reducing the result produces the new rule: \[\protosym{View}.\namesym{Body}.\namesym{Body}.\protosym{View}\Rightarrow\protosym{View}.\namesym{Body}.\namesym{Body}\] The new rule, in turn, overlaps with the first rule, and the process continues forever (or until your algorithm's maximum iteration count is reached, or \emph{in extremis}, when your computer runs out of memory). Figure \ref{swiftuirunaway} shows what this infinite sequence of critical pairs looks like. \begin{figure}\captionabove{Infinitely many critical pairs while completing $\proto{View}$ protocol}\label{swiftuirunaway} \begin{center} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body}.\protosym{View} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{View}.\namesym{Body}.\namesym{Body}.\protosym{View} \arrow[d, equal]&& \protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body} \arrow[d]\\ \protosym{View}.\namesym{Body}.\namesym{Body}.\protosym{View} \arrow[rr, dashed]&& \protosym{View}.\namesym{Body}.\namesym{Body} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{View}.\namesym{Body}.\namesym{Body}.\protosym{View}.\namesym{Body}.\protosym{View} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{View}.\namesym{Body}.\namesym{Body}.\namesym{Body}.\protosym{View} \arrow[d, equal]&& \protosym{View}.\namesym{Body}.\namesym{Body}.\protosym{View}.\namesym{Body} \arrow[d]\\ \protosym{View}.\namesym{Body}.\namesym{Body}.\namesym{Body}.\protosym{View} \arrow[rr, dashed]&& \protosym{View}.\namesym{Body}.\namesym{Body}.\namesym{Body} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{View}.\namesym{Body}^{n-1}.\protosym{View}.\namesym{Body}.\protosym{View} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{View}.\namesym{Body}^{n-1}.\namesym{Body}.\protosym{View} \arrow[d, equal]&& \protosym{View}.\namesym{Body}^{n-1}.\protosym{View}.\namesym{Body} \arrow[d]\\ \protosym{View}.\namesym{Body}^n.\protosym{View} \arrow[rr, dashed]&& \protosym{View}.\namesym{Body}^n \end{tikzcd} \end{center} \end{figure} \index{convergent presentation} In fact, this is exactly the same setup as monoid $M$ in Example \ref{infiniteex} from earlier, only $a$ is $\protosym{View}$, and $b$ is $\namesym{Body}$: \[\langle a, b;\;aba=ab \rangle\] While that presentation stumps the Knuth-Bendix algorithm, Example \ref{diffpresex} gave an isomorphic monoid $M'$ with a different presentation which worked just fine: \[\langle t, u, v;\;uv=t,\;tu=t\rangle\] This seems awfully convenient, almost as if I introduced these examples with the full intention of revisiting them later. Let's take a closer look at the isomorphism $\varphi\colon M'\rightarrow M$ exhibited in Example \ref{diffpresex}: \begin{align*} t&\leftrightarrow ab\\ u&\leftrightarrow a\\ v&\leftrightarrow b \end{align*} This means that adding a new \emph{generator} to $M$ made the presentation convergent. What does this generator represent in the world of Swift? Well, $u\in M'$ is $a\in M$, which is $\protosym{View}$ in Swift; and $v\in M'$ is $b\in M$, which is $\namesym{Body}$. Therefore $t\in M'$ is $ab\in M$, which is $\protosym{View}.\namesym{Body}$. You may guess that $t$ could be a new kind of symbol, perhaps representing a ``bound'' associated type inside a specific protocol. \index{associated type symbol} The crux of the issue is that name symbols like $\namesym{Body}$ don't carry any information, and little can be said about them unless they're prefixed with some other term that is known to conform to a protocol. Thus, you cannot simply add a rewrite rule to say that $\namesym{Body}$ conforms to $\protosym{View}$: \[\namesym{Body}.\protosym{View}\Rightarrow\namesym{View}\] A rule like this would apply to all associated types named $\namesym{Body}$, ever, \emph{in all protocols}, which is wrong. The best we could do until now is try to introduce a rule for each valid prefix term that conforms to $\proto{View}$, of which there are infinitely many here: \[\protosym{View}.\underbrace{\namesym{Body}\ldots\namesym{Body}}_{\textrm{$n$ times}}.\protosym{View}\] If there was instead a symbol representing the unique associated type $\namesym{Body}$ defined in protocol $\proto{View}$, you could introduce a single rewrite rule modeling the conformance requirement on that associated type for any length ``prefix''. This is exactly how it works. An \emph{associated type symbol} is uniquely identified by the combination of a protocol name and an associated type name; they're written like so, where $\proto{P}$ is the protocol and $\namesym{A}$ is an identifier: \[\assocsym{P}{A}\] The notation is to enclose the entire symbol in square brackets [ and ] to remind you that it is one symbol, and not two; that is, $\assocsym{P}{A}$ is a term of length 1. While I still haven't formally defined the reduction order here, it is also important that associated type symbols come before name symbols. You will see why shortly. To be of any use, rules involving associated type symbols must be introduced when the rewrite system is built. Since rewrite system construction is starting to get more complex, I'm going to encapsulate it with the \emph{protocol lowering map}. \begin{definition}[Protocol lowering map]\label{protoloweringmap} The map $\Lambda\colon\namesym{Proto}\rightarrow\namesym{Rule}^*$ takes a protocol and outputs a list of zero or more rewrite rules. This list contains two kinds of rules: \begin{enumerate} \item Every associated type $\namesym{A}$ of $\proto{P}$ adds an \emph{introduction rule}: \[\protosym{P}.\namesym{A}\Rightarrow\assocsym{P}{A}.\] \item Every generic requirement of $\proto{P}$ adds a rewrite rule using the requirement lowering map $\Lambda_{\proto{P}}:\namesym{Requirement}\rightarrow\namesym{Rule}$ from Definition~\ref{reqlowering1}. \end{enumerate} This map is further amended in Definition~\ref{protoloweringmap2} in the next section. \end{definition} With this amended rewrite system construction process, the initial rewrite system for the $\proto{View}$ protocol now has two rules, the first one describing the associated type itself, and the second one describing the protocol conformance requirement on the associated type: \begin{align} \protosym{View}.\namesym{Body}&\Rightarrow\assocsym{View}{Body}\tag{1}\\ \protosym{View}.\namesym{Body}.\protosym{View}&\Rightarrow\protosym{View}.\namesym{Body}\tag{2} \end{align} Rule 1 overlaps with Rule 2 on this term: \[\protosym{View}.\namesym{Body}.\protosym{View}.\] Resolving this first critical pair introduces a new rewrite rule: \begin{align} \assocsym{View}{Body}.\protosym{View}&\Rightarrow\assocsym{View}{Body}\tag{3} \end{align} Next, swapping things around, Rule 2 overlaps with Rule 1 on this term: \[\protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body}.\] Resolving this second critical pair also introduces a new rewrite rule: \begin{align} \assocsym{View}{Body}.\namesym{Body}&\Rightarrow\assocsym{View}{Body}.\assocsym{View}{Body}\tag{4} \end{align} (Incidentally, this is why it is important that $\assocsym{View}{Body}<\namesym{Body}$. If the above rule was oriented in the other direction, completion would run off into the weeds again.) Finally, Rule 2 overlaps with itself on this term: \[\protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body}.\protosym{View}.\] This is the same overlapped term that caused trouble before, and once again this overlap produces the same critical pair: \[(\protosym{View}.\namesym{Body}.\namesym{Body}.\protosym{View}, \protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body})\] However, everything gets better from here. The reduced form of the left-hand side is different: \begin{align} \protosym{View}.\namesym{Body}.\namesym{Body}.\protosym{View} &\rightarrow\assocsym{View}{Body}.\namesym{Body}.\protosym{View}\tag{Rule 1}\\ &\rightarrow\assocsym{View}{Body}.\assocsym{View}{Body}.\protosym{View}\tag{Rule 4}\\ &\rightarrow\assocsym{View}{Body}.\assocsym{View}{Body}\tag{Rule 3} \end{align} And the best part is, the right-hand side reduces to the same term: \begin{align} \protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body} &\rightarrow\protosym{View}.\namesym{Body}.\namesym{Body}\tag{Rule 2}\\ &\rightarrow\assocsym{View}{Body}.\namesym{Body}\tag{Rule 4}\\ &\rightarrow\assocsym{View}{Body}.\assocsym{View}{Body}\tag{Rule 3} \end{align} \begin{listing}\captionabove{Rewrite system of $\proto{View}$ protocol after completion}\label{swiftuiviewcompleterules} \begin{align} \protosym{View}.\namesym{Body}&\Rightarrow\assocsym{View}{Body}\tag{1}\\ \protosym{View}.\namesym{Body}.\protosym{View}&\Rightarrow\protosym{View}.\namesym{Body}\tag{\textbf{Deleted}}\\ \assocsym{View}{Body}.\protosym{View}&\Rightarrow\assocsym{View}{Body}\tag{3}\\ \assocsym{View}{Body}.\namesym{Body}&\Rightarrow\assocsym{View}{Body}.\assocsym{View}{Body}\tag{4} \end{align} \end{listing} How exciting---the third critical pair can be discarded, and no more overlaps remain. Figure \ref{swiftuiassocfig} presents this process in diagram form, and Listing \ref{swiftuiviewcompleterules} shows the final list of rules. Note that the left-hand side of Rule 2 contains the left-hand side of Rule 1, so the post-processing step of the Knuth-Bendix algorithm deletes Rule 2. \begin{figure}\captionabove{Successful completion of $\proto{View}$ protocol with an associated type symbol}\label{swiftuiassocfig} \begin{center} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{View}.\namesym{Body}.\protosym{View} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \assocsym{View}{Body}.\protosym{View} \arrow[d, equal]&& \protosym{View}.\namesym{Body} \arrow[d]\\ \assocsym{View}{Body}.\protosym{View} \arrow[rr, dashed]&& \assocsym{View}{Body} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{View}.\namesym{Body}.\namesym{Body} \arrow[d]&& \protosym{View}.\namesym{Body}.\assocsym{View}{Body} \arrow[d]\\ \assocsym{View}{Body}.\namesym{Body} \arrow[rr, dashed]&& \assocsym{View}{Body}.\assocsym{View}{Body} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body}.\protosym{View} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{View}.\namesym{Body}.\namesym{Body}.\protosym{View} \arrow[d]&& \protosym{View}.\namesym{Body}.\protosym{View}.\namesym{Body} \arrow[d]\\ \assocsym{View}{Body}.\assocsym{View}{Body} \arrow[rr, dashed, equal]&& \assocsym{View}{Body}.\assocsym{View}{Body} \end{tikzcd} \end{center} \end{figure} I'm going to call this ``the requirement machine with name, protocol and associated type symbols.'' Since the rewrite system generated by the $\proto{View}$ protocol now has a confluent completion, the addition of associated type symbols gives you a strictly more powerful formalism. One interesting phenomenon is when terms containing name symbols reduce to associated type symbols: \begin{align} \protosym{View}.\namesym{Body}.\namesym{Body}.&\namesym{Body}\nonumber\\ &\rightarrow\assocsym{View}{Body}.\namesym{Body}.\namesym{Body}\tag{Rule 1}\\ &\rightarrow\assocsym{View}{Body}.\assocsym{View}{Body}.\namesym{Body}\tag{Rule 4}\\ &\rightarrow\assocsym{View}{Body}.\assocsym{View}{Body}.\assocsym{View}{Body}\tag{Rule 4} \end{align} \section{Inherited Associated Types}\label{inheritedassoctypes} \index{recursive conformance requirement} \index{same-type requirement} Now that recursive protocol conformances are handled correctly, I can finally show you the full definition of the $\proto{Collection}$ protocol. The previous toy $\proto{Collection}$ protocol did have one recursive associated type, $\namesym{SubSequence}$, but it was ``tied off'' with the same-type requirement: \[\genericparam{Self}.\namesym{SubSequence}.\namesym{SubSequence}==\genericparam{Self}.\namesym{SubSequence}\] The real protocol also includes an $\namesym{Indices}$ associated type conforming to $\proto{Collection}$, but the recursion is not constrained in any way, so $\genericparam{Self}.\namesym{Indices}$, $\genericparam{Self}.\namesym{Indices}.\namesym{Indices}$, and so on are distinct type parameters, just like the $\namesym{Body}$ associated type in the $\proto{View}$ protocol. Listing \ref{fullcollectionproto} shows the full definition of the protocol. The rewrite system built from this protocol's requirements is convergent with the latest formulation incorporating associated type symbols. \begin{listing}\captionabove{The $\proto{Collection}$ and $\proto{BidirectionalCollection}$ protocols}\label{fullcollectionproto} \begin{Verbatim} protocol Collection : Sequence { associatedtype Index : Comparable associatedtype SubSequence : Collection where SubSequence.Index == Index, SubSequence.Element == Element, SubSequence.SubSequence == SubSequence associatedtype Indices : Collection where Indices.Element == Index, Indices.Index == Index, Indices.SubSequence == Indices } protocol BidirectionalCollection : Collection where SubSequence : BidirectionalCollection, Indices : BidirectionalCollection { } \end{Verbatim} \end{listing} The next problem comes up when you try to build a confluent rewrite system for $\proto{BidirectionalCollection}$. The rewrite system for $\proto{BidirectionalCollection}$ inherits all of the rewrite rules from $\proto{Collection}$, $\proto{Sequence}$ and $\proto{IteratorProtocol}$, and also adds three more rules, corresponding to these requirements: \begin{itemize} \item $\genericparam{Self}\colon\proto{Collection}$ \item $\genericparam{Self}.\namesym{SubSequence}\colon\proto{BidirectionalCollection}$ \item $\genericparam{Self}.\namesym{Indices}\colon\proto{BidirectionalCollection}$ \end{itemize} I'm not going to talk about the new $\namesym{SubSequence}$ requirement; again, the recursion via $\namesym{SubSequence}$ is ``tied off,'' so it doesn't do anything interesting. Also once again, to keep long lines in check, I will abbreviate protocol names, writing $\proto{C}$ in place of $\proto{Collection}$ and $\proto{BC}$ in place of $\proto{BidirectionalCollection}$. The initial rewrite system is now rather large, so I'm only going to show the relevant rewrite rules below: \begin{align} &\cdots\nonumber\\ \protosym{C}.\namesym{Indices}&\Rightarrow\assocsym{C}{Indices}\tag{1}\\ \assocsym{C}{Indices}.\protosym{C}&\Rightarrow\assocsym{C}{Indices}\tag{2}\\ &\cdots\nonumber\\ \protosym{BC}.\protosym{C}&\Rightarrow\protosym{BC}\tag{3}\\ \protosym{BC}.\namesym{Indices}.\protosym{BC}&\Rightarrow\protosym{BC}.\namesym{Indices}\tag{4} \end{align} Let's take a closer look at a handful of critical pairs introduced by the new rules. First of all, Rule~3 overlaps with Rule~1 on the term $\protosym{BC}.\protosym{C}.\namesym{Indices}$. Resolving this first critical pair introduces a rewrite rule: \begin{align} \protosym{BC}.\namesym{Indices}&\Rightarrow\protosym{BC}.\assocsym{C}{Indices}\tag{5} \end{align} Next, Rule 5 overlaps with Rule 4 on the term $\protosym{BC}.\namesym{Indices}.\protosym{BC}$. Resolving this critical pair introduces a rewrite rule: \begin{align} \protosym{BC}.\assocsym{C}{Indices}.\protosym{BC}&\Rightarrow\protosym{BC}.\assocsym{C}{Indices}\tag{6} \end{align} The new Rule~6 overlaps with itself, and resolving this critical pair produces yet another rule: \begin{align*} \protosym{BC}.\assocsym{C}{Indices}.\assocsym{C}{Indices}.\protosym{BC}&\Rightarrow\protosym{BC}.\assocsym{C}{Indices}.\assocsym{C}{Indices} \end{align*} Oops! The annoying problem with the recursive associated type is back. The new rule also overlaps with Rule~6, and this continues forever, producing ever-longer rules of the form: \begin{align*} \protosym{BC}.\underbrace{\assocsym{C}{Indices}\ldots\assocsym{C}{Indices}}_{\text{$n$ times}}.\protosym{BC} &\Rightarrow\protosym{BC}.\underbrace{\assocsym{C}{Indices}\ldots\assocsym{C}{Indices}}_{\text{$n$ times}} \end{align*} Introducing associated type symbols was supposed to have solved this problem already! What went wrong? The issue is that, yet again, you want to be able to impose a conformance requirement on an arbitrary path comprised of $\assocsym{C}{Indices}$ that is ``rooted'' at $\protosym{BC}$. The conformance requirement doesn't apply to \emph{all} instances of $\assocsym{C}{Indices}$, only those that come from a $\protosym{BC}$. The solution is to generalize associated type symbols to encode associated type inheritance. That is, rewrite rules should be able to talk about the symbol $\assocsym{P}{A}$ if $\proto{P}$ \emph{inherits} some other protocol $\proto{Q}$, and $\proto{Q}$ \emph{defines} $\namesym{A}$. Then, $\proto{P}$ can impose additional requirements on $\assocsym{P}{A}$ without blowing up the completion procedure. For this reason, associated type symbols can't be represented by a pointer to an \texttt{AssociatedTypeDecl} in the actual implementation, because inherited associated type symbols correspond to ``virtual'' associated types which have no corresponding declaration node in the AST. Instead associated type symbols store a pointer to a \texttt{ProtocolDecl} together with an \texttt{Identifier}. These new inherited associated type symbols are introduced by extending the protocol lowering map from Definition~\ref{protoloweringmap}. \begin{definition}[Protocol lowering map with inheritance]\label{protoloweringmap2} The map $\Lambda\colon\namesym{Proto}\rightarrow\namesym{Rule}^*$ takes a protocol and outputs a list of zero or more rewrite rules. This list contains two kinds of rules: \begin{enumerate} \item Every associated type $\namesym{A}$ of $\proto{P}$ adds an \emph{introduction rule}: \[\protosym{P}.\namesym{A}\Rightarrow\assocsym{P}{A}.\] \item Every associated type $\namesym{A}$ of $\proto{Q}$, where $\proto{P}$ inherits from $\proto{Q}$, possibly transitively via some intermediate protocol, adds an \emph{inheritance rule}: \[\protosym{P}.\assocsym{Q}{A}\Rightarrow\assocsym{P}{A}.\] \item Every generic requirement of $\proto{P}$ adds a rewrite rule using the requirement lowering map $\Lambda_{\proto{P}}:\namesym{Requirement}\rightarrow\namesym{Rule}$ from Definition~\ref{reqlowering1}. \end{enumerate} \end{definition} Coming back to the $\proto{BidirectionalCollection}$ example, we're going to start over with the following initial rules, where the first four are the same as before, but Rule 5 is new, coming from Step~2 of the amended protocol lowering map: \begin{align} &\cdots\nonumber\\ \protosym{C}.\namesym{Indices}&\Rightarrow\assocsym{C}{Indices}\tag{1}\\ \assocsym{C}{Indices}.\protosym{C}&\Rightarrow\assocsym{C}{Indices}\tag{2}\\ &\cdots\nonumber\\ \protosym{BC}.\protosym{C}&\Rightarrow\protosym{BC}\tag{3}\\ \protosym{BC}.\namesym{Indices}.\protosym{BC}&\Rightarrow\protosym{BC}.\namesym{Indices}\tag{4}\\ \protosym{BC}.\assocsym{C}{Indices}&\Rightarrow\assocsym{BC}{Indices}\tag{5} \end{align} Recall that Rule 3 overlaps with Rule 1 on the term $\protosym{BC}.\protosym{C}.\namesym{Indices}$. Resolving this critical pair introduces a slightly different rewrite rule this time, though; look at the right-hand side: \begin{align} \protosym{BC}.\namesym{Indices}&\Rightarrow\assocsym{BC}{Indices}\tag{6} \end{align} Rule 4 overlaps with Rule 6 on the term $\protosym{BC}.\namesym{Indices}.\protosym{BC}$. Resolving the critical pair introduces this rewrite rule: \begin{align} \assocsym{BC}{Indices}.\protosym{BC}&\Rightarrow\assocsym{BC}{Indices}\tag{7} \end{align} Rule 7 overlaps with Rule 6 on the term $\assocsym{BC}{Indices}.\protosym{BC}.\namesym{Indices}$. Resolving the critical pair introduces this rewrite rule: \begin{align} \assocsym{BC}{Indices}.\namesym{Indices}&\Rightarrow\assocsym{BC}{Indices}\assocsym{BC}{Indices}\tag{8} \end{align} At this point, you might guess that we've closed off the runaway recursion. For instance, look at the overlap of Rule 4 with itself on the term $\protosym{BC}.\namesym{Indices}.\protosym{BC}.\namesym{Indices}.\protosym{BC}$. The critical pair from reducing the term both ways is \[(\protosym{BC}.\namesym{Indices}.\namesym{Indices}.\protosym{BC}, \protosym{BC}.\namesym{Indices}.\protosym{BC}.\namesym{Indices})\] The left-hand side reduces as follows: \begin{align*} \protosym{BC}.\namesym{Indices}.\namesym{Indices}.\protosym{BC}&\rightarrow \assocsym{BC}{Indices}.\namesym{Indices}.\protosym{BC}\tag{Rule 6}\\ &\rightarrow \assocsym{BC}{Indices}.\assocsym{BC}{Indices}.\protosym{BC}\tag{Rule 8}\\ &\rightarrow \assocsym{BC}{Indices}.\assocsym{BC}{Indices}\tag{Rule 7} \end{align*} The right-hand side similarly reduces to the same term: \begin{align*} \protosym{BC}.\namesym{Indices}.\protosym{BC}.\namesym{Indices}&\rightarrow \assocsym{BC}{Indices}.\protosym{BC}.\namesym{Indices}\tag{Rule 6}\\ &\rightarrow \assocsym{BC}{Indices}.\namesym{Indices}\tag{Rule 7}\\ &\rightarrow \assocsym{BC}{Indices}.\assocsym{BC}{Indices}\tag{Rule 8} \end{align*} This critical pair is now trivial and can be discarded. We've successfully neutralized a formidable adversary---the recursive inherited associated type. Figure~\ref{bidirectionalfig} shows the resolution of the above critical pairs in diagram form. Before moving on to the next topic, there is one final thing worth mentioning. While it's not apparent from looking at this example, for the inherited associated type trick to work, the reduction order must be defined so that if a protocol $\proto{P}$ inherits from $\proto{Q}$, then $\protosym{P}<\protosym{Q}$ as well as $\assocsym{P}{A}<\assocsym{Q}{A}$ for all associated types $\namesym{A}$ of $\proto{Q}$. That is, protocols with a ``deeper'' inheritance graph order lower in the order. In this section, I've been implicitly using a lexicographic order on protocol names, and everything worked ``on accident'' because $\proto{BidirectionalCollection}<\proto{Collection}$ anyway; however this would not have been the case if $\proto{BidirectionalCollection}$ was named differently, for example $\proto{TwoWayCollection}$. The formal definition of the reduction order in Definition \ref{protocolorder} will take protocol inheritance into account. \begin{figure}\captionabove{A few critical pairs resolved while completing the $\proto{BidirectionalCollection}$ protocol with an inherited associated type symbol}\label{bidirectionalfig} \begin{center} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{BC}.\protosym{C}.\namesym{Indices} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{BC}.\namesym{Indices} \arrow[d, equal]&& \protosym{BC}.\assocsym{C}{Indices} \arrow[d]\\ \protosym{BC}.\namesym{Indices} \arrow[rr, dashed]&& \assocsym{BC}{Indices} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{BC}.\namesym{Indices}.\protosym{BC} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{BC}.\namesym{Indices} \arrow[d]&& \assocsym{BC}{Indices}.\protosym{BC} \arrow[d, equal]\\ \assocsym{BC}{Indices} \arrow[rr, dashed, leftarrow]&& \assocsym{BC}{Indices}.\protosym{BC} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \assocsym{BC}{Indices}.\protosym{BC}.\namesym{Indices} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \assocsym{BC}{Indices}.\namesym{Indices} \arrow[d, equal]&& \assocsym{BC}{Indices}.\assocsym{BC}{Indices} \arrow[d, equal]\\ \assocsym{BC}{Indices}.\namesym{Indices} \arrow[rr, dashed]&& \assocsym{BC}{Indices}.\assocsym{BC}{Indices} \end{tikzcd} \vspace{10mm} \begin{tikzcd} &\mathmakebox[0pt][c]{ \protosym{BC}.\namesym{Indices}.\protosym{BC}.\namesym{Indices}.\protosym{BC} } \arrow[ld, yshift=-3pt, shorten=6pt] \arrow[rd, yshift=-3pt, shorten=6pt] \\ \protosym{BC}.\namesym{Indices}.\namesym{Indices}.\protosym{BC} \arrow[d, equal]&& \protosym{BC}.\namesym{Indices}.\protosym{BC}.\namesym{Indices} \arrow[d, equal]\\ \assocsym{BC}{Indices}.\assocsym{BC}{Indices} \arrow[rr, dashed, equal]&& \assocsym{BC}{Indices}.\assocsym{BC}{Indices} \end{tikzcd} \end{center} \end{figure} \section{Merged Associated Types}\label{mergedassoctypes} The previous section showed an example of what can be called ``vertical composition,'' where the $\proto{BidirectionalCollection}$ protocol imposed an additional protocol conformance requirement on the $\namesym{Indices}$ associated type it inherits from $\proto{Collection}$. The recursive conformance on the $\namesym{Indices}$ associated type caused some trouble, which was resolved with the introduction of inherited associated type symbols. Now consider ``horizontal composition,'' where a type parameter conforms to two unrelated protocols, and both protocols define nested associated types with the same name. In this case, the requirements on both nested types are ``merged'' to form a single type parameter. Listing~\ref{horizontalcomp} shows a contrived example, which once again prominently features recursive protocol conformances. \begin{listing}\captionabove{Example of horizontal composition}\label{horizontalcomp} \begin{Verbatim} protocol P1 { associatedtype A : P1 } protocol P2 { associatedtype A : P2 } protocol P3 { associatedtype T : P1, P2 } \end{Verbatim} \end{listing} Note that $\genericparam{Self}.\namesym{T}$ conforms to both $\proto{P1}$ and $\proto{P2}$. Furthermore, $\proto{P1}$ and $\proto{P2}$ both define an associated type name $\namesym{A}$. The associated type in $\proto{P1}$ conforms to $\proto{P1}$, and the associated type in $\proto{P2}$ conforms to $\proto{P2}$. This means that the associated type $\namesym{T}$ of $\proto{P3}$ defines an infinite sequence of nested type parameters, all of which conform to \emph{both} $\proto{P1}$ and $\proto{P2}$: \begin{align*} &\genericparam{Self}.\namesym{T}\\ &\genericparam{Self}.\namesym{T}.\namesym{A}\\ &\genericparam{Self}.\namesym{T}.\namesym{A}.\namesym{A}\\ &\cdots \end{align*} Once again, this causes trouble with the rewrite system, and fixing this requires the final generalization to the concept of associated type symbols. Let's start by listing the initial rewrite rules for the above three protocols. First, $\proto{P1}$ defines an associated type introduction rule, and a conformance rule for this associated type: \begin{align} &\protosym{P1}.\namesym{A}&\Rightarrow\assocsym{P1}{A}\tag{1}\\ &\assocsym{P1}{A}.\protosym{P1}&\Rightarrow\assocsym{P1}{A}\tag{2} \end{align} Similarly for $\proto{P2}$: \begin{align} &\protosym{P2}.\namesym{A}&\Rightarrow\assocsym{P2}{A}\tag{3}\\ &\assocsym{P2}{A}.\protosym{P2}&\Rightarrow\assocsym{P2}{A}\tag{4} \end{align} Finally, $\proto{P3}$ defines an associated type conforming to both $\proto{P1}$ and $\proto{P2}$: \begin{align} &\protosym{P3}.\namesym{T}&\Rightarrow\assocsym{P3}{T}\tag{5}\\ &\assocsym{P3}{T}.\protosym{P1}&\Rightarrow\assocsym{P3}{T}\tag{6}\\ &\assocsym{P3}{T}.\protosym{P2}&\Rightarrow\assocsym{P3}{T}\tag{7} \end{align} The completion procedure looks for overlaps in the above set of rules. The first critical pairs to be resolved are the overlap of Rule~2 with Rule~1 on $\assocsym{P1}{A}.\protosym{P1}.\namesym{A}$, and Rule~4 with Rule~3 on $\assocsym{P2}{A}.\protosym{P2}.\namesym{A}$: \begin{align} \assocsym{P1}{A}.\namesym{A}&\Rightarrow\assocsym{P1}{A}.\assocsym{P1}{A}\tag{8}\\ \assocsym{P2}{A}.\namesym{A}&\Rightarrow\assocsym{P2}{A}.\assocsym{P2}{A}\tag{9} \end{align} Next, let's look at overlaps arising from the rules introduced by protocol $\proto{P3}$. Rule~6 overlaps with Rule~1 on the term $\assocsym{P3}{T}.\protosym{P1}.\namesym{A}$. Resolving this critical pair introduces the rewrite rule: \begin{align} \assocsym{P3}{T}.\namesym{A}\Rightarrow\assocsym{P3}{T}.\assocsym{P1}{A}\tag{10} \end{align} Similarly, Rule~7 overlaps with Rule~1 on the term $\assocsym{P3}{T}.\protosym{P2}.\namesym{A}$. Resolving this critical pair introduces the rewrite rule: \begin{align} \assocsym{P3}{T}.\namesym{A}\Rightarrow\assocsym{P3}{T}.\assocsym{P2}{A}\tag{11} \end{align} These two new rules, Rule~10 and Rule~11, have identical left hand sides, so they overlap on the term $\assocsym{P3}{T}.\namesym{A}$. Resolving this critical pair introduces the rewrite rule: \begin{align} \assocsym{P3}{T}.\assocsym{P2}{A}\Rightarrow\assocsym{P3}{T}.\assocsym{P1}{A}\tag{12} \end{align} The new Rule~12 overlaps with Rule~4 on the term $\assocsym{P3}{T}.\assocsym{P2}{A}.\protosym{P2}$. Resolving this critical pair introduces the rewrite rule: \begin{align} \assocsym{P3}{T}.\assocsym{P1}{A}.\protosym{P2}\Rightarrow\assocsym{P3}{T}.\assocsym{P1}{A}\tag{13} \end{align} The new Rule~12 also overlaps with Rule~9 on the term $\assocsym{P3}{T}.\assocsym{P2}{A}.\namesym{A}$. Let's look at this one more carefully. Call the overlapped term $t$. Reducing $t$ with each one of Rule~12 and Rule~9 produces the following pair of terms: \begin{align*} t_0&=\assocsym{P3}{T}.\assocsym{P1}{A}.\underline{\namesym{A}}\\ t_1&=\assocsym{P3}{T}.\assocsym{P2}{A}.\underline{\assocsym{P2}{A}} \end{align*} The term $t_0$ can be further reduced by an application of Rule~8: \begin{align*} t_0=&\assocsym{P3}{T}.\assocsym{P1}{A}.\underline{\namesym{A}}\\ \rightarrow&\assocsym{P3}{T}.\assocsym{P1}{A}.\underline{\assocsym{P1}{A}} \end{align*} At this point, $t_0$ is now irreducible. The term $t_1$ can be further reduced by another an application of Rule~12: \begin{align*} t_1=&\underline{\assocsym{P3}{T}.\assocsym{P2}{A}}.\assocsym{P2}{A}\\ \rightarrow&\underline{\assocsym{P3}{T}.\assocsym{P1}{A}}.\assocsym{P2}{A} \end{align*} If you orient this critical pair, you get the rewrite rule: \begin{align} \assocsym{P3}{T}.\assocsym{P1}{A}.\assocsym{P2}{A}&\Rightarrow \assocsym{P3}{T}.\assocsym{P1}{A}.\assocsym{P1}{A}\tag{14} \end{align} Now, you're surely experiencing d\'ej\`a vu, because the completion procedure is about to wander off into the weeds with an infinite sequence of critical pairs. Rule~14 overlaps with Rule~4 on the term $\assocsym{P3}{T}.\assocsym{P1}{A}.\assocsym{P2}{A}.\protosym{P2}$. Resolving this critical pair introduces the rewrite rule: \begin{align} \assocsym{P3}{T}.\assocsym{P1}{A}.\assocsym{P1}{A}.\protosym{P2}&\Rightarrow \assocsym{P3}{T}.\assocsym{P1}{A}.\assocsym{P1}{A}\tag{15} \end{align} Similarly, Rule~14 overlaps with Rule~9 on the term $\assocsym{P3}{T}.\assocsym{P1}{A}.\assocsym{P2}{A}.\namesym{A}$. Resolving this critical pair introduces the rewrite rule: \begin{align} \assocsym{P3}{T}.\assocsym{P1}{A}.\assocsym{P1}{A}.\assocsym{P2}{A}&\Rightarrow \assocsym{P3}{T}.\assocsym{P1}{A}.\assocsym{P1}{A}.\assocsym{P1}{A}\tag{16} \end{align} This process will continue forever, introducing an infinite sequence of rewrite rules, for all $n\in\mathbb{N}$: \begin{align*} \assocsym{P3}{T}.\underbrace{\assocsym{P1}{A}.\assocsym{P1}{A}.\assocsym{P1}{A}}_{\text{$n$ times}}.\protosym{P2}&\Rightarrow \assocsym{P3}{T}.\underbrace{\assocsym{P1}{A}.\assocsym{P1}{A}.\assocsym{P1}{A}}_{\text{$n$ times}}\\ \assocsym{P3}{T}.\underbrace{\assocsym{P1}{A}.\assocsym{P1}{A}.\assocsym{P1}{A}}_{\text{$n$ times}}.\assocsym{P2}{A}&\Rightarrow \assocsym{P3}{T}.\underbrace{\assocsym{P1}{A}.\assocsym{P1}{A}.\assocsym{P1}{A}}_{\text{$n$ times}}.\assocsym{P1}{A} \end{align*} What happened here is that the rewrite system wants to normalize a mix of $\assocsym{P1}{A}$ and $\assocsym{P2}{A}$ that follows $\assocsym{P3}{T}$ into the same-length sequence of $\assocsym{P1}{A}$ alone. Additionally, each one of these type parameters needs to conform to $\protosym{P2}$ as well, even though $\assocsym{P1}{T}$ alone does \emph{not} conform to $\proto{P2}$. Unfortunately, this cannot be expressed with a convergent rewrite system over the existing alphabet. Recall the two other examples of this phenomenon shown in this chapter: \begin{enumerate} \item Introducing associated type symbols made recursive protocol conformance requirements convergent. \item Introducing inherited associated type symbols made recursive protocol conformance requirements on inherited associated types convergent. \end{enumerate} \chapter{The Requirement Machine}\label{requirementmachine} Chapter \ref{rewritesystemintro} introduced rewrite systems, and Chapter~\ref{protocolsasmonoids} and Chapter~\ref{associatedtypes} worked through a series of examples to show how a rewrite system can be constructed to answer a couple of simple generic signature queries. In this chapter, I will define the Swift generics rewrite system in its entirety. First, let's define the alphabet of symbols used by the requirement machine. The symbols are categorized into seven kinds. You've already seen name symbols, protocol symbols and associated type symbols. The last four symbol kinds are new. \index{symbol} \index{term} \begin{definition}[Symbols]\label{symboldef} The requirement machine operates on the below alphabet: \begin{itemize} \index{protocol symbol} \item \textbf{Protocol symbols}: $\protosym{P}$ where $\proto{P}$ is a Swift protocol. \index{associated type symbol} \item \textbf{Associated type symbols}: $\assocsym{P}{T}$ where $\proto{P}$ is a protocol and $\namesym{T}$ is an associated type name. The protocol must directly define or inherit an associated type named $\namesym{T}$ \index{name symbol} \item \textbf{Name symbols}: $\namesym{T}$ for any valid Swift identifier. \index{generic parameter symbol} \item \textbf{Generic parameter symbols}: $\genericsym{d}{i}$ where $d$, $i\geq 0$ are the depth and index of the generic parameter, respectively. \index{layout symbol} \item \textbf{Layout symbols}: $\layoutsym{L}$ where $\namesym{L}$ is a Swift layout constraint. \index{type substitution} \index{superclass symbol} \item \textbf{Superclass symbols}: $\supersym{\namesym{T};\;\sigma_0,\ldots,\sigma_n}$ where $\namesym{T}$ is a Swift type, and the $\{\sigma_i\}_{0\le i \le n}$ are a (possibly empty) ordered list of terms, referred to as \emph{substitutions}. \item \index{concrete type symbol} \textbf{Concrete type symbols}: $\concretesym{\namesym{T};\;\sigma_0,\ldots,\sigma_n}$ where $T$ and $\sigma_i$ are as above. \end{itemize} \end{definition} Generic parameter symbols are the subject of Section \ref{genericparamsym}. Layout, superclass and concrete type symbols are described in excruciating detail in Section~\ref{concretetypes}. \index{reduction order} First though, I will define the reduction order on symbols. \index{linear order} \index{reduction protocol order} \begin{definition}[Reduction protocol order]\label{protocolorder} First, for each protocol $\proto{P}$, define the \emph{depth} of $\proto{P}$ as one greater than the maximum depth of each protocol $\proto{Q}_i$ inherited by $\proto{P}$: \[\gpdepth(\proto{P}) = 1 + \max(\gpdepth(\proto{Q}_i))\quad\hbox{where}\quad \bm{\mathsf{Q}}_i \in \hbox{protocols inherited by $\proto{P}$}\] If $\proto{P}$ does not inherit from any protocols, then $\max(\varnothing)=0$, and $\gpdepth(\proto{P})=1$. Now, given two protocols $\proto{P}$ and $\proto{Q}$, $\proto{P}<\proto{Q}$ in the reduction protocol order if: \begin{itemize} \item $\gpdepth(\proto{P}) > \gpdepth(\proto{Q})$, or \item $\gpdepth(\proto{P}) = \gpdepth(\proto{Q})$, and $\proto{P}$ precedes $\proto{Q}$ with the canonical protocol order from Definition~\ref{canonicalprotocol}. \end{itemize} \end{definition} \begin{listing}\captionabove{The standard library's Collection protocol tower}\label{collectiontower} \begin{Verbatim} protocol Sequence {} protocol Collection : Sequence {} protocol BidirectionalCollection : Collection {} protocol MutableCollection : Collection {} protocol RangeReplaceableCollection : Collection {} protocol RandomAccessCollection : BidirectionalCollection {} \end{Verbatim} \end{listing} \begin{example} Consider the collection protocol tower from the standard library, shown in Listing \ref{collectiontower}. The depth of each protocol is as follows: \begin{itemize} \item $\proto{Sequence}$ has depth 1. \item $\proto{Collection}$ has depth 2. \item $\proto{BidirectionalCollection}$, $\proto{MutableCollection}$, and $\proto{RangeReplaceableCollection}$ all have depth 3. \item $\proto{RandomAccessCollection}$ has depth 4. \end{itemize} Here is the linear order among these protocols: \begin{align*}\proto{RandomAccessCollection}<\proto{BidirectionalCollection}&<\proto{MutableCollection}\\ <\proto{RangeReplaceableCollection}&<\proto{Collection}<\proto{Sequence} \end{align*} You can see that protocols deeper in the inheritance graph precede other protocols. If you recall, associated type inheritance relies on this in Section \ref{inheritedassoctypes}. \end{example} \index{partial order} \begin{definition}[Reduction order on symbols]\label{symbolorder} Say the two symbols are $\alpha$ and $\beta$. \begin{figure}\captionabove{symbol kind order}\label{kindorder} \[ \begin{array}{c} \text{Protocol symbol}\\ <\\ \text{Associated type symbol}\\ <\\ \text{Name symbol}\\ <\\ \text{Generic parameter symbol}\\ <\\ \text{Layout symbol}\\ <\\ \text{Superclass symbol}\\ <\\ \text{Concrete type symbol} \end{array} \] \end{figure} If $\alpha$ and $\beta$ have different kinds, then $\alpha<\beta$ if the kind of $\alpha$ precedes the kind of $\beta$ in Figure~\ref{kindorder}. Note that this is the same kind order as in Definition \ref{symboldef}. If $\alpha$ and $\beta$ have the same kind, then they are compared as follows: \begin{itemize} \item \textbf{Protocol symbols:} Let $\alpha=\protosym{P}$ and $\beta=\protosym{Q}$. Then $\alpha<\beta$ if $\proto{P}<\proto{Q}$ in the reduction protocol order from Definition \ref{protocolorder}. \item \textbf{Associated type symbols:} Let $\alpha=\assocsym{P}{T}$, $\beta=\assocsym{Q}{U}$. \begin{itemize} \item If the identifier $\namesym{T}$ precedes $\namesym{U}$ in lexicographic order, then $\alpha < \beta$. \item If $\namesym{T}=\namesym{U}$ and $\proto{P}<\proto{Q}$ in the reduction protocol order, then $\alpha < \beta$. \end{itemize} \item \textbf{Name symbols} $\alpha<\beta$ if the identifier of $\alpha$ precedes the identifier of $\beta$ lexicographically. \item \textbf{Generic parameter symbols} Let $\alpha=\genericsym{d}{i}$, $\beta=\genericsym{d'}{i'}$. Then $\alpha < \beta$ if either $d \Lambda_{\proto{P}}(\namesym{U})$ in the shortlex order on terms (if not, flip the terms around): \[\Lambda_{\proto{P}}(\namesym{T}) \Rightarrow \Lambda_{\proto{P}}(\namesym{U})\] \end{itemize} % FIXME: Link to property map section here and mention implied AnyObject rule. \end{algorithm} \section{Generic Parameters}\label{genericparamsym} \index{generic parameter symbol} So far, I've only shown you how to build rewrite rules from requirements in the requirement signature of some protocol $\proto{P}$. When lowering a type parameter, the protocol $\genericparam{Self}$ type lowers to the protocol symbol $\protosym{P}$. Once such a rewrite system is built, queries can be performed against the protocol generic signature $\gensig{\genericparam{Self}}{\genericparam{Self}\colon\proto{P}}$. When lowering parameters and requirements in an arbitrary generic signature, generic parameter types instead become generic parameter symbols. Generic parameter symbols should only ever appear as the initial symbol in a term. While the rewrite system would have no trouble with terms where generic parameter symbols appear elsewhere in the abstract, they don't actually make sense semantically, since they do not correspond to valid Swift type parameter types. The lowering of type parameters in a generic signature is similar to Algorithm \ref{lowertypeinproto}. The first associated type no longer plays a special rule, since the term is always ``rooted'' at a generic parameter symbol. \index{type parameter} \index{generic requirement} \index{generic signature} \begin{algorithm}[Type parameter lowering for generic signatures]\label{lowertypeinsig} The lowering map $\Lambda\colon\namesym{Type}\rightarrow\namesym{Term}$ takes a type parameter $X$ as input: \[X:=\genericsym{d}{i}.X_1.X_2\ldots X_n\] This algorithm constructs a new term $Y:=\Lambda(X)$ from the type parameter $X$ as follows: \begin{itemize} \item The first element of $X$ is a generic parameter type at depth $d$ and index $i$, so set $Y_0:=\genericsym{d}{i}$. \item All subsequent elements are associated types. If the $i$th element is an associated type $A_i$ defined in a protocol $\protosym{P}_i$, then set $Y_i:=\assocsym{P}{A}$. \end{itemize} \end{algorithm} \begin{algorithm}[Generic requirement lowering] The generic signature requirement lowering map $\Lambda\colon \namesym{Requirement}\rightarrow\namesym{Rule}^+$ is virtually identical to protocol requirement lowering map in Algorithm \ref{lowerreqinproto}. The only difference is that types should be lowered to terms via $\Lambda\colon\namesym{Type}\rightarrow\namesym{Term}$ defined above, in place of $\Lambda_{\proto{P}}\colon\namesym{Type}\rightarrow\namesym{Term}$ from Algorithm~\ref{lowertypeinproto}. \end{algorithm} \begin{example} Consider the following generic signature: \[\gensig{\genericsym{0}{0},\genericsym{0}{1}}{\genericsym{0}{0}\colon\proto{Collection},\;\genericsym{0}{1}\colon\proto{Collection},\;\genericsym{0}{0}.\namesym{Element}==\genericsym{0}{1}.\namesym{Element}}\] The signature's requirements lower to the following rewrite rules: \begin{align} \genericsym{0}{0}.\protosym{Collection}&\Rightarrow\genericsym{0}{0}\tag{1}\\ \genericsym{0}{1}.\protosym{Collection}&\Rightarrow\genericsym{0}{1}\tag{2}\\ \genericsym{0}{1}.\namesym{Element}&\Rightarrow\genericsym{0}{0}.\namesym{Element}\tag{3} \end{align} Rule 1 and Rule 2 are lowered conformance requirements of the form $\namesym{T}.\protosym{P}\Rightarrow\namesym{T}$ just like before, and Rule 3 is the lowered same-type requirement. This rewrite system will also need to include the requirements of the $\proto{Collection}$ protocol, as well as $\proto{Sequence}$ and $\proto{IteratorProtocol}$, which are referenced from the requirement signatures of $\proto{Collection}$ and $\proto{Sequence}$. \end{example} \begin{definition}The \emph{protocol dependencies} of a generic signature (or protocol requirement signature) are all the protocols that appear on the right-hand side of the conformance requirements of the generic signature (or protocol requirement signature). The \emph{complete} protocol dependencies of a generic signature is the transitive closure of its protocol dependencies. \end{definition} While generic parameters are uniquely identified by their depth and index within a \emph{single} generic signature, they are not unique \emph{between} generic signatures, so each generic signature needs its own requirement machine. This process of constructing a requirement machine from a generic signature can be formalized as follows. \begin{algorithm}[Requirement machine construction]\label{rqmalgo} Let $G$ be the input generic signature $\gensig{\genericsym{0}{0},\;\ldots,\;\genericsym{m}{n}}{R_0,\;\ldots\;R_i}$. Outputs a confluent rewrite system, or fails with a timeout. \begin{enumerate} \item Let $S$ be an empty rewrite system. \item Let $W$ be an empty stack of protocols. \item Let $V$ be an empty set of protocols. \item For each requirement $R$ of $G$: \begin{enumerate} \item Lower $R$ to a rewrite rule $\Lambda(R)$, and add the new rule to $S$. \item If $R$ is a conformance requirement $\namesym{T}\colon\proto{P}$ and $\proto{P}\notin V$, push $\proto{P}$ onto $W$, and insert $\proto{P}$ into $V$. \end{enumerate} \item While $W$ is not empty, \begin{enumerate} \item Pop the next protocol $\proto{P}$ from $W$. \item Add rewrite rules corresponding to the associated types and requirements of $\proto{P}$ using the protocol lowering map from Definition~\ref{protoloweringmap2}. \item For each conformance requirement $\namesym{T}\colon\proto{Q}$ in the requirement signature of $\proto{P}$, \begin{enumerate} \item If $\proto{Q}\notin V$, push $\proto{Q}$ onto $W$, and insert $\proto{Q}$ into $V$. \end{enumerate} \end{enumerate} \item Run the Knuth-Bendix completion procedure on $S$ (Algorithm~\ref{knuthbendix}). \item If the completion succeeds with the configured iteration and depth limits, return $S$; otherwise, diagnose an error. \end{enumerate} \end{algorithm} \section{Concrete Types}\label{concretetypes} \index{concrete type symbol} \index{superclass symbol} \index{layout symbol} The time has come to reveal the mystery of how layout, superclass and concrete type requirements work. Definition \ref{lowerreqinproto} showed that just like a conformance requirement $\namesym{T}\colon\proto{P}$ becomes a rewrite rule $\namesym{T}.\protosym{P}\Rightarrow\namesym{T}$, a layout requirement $\namesym{T}\colon\namesym{L}$ becomes a rewrite rule $\namesym{T}.\layoutsym{L}\Rightarrow\namesym{T}$. The situation with superclass and concrete type requirements is analogous, except that superclass and concrete type symbols are constructed in a more elaborate manner, described in Algorithm~\ref{concretesymbolcons}. In fact, this phenomenon where rules eliminate a symbol from the end of a term can be formalized. Figure \ref{symbolclass} classifies the alphabet into the \emph{property-like} and \emph{type-like} symbols (protocol symbols straddle both classifications, because they can also arise as the protocol $\proto{Self}$ type). This notion of property-like symbols also generalizes to property-like rules. \begin{definition} A rewrite rule is \emph{property-like} if it is of the form $\namesym{T}.\namesym{\Pi}\Rightarrow\namesym{T}$ with $\namesym{\Pi}$ a property-like symbol. \end{definition} \index{property-like!symbol} \index{type-like!symbol} \begin{figure}\captionabove{symbol kind classification}\label{symbolclass} \begin{center} \begin{tabular}{|l|l|} \hline \multirow{3}{14em}{Property-like}& layout\\ &superclass\\ &concrete type\\ \hline \multirow{3}{14em}{Type-like}& associated type\\ &identifier\\ &generic parameter\\ \hline \multirow{1}{14em}{Both property and type-like}& protocol\\ \hline \end{tabular} \end{center} \end{figure} \index{type substitution} Recall that superclass and concrete type symbols store a Swift type together with a list of substitutions. What do these represent exactly, and why is it not enough to store a Swift type alone? Well, consider this generic signature: \begin{align*} \gensig{\genericsym{0}{0},\;\genericsym{0}{1}} { &\genericsym{0}{0}\colon\proto{Sequence},\\ &\genericsym{0}{1}\colon\proto{Sequence},\\ &\genericsym{0}{0}.\namesym{Element}== \namesym{Array}\langle\genericsym{0}{1}.\namesym{Element}\rangle } \end{align*} The right-hand side of the concrete type requirement contains the type parameter $\genericsym{0}{1}.\namesym{Element}$. This type parameter lowers to the term $\genericsym{0}{1}.\assocsym{Sequence}{Element}$. It would be nice if the Swift type could directly contain this term, but a \texttt{BoundGenericType} like $\namesym{Array}\langle\bullet\rangle$ only contains other types, not terms or any other arbitrary objects. In theory it would be possible to duplicate the entire Swift type hierarchy in the world of concrete type symbols, but it would not be very practical. The parallel hierarchy would be quite large, with its own versions of metatypes, function types, tuple types, and so on. this would also be a maintenance burden going forward, since any addition to a Swift type representation, for example adding a new attribute to function types, would have to be mirrored in the world of concrete type terms. Another option would be to introduce a special kind of placeholder type in the Swift AST, which can store a term, but this would also have undesirable ripple effects throughout the codebase. There is a simple solution. Concrete type symbols store the child terms off to the side in a list of substitutions. The substitution terms are referenced from within the Swift type using ``phantom'' generic parameters, disregarding the depth and using the index to refer to an element of the substitution list. \begin{algorithm}[Concrete type symbol construction]\label{concretesymbolcons} Takes as input a Swift type $X$ containing arbitrary type parameters, and as output returns a new type where the type parameters have been replaced with generic parameters indexing into a substitution list, together with the substitution list itself. This algorithm can build a symbol for use in a rule constructed from a requirement in a protocol $\proto{P}$, or a requirement in the generic signature of a function or type. The only difference is whether types are lowered via $\Lambda_{\proto{P}}$ (Algorithm~\ref{lowertypeinsig}) or $\Lambda$ (Algorithm~\ref{lowertypeinproto}). \begin{enumerate} \item Initialize $S$ with an empty list of terms. \item For each position $\pi$ where $X|_{\pi}$ is a type parameter, \begin{enumerate} \item Get the type parameter $T$ stored at $X|_{\pi}$. \item Replace $X|_{\pi}$ with a new generic parameter type $\genericsym{0}{j}$, where $j$ is the number of elements in $S$ so far. \item Append the term $\Lambda(T)$ (or $\Lambda_{\proto{P}}(T)$) to $S$. \end{enumerate} \item Build the concrete type symbol $\concretesym{X;S}$ with the modified Swift type $X$ and substitutions $S$. \end{enumerate} \end{algorithm} The same algorithm also constructs superclass symbols, if you replace $\concretesym{X;S}$ with $\supersym{X;S}$ in Step~3. \begin{example} The type $\namesym{Array}\langle\genericsym{0}{1}.\namesym{Element}\rangle$, when written in a generic signature where $\genericsym{0}{1}\colon\proto{Sequence}$, becomes the following concrete type symbol: \[\concretesym{\namesym{Array}\langle\genericsym{0}{0}\rangle;\;\sigma_0:=\genericsym{0}{1}.\assocsym{Sequence}{Element}}.\] The generic parameter $\genericsym{0}{0}$ appearing within $\namesym{Array}\langle\genericsym{0}{0}\rangle$ is not a real generic parameter from the current generic context; instead, its just an index into the substitution list, here referring to the first element, $\genericsym{0}{1}.\namesym{Element}$. To aid with readability, let's write a ``phantom'' generic parameter $\genericsym{0}{i}$ as $\sigma_i$. Now, the above symbol looks a little neater: \[\concretesym{\namesym{Array}\langle\sigma_0\rangle;\;\sigma_0:=\genericsym{0}{1}.\namesym{Element}}.\] \end{example} \begin{example} The function type $(\namesym{Array}\langle\genericsym{1}{0}\rangle,\; \namesym{Int})\rightarrow \genericsym{1}{1}.\namesym{Element}$, when written in a generic signature where $\genericsym{1}{1}\colon\proto{Sequence}$, maps to the following concrete type symbol: \begin{align*} \concretesym{&(\namesym{Array}\langle\sigma_0\rangle,\; \namesym{Int})\rightarrow \sigma_1;\\ &\sigma_0:=\genericsym{1}{0},\\ &\sigma_1:=\genericsym{1}{1}.\assocsym{Sequence}{Element}}. \end{align*} \end{example} \begin{example} The tuple type $(\genericparam{Self}.\namesym{Magnitude},\; \genericparam{Self}.\namesym{Words})$, when written in a protocol $\proto{P}$ that defines associated types $\namesym{Magnitude}$ and $\namesym{Words}$, maps to the following concrete type symbol: \begin{align*} \concretesym{&(\sigma_0,\;\sigma_1);\\ &\sigma_0:=\assocsym{P}{Magnitude},\\ &\sigma_1:=\assocsym{P}{Words}}. \end{align*} \end{example} Note that the Swift type in a superclass or concrete type symbol cannot itself be a type parameter. That is, the following is never valid: \[\concretesym{\sigma_0;\; \sigma_0=\hbox{some term}}\] A same-type requirement between a type parameter and another type parameter is always represented as an equivalence of terms, no concrete type symbols are involved. \index{partial order} One thing to note is that the reduction order on symbols (Definition \ref{symbolorder}) is a partial order, as layout, superclass and concrete type symbols are incomparable amongst themselves. Ordinarily, this would imply that the Knuth-Bendix completion procedure (Algorithm~\ref{knuthbendix}) can fail in a new way: when resolving a critical pair $(t_0, t_1)$ to a rewrite rule, it might be the case that there is no way to orient the rule; that is, neither $t_0t_1$? Since $x'\Rightarrow y'$ is a property-like rule, $x'$ is equal to $y'$ with a concrete type symbol appended, or in other words, $y'=vw$, so $t_1=uvw$. But $x=uv$, so $t_1=uvw$ reduces to $yw$. So indeed, the above critical pair either becomes trivial if $t_0$ can be reduced by some other rule, or it introduces the rewrite rule $t_0\Rightarrow yw$.) \begin{example} Consider the generic signature of class $\namesym{C}$ from Listing~\ref{overlapconcreteex}: \begin{listing}\captionabove{Example with overlap involving concrete type term}\label{overlapconcreteex} \begin{Verbatim} struct G {} protocol S { associatedtype E } protocol P { associatedtype T associatedtype U where U == G associatedtype V } class C where X : S, X.E : P, X.E.U == X.E.T {} \end{Verbatim} \end{listing} \begin{align*} \gensig{\genericsym{0}{0}}{&\genericsym{0}{0}\colon\proto{S},\\ &\genericsym{0}{0}.\namesym{E}\colon\proto{P},\\ &\genericsym{0}{0}.\namesym{E}.\namesym{U}==\genericsym{0}{0}.\namesym{E}.\namesym{T}} \end{align*} The relevant subset of this generic signature's rewrite rules: \begin{align} %&\protosym{P}.\namesym{T}&\Rightarrow\;&\assocsym{P}{T}\tag{Rule 1}\\ %&\protosym{P}.\namesym{U}&\Rightarrow\;&\assocsym{P}{U}\tag{Rule 2}\\ %&\protosym{P}.\namesym{V}&\Rightarrow\;&\assocsym{P}{V}\tag{Rule 3}\\ &\assocsym{P}{U}.\concretesym{\namesym{G}\langle\sigma_0\rangle;\;\sigma_0:=\assocsym{P}{V}}&\Rightarrow\;&\assocsym{P}{U}\tag{Rule 1}\\ &\genericsym{0}{0}.\protosym{S}&\Rightarrow\;&\genericsym{0}{0}\tag{Rule 2}\\ &\genericsym{0}{0}.\assocsym{S}{E}.\protosym{P}&\Rightarrow\;&\genericsym{0}{0}.\assocsym{S}{E}\tag{Rule 3}\\ &\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{U}&\Rightarrow\;&\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{T}\tag{Rule 4} \end{align} Observe that Rule 4 overlaps with Rule 1. The prefix $\genericsym{0}{0}.\assocsym{S}{E}$ must be prepended to the substitution $\sigma_0$ in the concrete type symbol when computing the critical pair: \begin{align*} t_0&=\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{T}.\concretesym{\namesym{G}\langle\sigma_0\rangle;\;\sigma_0:=\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{V}}\\ t_1&=\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{U} \end{align*} Now, $t_0$ cannot be reduced further, whereas Rule 7 reduces $t_1$ to $\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{T}$. This means that resolving the critical pair introduces the new rewrite rule: \[\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{T}.\concretesym{\namesym{G}\langle\sigma_0\rangle;\;\sigma_0:=\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{V}}\Rightarrow\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{T}. \] Intuitively, the completion process began with the fact that \[\assocsym{P}{U}==\namesym{G}\langle\assocsym{P}{V}\rangle,\] and derived that\ \[\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{T}==\namesym{G}\langle\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{T}\rangle.\] Adjusting the concrete type symbol by prepending the prefix $\genericsym{0}{0}.\assocsym{S}{E}$ to the substitution $\sigma_0$ appearing in the left-hand side of Rule 7 ``re-rooted'' the concrete type, giving the correct result shown above. Without the adjustment, we would instead have derived the fact \[\genericsym{0}{0}.\assocsym{S}{E}.\assocsym{P}{T}==\namesym{G}\langle\assocsym{P}{T}\rangle,\] which does not make sense. \end{example} The concrete type adjustment actually comes up again in the next chapter, during property map construction (Algorithm~\ref{propmapconsalgo}) and lookup (Algorithm~\ref{propmaplookupalgo}). \fi \chapter{Property Map}\label{propertymap} \ifWIP If we already have some way to compute reduced type parameters, we can define what it means to compute a reduced type for an arbitrary type containing type parameters, as follows. \begin{algorithm}[Computing a reduced type]\label{reducedtypealgo} As input, takes a type \texttt{T}. Outputs the reduced type of \texttt{T}. \begin{enumerate} \item If \texttt{T} does not contain any type parameters, it is already reduced. Return \texttt{T}. \item If \texttt{T} is a type parameter but fixed to a concrete type, replace \texttt{T} with its concrete type and continue to Step~3. \item If \texttt{T} is not a type parameter, transform \texttt{T} by recursively replacing any type parameters appearing in \texttt{T} with their reduced types, and return the transformed type. \item The final possibility is that \texttt{T} is a type parameter, not fixed to a concrete type. The reduced type of \texttt{T} is the smallest type parameter in its equivalence class. Return this type parameter. \end{enumerate} \end{algorithm} Until now, you've seen how to solve the \texttt{requiresProtocol()} generic signature query. If $T$ is a type term, then the type parameter represented by $T$ conforms to a protocol $\proto{P}$ if $T$ and $T.\protosym{P}$ both reduce to the same canonical form ${T}{\downarrow}$. The next step is to solve more general queries, such as \texttt{getRequiredProtocols()}. Here, you want to find \emph{all} protocol symbols $\protosym{P}$ such that $T.\protosym{P}$ and $T$ reduce to some ${T}{\downarrow}$. \index{layout requirement} \index{conformance requirement} \index{concrete type requirement} \index{property-like symbol} One potential implementation would use exhaustive enumeration. A rewrite system's rules only mention a finite set of protocol symbols, so it would be enough to suffix a type term with every known protocol symbol and attempt to reduce the result. While this shows that the query is implementable, it is an unsatisfying solution. The approach I'm going to outline below is more efficient, and also more generally useful with generic signature queries involving layout, superclass and concrete type requirements as well. \begin{definition} If $T$ and $U$ are terms and there is some term $Z$ such that $T\rightarrow Z$ and $U\rightarrow Z$, then $T$ and $U$ are said to be \emph{joinable}, and this is written as $T\downarrow U$. \end{definition} \begin{definition} If $\Pi$ is a property-like symbol and $T$ is a term, then $T$ \emph{satisfies} $\Pi$ if $T.\Pi\downarrow T$. The set of properties satisfied by $T$ is defined as the set of all symbols $\Pi$ such that $T.\Pi\downarrow T$. \end{definition} \begin{theorem}\label{propertymaptheorem} If $T$ is a type term with canonical form ${T}{\downarrow}$, $\Pi$ is a property-like symbol, and $T$ satisfies $\Pi$, then ${T}{\downarrow}.\Pi\rightarrow{T}{\downarrow}$. Furthermore, this reduction sequence consists of a single rule $V.\Pi\Rightarrow V$, for some non-empty suffix $V$ of ${T}{\downarrow}$. \end{theorem} \begin{proof} Since $T$ can be reduced to ${T}{\downarrow}$, the same reduction sequence when applied to $T.\Pi$ will produce ${T}{\downarrow}.\Pi$. This means that $T.\Pi$ can be reduced to both ${T}{\downarrow}$ (by assumption), and ${T}{\downarrow}.\Pi$. By confluence, ${T}{\downarrow}.\Pi$ must reduce to ${T}{\downarrow}$. Since ${T}{\downarrow}$ is canonical, ${T}{\downarrow}.\Pi$ cannot be reduced further except with a rewrite rule of the form $V.\Pi\Rightarrow V'$, where ${T}{\downarrow}=UV$, for terms $U$, $V$ and $V'$. It remains to show that $V=V'$. (TODO: This needs an additional assumption about conformance-valid rules.) \end{proof} By Theorem~\ref{propertymaptheorem}, the properties satisfied by a type term can be discovered by considering all non-empty suffixes of ${T}{\downarrow}$, and collecting rewrite rules of the form $V.\Pi\rightarrow V$ where $\Pi$ is some property-like symbol. \begin{listing}\captionabove{Motivating example for property map}\label{propmaplisting1} \begin{Verbatim} protocol P1 {} protocol P2 {} protocol P3 { associatedtype T : P1 associatedtype U : P2 } protocol P4 { associatedtype A : P3 where A.T == A.U associatedtype B : P3 } \end{Verbatim} \end{listing} \begin{example}\label{propmapexample1} Consider the protocol definitions in Listing~\ref{propmaplisting1}. These definitions are used in a couple of examples below, so let's look at the constructed rewrite system first. Protocols $\proto{P1}$ and $\proto{P2}$ do not define any associated types or requirements, so they do not contribute any initial rewrite rules. Protocol $\proto{P3}$ has two associated types $\namesym{T}$ and $\namesym{U}$ conforming to $\proto{P1}$ and $\proto{P2}$ respectively, so a pair of rules introduce each associated type, and another pair impose conformance requirements: \begin{align} \protosym{P3}.\namesym{T}&\Rightarrow\assocsym{P3}{T}\tag{1}\\ \protosym{P3}.\namesym{U}&\Rightarrow\assocsym{P3}{U}\tag{2}\\ \assocsym{P3}{T}.\protosym{P1}&\Rightarrow\assocsym{P3}{T}\tag{3}\\ \assocsym{P3}{U}.\protosym{P2}&\Rightarrow\assocsym{P3}{U}\tag{4} \end{align} Protocol $\proto{P4}$ adds five additional rules. A pair of rules introduce the associated types $\namesym{A}$ and $\namesym{B}$. Next, both associated types conform to $\proto{P3}$, and $\namesym{A}$ has a same-type requirement between it's nested types $\namesym{T}$ and $\namesym{U}$: \begin{align} \protosym{P4}.\namesym{A}&\Rightarrow\assocsym{P4}{A}\tag{5}\\ \protosym{P4}.\namesym{B}&\Rightarrow\assocsym{P4}{B}\tag{6}\\ \assocsym{P4}{A}.\protosym{P3}&\Rightarrow\assocsym{P4}{A}\tag{7}\\ \assocsym{P4}{B}.\protosym{P3}&\Rightarrow\assocsym{P4}{B}\tag{8}\\ \assocsym{P4}{A}.\assocsym{P3}{U}&\Rightarrow\assocsym{P4}{A}.\assocsym{P3}{T}\tag{9} \end{align} When applied to the above initial rewrite system, the Knuth-Bendix algorithm adds a handful of new rules to resolve critical pairs. First, there are four overlaps between the conformance requirements of $\proto{P4}$ and the associated type introduction rules of $\proto{P3}$: \begin{align} \assocsym{P4}{A}.\namesym{T}&\Rightarrow\assocsym{P4}{A}.\assocsym{P3}{T}\tag{10}\\ \assocsym{P4}{A}.\namesym{U}&\Rightarrow\assocsym{P4}{A}.\assocsym{P3}{T}\tag{11}\\ \assocsym{P4}{B}.\namesym{T}&\Rightarrow\assocsym{P4}{B}.\assocsym{P3}{T}\tag{12}\\ \assocsym{P4}{B}.\namesym{U}&\Rightarrow\assocsym{P4}{B}.\assocsym{P3}{U}\tag{13} \end{align} Finally, there is an overlap between Rule~9 and Rule~4: \begin{align} \assocsym{P4}{A}.\assocsym{P3}{T}.\protosym{P2}&\Rightarrow\assocsym{P4}{A}.\assocsym{P3}{T}\tag{14} \end{align} Consider the type parameter $\genericparam{Self}.\namesym{A}.\namesym{U}$ in the generic signature of $\proto{P4}$. This type parameter is equivalent to $\genericparam{Self}.\namesym{A}.\namesym{T}$ via the same-type requirement in $\proto{P4}$. The associated type $\namesym{T}$ of $\proto{P3}$ conforms to $\proto{P1}$, and $\namesym{U}$ conforms to $\proto{P2}$. This means that $\genericparam{Self}.\namesym{A}.\namesym{U}$ conforms to \emph{both} $\proto{P1}$ and $\proto{P2}$. Let's see how this fact can be derived from the rewrite system. Applying $\Lambda_{\proto{P4}}$ to $\genericparam{Self}.\namesym{A}.\namesym{U}$ produces the type term $\assocsym{P4}{A}.\assocsym{P3}{U}$. This type term can be reduced to the canonical term $\assocsym{P4}{A}.\assocsym{P3}{T}$ with a single application of Rule~9. By the result in Theorem~\ref{propertymaptheorem}, it suffices to look at rules of the form $V.\Pi\Rightarrow V$, where $V$ is some suffix of $\assocsym{P4}{A}.\assocsym{P3}{T}$. There are two such rules: \begin{enumerate} \item Rule~3, which says that $\assocsym{P3}{T}$ conforms to $\proto{P1}$. \item Rule~14, which says that $\assocsym{P4}{A}.\assocsym{P4}{T}$ conforms to $\proto{P2}$. \end{enumerate} This shows that the set of properties satisfied by the type parameter $\genericparam{Self}.\namesym{A}.\namesym{U}$ is exactly $\{\protosym{P1},\protosym{P2}\}$. \end{example} The above example might suggest that looking up the set of properties satisfied by a type parameter requires iterating over the list of rewrite rules, but in reality, it is possible to construct a multi-map of pairs $(V, \Pi)$ once, after the completion procedure ends. As you saw in the example, a type term can satisfy multiple properties via different suffixes. For the material presented in Section~\ref{moreconcretetypes}, it is convenient to avoid having to take the union of sets in the lookup path. For this reason, the construction algorithm explicitly ``inherits'' the symbols associated with a term $V$ when adding an entry for a term $UV$ that has $V$ as a suffix. As a result, the lookup algorithm only has to look for the longest suffix that appears in the multi-map to find all properties satisfied by a term. The multi-map construction and lookup can be formalized in a pair of algorithms. \begin{algorithm}[Property map construction]\label{propmapconsalgo} This algorithm runs after the completion procedure has constructed a confluent rewrite system with simplified right hand sides. As output, it produces a multi-map mapping terms to sets of symbols. \begin{enumerate} \item Initialize $S$ to the list of all rewrite rules of the form $V.\Pi\Rightarrow V$. \item Initialize $P$ to a multi-map mapping terms to sets of symbols, initially empty. \item Sort $S$ in ascending order by the lengths of the rewrite rules' left-hand sides. The relative order of rules whose left hand sides have the same length is irrelevant. \item For each rule $V.\Pi\Rightarrow V\in S$, \begin{enumerate} \item If $V\notin P$, initialize $P[V]$ first as follows. If $P$ contains some $V''$ where $V=V'V''$, copy the symbols from $P[V'']$ to $P[V]$. When copying superclass or concrete type symbols, the substitution terms inside the symbol must be adjusted by prepending $V'$. This is the point where the algorithm relies on the sorting of rules done in Step~2. Since $|V''|<|V|$, all rules of the form $V''.\Pi\Rightarrow V''$ have already been visited by the time the algorithm can encounter a rule involving $V$. In fact, since the map is constructed in bottom-up order, it suffices to only check the \emph{longest} suffix $V''$ of $V$ such that $V''\in P$. \item Insert $\Pi$ in $P[V]$. \end{enumerate} \end{enumerate} \end{algorithm} Once the property map has been built, lookup is very simple. \begin{algorithm}[Property map lookup]\label{propmaplookupalgo} Given a type parameter $T$ and a property map $P$, this algorithm outputs the set of properties satisfied by $T$. \begin{enumerate} \item First, lower $T$ to a type term $\Lambda(T)$, and reduce this term to canonical form $\Lambda(T){\downarrow}$. \item If no suffix of $\Lambda(T){\downarrow}$ appears in $P$, return the empty set. \item Otherwise, let $\Lambda(T){\downarrow}:=UV$, where $V$ is the longest suffix of $\Lambda(T){\downarrow}$ appearing in $P$. \item Let $S:=V[P]$, the set of property symbols associated with $V$ in $P$. \item For each superclass or concrete type symbol $\Pi\in S$, prepend $U$ to every substitution term inside the symbol. \end{enumerate} \end{algorithm} Notice how in both algorithms, superclass and concrete type symbols are adjusted by prepending a prefix to each substitution. This is the same operation as described in Section~\ref{concretetypeadjust}. \begin{example}\label{propmapexample2} Recall Example~\ref{propmapexample1}, where a rewrite system was constructed from Listing~\ref{propmaplisting}. The complete rewrite system contains five rewrite rules of the form $V.\Pi\Rightarrow V$: \begin{enumerate} \item Rule~3 and Rule~4, which state that the associated types $\namesym{T}$ and $\namesym{U}$ of $\proto{P3}$ conform to $\proto{P1}$ and $\proto{P2}$, respectively. \item Rule~7 and Rule~8, which state that the associated types $\namesym{A}$ and $\namesym{B}$ of $\proto{P4}$ both conform to $\proto{P3}$. \item Rule~13, which states that the nested type $\genericparam{A}.\genericparam{T}$ of $\proto{P4}$ also conforms to $\proto{P2}$. \end{enumerate} The property map constructed by Algorithm~\ref{propmapconsalgo} from the above rules is shown in Table~\ref{propmapexample2table}. \end{example} \begin{table}\captionabove{Property map constructed from Example~\ref{propmapexample2}}\label{propmapexample2table} \begin{center} \begin{tabular}{|l|l|} \hline Key&Values\\ \hline \hline $\assocsym{P3}{T}$&$\protosym{P1}$\\ $\assocsym{P3}{U}$&$\protosym{P2}$\\ $\assocsym{P4}{A}$&$\protosym{P3}$\\ $\assocsym{P4}{B}$&$\protosym{P3}$\\ $\assocsym{P4}{A}.\assocsym{P3}{T}$&$\protosym{P1}$, $\protosym{P2}$\\ \hline \end{tabular} \end{center} \end{table} \begin{example}\label{propmapexample3} The second example explores layout, superclass and concrete type requirements. Consider the protocol definitions in Listing~\ref{propmaplisting} together with the generic signature: \[\gensig{\genericsym{0}{0}}{\genericsym{0}{0}\colon\proto{P}, \genericsym{0}{0}.\namesym{B}\colon\proto{Q}}\] The three protocols $\proto{R}$, $\proto{Q}$ and $\proto{P}$ together with the generic signature generate the following initial rewrite rules: \begin{align*} \protosym{Q}.\protosym{R}&\Rightarrow\protosym{Q}\tag{1}\\ \protosym{P}.\namesym{A}&\Rightarrow\assocsym{P}{A}\tag{2}\\ \protosym{P}.\namesym{B}&\Rightarrow\assocsym{P}{B}\tag{3}\\ \protosym{P}.\namesym{C}&\Rightarrow\assocsym{P}{C}\tag{4}\\ \assocsym{P}{A}.\layoutsym{AnyObject}&\Rightarrow\assocsym{P}{A}\tag{5}\\ \assocsym{P}{B}.\supersym{\namesym{Cache}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{A}}&\Rightarrow\assocsym{P}{B}\tag{6}\\ \assocsym{P}{B}.\layoutsym{\_NativeClass}&\Rightarrow\assocsym{P}{B}\tag{7}\\ \assocsym{P}{C}.\concretesym{\namesym{Array}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{A}}&\Rightarrow\assocsym{P}{C}\tag{8}\\ \genericsym{0}{0}.\protosym{P}&\Rightarrow\genericsym{0}{0}\tag{9}\\ \genericsym{0}{0}.\assocsym{P}{B}.\protosym{Q}&\Rightarrow\genericsym{0}{0}.\assocsym{P}{B}\tag{10} \end{align*} The Knuth-Bendix algorithm adds the following rules to make the rewrite system confluent: \begin{align*} \genericsym{0}{0}.\namesym{A}&\Rightarrow\genericsym{0}{0}.\assocsym{P}{A}\tag{11}\\ \genericsym{0}{0}.\namesym{B}&\Rightarrow\genericsym{0}{0}.\assocsym{P}{B}\tag{12}\\ \genericsym{0}{0}.\namesym{C}&\Rightarrow\genericsym{0}{0}.\assocsym{P}{C}\tag{13}\\ \genericsym{0}{0}.\assocsym{P}{B}.\protosym{R}&\Rightarrow\genericsym{0}{0}.\assocsym{P}{B}\tag{14} \end{align*} \begin{listing}\captionabove{Protocol with concrete type requirements}\label{propmaplisting} \begin{Verbatim} class Cache {} protocol R {} protocol Q : R {} protocol P { associatedtype A : AnyObject associatedtype B : Cache associatedtype C where C == Array } \end{Verbatim} \end{listing} The following rewrite rules take the form $V.\Pi\Rightarrow V$, where $\Pi$ is a property-like symbol: \begin{enumerate} \item Rule~1, which states that protocol $\proto{Q}$ inherits from $\proto{R}$. \item Rule~5, which states that the associated type $\namesym{A}$ in protocol $\proto{P}$ is represented as an $\namesym{AnyObject}$. \item Rule~6, which states that the associated type $\namesym{B}$ in protocol $\proto{P}$ must inherit from $\namesym{Cache}\langle\namesym{A}\rangle$. \item Rule~7, which states that the associated type $\namesym{B}$ in protocol $\proto{P}$ is also represented as a $\namesym{\_NativeClass}$. \item Rule~8, which states that the associated type $\namesym{C}$ in protocol $\proto{P}$ is fixed to the concrete type $\namesym{Array}\langle\namesym{A}\rangle$. \item Rule~9, which states that the generic parameter $\genericsym{0}{0}$ conforms to $\proto{P}$. \item Rule~10, which states that the type parameter $\genericsym{0}{0}.\namesym{B}$ conforms to $\proto{Q}$. \item Rule~14, which states that the type parameter $\genericsym{0}{0}.\namesym{B}$ conforms to $\proto{R}$. This final rule was added by the completion procedure to resolve the overlap of Rule~10 and Rule~1 on the term $\genericsym{0}{0}.\assocsym{P}{B}.\protosym{Q}.\protosym{R}$. \end{enumerate} When constructing the property map, sorting the rules by the length of their left hand sides guarantees that Rule~6 and Rule~7 are processed before Rule~10 and Rule~14. This is important because the subject type of Rule~6 and Rule~7 ($\assocsym{P}{B}$), is a suffix of the subject type of Rule~10 and Rule~14 ($\genericsym{0}{0}.\assocsym{P}{B}$), which means that the property map entries for both Rule~10 and Rule~14 inherit the superclass and layout requirements from Rule~6 and Rule~7. Furthermore, the substitution $\sigma_0:=\assocsym{P}{A}$ in the superclass requirement is adjusted by prepending the prefix $\genericsym{0}{0}$. The property map constructed by Algorithm~\ref{propmapconsalgo} from the above rules is shown in Table~\ref{propmapexample3table}. In the next section, you will see how this example property map can solve generic signature queries. \begin{table}\captionabove{Property map constructed from Example~\ref{propmapexample3}}\label{propmapexample3table} \begin{center} \begin{tabular}{|l|l|} \hline Key&Values\\ \hline \hline $\protosym{Q}$&$\protosym{R}$\\ $\assocsym{P}{A}$&$\layoutsym{AnyObject}$\\ $\assocsym{P}{B}$&$\supersym{\namesym{Cache}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{A}}$, $\layoutsym{\_NativeClass}$\\ $\assocsym{P}{C}$&$\concretesym{\namesym{Array}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{A}}$\\ $\genericsym{0}{0}$&$\protosym{P}$\\ $\genericsym{0}{0}.\assocsym{P}{B}$&$\protosym{Q}$, $\protosym{R}$, $\supersym{\namesym{Cache}\langle\sigma_0\rangle;\,\sigma_0:=\genericsym{0}{0}.\assocsym{P}{A}}$, $\layoutsym{\_NativeClass}$\\ \hline \end{tabular} \end{center} \end{table} \end{example} \section{Generic Signature Queries}\label{implqueries} Recall the categorization of generic signature queries into predicates, properties and canonical type queries previously shown in Section~\ref{intqueries}. The predicates can be implemented in a straightforward manner using the property map. Each predicate takes a subject type parameter $T$. Generic signature queries are always posed relative to a generic signature, and not a protocol requirement signature, hence the type parameter $T$ is lowered with the generic signature type lowering map $\Lambda\colon\namesym{Type}\rightarrow\namesym{Term}$ (Definition~\ref{lowertypeinsig}) and not a protocol type lowering map $\Lambda_{\proto{P}}\colon\namesym{Type}\rightarrow\namesym{Term}$ for some protocol $\proto{P}$ (Definition~\ref{lowertypeinproto}). The first step is to look up the set of properties satisfied by $T$ using Algorithm~\ref{propmaplookupalgo}. Then, each predicate can be tested as follows: \begin{description} \item[\texttt{requiresProtocol()}] A type parameter $T$ conforms to a protocol $\proto{P}$ if the property map entry for some suffix of $T$ stores $\protosym{P}$ for some suffix of $T$. \index{layout constraints} \index{join of layout constraints} \item[\texttt{requiresClass()}] A type parameter $T$ is represented as a retainable pointer if the property map entry for some suffix of $T$ stores a layout symbol $L$ such that $L\leq\layoutsym{AnyObject}$. The order relation comes into play because there exist layout symbols which further refine $\layoutsym{AnyObject}$, for example $\layoutsym{\_NativeClass}$, so it is not enough to check for a layout symbol exactly equal to $\layoutsym{AnyObject}$. Given two layout symbols $A$ and $B$, $A\wedge B$ is the most general symbol that satisfies both $A$ and $B$. The two elements are ordered $A\leq B$ if $A=A\wedge B$. \item[\texttt{isConcreteType()}] A type parameter $T$ is fixed to a concrete type if the property map entry for some suffix of $T$ stores a concrete type symbol. \end{description} Layout symbols store a layout constraint as an instance of the \texttt{LayoutConstraint} class. The join operation used in the implementation of the \texttt{requiresClass()} query is defined in the \texttt{merge()} method on \texttt{LayoutConstraint}. You've already seen the \texttt{requiresProtocol()} query in Chapter~\ref{protocolsasmonoids}, where it was shown that it can be implemented by checking if $\Lambda(T).\protosym{P}\downarrow\Lambda(T)$. The property map implementation is perhaps slightly more efficient, since it only simplifies a single term and not two. The $\texttt{requiresClass()}$ and $\texttt{isConcreteType()}$ queries are new on the other hand, and demonstrate the power of the property map. With the rewrite system alone, they cannot be implemented except by exhaustive enumeration over all known layout and concrete type symbols. All of the subsequent examples reference the protocol definitions from Example~\ref{propmapexample3}, and the resulting property map shown in Table~\ref{propmapexample2table}. \begin{example} Consider the canonical type term $\genericsym{0}{0}.\assocsym{P}{B}$. This type parameter conforms to $\proto{Q}$ via a requirement stated in the generic signature, and also to $\proto{R}$, because $\proto{Q}$ inherits from $\proto{R}$. The $\texttt{requiresProtocol()}$ query will confirm these facts, because the property map entry for $\genericsym{0}{0}.\assocsym{P}{B}$ contains the protocol symbols $\protosym{Q}$ and $\protosym{R}$: \begin{enumerate} \item The conformance to $\proto{Q}$ is witnessed by the rewrite rule $\genericsym{0}{0}.\assocsym{P}{B}.\protosym{Q}\Rightarrow \genericsym{0}{0}.\assocsym{P}{B}$, which is Rule~10 in Example~\ref{propmapexample2}. This is the initial rule generated by the conformance requirement. \item The conformance to $\proto{R}$ is witnessed by the rewrite rule $\genericsym{0}{0}.\assocsym{P}{B}.\protosym{R}\Rightarrow \genericsym{0}{0}.\assocsym{P}{B}$, which is Rule~14 in Example~\ref{propmapexample2}. This rule was added by the completion procedure to resolve the overlap between Rule~10 above, which states that $\genericsym{0}{0}.\assocsym{P}{B}$ conforms to $\proto{Q}$, and Rule~1, which states that anything conforming to $\proto{Q}$ also conforms to $\proto{R}$. \end{enumerate} \end{example} \begin{example} This example shows the \texttt{requiresClass()} query on two different type terms. First, consider the canonical type term $\genericsym{0}{0}.\assocsym{P}{A}$. The query returns true, because the longest suffix with an entry in the property map is $\assocsym{P}{A}$, which stores a single symbol, $\layoutsym{AnyObject}$. The corresponding rewrite rule is $\assocsym{P}{A}.\layoutsym{AnyObject}\Rightarrow\assocsym{P}{A}$, or Rule~5 in Example~\ref{propmapexample2}. This is the initial rule generated by the $\namesym{A}\colon\namesym{AnyObject}$ layout requirement in protocol $\proto{P}$. Now, consider the canonical type term $\genericsym{0}{0}.\assocsym{P}{B}$. The query also returns true. Here, the longest suffix is the entire type term, because the property map stores an entry for $\genericsym{0}{0}.\assocsym{P}{B}$ with layout symbol $\layoutsym{\_NativeClass}$. This symbol satisfies \[\layoutsym{\_NativeClass}\leq\layoutsym{AnyObject},\] because \[\layoutsym{\_NativeClass}\wedge \layoutsym{AnyObject}=\layoutsym{\_NativeClass}.\] \end{example} \begin{example} The final predicate is the \texttt{isConcreteType()} query. Consider the canonical type term $\genericsym{0}{0}.\assocsym{P}{C}$. The longest suffix that appears in the property map is $\assocsym{P}{C}$. This entry stores the concrete type symbol $\concretesym{\namesym{Array}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{A}}$, and so the query returns true. \end{example} Next, I will describe the generic signature queries that return properties of type parameters, but this requires building up a little more machinery first. The first step is to define the invariants satisfied by the list of protocols returned by the \texttt{getRequiredProtocols()} query. \begin{definition}\label{minimalproto} A list of protocols $\{\proto{P}_i\}$ is \emph{minimal} if no protocol inherits from any other protocol in the list; that is, there do not exist $i, j\in\mathbb{N}$ such that $i\neq j$ and $\proto{P}_i$ inherits from $\proto{P}_j$. The list is \emph{canonical} if it is sorted in canonical protocol order. A minimal canonical list of protocols can be constructed from an arbitrary list of protocols $P=\{\proto{P}_1,\ldots,\proto{P}_n\}$ via the following algorithm: \begin{enumerate} \item Let $G=(V, E)$ be the directed acyclic graph\footnote{Invalid code seen by the type checker can have circular protocol inheritance. The ``request evaluator'' framework in the compiler handles cycle breaking in a principled manner, so the requirement machine does not have to deal with this explicitly.} where $V$ is the set of all protocols, and an edge in $E$ connects $\proto{P}\in V$ to $\proto{Q}\in V$ if $\proto{P}$ inherits from $\proto{Q}$. \item Construct the subgraph $H\subseteq G$ generated by $P$. \item Compute the set of root nodes of $H$ (that is, the nodes with no incoming edges, or zero in-degree) to obtain the minimal set protocols of $P$. \item Sort the elements of this set using the canonical protocol order (Definition~\ref{canonicalprotocol}) to obtain the final minimal canonical list of protocols from $P$. \end{enumerate} \end{definition} The second step is to define a mapping from type terms to Swift type parameters, for use by the \texttt{getSuperclassBound()} and \texttt{getConcreteType()} queries when mapping substitutions back to Swift types. \begin{algorithm} The type lifting map $\Lambda^{-1}:\namesym{Term}\rightarrow\namesym{Type}$ takes as input a type term $T$ and maps it back to a Swift type parameter. This is the inverse of the type lowering map $\Lambda\colon\namesym{Type}\rightarrow\namesym{Term}$ from Algorithm~\ref{lowertypeinproto}. \begin{enumerate} \item Initialize $S$ to an empty type parameter. \item The first symbol of $T$ must be a generic parameter symbol $\tau_{d,i}$, which is mapped to a \texttt{GenericTypeParamType} with depth $d$ and index $i$. Set $S$ to this type. \item Any subsequent symbol in $T$ must be some associated type symbol $[\proto{P}_1\cap\ldots\cap\proto{P}_n\colon\namesym{A}]$. This symbol is mapped to a \texttt{DependentMemberType} whose base type is the previous value of $S$, and the associated type declaration is found as follows: \begin{enumerate} \item For each $\proto{P}_i$, either $\proto{P}_i$ directly defines an associated type named $\namesym{A}$, or $\namesym{A}$ was declared in some protocol $\proto{Q}$ such that $\proto{P}_i$ inherits from $\proto{Q}$. In both cases, collect all associated type declarations in a list. \item If any associated type found above is a non-root associated type declaration, replace it with its anchor (Definition~\ref{rootassoctypedef}). \item Pick the associated type declaration from the above set that is minimal with respect to the associated type order (Definition~\ref{canonicaltypeorder}). \end{enumerate} \end{enumerate} \end{algorithm} The third and final step before the queries themselves can be presented is the algorithm for mapping a superclass or concrete type symbol back to a Swift type. This algorithm uses the above type lifting map on type parameters appearing in substitutions. \begin{algorithm}[Constructing a concrete type from a symbol]\label{symboltotype} As input, this algorithm takes a superclass symbol $\supersym{\namesym{T}\colon\sigma_0,\ldots,\sigma_n}$ or concrete type symbol $\concretesym{\namesym{T}\colon\sigma_0,\ldots,\sigma_n}$. This is the inverse of Algorithm~\ref{concretesymbolcons}. \begin{enumerate} \item Let $\pi_0,\ldots,\pi_n$ be the set of positions such that $\namesym{T}|_{\pi_i}$ is a \texttt{GenericTypeParamType} with index $i$. \item For each $i$, replace $\namesym{T}|_{\pi_i}$ with $\Lambda^{-1}(\sigma_i)$, the type parameter obtained by applying the lifting map to $\sigma_i$. \item Return the final value of type $\namesym{T}$ after performing all substitutions above. \end{enumerate} \end{algorithm} Now, the time has finally come to describe the implementation of the four property queries. \begin{description} \item[\texttt{getRequiredProtocols()}] The list of protocol requirements satisfied by a type parameter $T$ is recorded in the form of protocol symbols in the property map. This list is transformed into a minimal canonical list of protocols using Definition~\ref{minimalproto}. \index{layout constraints} \index{join of layout constraints} \item[\texttt{getLayoutConstraint()}] A type parameter $T$ might be subject to multiple layout constraints, in which case the property map entry will store a list of layout constraints $L_1,\ldots,L_n$. This query needs to compute their join, which is the largest layout constraint that simultaneously satisfies all of them: \[L_1\wedge\cdots\wedge L_n.\] Some layout constraints are disjoint on concrete types, meaning their join is the uninhabited ``bottom'' layout constraint, which precedes all other layout constraints in the partial order. In this case, the original generic signature is said to have conflicting requirements. While such a signature does not violate the requirement machine's invariants, it cannot be satisfied by any valid set of concrete substitutions. Detecting and diagnosing conflicting requirements is discussed later. \item[\texttt{getSuperclassBound()}] If the type parameter $T$ does not satisfy any superclass symbols, returns the empty type. Otherwise, $T$ can be written as $T=UV$, where $V$ is the longest suffix of $T$ present in the property map. Let $\supersym{\namesym{C};\,\sigma_0,\ldots,\sigma_n}$ be a superclass symbol in $T[V]$. The first step is to adjust the symbol by prepending $U$ to each substitution $\sigma_i$, to produce the superclass symbol \[\supersym{\namesym{C};\,\sigma_0,\ldots,U\sigma_n}.\] Then, Algorithm~\ref{symboltotype} can be applied to convert the symbol to a Swift type. \item\texttt{getConcreteType()}: This query is almost identical to \texttt{getSuperclassBound()}; you can replace ``superclass symbol'' with ``concrete type symbol'' above. \end{description} Note how the \texttt{getLayoutConstraint()} query deals with a multiplicity of layout symbols by computing their join, whereas the \texttt{getSuperclassBound()} and \texttt{getConcreteType()} queries just arbitrarily pick one superclass or concrete type symbol. Indeed in Section~\ref{moreconcretetypes}, you will see that picking one symbol is not always sufficient, and a complete implementation must perform joins on superclass and concrete type symbols as well, and furthermore, a situation analogous to the uninhabited layout constraint can arise, where a type parameter can be subject to conflicting superclass or concrete type requirements. For now though, the current formulation is sufficient. Now, let's look at some examples of the four property queries. Once again, these examples use the property map shown in Table~\ref{propmapexample2table}. \begin{example} Consider the computation of the \texttt{getRequiredProtocols()} query on the canonical type term $\genericsym{0}{0}.\assocsym{P}{B}$. The property map stores the protocol symbols $\{\protosym{Q},\protosym{R}\}$, but $\proto{Q}$ inherits from $\proto{R}$, so the minimal canonical list of protocols is just $\{\protosym{Q}\}$. \end{example} \begin{example} Consider the computation of the \texttt{getSuperclassBound()} query on the canonical type term $\genericsym{0}{0}.\assocsym{P}{B}$. The superclass symbol $\supersym{\namesym{Cache}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{A}}$ does not need to be adjusted by prepending a prefix to each substitution term, because the property map entry is associated with the entire term $\genericsym{0}{0}.\assocsym{P}{B}$. Applying Algorithm~\ref{symboltotype} to the superclass symbol produces the Swift type: \[\namesym{Cache}\langle\genericsym{0}{0}.\namesym{A}\rangle\]. \end{example} \begin{example} Consider the computation of the \texttt{getConcreteType()} query on the canonical type term $\genericsym{0}{0}.\assocsym{P}{C}$. Here, the property map entry is associated with the suffix $\assocsym{P}{C}$, which means an adjustment must be performed on the concrete type symbol $\concretesym{\namesym{Array}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{A}}$. The adjusted symbol is \[\concretesym{\namesym{Array}\langle\sigma_0\rangle;\,\sigma_0:=\genericsym{0}{0}\assocsym{P}{A}}.\] Applying Algorithm~\ref{symboltotype} to the adjusted concrete type symbol produces the Swift type: \[\namesym{Array}\langle\genericsym{0}{0}.\namesym{A}\rangle.\] \end{example} \section{Canonical Types} \index{canonical anchor} \index{concrete type requirement} The canonical type queries pull everything together. \begin{description} \item[\texttt{areSameTypeParametersInContext()}] Two type parameters $T$ and $U$ are equivalent if $\Lambda(T)\downarrow\Lambda(U)$, which is true if and only if $\Lambda(T){\downarrow}=\Lambda(U){\downarrow}$. Note that this query doesn't do anything useful if either $T$ or $U$ are fixed to a concrete type. This is also the one and only generic signature query that is solved with the rewrite system alone, and not the property map, but it is presented here for completeness. \item[\texttt{isCanonicalTypeInContext()}] This query performs a series of checks on a type $T$; if any of them fail, then $T$ is not canonical and false is returned. There are two cases to consider; $T$ is either a type parameter, or a concrete type (which might possibly contain type parameters in nested positions): \begin{enumerate} \item If $T$ is a type parameter, then $T$ is a canonical type if it is both a canonical anchor, and not fixed to a concrete type. \begin{enumerate} \item Peculiarities of inherited and merged associated types mean that a type $T$ can be a canonical anchor at the \emph{type} level, even if $\Lambda(T)$ is not a canonical \emph{term}. However there is a weaker condition that relates the two notions of canonical-ness: $T$ is a canonical anchor if and only if applying the type lowering map to $T$, reducing the result, and then finally applying the type lifting map produces $T$: \[\Lambda^{-1}(\Lambda(T){\downarrow})=T.\] \item Once a type parameter $T$ is known to be a canonical anchor, checking if the \texttt{isConcreteType()} query returns false is enough to determine that it is a canonical type parameter. \end{enumerate} \item Otherwise, $T$ is a concrete type. Let $\pi_0,\ldots,\pi_n$ be the set of positions of $T$ such that $T|_{\pi_i}$ is a type parameter. Then $T$ is canonical if and only if all projections $T|_{\pi_i}$ are canonical type parameters. \end{enumerate} \item[\texttt{getCanonicalTypeInContext()}] Once again, $T$ is either a type parameter, or a concrete type. The type parameter case is described first, and the concrete type case is implemented recursively by considering all nested positions that contain type parameters. \begin{enumerate} \item If $T$ is a type parameter, the \texttt{isConcreteType()} query will determine if $T$ is fixed to a concrete type or not. \begin{enumerate} \item If $T$ is fixed to some concrete type $T'$, the canonical type of $T$ is equal to the canonical type of $T'$. This can be computed by recursively calling \texttt{getCanonicalTypeInContext()} on the result of \texttt{getConcreteType()}. \item Otherwise, $T$ is not fixed to a concrete type, which means that the canonical type of $T$ is the canonical anchor of $T$. Let $\Lambda(T)$ be the type term corresponding to $T$, and let $\Lambda(T){\downarrow}$ be the canonical form of the term $\Lambda(T)$. The canonical anchor of $T$ is $\Lambda^{-1}(\Lambda(T){\downarrow})$. \end{enumerate} \item Otherwise, $T$ is a concrete type. Let $\pi_0,\ldots,\pi_n$ be the set of positions of $T$ such that $T|_{\pi_i}$ is a type parameter. The canonical type of $T$ is the type obtained by substituting the type parameter at each position $\pi_i$ with the result of a recursive call to \texttt{getCanonicalTypeInContext()} on $T|_{\pi_i}$. \end{enumerate} \end{description} \begin{example} This example shows how protocol inheritance leads to a situation where a canonical anchor $T$ lowers to a non-canonical term $\Lambda(T)$. Consider the generic signature $\gensig{\genericsym{0}{0}}{\genericsym{0}{0}\colon\proto{P}}$ with the protocol definitions below: \begin{Verbatim} protocol Q { associatedtype A } protocol P : Q {} \end{Verbatim} The rewrite system has two associated type introduction rules, one for the declaration of $\namesym{A}$ in $\proto{Q}$, and another for the inherited type $\namesym{A}$ in $\proto{P}$: \begin{align} \protosym{Q}.\namesym{A}&\Rightarrow\assocsym{Q}{A}\tag{1}\\ \protosym{P}.\assocsym{Q}{A}&\Rightarrow \assocsym{P}{A}\tag{2} \end{align} The protocol inheritance relationship also introduces a rewrite rule: \begin{align} \protosym{P}.\protosym{Q}&\Rightarrow\protosym{P}\tag{3} \end{align} Finally, the conformance requirement in the generic signature adds the rewrite rule: \begin{align} \genericsym{0}{0}.\protosym{P}&\Rightarrow\genericsym{0}{0}\tag{4} \end{align} Resolving critical pairs adds a few additional rules: \begin{align} \protosym{P}.\namesym{A}&\Rightarrow\assocsym{P}{A}\tag{5}\\ \genericsym{0}{0}.\protosym{Q}&\Rightarrow\genericsym{0}{0}\tag{6}\\ \genericsym{0}{0}.\assocsym{Q}{A}&\Rightarrow\genericsym{0}{0}.\assocsym{P}{A}\tag{7}\\ \genericsym{0}{0}.\namesym{A}&\Rightarrow\genericsym{0}{0}.\assocsym{P}{A}\tag{8} \end{align} Now consider the type parameter $T:=\genericsym{0}{0}.\namesym{A}$. This type parameter is a canonical anchor by Definition~\ref{canonicalanchor}. Since Swift type parameters always point to an actual associated type declaration, the type term $\Lambda(T)$ is $\genericsym{0}{0}.\assocsym{Q}{A}$, and not $\genericsym{0}{0}.\assocsym{P}{A}$. However, $\genericsym{0}{0}.\assocsym{Q}{A}$ is not canonical as a term, and reduces to $\genericsym{0}{0}.\assocsym{P}{A}$ via Rule~7, therefore $T$ is a canonical anchor and yet $\Lambda(T)$ is not a canonical term. Essentially, the term $\genericsym{0}{0}.\assocsym{P}{A}$ is ``more canonical'' than any type parameter that can be output by $\Lambda:\namesym{Type}\rightarrow\namesym{Term}$. Protocol $\proto{P}$ does not actually define an associated type named $\namesym{A}$, therefore $\Lambda$ can only construct terms containing the symbol $\assocsym{Q}{A}$, and yet $\assocsym{P}{A}<\assocsym{Q}{A}$. The key invariant here though is that $\Lambda^{-1}(\genericsym{0}{0}.\assocsym{Q}{A})=\Lambda^{-1}(\genericsym{0}{0}.\assocsym{P}{A})=T$, or in other words: \[\Lambda^{-1}(\Lambda(T){\downarrow})=T.\] A similar situation arises with merged associated type symbols, which are also smaller than any ``real'' associated type symbol output by $\Lambda$. Once again, you can have a canonical type parameter $T$ whose lowered type term $\Lambda(T)$ is not canonical, but just as before, $\Lambda^{-1}$ will map both $\Lambda(T)$ and it's canonical form $\Lambda(T){\downarrow}$ back to $T$, because the only possible reduction path from $\Lambda(T)$ to $\Lambda(T){\downarrow}$ introduces merged associated type symbols, which is invariant under the type lifting map. \end{example} \begin{example} \label{concretecanonicalpropertymapex} The next example demonstrates canonical type computation in the presence of concrete types. Table~\ref{concretecanonicalpropertymap} shows the property map built from the generic signature: \[\gensig{\genericsym{0}{0}}{\genericsym{0}{0}\colon\proto{P},\,\genericsym{0}{0}.\namesym{B}==\namesym{Int}},\] together with the below protocol definition: \begin{Verbatim} protocol P { associatedtype A where A == Array associatedtype B } \end{Verbatim} \begin{table}\captionabove{Property map from Example~\ref{concretecanonicalpropertymapex}}\label{concretecanonicalpropertymap} \begin{center} \begin{tabular}{|l|l|} \hline Keys&Values\\ \hline \hline $\assocsym{P}{A}$&$\concretesym{\namesym{Array}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{B}}$\\ $\genericsym{0}{0}$&$\protosym{P}$\\ $\genericsym{0}{0}.\assocsym{P}{B}$&$\concretesym{\namesym{Int}}$\\ \hline \end{tabular} \end{center} \end{table} Consider the type parameter $T:=\genericsym{0}{0}.\namesym{A}$. This type parameter is a canonical anchor because $\Lambda(T)=\genericsym{0}{0}.\assocsym{P}{A}$ is a canonical term, however $T$ is still not a canonical type, because it is fixed to a concrete type. Therefore \texttt{isCanonicalTypeInContext()} returns false on $T$. The \texttt{getConcreteType()} query on $T$ finds that the longest suffix of $\Lambda(T)$ with a property map entry is $\assocsym{P}{A}$, and the corresponding prefix is $\genericsym{0}{0}$. This property map entry stores the concrete type symbol \[\concretesym{\namesym{Array}\langle\sigma_0\rangle;\,\sigma_0:=\assocsym{P}{B}}.\] Prepending $\genericsym{0}{0}$ to the substitution term $\sigma_0$ produces the adjusted concrete type symbol: \[\concretesym{\namesym{Array}\langle\sigma_0\rangle;\,\sigma_0:=\genericsym{0}{0}.\assocsym{P}{B}}.\] Converting this symbol to a Swift type yields $\namesym{Array}\langle\genericsym{0}{0}.\namesym{B}\rangle$. However, this is still not a canonical type, because the type parameter $\genericsym{0}{0}.\assocsym{P}{B}$ appearing in nested position is not canonical. A recursive application of \texttt{getCanonicalTypeInContext()} on the type parameter $\genericsym{0}{0}.\namesym{B}$ returns $\namesym{Int}$. Therefore, the original call to \texttt{getCanonicalTypeInContext()} on $T$ returns \[\namesym{Array}\langle\namesym{Int}\rangle.\] \end{example} \fi \begingroup \raggedright \bibliographystyle{IEEEtran} \bibliography{generics} \endgroup \printindex \end{document}