swift-mirror/docs/ABI.rst

.. @raise litre.TestsAreMissing
.. _ABI:

The Swift ABI
=============

Hard Constraints on Resilience
------------------------------

The root of a class hierarchy must remain stable, at pain of
invalidating the metaclass hierarchy.  Note a Swift class without an
explicit base class is implicitly rooted in the SwiftObject
Objective-C class.

Fragile Struct Layout
---------------------

Structs are currently laid out in declared field order, which then follows
the size and alignment conventions of LLVM on the target platform.

::

  struct S { var x:Int; var y:Double } // => LLVM { i64, double }
  struct S2 { var x:Char; var s:S }    // => LLVM { i21, { i64, double } }

Class Layout
------------

TODO

Fragile Enum Layout
--------------------

In laying out enum types, the ABI attempts to avoid requiring additional
storage to store the tag for the enum case. The ABI chooses one of five
strategies based on the layout of the enum:

Empty Enums
````````````

In the degenerate case of an enum with no cases, the enum is an empty type.

::

  enum Empty {} // => empty type

Single-Case Enums
``````````````````

In the degenerate case of an enum with a single case, there is no
discriminator needed, and the enum type has the exact same layout as its
case's data type, or is empty if the case has no data type.

::

  enum EmptyCase { case X }             // => empty type
  enum DataCase { case Y(Int, Double) } // => LLVM { i64, double }

C-Like Enums
````````````

If none of the cases has a data type (a "C-like" enum), then the enum
is laid out as an integer tag with the minimal number of bits to contain
all of the cases. The machine-level layout of the type then follows LLVM's
data layout rules for integer types on the target platform. The cases are
assigned tag values in declaration order.

::

  enum EnumLike2 { // => LLVM i1
    case A          // => i1 0
    case B          // => i1 1
  }

  enum EnumLike8 { // => LLVM i3
    case A          // => i3 0
    case B          // => i3 1
    case C          // => i3 2
    case D          // etc.
    case E
    case F
    case G
    case H
  }

Single-Payload Enums
`````````````````````

If an enum has a single case with a data type and one or more no-data cases
(a "single-payload" enum), then the case with data type is represented using
the data type's binary representation, with added zero bits for tag if
necessary. If the data type's binary representation
has *extra inhabitants*, that is, bit patterns with the size and alignment of
the type but which do not form valid values of that type, they are used to
represent the no-data cases, with extra inhabitants in order of ascending
numeric value matching no-data cases in declaration order. The only
currently considered extra inhabitants are those that use *spare bits*
(see `Multi-Payload Enums`_) of an integer type, such as the top 11 bits of
an ``i21``. The enum value is then represented as an integer with the storage
size in bits of the data type.

::

  enum CharOrSectionMarker { => LLVM i32
    case Paragraph            => i32 0x0020_0000
    case Char(Char)           => i32 (zext i21 %Char to i32)
    case Chapter              => i32 0x0020_0001
  }

  CharOrSectionMarker.Char('\x00') => i32 0x0000_0000
  CharOrSectionMarker.Char('\u10FFFF') => i32 0x0010_FFFF

If the data type has no extra inhabitants, or there are not enough extra
inhabitants to represent all of the no-data cases, then a tag bit is added
to the enum's representation. The tag bit is set for the no-data cases, which
are then assigned values in the data area of the enum in declaration order.

::

  enum IntOrInfinity { => LLVM { i64, i1 }
    case NegInfinity    => { i64, i1 } {    0, 1 }
    case Int(Int)       => { i64, i1 } { %Int, 0 }
    case PosInfinity    => { i64, i1 } {    1, 1 }
  }

  IntOrInfinity.Int(    0) => { i64, i1 } {     0, 0 }
  IntOrInfinity.Int(20721) => { i64, i1 } { 20721, 0 }

Multi-Payload Enums
````````````````````

If an enum has more than one case with data type, then a tag is necessary to
discriminate the data types. The ABI will first try to find common
*spare bits*, that is, bits in the data types' binary representations which are
either fixed-zero or ignored by valid values of all of the data types. The tag
will be scattered into these spare bits as much as possible. Currently only
spare bits of primitive integer types, such as the high bits of an ``i21``
type, are considered. The enum data is represented as an integer with the
storage size in bits of the largest data type.

::

  enum TerminalChar {   => LLVM i32
    case Plain(Char)     => i32     (zext i21 %Plain     to i32)
    case Bold(Char)      => i32 (or (zext i21 %Bold      to i32), 0x0020_0000)
    case Underline(Char) => i32 (or (zext i21 %Underline to i32), 0x0040_0000)
    case Blink(Char)     => i32 (or (zext i21 %Blink     to i32), 0x0060_0000)
    case Empty           => i32 0x0080_0000
    case Cursor          => i32 0x0080_0001
  }

If there are not enough spare bits to contain the tag, then additional bits are
added to the representation to contain the tag. Tag values are
assigned to data cases in declaration order. If there are no-data cases, they
are collected under a common tag, and assigned values in the data area of the
enum in declaration order.

::

  class Bignum {}

  enum IntDoubleOrBignum { => LLVM { i64, i2 }
    case Int(Int)           => { i64, i2 } {           %Int,            0 }
    case Double(Double)     => { i64, i2 } { (bitcast  %Double to i64), 1 }
    case Bignum(Bignum)     => { i64, i2 } { (ptrtoint %Bignum to i64), 2 }
  }

Mangling
--------
::

  mangled-name ::= '_T' global

All Swift-mangled names begin with this prefix.

::

  global ::= 't' type                    // standalone type (for DWARF)
  global ::= 'M' directness type         // type metadata
  global ::= 'MP' directness type        // type metadata pattern
  global ::= 'Mm' type                   // class metaclass
  global ::= 'nk_' entity                // protocol witness
  global ::= 'w' value-witness-kind type // value witness
  global ::= 'WV' type                   // value witness table
  global ::= 'Wo' entity                 // witness table offset
  global ::= 'Wv' directness entity      // field offset
  global ::= 'WP' protocol-conformance   // protocol witness table
  global ::= 'WZ' protocol-conformance   // lazy protocol witness table accessor
  global ::= 'Wz' protocol-conformance   // lazy protocol witness table template
  global ::= 'WD' protocol-conformance   // dependent proto witness table generator
  global ::= 'Wd' protocol-conformance   // dependent proto witness table template
  global ::= local-marker? entity        // some identifiable thing
  global ::= 'To' global                 // swift-as-ObjC thunk
  global ::= 'Tb' type                   // swift-to-ObjC block converter
  entity ::= context 'D'                 // deallocating destructor
  entity ::= context 'd'                 // non-deallocating destructor
  entity ::= context 'C' type            // allocating constructor
  entity ::= context 'c' type            // non-allocating constructor
  entity ::= declaration 'g'             // getter
  entity ::= declaration 's'             // setter
  entity ::= declaration 'a'             // addressor
  entity ::= declaration                 // other declaration
  declaration ::= declaration-name type
  declaration-name ::= context identifier
  local-marker ::= 'L'

Entity manglings all start with a nominal-type-kind ([COPV]), an
identifier ([0-9oX]), or a substitution ([S]).  Global manglings start
with any of those or [MTWw].

::

  directness ::= 'd'                         // direct
  directness ::= 'i'                         // indirect

A direct symbol resolves directly to the address of an object.  An
indirect symbol resolves to the address of a pointer to the object.
They are distinct manglings to make a certain class of bugs
immediately obvious.

The terminology is slightly overloaded when discussing offsets.  A
direct offset resolves to a variable holding the true offset.  An
indirect offset resolves to a variable holding an offset to be applied
to type metadata to get the address of the true offset.  (Offset
variables are required when the object being accessed lies within a
resilient structure.  When the layout of the object may depend on
generic arguments, these offsets must be kept in metadata.  Indirect
field offsets are therefore required when accessing fields in generic
types where the metadata itself has unknown layout.)

::

  context ::= module
  context ::= function
  context ::= nominal-type
  context ::= protocol-context
  module ::= substitution                    // other substitution
  module ::= identifier                      // module name
  module ::= known-module                    // abbreviation
  function ::= entity

  type ::= 'A' natural type                  // fixed-size array
  type ::= 'Bf' natural '_'                  // Builtin.Float<n>
  type ::= 'Bi' natural '_'                  // Builtin.Int<n>
  type ::= 'BO'                              // Builtin.ObjCPointer
  type ::= 'Bo'                              // Builtin.ObjectPointer
  type ::= 'Bp'                              // Builtin.RawPointer
  type ::= 'Bv' natural type                 // Builtin.Vec<n>x<type>
  type ::= nominal-type
  type ::= associated-type
  type ::= 'b' type type                     // objc block function type
  type ::= 'F' type type                     // function type
  type ::= 'f' type type                     // uncurried function type
  type ::= 'G' type <type>+ '_'              // generic type application
  type ::= 'M' type                          // metatype
  type ::= 'P' protocol-list '_'             // protocol type
  type ::= archetype
  type ::= 'R' type                          // byref
  type ::= 'T' tuple-element* '_'            // tuple
  type ::= 't' tuple-element* '_'            // variadic tuple
  type ::= 'U' generics '_' type             // generic type
  type ::= 'Xo' type                         // [unowned] type
  type ::= 'Xw' type                         // [weak] type
  nominal-type ::= known-nominal-type
  nominal-type ::= substitution
  nominal-type ::= nominal-type-kind declaration-name
  nominal-type-kind ::= 'C'                  // class
  nominal-type-kind ::= 'O'                  // enum
  nominal-type-kind ::= 'V'                  // struct
  archetype ::= 'Q' index                    // archetype with depth=0
  archetype ::= 'Qd' index index             // archetype with depth=M+1
  archetype ::= associated-type
  associated-type ::= substitution
  associated-type ::= 'Q' protocol-context     // self type of protocol
  associated-type ::= 'Q' archetype identifier // associated type
  protocol-context ::= 'P' protocol
  tuple-element ::= identifier? type

<type> never begins or ends with a number.
<type> never begins with an underscore.

Note that protocols mangle differently as types and as contexts. A protocol
context always consists of a single protocol name and so mangles without a
trailing underscore. A protocol type can have zero, one, or many protocol bounds
which are juxtaposed and terminated with a trailing underscore.

::

  generics ::= generic-parameter+
  generic-parameter ::= protocol-list '_'
  protocol-list ::= protocol*
  protocol ::= substitution
  protocol ::= declaration-name

<protocol-list> is unambiguous because protocols are always top-level,
so the structure is quite simple.

::

  protocol-conformance ::= type protocol module

<protocol-conformance> refers to a type's conformance to a protocol. The named
module is the one containing the extension or type declaration that declared
the conformance.

::

  value-witness-kind ::= 'al'                // allocateBuffer
  value-witness-kind ::= 'ca'                // assignWithCopy
  value-witness-kind ::= 'ta'                // assignWithTake
  value-witness-kind ::= 'de'                // deallocateBuffer
  value-witness-kind ::= 'xx'                // destroy
  value-witness-kind ::= 'XX'                // destroyBuffer
  value-witness-kind ::= 'CP'                // initializeBufferWithCopyOfBuffer
  value-witness-kind ::= 'Cp'                // initializeBufferWithCopy
  value-witness-kind ::= 'cp'                // initializeWithCopy
  value-witness-kind ::= 'Tk'                // initializeBufferWithTake
  value-witness-kind ::= 'tk'                // initializeWithTake
  value-witness-kind ::= 'pr'                // projectBuffer
  value-witness-kind ::= 'ty'                // typeof
  value-witness-kind ::= 'xs'                // storeExtraInhabitant
  value-witness-kind ::= 'xg'                // getExtraInhabitantIndex
  value-witness-kind ::= 'ug'                // getEnumTag
  value-witness-kind ::= 'up'                // inplaceProjectEnumData

<value-witness-kind> differentiates the kinds of function value
witnesses for a type.

::

  identifier ::= natural identifier-start-char identifier-char*
  identifier ::= 'o' operator-fixity natural operator-char+

  operator-fixity ::= 'p'                    // prefix operator
  operator-fixity ::= 'P'                    // postfix operator
  operator-fixity ::= 'i'                    // infix operator

  operator-char ::= 'a'                      // & 'and'
  operator-char ::= 'c'                      // @ 'commercial at'
  operator-char ::= 'd'                      // / 'divide'
  operator-char ::= 'e'                      // = 'equals'
  operator-char ::= 'g'                      // > 'greater'
  operator-char ::= 'l'                      // < 'less'
  operator-char ::= 'm'                      // * 'multiply'
  operator-char ::= 'n'                      // ! 'not'
  operator-char ::= 'o'                      // | 'or'
  operator-char ::= 'p'                      // + 'plus'
  operator-char ::= 'r'                      // % 'remainder'
  operator-char ::= 's'                      // - 'subtract'
  operator-char ::= 't'                      // ~ 'tilde'
  operator-char ::= 'x'                      // ^ 'xor'
  operator-char ::= 'z'                      // . 'zperiod'

<identifier> is run-length encoded: the natural indicates how many
characters follow.  Operator characters are mapped to letter characters as
given. In neither case can an identifier start with a digit, so
there's no ambiguity with the run-length.

::

  identifier ::= 'X' natural identifier-start-char identifier-char*
  identifier ::= 'X' 'o' operator-fixity natural identifier-char*

Identifiers that contain non-ASCII characters are encoded using the Punycode
algorithm specified in RFC 3492, with the modifications that ``_`` is used
as the encoding delimiter, and uppercase letters A through J are used in place
of digits 0 through 9 in the encoding character set. The mangling then
consists of an ``X`` followed by the run length of the encoded string and the
encoded string itself. For example, the identifier ``vergüenza`` is mangled
to ``X12vergenza_JFa``. (The encoding in standard Punycode would be
``vergenza-95a``)

Operators that contain non-ASCII characters are mangled by first mapping the
ASCII operator characters to letters as for pure ASCII operator names, then
Punycode-encoding the substituted string. The mangling then consists of
``Xo`` followed by the fixity, run length of the encoded string, and the encoded
string itself. For example, the infix operator ``«+»`` is mangled to
``Xoi7p_qcaDc`` (``p_qcaDc`` being the encoding of the substituted
string ``«p»``).

::

  substitution ::= 'S' index

<substitution> is a back-reference to a previously mangled entity. The mangling
algorithm maintains a mapping of entities to substitution indices as it runs.
When an entity that can be represented by a substitution (a module, nominal
type, or protocol) is mangled, a substitution is first looked for in the
substitution map, and if it is present, the entity is mangled using the
associated substitution index. Otherwise, the entity is mangled normally, and
it is then added to the substitution map and associated with the next
available substitution index.

For example,  in mangling a function type
``(zim.zang.zung, zim.zang.zung, zim.zippity) -> zim.zang.zoo`` (with module
``zim`` and class ``zim.zang``),
the recurring contexts ``zim``, ``zim.zang``, and ``zim.zang.zung``
will be mangled using substitutions after being mangled
for the first time. The first argument type will mangle in long form,
``CC3zim4zang4zung``, and in doing so, ``zim`` will acquire substitution ``S_``,
``zim.zang`` will acquire substitution ``S0_``, and ``zim.zang.zung`` will
acquire ``S1_``. The second argument is the same as the first and will mangle
using its substitution, ``CS1_``. The
third argument type will mangle using the substitution for ``zim``,
``CS_7zippity``. (It also acquires substitution ``S2_`` which would be used
if it mangled again.) The result type will mangle using the substitution for
``zim.zang``, ``CS0_zoo`` (and acquire substitution ``S3_``). The full
function type thus mangles as ``fTCC3zim4zang4zungCS1_CS_7zippity_CS0_zoo``.

::

  known-module ::= 'So'                      // Objective-C
  known-module ::= 'Ss'                      // swift
  known-nominal-type ::= 'Sa'                // swift.Slice
  known-nominal-type ::= 'Sb'                // swift.Bool
  known-nominal-type ::= 'Sc'                // swift.Char
  known-nominal-type ::= 'Sd'                // swift.Float64
  known-nominal-type ::= 'Sf'                // swift.Float32
  known-nominal-type ::= 'Si'                // swift.Int64
  known-nominal-type ::= 'Sq'                // swift.Optional
  known-nominal-type ::= 'SS'                // swift.String
  known-nominal-type ::= 'Su'                // swift.UInt64

<known-module> and <known-nominal-type> are built-in substitutions for
certain common entities.  Like any other substitution, they all start
with 'S'.

The Objective-C module is used as the context for mangling Objective-C
classes as <type>s.

::

  index ::= '_'                              // 0
  index ::= natural '_'                      // N+1
  natural ::= [0-9]+

<index> is a production for encoding numbers in contexts that can't
end in a digit; it's optimized for encoding smaller numbers.