Files
swift-mirror/docs/ABI.rst
Joe Groff e109124186 Replace 'union' keyword with 'enum'.
This only touches the compiler and tests. Doc updates to follow.

Swift SVN r8478
2013-09-20 01:33:14 +00:00

439 lines
18 KiB
ReStructuredText

.. @raise litre.TestsAreMissing
.. _ABI:
The Swift ABI
=============
Hard Constraints on Resilience
------------------------------
The root of a class hierarchy must remain stable, at pain of
invalidating the metaclass hierarchy. Note a Swift class without an
explicit base class is implicitly rooted in the SwiftObject
Objective-C class.
Fragile Struct Layout
---------------------
Structs are currently laid out in declared field order, which then follows
the size and alignment conventions of LLVM on the target platform.
::
struct S { var x:Int; var y:Double } // => LLVM { i64, double }
struct S2 { var x:Char; var s:S } // => LLVM { i21, { i64, double } }
Class Layout
------------
TODO
Fragile Enum Layout
--------------------
In laying out enum types, the ABI attempts to avoid requiring additional
storage to store the tag for the enum case. The ABI chooses one of five
strategies based on the layout of the enum:
Empty Enums
````````````
In the degenerate case of an enum with no cases, the enum is an empty type.
::
enum Empty {} // => empty type
Single-Case Enums
``````````````````
In the degenerate case of an enum with a single case, there is no
discriminator needed, and the enum type has the exact same layout as its
case's data type, or is empty if the case has no data type.
::
enum EmptyCase { case X } // => empty type
enum DataCase { case Y(Int, Double) } // => LLVM { i64, double }
C-Like Enums
````````````
If none of the cases has a data type (a "C-like" enum), then the enum
is laid out as an integer tag with the minimal number of bits to contain
all of the cases. The machine-level layout of the type then follows LLVM's
data layout rules for integer types on the target platform. The cases are
assigned tag values in declaration order.
::
enum EnumLike2 { // => LLVM i1
case A // => i1 0
case B // => i1 1
}
enum EnumLike8 { // => LLVM i3
case A // => i3 0
case B // => i3 1
case C // => i3 2
case D // etc.
case E
case F
case G
case H
}
Single-Payload Enums
`````````````````````
If an enum has a single case with a data type and one or more no-data cases
(a "single-payload" enum), then the case with data type is represented using
the data type's binary representation, with added zero bits for tag if
necessary. If the data type's binary representation
has *extra inhabitants*, that is, bit patterns with the size and alignment of
the type but which do not form valid values of that type, they are used to
represent the no-data cases, with extra inhabitants in order of ascending
numeric value matching no-data cases in declaration order. The only
currently considered extra inhabitants are those that use *spare bits*
(see `Multi-Payload Enums`_) of an integer type, such as the top 11 bits of
an ``i21``. The enum value is then represented as an integer with the storage
size in bits of the data type.
::
enum CharOrSectionMarker { => LLVM i32
case Paragraph => i32 0x0020_0000
case Char(Char) => i32 (zext i21 %Char to i32)
case Chapter => i32 0x0020_0001
}
CharOrSectionMarker.Char('\x00') => i32 0x0000_0000
CharOrSectionMarker.Char('\u10FFFF') => i32 0x0010_FFFF
If the data type has no extra inhabitants, or there are not enough extra
inhabitants to represent all of the no-data cases, then a tag bit is added
to the enum's representation. The tag bit is set for the no-data cases, which
are then assigned values in the data area of the enum in declaration order.
::
enum IntOrInfinity { => LLVM { i64, i1 }
case NegInfinity => { i64, i1 } { 0, 1 }
case Int(Int) => { i64, i1 } { %Int, 0 }
case PosInfinity => { i64, i1 } { 1, 1 }
}
IntOrInfinity.Int( 0) => { i64, i1 } { 0, 0 }
IntOrInfinity.Int(20721) => { i64, i1 } { 20721, 0 }
Multi-Payload Enums
````````````````````
If an enum has more than one case with data type, then a tag is necessary to
discriminate the data types. The ABI will first try to find common
*spare bits*, that is, bits in the data types' binary representations which are
either fixed-zero or ignored by valid values of all of the data types. The tag
will be scattered into these spare bits as much as possible. Currently only
spare bits of primitive integer types, such as the high bits of an ``i21``
type, are considered. The enum data is represented as an integer with the
storage size in bits of the largest data type.
::
enum TerminalChar { => LLVM i32
case Plain(Char) => i32 (zext i21 %Plain to i32)
case Bold(Char) => i32 (or (zext i21 %Bold to i32), 0x0020_0000)
case Underline(Char) => i32 (or (zext i21 %Underline to i32), 0x0040_0000)
case Blink(Char) => i32 (or (zext i21 %Blink to i32), 0x0060_0000)
case Empty => i32 0x0080_0000
case Cursor => i32 0x0080_0001
}
If there are not enough spare bits to contain the tag, then additional bits are
added to the representation to contain the tag. Tag values are
assigned to data cases in declaration order. If there are no-data cases, they
are collected under a common tag, and assigned values in the data area of the
enum in declaration order.
::
class Bignum {}
enum IntDoubleOrBignum { => LLVM { i64, i2 }
case Int(Int) => { i64, i2 } { %Int, 0 }
case Double(Double) => { i64, i2 } { (bitcast %Double to i64), 1 }
case Bignum(Bignum) => { i64, i2 } { (ptrtoint %Bignum to i64), 2 }
}
Mangling
--------
::
mangled-name ::= '_T' global
All Swift-mangled names begin with this prefix.
::
global ::= 't' type // standalone type (for DWARF)
global ::= 'M' directness type // type metadata
global ::= 'MP' directness type // type metadata pattern
global ::= 'Mm' type // class metaclass
global ::= 'nk_' entity // protocol witness
global ::= 'w' value-witness-kind type // value witness
global ::= 'WV' type // value witness table
global ::= 'Wo' entity // witness table offset
global ::= 'Wv' directness entity // field offset
global ::= 'WP' protocol-conformance // protocol witness table
global ::= 'WZ' protocol-conformance // lazy protocol witness table accessor
global ::= 'Wz' protocol-conformance // lazy protocol witness table template
global ::= 'WD' protocol-conformance // dependent proto witness table generator
global ::= 'Wd' protocol-conformance // dependent proto witness table template
global ::= local-marker? entity // some identifiable thing
global ::= 'To' global // swift-as-ObjC thunk
global ::= 'Tb' type // swift-to-ObjC block converter
entity ::= context 'D' // deallocating destructor
entity ::= context 'd' // non-deallocating destructor
entity ::= context 'C' type // allocating constructor
entity ::= context 'c' type // non-allocating constructor
entity ::= declaration 'g' // getter
entity ::= declaration 's' // setter
entity ::= declaration 'a' // addressor
entity ::= declaration // other declaration
declaration ::= declaration-name type
declaration-name ::= context identifier
local-marker ::= 'L'
Entity manglings all start with a nominal-type-kind ([COPV]), an
identifier ([0-9oX]), or a substitution ([S]). Global manglings start
with any of those or [MTWw].
::
directness ::= 'd' // direct
directness ::= 'i' // indirect
A direct symbol resolves directly to the address of an object. An
indirect symbol resolves to the address of a pointer to the object.
They are distinct manglings to make a certain class of bugs
immediately obvious.
The terminology is slightly overloaded when discussing offsets. A
direct offset resolves to a variable holding the true offset. An
indirect offset resolves to a variable holding an offset to be applied
to type metadata to get the address of the true offset. (Offset
variables are required when the object being accessed lies within a
resilient structure. When the layout of the object may depend on
generic arguments, these offsets must be kept in metadata. Indirect
field offsets are therefore required when accessing fields in generic
types where the metadata itself has unknown layout.)
::
context ::= module
context ::= function
context ::= nominal-type
context ::= protocol-context
module ::= substitution // other substitution
module ::= identifier // module name
module ::= known-module // abbreviation
function ::= entity
type ::= 'A' natural type // fixed-size array
type ::= 'Bf' natural '_' // Builtin.Float<n>
type ::= 'Bi' natural '_' // Builtin.Int<n>
type ::= 'BO' // Builtin.ObjCPointer
type ::= 'Bo' // Builtin.ObjectPointer
type ::= 'Bp' // Builtin.RawPointer
type ::= 'Bv' natural type // Builtin.Vec<n>x<type>
type ::= nominal-type
type ::= associated-type
type ::= 'b' type type // objc block function type
type ::= 'F' type type // function type
type ::= 'f' type type // uncurried function type
type ::= 'G' type <type>+ '_' // generic type application
type ::= 'M' type // metatype
type ::= 'P' protocol-list '_' // protocol type
type ::= archetype
type ::= 'R' type // byref
type ::= 'T' tuple-element* '_' // tuple
type ::= 't' tuple-element* '_' // variadic tuple
type ::= 'U' generics '_' type // generic type
type ::= 'Xo' type // [unowned] type
type ::= 'Xw' type // [weak] type
nominal-type ::= known-nominal-type
nominal-type ::= substitution
nominal-type ::= nominal-type-kind declaration-name
nominal-type-kind ::= 'C' // class
nominal-type-kind ::= 'O' // enum
nominal-type-kind ::= 'V' // struct
archetype ::= 'Q' index // archetype with depth=0
archetype ::= 'Qd' index index // archetype with depth=M+1
archetype ::= associated-type
associated-type ::= substitution
associated-type ::= 'Q' protocol-context // self type of protocol
associated-type ::= 'Q' archetype identifier // associated type
protocol-context ::= 'P' protocol
tuple-element ::= identifier? type
<type> never begins or ends with a number.
<type> never begins with an underscore.
Note that protocols mangle differently as types and as contexts. A protocol
context always consists of a single protocol name and so mangles without a
trailing underscore. A protocol type can have zero, one, or many protocol bounds
which are juxtaposed and terminated with a trailing underscore.
::
generics ::= generic-parameter+
generic-parameter ::= protocol-list '_'
protocol-list ::= protocol*
protocol ::= substitution
protocol ::= declaration-name
<protocol-list> is unambiguous because protocols are always top-level,
so the structure is quite simple.
::
protocol-conformance ::= type protocol module
<protocol-conformance> refers to a type's conformance to a protocol. The named
module is the one containing the extension or type declaration that declared
the conformance.
::
value-witness-kind ::= 'al' // allocateBuffer
value-witness-kind ::= 'ca' // assignWithCopy
value-witness-kind ::= 'ta' // assignWithTake
value-witness-kind ::= 'de' // deallocateBuffer
value-witness-kind ::= 'xx' // destroy
value-witness-kind ::= 'XX' // destroyBuffer
value-witness-kind ::= 'CP' // initializeBufferWithCopyOfBuffer
value-witness-kind ::= 'Cp' // initializeBufferWithCopy
value-witness-kind ::= 'cp' // initializeWithCopy
value-witness-kind ::= 'Tk' // initializeBufferWithTake
value-witness-kind ::= 'tk' // initializeWithTake
value-witness-kind ::= 'pr' // projectBuffer
value-witness-kind ::= 'ty' // typeof
value-witness-kind ::= 'xs' // storeExtraInhabitant
value-witness-kind ::= 'xg' // getExtraInhabitantIndex
value-witness-kind ::= 'ug' // getEnumTag
value-witness-kind ::= 'up' // inplaceProjectEnumData
<value-witness-kind> differentiates the kinds of function value
witnesses for a type.
::
identifier ::= natural identifier-start-char identifier-char*
identifier ::= 'o' operator-fixity natural operator-char+
operator-fixity ::= 'p' // prefix operator
operator-fixity ::= 'P' // postfix operator
operator-fixity ::= 'i' // infix operator
operator-char ::= 'a' // & 'and'
operator-char ::= 'c' // @ 'commercial at'
operator-char ::= 'd' // / 'divide'
operator-char ::= 'e' // = 'equals'
operator-char ::= 'g' // > 'greater'
operator-char ::= 'l' // < 'less'
operator-char ::= 'm' // * 'multiply'
operator-char ::= 'n' // ! 'not'
operator-char ::= 'o' // | 'or'
operator-char ::= 'p' // + 'plus'
operator-char ::= 'r' // % 'remainder'
operator-char ::= 's' // - 'subtract'
operator-char ::= 't' // ~ 'tilde'
operator-char ::= 'x' // ^ 'xor'
operator-char ::= 'z' // . 'zperiod'
<identifier> is run-length encoded: the natural indicates how many
characters follow. Operator characters are mapped to letter characters as
given. In neither case can an identifier start with a digit, so
there's no ambiguity with the run-length.
::
identifier ::= 'X' natural identifier-start-char identifier-char*
identifier ::= 'X' 'o' operator-fixity natural identifier-char*
Identifiers that contain non-ASCII characters are encoded using the Punycode
algorithm specified in RFC 3492, with the modifications that ``_`` is used
as the encoding delimiter, and uppercase letters A through J are used in place
of digits 0 through 9 in the encoding character set. The mangling then
consists of an ``X`` followed by the run length of the encoded string and the
encoded string itself. For example, the identifier ``vergüenza`` is mangled
to ``X12vergenza_JFa``. (The encoding in standard Punycode would be
``vergenza-95a``)
Operators that contain non-ASCII characters are mangled by first mapping the
ASCII operator characters to letters as for pure ASCII operator names, then
Punycode-encoding the substituted string. The mangling then consists of
``Xo`` followed by the fixity, run length of the encoded string, and the encoded
string itself. For example, the infix operator ``«+»`` is mangled to
``Xoi7p_qcaDc`` (``p_qcaDc`` being the encoding of the substituted
string ``«p»``).
::
substitution ::= 'S' index
<substitution> is a back-reference to a previously mangled entity. The mangling
algorithm maintains a mapping of entities to substitution indices as it runs.
When an entity that can be represented by a substitution (a module, nominal
type, or protocol) is mangled, a substitution is first looked for in the
substitution map, and if it is present, the entity is mangled using the
associated substitution index. Otherwise, the entity is mangled normally, and
it is then added to the substitution map and associated with the next
available substitution index.
For example, in mangling a function type
``(zim.zang.zung, zim.zang.zung, zim.zippity) -> zim.zang.zoo`` (with module
``zim`` and class ``zim.zang``),
the recurring contexts ``zim``, ``zim.zang``, and ``zim.zang.zung``
will be mangled using substitutions after being mangled
for the first time. The first argument type will mangle in long form,
``CC3zim4zang4zung``, and in doing so, ``zim`` will acquire substitution ``S_``,
``zim.zang`` will acquire substitution ``S0_``, and ``zim.zang.zung`` will
acquire ``S1_``. The second argument is the same as the first and will mangle
using its substitution, ``CS1_``. The
third argument type will mangle using the substitution for ``zim``,
``CS_7zippity``. (It also acquires substitution ``S2_`` which would be used
if it mangled again.) The result type will mangle using the substitution for
``zim.zang``, ``CS0_zoo`` (and acquire substitution ``S3_``). The full
function type thus mangles as ``fTCC3zim4zang4zungCS1_CS_7zippity_CS0_zoo``.
::
known-module ::= 'So' // Objective-C
known-module ::= 'Ss' // swift
known-nominal-type ::= 'Sa' // swift.Slice
known-nominal-type ::= 'Sb' // swift.Bool
known-nominal-type ::= 'Sc' // swift.Char
known-nominal-type ::= 'Sd' // swift.Float64
known-nominal-type ::= 'Sf' // swift.Float32
known-nominal-type ::= 'Si' // swift.Int64
known-nominal-type ::= 'Sq' // swift.Optional
known-nominal-type ::= 'SS' // swift.String
known-nominal-type ::= 'Su' // swift.UInt64
<known-module> and <known-nominal-type> are built-in substitutions for
certain common entities. Like any other substitution, they all start
with 'S'.
The Objective-C module is used as the context for mangling Objective-C
classes as <type>s.
::
index ::= '_' // 0
index ::= natural '_' // N+1
natural ::= [0-9]+
<index> is a production for encoding numbers in contexts that can't
end in a digit; it's optimized for encoding smaller numbers.