mirror of
https://github.com/apple/swift.git
synced 2025-12-21 12:14:44 +01:00
439 lines
18 KiB
ReStructuredText
439 lines
18 KiB
ReStructuredText
.. @raise litre.TestsAreMissing
|
|
.. _ABI:
|
|
|
|
The Swift ABI
|
|
=============
|
|
|
|
Hard Constraints on Resilience
|
|
------------------------------
|
|
|
|
The root of a class hierarchy must remain stable, at pain of
|
|
invalidating the metaclass hierarchy. Note a Swift class without an
|
|
explicit base class is implicitly rooted in the SwiftObject
|
|
Objective-C class.
|
|
|
|
Fragile Struct Layout
|
|
---------------------
|
|
|
|
Structs are currently laid out in declared field order, which then follows
|
|
the size and alignment conventions of LLVM on the target platform.
|
|
|
|
::
|
|
|
|
struct S { var x:Int; var y:Double } // => LLVM { i64, double }
|
|
struct S2 { var x:Char; var s:S } // => LLVM { i21, { i64, double } }
|
|
|
|
Class Layout
|
|
------------
|
|
|
|
TODO
|
|
|
|
Fragile Enum Layout
|
|
--------------------
|
|
|
|
In laying out enum types, the ABI attempts to avoid requiring additional
|
|
storage to store the tag for the enum case. The ABI chooses one of five
|
|
strategies based on the layout of the enum:
|
|
|
|
Empty Enums
|
|
````````````
|
|
|
|
In the degenerate case of an enum with no cases, the enum is an empty type.
|
|
|
|
::
|
|
|
|
enum Empty {} // => empty type
|
|
|
|
Single-Case Enums
|
|
``````````````````
|
|
|
|
In the degenerate case of an enum with a single case, there is no
|
|
discriminator needed, and the enum type has the exact same layout as its
|
|
case's data type, or is empty if the case has no data type.
|
|
|
|
::
|
|
|
|
enum EmptyCase { case X } // => empty type
|
|
enum DataCase { case Y(Int, Double) } // => LLVM { i64, double }
|
|
|
|
C-Like Enums
|
|
````````````
|
|
|
|
If none of the cases has a data type (a "C-like" enum), then the enum
|
|
is laid out as an integer tag with the minimal number of bits to contain
|
|
all of the cases. The machine-level layout of the type then follows LLVM's
|
|
data layout rules for integer types on the target platform. The cases are
|
|
assigned tag values in declaration order.
|
|
|
|
::
|
|
|
|
enum EnumLike2 { // => LLVM i1
|
|
case A // => i1 0
|
|
case B // => i1 1
|
|
}
|
|
|
|
enum EnumLike8 { // => LLVM i3
|
|
case A // => i3 0
|
|
case B // => i3 1
|
|
case C // => i3 2
|
|
case D // etc.
|
|
case E
|
|
case F
|
|
case G
|
|
case H
|
|
}
|
|
|
|
Single-Payload Enums
|
|
`````````````````````
|
|
|
|
If an enum has a single case with a data type and one or more no-data cases
|
|
(a "single-payload" enum), then the case with data type is represented using
|
|
the data type's binary representation, with added zero bits for tag if
|
|
necessary. If the data type's binary representation
|
|
has *extra inhabitants*, that is, bit patterns with the size and alignment of
|
|
the type but which do not form valid values of that type, they are used to
|
|
represent the no-data cases, with extra inhabitants in order of ascending
|
|
numeric value matching no-data cases in declaration order. The only
|
|
currently considered extra inhabitants are those that use *spare bits*
|
|
(see `Multi-Payload Enums`_) of an integer type, such as the top 11 bits of
|
|
an ``i21``. The enum value is then represented as an integer with the storage
|
|
size in bits of the data type.
|
|
|
|
::
|
|
|
|
enum CharOrSectionMarker { => LLVM i32
|
|
case Paragraph => i32 0x0020_0000
|
|
case Char(Char) => i32 (zext i21 %Char to i32)
|
|
case Chapter => i32 0x0020_0001
|
|
}
|
|
|
|
CharOrSectionMarker.Char('\x00') => i32 0x0000_0000
|
|
CharOrSectionMarker.Char('\u10FFFF') => i32 0x0010_FFFF
|
|
|
|
If the data type has no extra inhabitants, or there are not enough extra
|
|
inhabitants to represent all of the no-data cases, then a tag bit is added
|
|
to the enum's representation. The tag bit is set for the no-data cases, which
|
|
are then assigned values in the data area of the enum in declaration order.
|
|
|
|
::
|
|
|
|
enum IntOrInfinity { => LLVM { i64, i1 }
|
|
case NegInfinity => { i64, i1 } { 0, 1 }
|
|
case Int(Int) => { i64, i1 } { %Int, 0 }
|
|
case PosInfinity => { i64, i1 } { 1, 1 }
|
|
}
|
|
|
|
IntOrInfinity.Int( 0) => { i64, i1 } { 0, 0 }
|
|
IntOrInfinity.Int(20721) => { i64, i1 } { 20721, 0 }
|
|
|
|
Multi-Payload Enums
|
|
````````````````````
|
|
|
|
If an enum has more than one case with data type, then a tag is necessary to
|
|
discriminate the data types. The ABI will first try to find common
|
|
*spare bits*, that is, bits in the data types' binary representations which are
|
|
either fixed-zero or ignored by valid values of all of the data types. The tag
|
|
will be scattered into these spare bits as much as possible. Currently only
|
|
spare bits of primitive integer types, such as the high bits of an ``i21``
|
|
type, are considered. The enum data is represented as an integer with the
|
|
storage size in bits of the largest data type.
|
|
|
|
::
|
|
|
|
enum TerminalChar { => LLVM i32
|
|
case Plain(Char) => i32 (zext i21 %Plain to i32)
|
|
case Bold(Char) => i32 (or (zext i21 %Bold to i32), 0x0020_0000)
|
|
case Underline(Char) => i32 (or (zext i21 %Underline to i32), 0x0040_0000)
|
|
case Blink(Char) => i32 (or (zext i21 %Blink to i32), 0x0060_0000)
|
|
case Empty => i32 0x0080_0000
|
|
case Cursor => i32 0x0080_0001
|
|
}
|
|
|
|
If there are not enough spare bits to contain the tag, then additional bits are
|
|
added to the representation to contain the tag. Tag values are
|
|
assigned to data cases in declaration order. If there are no-data cases, they
|
|
are collected under a common tag, and assigned values in the data area of the
|
|
enum in declaration order.
|
|
|
|
::
|
|
|
|
class Bignum {}
|
|
|
|
enum IntDoubleOrBignum { => LLVM { i64, i2 }
|
|
case Int(Int) => { i64, i2 } { %Int, 0 }
|
|
case Double(Double) => { i64, i2 } { (bitcast %Double to i64), 1 }
|
|
case Bignum(Bignum) => { i64, i2 } { (ptrtoint %Bignum to i64), 2 }
|
|
}
|
|
|
|
Mangling
|
|
--------
|
|
::
|
|
|
|
mangled-name ::= '_T' global
|
|
|
|
All Swift-mangled names begin with this prefix.
|
|
|
|
::
|
|
|
|
global ::= 't' type // standalone type (for DWARF)
|
|
global ::= 'M' directness type // type metadata
|
|
global ::= 'MP' directness type // type metadata pattern
|
|
global ::= 'Mm' type // class metaclass
|
|
global ::= 'nk_' entity // protocol witness
|
|
global ::= 'w' value-witness-kind type // value witness
|
|
global ::= 'WV' type // value witness table
|
|
global ::= 'Wo' entity // witness table offset
|
|
global ::= 'Wv' directness entity // field offset
|
|
global ::= 'WP' protocol-conformance // protocol witness table
|
|
global ::= 'WZ' protocol-conformance // lazy protocol witness table accessor
|
|
global ::= 'Wz' protocol-conformance // lazy protocol witness table template
|
|
global ::= 'WD' protocol-conformance // dependent proto witness table generator
|
|
global ::= 'Wd' protocol-conformance // dependent proto witness table template
|
|
global ::= local-marker? entity // some identifiable thing
|
|
global ::= 'To' global // swift-as-ObjC thunk
|
|
global ::= 'Tb' type // swift-to-ObjC block converter
|
|
entity ::= context 'D' // deallocating destructor
|
|
entity ::= context 'd' // non-deallocating destructor
|
|
entity ::= context 'C' type // allocating constructor
|
|
entity ::= context 'c' type // non-allocating constructor
|
|
entity ::= declaration 'g' // getter
|
|
entity ::= declaration 's' // setter
|
|
entity ::= declaration 'a' // addressor
|
|
entity ::= declaration // other declaration
|
|
declaration ::= declaration-name type
|
|
declaration-name ::= context identifier
|
|
local-marker ::= 'L'
|
|
|
|
Entity manglings all start with a nominal-type-kind ([COPV]), an
|
|
identifier ([0-9oX]), or a substitution ([S]). Global manglings start
|
|
with any of those or [MTWw].
|
|
|
|
::
|
|
|
|
directness ::= 'd' // direct
|
|
directness ::= 'i' // indirect
|
|
|
|
A direct symbol resolves directly to the address of an object. An
|
|
indirect symbol resolves to the address of a pointer to the object.
|
|
They are distinct manglings to make a certain class of bugs
|
|
immediately obvious.
|
|
|
|
The terminology is slightly overloaded when discussing offsets. A
|
|
direct offset resolves to a variable holding the true offset. An
|
|
indirect offset resolves to a variable holding an offset to be applied
|
|
to type metadata to get the address of the true offset. (Offset
|
|
variables are required when the object being accessed lies within a
|
|
resilient structure. When the layout of the object may depend on
|
|
generic arguments, these offsets must be kept in metadata. Indirect
|
|
field offsets are therefore required when accessing fields in generic
|
|
types where the metadata itself has unknown layout.)
|
|
|
|
::
|
|
|
|
context ::= module
|
|
context ::= function
|
|
context ::= nominal-type
|
|
context ::= protocol-context
|
|
module ::= substitution // other substitution
|
|
module ::= identifier // module name
|
|
module ::= known-module // abbreviation
|
|
function ::= entity
|
|
|
|
type ::= 'A' natural type // fixed-size array
|
|
type ::= 'Bf' natural '_' // Builtin.Float<n>
|
|
type ::= 'Bi' natural '_' // Builtin.Int<n>
|
|
type ::= 'BO' // Builtin.ObjCPointer
|
|
type ::= 'Bo' // Builtin.ObjectPointer
|
|
type ::= 'Bp' // Builtin.RawPointer
|
|
type ::= 'Bv' natural type // Builtin.Vec<n>x<type>
|
|
type ::= nominal-type
|
|
type ::= associated-type
|
|
type ::= 'b' type type // objc block function type
|
|
type ::= 'F' type type // function type
|
|
type ::= 'f' type type // uncurried function type
|
|
type ::= 'G' type <type>+ '_' // generic type application
|
|
type ::= 'M' type // metatype
|
|
type ::= 'P' protocol-list '_' // protocol type
|
|
type ::= archetype
|
|
type ::= 'R' type // byref
|
|
type ::= 'T' tuple-element* '_' // tuple
|
|
type ::= 't' tuple-element* '_' // variadic tuple
|
|
type ::= 'U' generics '_' type // generic type
|
|
type ::= 'Xo' type // [unowned] type
|
|
type ::= 'Xw' type // [weak] type
|
|
nominal-type ::= known-nominal-type
|
|
nominal-type ::= substitution
|
|
nominal-type ::= nominal-type-kind declaration-name
|
|
nominal-type-kind ::= 'C' // class
|
|
nominal-type-kind ::= 'O' // enum
|
|
nominal-type-kind ::= 'V' // struct
|
|
archetype ::= 'Q' index // archetype with depth=0
|
|
archetype ::= 'Qd' index index // archetype with depth=M+1
|
|
archetype ::= associated-type
|
|
associated-type ::= substitution
|
|
associated-type ::= 'Q' protocol-context // self type of protocol
|
|
associated-type ::= 'Q' archetype identifier // associated type
|
|
protocol-context ::= 'P' protocol
|
|
tuple-element ::= identifier? type
|
|
|
|
<type> never begins or ends with a number.
|
|
<type> never begins with an underscore.
|
|
|
|
Note that protocols mangle differently as types and as contexts. A protocol
|
|
context always consists of a single protocol name and so mangles without a
|
|
trailing underscore. A protocol type can have zero, one, or many protocol bounds
|
|
which are juxtaposed and terminated with a trailing underscore.
|
|
|
|
::
|
|
|
|
generics ::= generic-parameter+
|
|
generic-parameter ::= protocol-list '_'
|
|
protocol-list ::= protocol*
|
|
protocol ::= substitution
|
|
protocol ::= declaration-name
|
|
|
|
<protocol-list> is unambiguous because protocols are always top-level,
|
|
so the structure is quite simple.
|
|
|
|
::
|
|
|
|
protocol-conformance ::= type protocol module
|
|
|
|
<protocol-conformance> refers to a type's conformance to a protocol. The named
|
|
module is the one containing the extension or type declaration that declared
|
|
the conformance.
|
|
|
|
::
|
|
|
|
value-witness-kind ::= 'al' // allocateBuffer
|
|
value-witness-kind ::= 'ca' // assignWithCopy
|
|
value-witness-kind ::= 'ta' // assignWithTake
|
|
value-witness-kind ::= 'de' // deallocateBuffer
|
|
value-witness-kind ::= 'xx' // destroy
|
|
value-witness-kind ::= 'XX' // destroyBuffer
|
|
value-witness-kind ::= 'CP' // initializeBufferWithCopyOfBuffer
|
|
value-witness-kind ::= 'Cp' // initializeBufferWithCopy
|
|
value-witness-kind ::= 'cp' // initializeWithCopy
|
|
value-witness-kind ::= 'Tk' // initializeBufferWithTake
|
|
value-witness-kind ::= 'tk' // initializeWithTake
|
|
value-witness-kind ::= 'pr' // projectBuffer
|
|
value-witness-kind ::= 'ty' // typeof
|
|
value-witness-kind ::= 'xs' // storeExtraInhabitant
|
|
value-witness-kind ::= 'xg' // getExtraInhabitantIndex
|
|
value-witness-kind ::= 'ug' // getEnumTag
|
|
value-witness-kind ::= 'up' // inplaceProjectEnumData
|
|
|
|
<value-witness-kind> differentiates the kinds of function value
|
|
witnesses for a type.
|
|
|
|
::
|
|
|
|
identifier ::= natural identifier-start-char identifier-char*
|
|
identifier ::= 'o' operator-fixity natural operator-char+
|
|
|
|
operator-fixity ::= 'p' // prefix operator
|
|
operator-fixity ::= 'P' // postfix operator
|
|
operator-fixity ::= 'i' // infix operator
|
|
|
|
operator-char ::= 'a' // & 'and'
|
|
operator-char ::= 'c' // @ 'commercial at'
|
|
operator-char ::= 'd' // / 'divide'
|
|
operator-char ::= 'e' // = 'equals'
|
|
operator-char ::= 'g' // > 'greater'
|
|
operator-char ::= 'l' // < 'less'
|
|
operator-char ::= 'm' // * 'multiply'
|
|
operator-char ::= 'n' // ! 'not'
|
|
operator-char ::= 'o' // | 'or'
|
|
operator-char ::= 'p' // + 'plus'
|
|
operator-char ::= 'r' // % 'remainder'
|
|
operator-char ::= 's' // - 'subtract'
|
|
operator-char ::= 't' // ~ 'tilde'
|
|
operator-char ::= 'x' // ^ 'xor'
|
|
operator-char ::= 'z' // . 'zperiod'
|
|
|
|
<identifier> is run-length encoded: the natural indicates how many
|
|
characters follow. Operator characters are mapped to letter characters as
|
|
given. In neither case can an identifier start with a digit, so
|
|
there's no ambiguity with the run-length.
|
|
|
|
::
|
|
|
|
identifier ::= 'X' natural identifier-start-char identifier-char*
|
|
identifier ::= 'X' 'o' operator-fixity natural identifier-char*
|
|
|
|
Identifiers that contain non-ASCII characters are encoded using the Punycode
|
|
algorithm specified in RFC 3492, with the modifications that ``_`` is used
|
|
as the encoding delimiter, and uppercase letters A through J are used in place
|
|
of digits 0 through 9 in the encoding character set. The mangling then
|
|
consists of an ``X`` followed by the run length of the encoded string and the
|
|
encoded string itself. For example, the identifier ``vergüenza`` is mangled
|
|
to ``X12vergenza_JFa``. (The encoding in standard Punycode would be
|
|
``vergenza-95a``)
|
|
|
|
Operators that contain non-ASCII characters are mangled by first mapping the
|
|
ASCII operator characters to letters as for pure ASCII operator names, then
|
|
Punycode-encoding the substituted string. The mangling then consists of
|
|
``Xo`` followed by the fixity, run length of the encoded string, and the encoded
|
|
string itself. For example, the infix operator ``«+»`` is mangled to
|
|
``Xoi7p_qcaDc`` (``p_qcaDc`` being the encoding of the substituted
|
|
string ``«p»``).
|
|
|
|
::
|
|
|
|
substitution ::= 'S' index
|
|
|
|
<substitution> is a back-reference to a previously mangled entity. The mangling
|
|
algorithm maintains a mapping of entities to substitution indices as it runs.
|
|
When an entity that can be represented by a substitution (a module, nominal
|
|
type, or protocol) is mangled, a substitution is first looked for in the
|
|
substitution map, and if it is present, the entity is mangled using the
|
|
associated substitution index. Otherwise, the entity is mangled normally, and
|
|
it is then added to the substitution map and associated with the next
|
|
available substitution index.
|
|
|
|
For example, in mangling a function type
|
|
``(zim.zang.zung, zim.zang.zung, zim.zippity) -> zim.zang.zoo`` (with module
|
|
``zim`` and class ``zim.zang``),
|
|
the recurring contexts ``zim``, ``zim.zang``, and ``zim.zang.zung``
|
|
will be mangled using substitutions after being mangled
|
|
for the first time. The first argument type will mangle in long form,
|
|
``CC3zim4zang4zung``, and in doing so, ``zim`` will acquire substitution ``S_``,
|
|
``zim.zang`` will acquire substitution ``S0_``, and ``zim.zang.zung`` will
|
|
acquire ``S1_``. The second argument is the same as the first and will mangle
|
|
using its substitution, ``CS1_``. The
|
|
third argument type will mangle using the substitution for ``zim``,
|
|
``CS_7zippity``. (It also acquires substitution ``S2_`` which would be used
|
|
if it mangled again.) The result type will mangle using the substitution for
|
|
``zim.zang``, ``CS0_zoo`` (and acquire substitution ``S3_``). The full
|
|
function type thus mangles as ``fTCC3zim4zang4zungCS1_CS_7zippity_CS0_zoo``.
|
|
|
|
::
|
|
|
|
known-module ::= 'So' // Objective-C
|
|
known-module ::= 'Ss' // swift
|
|
known-nominal-type ::= 'Sa' // swift.Slice
|
|
known-nominal-type ::= 'Sb' // swift.Bool
|
|
known-nominal-type ::= 'Sc' // swift.Char
|
|
known-nominal-type ::= 'Sd' // swift.Float64
|
|
known-nominal-type ::= 'Sf' // swift.Float32
|
|
known-nominal-type ::= 'Si' // swift.Int64
|
|
known-nominal-type ::= 'Sq' // swift.Optional
|
|
known-nominal-type ::= 'SS' // swift.String
|
|
known-nominal-type ::= 'Su' // swift.UInt64
|
|
|
|
<known-module> and <known-nominal-type> are built-in substitutions for
|
|
certain common entities. Like any other substitution, they all start
|
|
with 'S'.
|
|
|
|
The Objective-C module is used as the context for mangling Objective-C
|
|
classes as <type>s.
|
|
|
|
::
|
|
|
|
index ::= '_' // 0
|
|
index ::= natural '_' // N+1
|
|
natural ::= [0-9]+
|
|
|
|
<index> is a production for encoding numbers in contexts that can't
|
|
end in a digit; it's optimized for encoding smaller numbers.
|