swift-mirror/docs/ABI/CallingConvention.rst

:orphan:

.. _CallingConvention:

The Swift Calling Convention
****************************

.. contents::

This whitepaper discusses the Swift calling convention, at least as we
want it to be.

It's a basic assumption in this paper that Swift shouldn't make an
implicit promise to exactly match the default platform calling
convention.  That is, if a C or Objective-C programmer manages to derive the
address of a Swift function, we don't have to promise that an obvious
translation of the type of that function will be correctly callable
from C.  For example, this wouldn't be guaranteed to work::

  // In Swift:
  func foo(_ x: Int, y: Double) -> MyClass { ... }

  // In Objective-C:
  extern id _TF4main3fooFTSiSd_CS_7MyClass(intptr_t x, double y);

We do sometimes need to be able to match C conventions, both to use
them and to generate implementations of them, but that level of
compatibility should be opt-in and site-specific.  If Swift would
benefit from internally using a better convention than C/Objective-C uses,
and switching to that convention doesn't damage the dynamic abilities
of our target platforms (debugging, dtrace, stack traces, unwinding,
etc.), there should be nothing preventing us from doing so.  (If we
did want to guarantee compatibility on this level, this paper would be
a lot shorter!)

Function call rules in high-level languages have three major
components, each operating on a different abstraction level:

* the high-level semantics of the call (pass-by-reference
  vs. pass-by-value),

* the ownership and validity conventions about argument and result
  values ("+0" vs. "+1", etc.), and

* the "physical" representation conventions of how values are actually
  communicated between functions (in registers, on the stack, etc.).

We'll tackle each of these in turn, then conclude with a detailed
discussion of function signature lowering.

High-level semantic conventions
===============================

The major division in argument passing conventions between languages
is between pass-by-reference and pass-by-value languages.  It's a
distinction that only really makes sense in languages with the concept
of an l-value, but Swift does, so it's pertinent.

In general, the terms "pass-by-X" and "call-by-X" are used
interchangeably.  It's unfortunate, because these conventions are
argument specific, and functions can be passed multiple arguments
that are each handled in a different way.  As such, we'll prefer
"pass-by-X" for consistency and to emphasize that these conventions
are argument-specific.

Pass-by-reference
-----------------

In pass-by-reference (also called pass-by-name or pass-by-address), if
`A` is an l-value expression, `foo(A)` is passed some sort of opaque
reference through which the original l-value can be modified.  If `A`
is not an l-value, the language may prohibit this, or (if
pass-by-reference is the default convention) it may pass a temporary
variable containing the result of `A`.

Don't confuse pass-by-reference with the concept of a *reference
type*.  A reference type is a type whose value is a reference to a
different object; for example, a pointer type in C, or a class type in
Java or Swift.  A variable of reference type can be passed by value
(copying the reference itself) or by reference (passing the variable
itself, allowing it to be changed to refer to a different object).
Note that references in C++ are a generalization of pass-by-reference,
not really a reference type; in C++, a variable of reference type
behaves completely unlike any other variable in the language.

Also, don't confuse pass-by-reference with the physical convention of
passing an argument value indirectly.  In pass-by-reference, what's
logically being passed is a reference to a tangible, user-accessible
object; changes to the original object will be visible in the
reference, and changes to the reference will be reflected in the
original object.  In an indirect physical convention, the argument is
still logically an independent value, no longer associated with the
original object (if there was one).

If every object in the language is stored in addressable memory,
pass-by-reference can be easily implemented by simply passing the
address of the object.  If an l-value can have more structure than
just a single, independently-addressable object, more information may
be required from the caller.  For example, an array argument in
FORTRAN can be a row or column vector from a matrix, and so arrays are
generally passed as both an address and a stride.  C and C++ do have
unaddressable l-values because of bitfields, but they forbid passing
bitfields by reference (in C++) or taking their address (in either
language), which greatly simplifies pointer and reference types in
those languages.

FORTRAN is the last remaining example of a language that defaults to
pass-by-reference.  Early FORTRAN implementations famously passed
constants by passing the address of mutable global memory initialized
to the constant; if the callee modified its parameter (illegal under
the standard, but...), it literally changed the constant for future
uses.  FORTRAN now allows procedures to explicitly take arguments by
value and explicitly declare that arguments must be l-values.

However, many languages do allow parameters to be explicitly marked as
pass-by-reference.  As mentioned for C++, sometimes only certain kinds
of l-values are allowed.

Swift allows parameters to be marked as pass-by-reference with
`inout`.  Arbitrary l-values can be passed.  The Swift convention is
to always pass an address; if the parameter is not addressable, it
must be materialized into a temporary and then written back.  See the
accessors proposal for more details about the high-level semantics of
`inout` arguments.

Pass-by-value
-------------

In pass-by-value, if `A` is an l-value expression, `foo(A)` copies the
current value there.  Any modifications `foo` makes to its parameter
are made to this copy, not to the original l-value.

Most modern languages are pass-by-value, with specific functions able
to opt in to pass-by-reference semantics.  This is exactly what Swift
does.

There's not much room for variation in the high-level semantics of
passing arguments by value; all the variation is in the ownership and
physical conventions.

Ownership transfer conventions
==============================

Arguments and results that require cleanup, like an Objective-C object
reference or a non-POD C++ object, raise two questions about
responsibility: who is responsible for cleaning it up, and when?

These questions arise even when the cleanup is explicit in code.  C's
`strdup` function returns newly-allocated memory which the caller is
responsible for freeing, but `strtok` does not.  Objective-C has
standard naming conventions that describe which functions return
objects that the caller is responsible for releasing, and outside of
ARC these must be followed manually.  Of course, conventions designed
to be implemented by programmers are often designed around the
simplicity of that implementation, rather than necessarily being more
efficient.

Pass-by-reference arguments
---------------------------

Pass-by-reference arguments generally don't involve a *transfer* of
ownership.  It's assumed that the caller will ensure that the referent
is valid at the time of the call, and that the callee will ensure that
the referent is still valid at the time of return.

FORTRAN does actually allow parameters to be tagged as out-parameters,
where the caller doesn't guarantee the validity of the argument before
the call.  Objective-C has something similar, where an indirect method
argument can be marked `out`; ARC takes advantage of this with
autoreleasing parameters to avoid a copy into the writeback temporary.
Neither of these are something we semantically care about supporting
in Swift.

There is one other theoretically interesting convention question here:
the argument has to be valid before the call and after the call, but
does it have to valid during the call?  Swift's answer to this is
generally "yes".  Swift does have `inout` aliasing rules that allow a
certain amount of optimization, but the compiler is forbidden from
exploiting these rules in any way that could cause memory corruption
(at least in the absence of race conditions).  So Swift has to ensure
that an `inout` argument is valid whenever it does something
(including calling an opaque function) that could potentially access
the original l-value.

If Swift allowed local variables to be captured through `inout`
parameters, and therefore needed to pass an implicit owner parameter
along with an address, this owner parameter would behave like a
pass-by-value argument and could use any of the conventions listed
below.  However, the optimal convention for this is obvious: it should
be `guaranteed`, since captures are very unlikely and callers are
almost always expected to use the value of an `inout` variable
afterwards.

Pass-by-value arguments
-----------------------

All conventions for this have performance trade-offs.

We're only going to discuss *static* conventions, where the transfer
is picked at compile time.  It's possible to have a *dynamic*
convention, where the caller passes a flag indicating whether it's
okay to directly take responsibility for the value, and the callee can
(conceptually) return a flag indicating whether it actually did take
responsibility for it.  If copying is extremely expensive, that can be
worthwhile; otherwise, the code cost may overwhelm any other benefits.

This discussion will ignore one particular impact of these conventions
on code size.  If a function has many callers, conventions that
require more code in the caller are worse, all else aside.  If a
single call site has many possible targets, conventions that require
more code in the callee are worse, all else aside.  It's not really
reasonable to decide this in advance for unknown code; we could maybe
make rules about code calling system APIs, except that system APIs are
by definition locked down, and we can't change them.  It's a
reasonable thing to consider changing with PGO, though.

Responsibility
~~~~~~~~~~~~~~

A common refrain in this performance analysis will be whether a
function has responsibility for a value.  A function has to get a
value from *somewhere*:

* A caller is usually responsible for the return values it receives:
  the callee generated the value and the caller is responsible for
  destroying it.  Any other convention has to rely on heavily
  restricting what kind of value can be returned.  (If you're thinking
  about Objective-C autoreleased results, just accept this for now;
  we'll talk about that later.)

* A function isn't necessarily responsible for a value it loads from
  memory.  Ignoring race conditions, the function may be able to
  immediately use the value without taking any specific action to keep
  it valid.

* A callee may or may not be responsible for a value passed as a
  parameter, depending on the convention it was passed with.

* A function might come from a source that doesn't necessarily make
  the function responsible, but if the function takes an action which
  invalidates the source before using the value, the function has to
  take action to keep the value valid.  At that point, the function
  has responsibility for the value despite its original source.

  For example, a function `foo()` might load a reference `r` from a
  global variable `x`, call an unknown function `bar()`, and then use
  `r` in some way.  If `bar()` can't possibly overwrite `x`, `foo()`
  doesn't have to do anything to keep `r` alive across the call;
  otherwise it does (e.g. by retaining it in a refcounted
  environment).  This is a situation where humans are often much
  smarter than compilers.  Of course, it's also a situation where
  humans are sometimes insufficiently conservative.

A function may also require responsibility for a value as part of its
operation:

* Since a variable is always responsible for the current value it
  stores, a function which stores a value into memory must first gain
  responsibility for that value.

* A callee normally transfers responsibility for its return value to
  its caller; therefore it must gain responsibility for its return
  value before returning it.

* A caller may need to gain responsibility for a value before passing
  it as an argument, depending on the parameter's ownership-transfer
  convention.

Known conventions
~~~~~~~~~~~~~~~~~

There are three static parameter conventions for ownership worth
considering here:

* The caller may transfer responsibility for the value to the callee.
  In SIL, we call this an **owned** parameter.

  This is optimal if the caller has responsibility for the value and
  doesn't need it after the call.  This is an extremely common
  situation; for example, it comes up whenever a call result is
  immediately used as an argument.  By giving the callee responsibility
  for the value, this convention allows the callee to use the value at
  a later point without taking any extra action to keep it alive.

  The flip side is that this convention requires a lot of extra work
  when a single value is used multiple times in the caller.  For
  example, a value passed in every iteration of a loop will need to be
  copied/retained/whatever each time.

* The caller may provide the value without any responsibility on
  either side.  In SIL, we call this an **unowned** parameter.  The
  value is guaranteed to be valid at the moment of the call, and in
  the absence of race conditions, that guarantee can be assumed to
  continue unless the callee does something that might invalidate it.
  As discussed above, humans are often much smarter than computers
  about knowing when that's possible.

  This is optimal if the caller can acquire the value without
  responsibility and the callee doesn't require responsibility of it.
  In very simple code --- e.g., loading values from an array and
  passing them to a comparator function which just reads a few fields
  from each and returns --- this can be extremely efficient.

  Unfortunately, this convention is completely undermined if either
  side has to do anything that forces it to take action to keep the
  value alive.  Also, if that happens on the caller side, the
  convention can keep values alive longer than is necessary.  It's
  very easy for both sides of the convention to end up doing extra
  work because of this.

* The caller may assert responsibility for the value.  In SIL, we call
  this a **guaranteed** parameter.  The callee can rely on the value
  staying valid for the duration of the call.

  This is optimal if the caller needs to use the value after the call
  and either has responsibility for it or has a guarantee like this
  for it.  Therefore, this convention is particularly nice when a
  value is likely to be forwarded by value a great deal.

  However, this convention does generally keep values alive longer
  than is necessary, since the outermost function which passed it as
  an argument will generally be forced to hold a reference for the
  duration.  By the same mechanism, in refcounted systems, this
  convention tends to cause values to have multiple retains active at
  once; for example, if a copy-on-write array is created in one
  function, passed to another, stored in a mutable variable, and then
  modified, the callee will see a reference count of 2 and be forced
  to do a structural copy.  This can occur even if the caller
  literally constructed the array for the sole and immediate purpose
  of passing it to the callee.

Analysis
~~~~~~~~

Objective-C generally uses the unowned convention for object-pointer
parameters.  It is possible to mark a parameter as being consumed,
which is basically the owned convention.  As a special case, in ARC we
assume that callers are responsible for keeping `self` values alive
(including in blocks), which is effectively the `guaranteed`
convention.

`unowned` causes a lot of problems without really solving any, in my
experience looking at ARC-generated code and optimizer output.  A
human can take advantage of it, but the compiler is so frequently
blocked.  There are many common idioms (like chains of functions that
just add default arguments at each step) have really awful performance
because the compiler is adding retains and releases at every single
level.  It's just not a good convention to adopt by default.  However,
we might want to consider allowing specific function parameters to opt
into it; sort comparators are a particularly interesting candidate
for this.  `unowned` is very similar to C++'s `const &` for things
like that.

`guaranteed` is good for some things, but it causes a lot of silly
code bloat when values are really only used in one place, which is
quite common.  The liveness / refcounting issues are also pretty
problematic.  But there is one example that's very nice for
`guaranteed`: `self`.  It's quite common for clients of a type to call
multiple methods on a single value, or for methods to dispatch to
multiple other methods, which are exactly the situations where
`guaranteed` excels.  And it's relatively uncommon (but not
unimaginable) for a non-mutating method on a copy-on-write struct to
suddenly store `self` aside and start mutating that copy.

`owned` is a good default for other parameters.  It has some minor
performance disadvantages (unnecessary retains if you have an
unoptimizable call in a loop) and some minor code size benefits (in
common straight-line code), but frankly, both of those points pale in
importance to the ability to transfer copy-on-write structures around
without spuriously increasing reference counts.  It doesn't take too
many unnecessary structural copies before any amount of
reference-counting traffic (especially the Swift-native
reference-counting used in copy-on-write structures) is basically
irrelevant in comparison.

Result values
-------------

There's no major semantic split in result conventions like that
between pass-by-reference and pass-by-value.  In most languages, a
function has to return a value (or nothing).  There are languages like
C++ where functions can return references, but that's inherently
limited, because the reference has to refer to something that exists
outside the function.  If Swift ever adds a similar language
mechanism, it'll have to be memory-safe and extremely opaque, and
it'll be easy to just think of that as a kind of weird value result.
So we'll just consider value results here.

Value results raise some of the same ownership-transfer questions as
value arguments.  There's one major limitation: just like a
by-reference result, an actual `unowned` convention is inherently
limited, because something else other than the result value must be
keeping it valid.  So that's off the table for Swift.

What Objective-C does is something more dynamic.  Most APIs in
Objective-C give you a very ephemeral guarantee about the validity of
the result: it's valid now, but you shouldn't count on it being valid
indefinitely later.  This might be because the result is actually
owned by some other object somewhere, or it might be because the
result has been placed in the autorelease pool, a thread-local data
structure which will (when explicitly drained by something up the call
chain) eventually release that's been put into it.  This autorelease
pool can be a major source of spurious memory growth, and in classic
manual reference-counting it was important to drain it fairly
frequently.  ARC's response to this convention was to add an
optimization which attempts to prevent things from ending up in the
autorelease pool; the net effect of this optimization is that ARC ends
up with an owned reference regardless of whether the value was
autoreleased.  So in effect, from ARC's perspective, these APIs still
return an owned reference, mediated through some extra runtime calls
to undo the damage of the convention.

So there's really no compelling alternative to an owned return
convention as the default in Swift.

Physical conventions
====================

The lowest abstraction level for a calling convention is the actual
"physical" rules for the call:

* where the caller should place argument values in registers and
  memory before the call,

* how the callee should pass back the return values in registers
  and/or memory after the call, and

* what invariants hold about registers and memory over the call.

In theory, all of these could be changed in the Swift ABI.  In
practice, it's best to avoid changes to the invariant rules, because
those rules could complicate Swift-to-C interoperation:

* Assuming a higher stack alignment would require dynamic realignment
  whenever Swift code is called from C.

* Assuming a different set of callee-saved registers would require
  additional saves and restores when either Swift code calls C or is
  called from C, depending on the exact change.  That would then
  inhibit some kinds of tail call.

So we will limit ourselves to considering the rules for allocating
parameters and results to registers.  Our platform C ABIs are usually
quite good at this, and it's fair to ask why Swift shouldn't just use
C's rules.  There are three general answers:

* Platform C ABIs are specified in terms of the C type system, and the
  Swift type system allows things to be expressed which don't have
  direct analogues in C (for example, enums with payloads).

* The layout of structures in Swift does not necessarily match their
  layout in C, which means that the C rules don't necessarily cover
  all the cases in Swift.

* Swift places a larger emphasis on first-class structs than C does.
  C ABIs often fail to allocate even small structs to registers, or
  use inefficient registers for them, and we would like to be somewhat
  more aggressive than that.

Accordingly, the Swift ABI is defined largely in terms of lowering: a
Swift function signature is translated to a C function signature with
all the aggregate arguments and results eliminated (possibly by
deciding to pass them indirectly).  This lowering will be described in
detail in the final section of this whitepaper.

However, there are some specific circumstances where we'd like to
deviate from the platform ABI:

Aggregate results
-----------------

As mentioned above, Swift puts a lot of focus on first-class value
types.  As part of this, it's very valuable to be able to return
common value types fully in registers instead of indirectly.  The
magic number here is three: it's very common for copy-on-write value
types to want about three pointers' worth of data, because that's just
enough for some sort of owner pointer plus a begin/end pair.

Unfortunately, many common C ABIs fall slightly short of that.  Even
those ABIs that do allow small structs to be returned in registers
tend to only allow two pointers' worth.  So in general, Swift would
benefit from a very slightly-tweaked calling convention that allocates
one or two more registers to the result.

Implicit parameters
-------------------

There are several language features in Swift which require implicit
parameters:

Closures
~~~~~~~~

Swift's function types are "thick" by default, meaning that a function
value carries an optional context object which is implicitly passed to
the function when it is called.  This context object is
reference-counted, and it should be passed `guaranteed` for
straightforward reasons:

* It's not uncommon for closures to be called many times, in which
  case an `owned` convention would be unnecessarily expensive.

* While it's easy to imagine a closure which would want to take
  responsibility for its captured values, giving it responsibility for
  a retain of the context object doesn't generally allow that.  The
  closure would only be able to take ownership of the captured values
  if it had responsibility for a *unique* reference to the context.
  So the closure would have to be written to do different things based
  on the uniqueness of the reference, and it would have to be able to
  tear down and deallocate the context object after stealing values
  from it.  The optimization just isn't worth it.

* It's usually straightforward for the caller to guarantee the
  validity of the context reference; worst case, a single extra
  Swift-native retain/release is pretty cheap.  Meanwhile, not having
  that guarantee would force many closure functions to retain their
  contexts, since many closures do multiple things with values from
  the context object.  So `unowned` would not be a good convention.

Many functions don't actually need a context, however; they are
naturally "thin".  It would be best if it were possible to construct a
thick function directly from a thin function without having to
introduce a thunk just to move parameters around the missing context
parameter.  In the worst case, a thunk would actually require the
allocation of a context object just to store the original function
pointer; but that's only necessary when converting from a completely
opaque function value.  When the source function is known statically,
which is far more likely, the thunk can just be a global function
which immediately calls the target with the correctly shuffled
arguments.  Still, it'd be better to be able to avoid creating such
thunks entirely.

In order to reliably avoid creating thunks, it must be possible for
code invoking an opaque thick function to pass the context pointer in
a way that can be safely and implicitly ignored if the function
happens to actually be thin.  There are two ways to achieve this:

* The context can be passed as the final parameter.  In most C calling
  conventions, extra arguments can be safely ignored; this is because
  most C calling conventions support variadic arguments, and such
  conventions inherently can't rely on the callee knowing the extent
  of the arguments.

  However, this is sub-optimal because the context is often used
  repeatedly in a closure, especially at the beginning, and putting it
  at the end of the argument list makes it more likely to be passed on
  the stack.

* The context can be passed in a register outside of the normal
  argument sequence.  Some ABIs actually even reserve a register for
  this purpose; for example, on x86-64 it's `%r10`.  Neither of the
  ARM ABIs do, however.

Having an out-of-band register would be the best solution.

(Surprisingly, the ownership transfer convention for the context
doesn't actually matter here.  You might think that an `owned`
convention would be prohibited, since the callee would fail to release
the context and would therefore leak it.  However, a thin function
should always have a `nil` context, so this would be harmless.)

Either solution works acceptably with curried partial application,
since the inner parameters can be left in place while transforming the
context into the outer parameters.  However, an `owned` convention
would either prevent the uncurrying forwarder from tail-calling the
main function or force all the arguments to be spilled.  Neither is
really acceptable; one more argument against an `owned` convention.
(This is another example where `guaranteed` works quite nicely, since
the guarantees are straightforward to extend to the main function.)

`self`
~~~~~~

Methods (both static and instance) require a `self` parameter.  In all
of these cases, it's reasonable to expect that `self` will used
frequently, so it's best to pass it in a register.  Also, many methods
call other methods on the same object, so it's also best if the
register storing `self` is stable across different method signatures.

In static methods on value types, `self` doesn't require any dynamic
information: there's only one value of the metatype, and there's
usually no point in passing it.

In static methods on class types, `self` is a reference to the class
metadata, a single pointer.  This is necessary because it could
actually be the class object of a subclass.

In instance methods on class types, `self` is a reference to the
instance, again a single pointer.

In mutating instance methods on value types, `self` is the address of
an object.

In non-mutating instance methods on value types, `self` is a value; it
may require multiple registers, or none, or it may need to be passed
indirectly.

All of these cases except mutating instance methods on value types can
be partially applied to create a function closure whose type is the
formal type of the method.  That is, if class `A` has a method
declared `func foo(_ x: Int) -> Double`, then `A.foo` yields a function
of type `(Int) -> Double`.  Assuming that we continue to feel that
this is a useful language feature, it's worth considered how we could
support it efficiently.  The expenses associated with a partial
application are (1) the allocation of a context object and (2) needing
to introduce a thunk to forward to the original function.  All else
aside, we can avoid the allocation if the representation of `self` is
compatible with the representation of a context object reference; this
is essentially true only if `self` is a class instance using Swift
reference counting.  Avoiding the thunk is possible only if we
successfully avoided the allocation (since otherwise a thunk is
required in order to extract the correct `self` value from the
allocated context object) and `self` is passed in exactly the same
manner as a closure context would be.

It's unclear whether making this more efficient would really be
worthwhile on its own, but if we do support an out-of-band context
parameter, taking advantage of it for methods is essentially trivial.

Error handling
--------------

The calling convention implications of Swift's error handling design
aren't yet settled.  It may involve extra parameters; it may involve
extra return values.  Considerations:

* Callers will generally need to immediately check for an error.
  Being able to quickly check a register would be extremely
  convenient.

* If the error is returned as a component of the result value, it
  shouldn't be physically combined with the normal result.  If the
  normal result is returned in registers, it would be unfortunate to
  have to do complicated logic to test for error.  If the normal
  result is returned indirectly, contorting the indirect result with
  the error would likely prevent the caller from evaluating the call
  in-place.

* It would be very convenient to be able to trivially turn a function
  which can't produce an error into a function which can.  This is an
  operation that we expect higher-order code to have do frequently, if
  it isn't completely inlined away.  For example::

    // foo() expects its argument to follow the conventions of a
    // function that's capable of throwing.
    func foo(_ fn: () throws -> ()) throwsIf(fn)

    // Here we're passing foo() a function that can't throw; this is
    // allowed by the subtyping rules of the language.  We'd like to be
    // able to do this without having to introduce a thunk that maps
    // between the conventions.
    func bar(_ fn: () -> ()) {
      foo(fn)
    }

We'll consider two ways to satisfy this.

The first is to pass a pointer argument that doesn't interfere with
the normal argument sequence.  The caller would initialize the memory
to a zero value.  If the callee is a throwing function, it would be
expected to write the error value into this argument; otherwise, it
would naturally ignore it.  Of course, the caller then has to load
from memory to see whether there's an error.  This would also either
consume yet another register not in the normal argument sequence or
have to be placed at the end of the argument list, making it more
likely to be passed on the stack.

The second is basically the same idea, but using a register that's
otherwise callee-save.  The caller would initialize the register to a
zero value.  A throwing function would write the error into it; a
non-throwing function would consider it callee-save and naturally
preserve it.  It would then be extremely easy to check it for an
error.  Of course, this would take away a callee-save register in the
caller when calling throwing functions.  Also, if the caller itself
isn't throwing, it would have to save and restore that register.

Both solutions would allow tail calls, and the zero store could be
eliminated for direct calls to known functions that can throw.  The
second is the clearly superior solution, but definitely requires more
work in the backend.

Default argument generators
---------------------------

By default, Swift is resilient about default arguments and treats them
as essentially one part of the implementation of the function.  This
means that, in general, a caller using a default argument must call a
function to emit the argument, instead of simply inlining that
emission directly into the call.

These default argument generation functions are unlike any other
because they have very precise information about how their result will
be used: it will be placed into a specific position in specific
argument list.  The only reason the caller would ever want to do
anything else with the result is if it needs to spill the value before
emitting the call.

Therefore, in principle, it would be really nice if it were possible
to tell these functions to return in a very specific way, e.g. to
return two values in the second and third argument registers, or to
return a value at a specific location relative to the stack pointer
(although this might be excessively constraining; it would be
reasonable to simply opt into an indirect return instead).  The
function should also preserve earlier argument registers (although
this could be tricky if the default argument generator is in a generic
context and therefore needs to be passed type-argument information).

This enhancement is very easy to postpone because it doesn't affect
any basic language mechanics.  The generators are always called
directly, and they're inherently attached to a declaration, so it's
quite easy to take any particular generator and compatibly enhance it
with a better convention.

ARM32
-----

Most of the platforms we support have pretty good C calling
conventions.  The exceptions are i386 (for the iOS simulator) and
ARM32 (for iOS).  We really, really don't care about i386, but iOS on
ARM32 is still an important platform.  Switching to a better physical
calling convention (only for calls from Swift to Swift, of course)
would be a major improvement.

It would be great if this were as simple as flipping a switch, but
unfortunately the obvious convention to switch to (AAPCS-VFP) has a
slightly different set of callee-save registers: iOS treats `r9` as a
scratch register.  So we'd really want a variant of AAPCS-VFP that did
the same.  We'd also need to make sure that SJ/LJ exceptions weren't
disturbed by this calling convention; we aren't really *supporting*
exception propagation through Swift frames, but completely breaking
propagation would be unfortunate, and we may need to be able to
*catch* exceptions.

So this would also require some amount of additional support from the
backend.

Function signature lowering
===========================

Function signatures in Swift are lowered in two phases.

Semantic lowering
-----------------

The first phase is a high-level semantic lowering, which does a number
of things:

* It determines a high-level calling convention: specifically, whether
  the function must match the C calling convention or the Swift
  calling convention.

* It decides the types of the parameters:

  * Functions exported for the purposes of C or Objective-C may need
    to use bridged types rather than Swift's native types.  For
    example, a function that formally returns Swift's `String` type
    may be bridged to return an `NSString` reference instead.

  * Functions which are values, not simply immediately called, may
    need their types lowered to follow to match a specific generic
    abstraction pattern.  This applies to functions that are
    parameters or results of the outer function signature.

* It identifies specific arguments and results which *must* be passed
  indirectly:

  * Some types are inherently address-only:

    * The address of a weak reference must be registered with the
      runtime at all times; therefore, any `struct` with a weak field
      must always be passed indirectly.

    * An existential type (if not class-bounded) may contain an
      inherently address-only value, or its layout may be sensitive to
      its current address.

    * A value type containing an inherently address-only type as a
      field or case payload becomes itself inherently address-only.

  * Some types must be treated as address-only because their layout is
    not known statically:

    * The layout of a resilient value type may change in a later
      release; the type may even become inherently address-only by
      adding a weak reference.

    * In a generic context, the layout of a type may be dependent on a
      type parameter.  The type parameter might even be inherently
      address-only at runtime.

    * A value type containing a type whose layout isn't known
      statically itself generally will not have a layout that can be
      known statically.

  * Other types must be passed or returned indirectly because the
    function type uses an abstraction pattern that requires it.  For
    example, a generic `map` function expects a function that takes a
    `T` and returns a `U`; the generic implementation of `map` will
    expect these values to be passed indirectly because their layout
    isn't statically known.  Therefore, the signature of a function
    intended to be passed as this argument must pass them indirectly,
    even if they are actually known statically to be non-address-only
    types like (e.g.) `Int` and `Float`.

* It expands tuples in the parameter and result types.  This is done
  at this level both because it is affected by abstraction patterns
  and because different tuple elements may use different ownership
  conventions.  (This is most likely for imported APIs, where it's the
  tuple elements that correspond to specific C or Objective-C parameters.)

  This completely eliminates top-level tuple types from the function
  signature except when they are a target of abstraction and thus are
  passed indirectly.  (A function with type `(Float, Int) -> Float`
  can be abstracted as `(T) -> U`, where `T == (Float, Int)`.)

* It determines ownership conventions for all parameters and results.

After this phase, a function type consists of an abstract calling
convention, a list of parameters, and a list of results.  A parameter
is a type, a flag for indirectness, and an ownership convention.  A
result is a type, a flag for indirectness, and an ownership
convention.  (Results need ownership conventions only for non-Swift
calling conventions.)  Types will not be tuples unless they are
indirect.

Semantic lowering may also need to mark certain parameters and results
as special, for the purposes of the special-case physical treatments
of `self`, closure contexts, and error results.

Physical lowering
-----------------

The second phase of lowering translates a function type produced by
semantic lowering into a C function signature.  If the function
involves a parameter or result with special physical treatment,
physical lowering initially ignores this value, then adds in the
special treatment as agreed upon with the backend.

General expansion algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Central to the operation of the physical-lowering algorithm is the
**generic expansion algorithm**.  This algorithm turns any
non-address-only Swift type in a sequence of zero or more **legal
type**, where a legal type is either:

* an integer type, with a power-of-two size no larger than the maximum
  integer size supported by C on the target,

* a floating-point type supported by the target, or

* a vector type supported by the target.

Obviously, this is target-specific.  The target also specifies a
maximum voluntary integer size.  The legal type sequence only contains
vector types or integer types larger than the maximum voluntary size
when the type was explicit in the input.

Pointers are represented as integers in the legal type sequence.  We
assume there's never a reason to differentiate them in the ABI as long
as the effect of address spaces on pointer size is taken into account.
If that's not true, this algorithm should be adjusted.

The result of the algorithm also associates each legal type with an
offset.  This information is sufficient to reconstruct an object in
memory from a series of values and vice-versa.

The algorithm proceeds in two steps.

Typed layouts
^^^^^^^^^^^^^

First, the type is recursively analyzed to produce a **typed layout**.
A typed layout associates ranges of bytes with either (1) a legal type
(whose storage size must match the size of the associated byte
range), (2) the special type **opaque**, or (3) the special type
**empty**.  Adjacent ranges mapped to **opaque** or **empty** can be
combined.

For most of the types in Swift, this process is obvious: they either
correspond to an obvious legal type (e.g. thick metatypes are
pointer-sized integers), or to an obvious sequence of scalars
(e.g. class existentials are a sequence of pointer-sized integers).
Only a few cases remain:

* Integer types that are not legal types should be mapped as opaque.

* Vector types that are not legal types should be broken into smaller
  vectors, if their size is an even multiple of a legal vector type,
  or else broken into their components.  (This rule may need some
  tinkering.)

* Tuples and structs are mapped by merging the typed layouts of the
  fields, as padded out to the extents of the aggregate with
  empty-mapped ranges.  Note that, if fields do not overlap, this is
  equivalent to concatenating the typed layouts of the fields, in
  address order, mapping internal padding to empty.  Bit-fields should
  map the bits they occupy to opaque.

  For example, given the following struct type::

    struct FlaggedPair {
      var flag: Bool
      var pair: (MyClass, Float)
    }

  If Swift performs naive, C-like layout of this structure, and this
  is a 64-bit platform, typed layout is mapped as follows::

    FlaggedPair.flag := [0: i1,                        ]
    FlaggedPair.pair := [       8-15: i64, 16-19: float]
    FlaggedPair      := [0: i1, 8-15: i64, 16-19: float]

  If Swift instead allocates `flag` into the spare (little-endian) low
  bits of `pair.0`, the typed layout map would be::

    FlaggedPair.flag := [0: i1                   ]
    FlaggedPair.pair := [0-7: i64,    8-11: float]
    FlaggedPair      := [0-7: opaque, 8-11: float]

* Unions (imported from C) are mapped by merging the typed layouts of
  the fields, as padded out to the extents of the aggregate with
  empty-mapped ranges.  This will often result in a fully-opaque
  mapping.

* Enums are mapped by merging the typed layouts of the cases, as
  padded out to the extents of the aggregate with empty-mapped ranges.
  A case's typed layout consists of the typed layout of the case's
  directly-stored payload (if any), merged with the typed layout for
  its discriminator.  We assume that checking for a discriminator
  involves a series of comparisons of bits extracted from
  non-overlapping ranges of the value; the typed layout of a
  discriminator maps all these bits to opaque and the rest to empty.

  For example, given the following enum type::

    enum Sum {
      case Yes(MyClass)
      case No(Float)
      case Maybe
    }

  If Swift, in its infinite wisdom, decided to lay this out
  sequentially, and to use invalid pointer values the class to
  indicate that the other cases are present, the layout would look as
  follows::

    Sum.Yes.payload        := [0-7: i64                ]
    Sum.Yes.discriminator  := [0-7: opaque             ]
    Sum.Yes                := [0-7: opaque             ]
    Sum.No.payload         := [             8-11: float]
    Sum.No.discriminator   := [0-7: opaque             ]
    Sum.No                 := [0-7: opaque, 8-11: float]
    Sum.Maybe              := [0-7: opaque             ]
    Sum                    := [0-7: opaque, 8-11: float]

  If Swift instead chose to just use a discriminator byte, the layout
  would look as follows::

    Sum.Yes.payload        := [0-7: i64             ]
    Sum.Yes.discriminator  := [            8: opaque]
    Sum.Yes                := [0-7: i64,   8: opaque]
    Sum.No.payload         := [0-3: float           ]
    Sum.No.discriminator   := [            8: opaque]
    Sum.No                 := [0-3: float, 8: opaque]
    Sum.Maybe              := [            8: opaque]
    Sum                    := [0-8: opaque          ]

  If Swift chose to use spare low (little-endian) bits in the class
  pointer, and to offset the float to make this possible, the layout
  would look as follows::

    Sum.Yes.payload        := [0-7: i64             ]
    Sum.Yes.discriminator  := [0: opaque            ]
    Sum.Yes                := [0-7: opaque          ]
    Sum.No.payload         := [           4-7: float]
    Sum.No.discriminator   := [0: opaque            ]
    Sum.No                 := [0: opaque, 4-7: float]
    Sum.Maybe              := [0: opaque            ]
    Sum                    := [0-7: opaque          ]

The merge algorithm for typed layouts is as follows.  Consider two
typed layouts `L` and `R`.  A range from `L` is said to *conflict*
with a range from `R` if they intersect and they are mapped as
different non-empty types.  If two ranges conflict, and either range
is mapped to a vector, replace it with mapped ranges for the vector
elements.  If two ranges conflict, and neither range is mapped to a
vector, map them both to opaque, combining them with adjacent opaque
ranges as necessary.  If a range is mapped to a non-empty type, and
the bytes in the range are all mapped as empty in the other map, add
that range-mapping to the other map.  `L` and `R` should now match
perfectly; this is the result of the merge.  Note that this algorithm
is both associative and commutative.

Forming a legal type sequence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once the typed layout is constructed, it can be turned into a legal
type sequence.

Note that this transformation is sensitive to the offsets of ranges in
the complete type.  It's possible that the simplifications described
here could be integrated directly into the construction of the typed
layout without changing the results, but that's not yet proven.

In all of these examples, the maximum voluntary integer size is 4
(`i32`) unless otherwise specified.

If any range is mapped as a non-empty, non-opaque type, but its start
offset is not a multiple of its natural alignment, remap it as opaque.
For these purposes, the natural alignment of an integer type is the
minimum of its size and the maximum voluntary integer size; the
natural alignment of any other type is its C ABI type.  Combine
adjacent opaque ranges.

For example::

  [1-2: i16, 4: i8, 6-7: i16]  ==>  [1-2: opaque, 4: i8, 6-7: i16]

If any range is mapped as an integer type that is not larger than the
maximum voluntary size, remap it as opaque.  Combine adjacent opaque
ranges.

For example::

  [1-2: opaque, 4: i8, 6-7: i16]  ==>  [1-2: opaque, 4: opaque, 6-7: opaque]
  [0-3: i32, 4-11: i64, 12-13: i16]  ==>  [0-3: opaque, 4-11: i64, 12-13: opaque]

An *aligned storage unit* is an N-byte-aligned range of N bytes, where
N is a power of 2 no greater than the maximum voluntary integer size.
A *maximal* aligned storage unit has a size equal to the maximum
voluntary integer size.

Note that any remaining ranges mapped as integers must fully occupy
multiple maximal aligned storage units.

Split all opaque ranges at the boundaries of maximal aligned storage
units.  From this point on, never combine adjacent opaque ranges
across these boundaries.

For example::

  [1-6: opaque]  ==> [1-3: opaque, 4-6: opaque]

Within each maximal aligned storage unit, find the smallest aligned
storage unit which contains all the opaque ranges.  Replace the first
opaque range in the maximal aligned storage unit with a mapping from
that aligned storage unit to an integer of the aligned storage unit's
size.  Remove any other opaque ranges in the maximal aligned storage
unit.  Note that this can create overlapping ranges in some cases.
For the purposes of this calculation, the last maximal aligned
storage unit should be considered "full", as if the type had an
infinite amount of empty tail-padding.

For example::

  [1-2: opaque]  ==>  [0-3: i32]
  [0-1: opaque]  ==>  [0-1: i16]
  [0: opaque, 2: opaque]  ==>  [0-3: i32]
  [0-9: fp80, 10: opaque]  ==>  [0-9: fp80, 10: i8]

  // If maximum voluntary size is 8 (i64):
  [0-9: fp80, 11: opaque, 13: opaque]  ==>  [0-9: fp80, 8-15: i64]

(This assumes that `fp80` is a legal type for illustrative purposes.
It would probably be a better policy for the actual x86-64 target to
consider it illegal and treat it as opaque from the start, at least
when lowering for the Swift calling convention; for C, it is important
to produce an `fp80` mapping for ABI interoperation with C functions
that take or return `long double` by value.)

The final legal type sequence is the sequence of types for the
non-empty ranges in the map.  The associated offset for each type is
the offset of the start of the corresponding range.

Only the final step can introduce overlapping ranges, and this is only
possible if there's a non-integer legal type which:

* has a natural alignment less than half of the size of the maximum
  voluntary integer size or

* has a store size is not a multiple of half the size of the maximum
  voluntary integer size.

On our supported platforms, these conditions are only true on x86-64,
and only of `long double`.

Deconstruction and Reconstruction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given the address of an object and a legal type sequence for its type,
it's straightforward to load a valid sequence or store the sequence
back into memory.  For the most part, it's sufficient to simply load
or store each value at its appropriate offset.  There are two
subtleties:

* If the legal type sequence had any overlapping ranges, the integer
  values should be stored first to prevent overwriting parts of the
  other values they overlap.

* Care must be taken with the final values in the sequence; integer
  values may extend slightly beyond the ordinary storage size of the
  argument type.  This is usually easy to compensate for.

The value sequence essentially has the same semantics that the value
in memory would have: any bits that aren't part of the actual
representation of the original type have a completely unspecified
value.

Forming a C function signature
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As mentioned before, in principle the process of physical lowering
turns a semantically-lowered Swift function type (in implementation
terms, a SILFunctionType) into a C function signature, which can then
be lowered according to the usual rules for the ABI.  This is, in
fact, what we do when trying to match a C calling convention.
However, for the native Swift calling convention, because we actively
want to use more aggressive rules for results, we instead build an
LLVM function type directly.  We first construct a direct result type
that we're certain the backend knows how to interpret according to our
more aggressive desired rules, and then we use the expansion algorithm
to construct a parameter sequence consisting solely of types with
obvious ABI lowering that the backend can reliably handle.  This
bypasses the need to consult Clang for our own native calling
convention.

We have this generic expansion algorithm, but it's important to
understand that the physical lowering process does not just naively
use the results of this algorithm.  The expansion algorithm will
happily expand an arbitrary structure; if that structure is very
large, the algorithm might turn it into hundreds of values.  It would
be foolish to pass it as an argument that way; it would use up all the
argument registers and basically turn into a very inefficient memcpy,
and if the caller wanted it all in one place, they'd have to very
painstakingly reassemble.  It's much better to pass large structures
indirectly.  And with result values, we really just don't have a
choice; there's only so many registers you can use before you have to
give up and return indirectly.  Therefore, even in the Swift native
convention, the expansion algorithm is basically used as a first pass.
A second pass then decides whether the expanded sequence is actually
reasonable to pass directly.

Recall that one aspect of the semantically-lowered Swift function type
is whether we should be matching the C calling convention or not.  The
following algorithm here assumes that the importer and semantic
lowering have conspired in a very particular way to make that
possible.  Specifically, we assume is that an imported C function
type, lowered semantically by Swift, will follow some simple
structural rules:

* If there was a by-value `struct` or `union` parameter or result in
  the imported C type, it will correspond to a by-value direct
  parameter or return type in Swift, and the Swift type will be a
  nominal type whose declaration links back to the original C
  declaration.

* Any other parameter or result will be transformed by the importer
  and semantic lowering to a type that the generic expansion algorithm
  will expand to a single legal type whose representation is
  ABI-compatible with the original parameter.  For example, an
  imported pointer type will eventually expand to an integer of
  pointer size.

* There will be at most one result in the lowered Swift type, and it
  will be direct.

Given this, we go about lowering the function type as follows.  Recall
that, when matching the C calling convention, we're building a C
function type; but that when matching the Swift native calling
convention, we're building an LLVM function type directly.

Results
^^^^^^^

The first step is to consider the results of the function.

There's a different set of rules here when we're matching the C
calling convention.  If there's a single direct result type, and it's
a nominal type imported from Clang, then the result type of the C
function type is that imported Clang type.  Otherwise, concatenate the
legal type sequences from the direct results.  If this yields an empty
sequence, the result type is `void`.  If it yields a single legal
type, the result type is the corresponding Clang type.  No other could
actually have come from an imported C declaration, so we don't have
any real compatibility requirements; for the convenience of
interoperation, this is handled by constructing a new C struct which
contains the corresponding Clang types for the legal type sequence as
its fields.

Otherwise, we are matching the Swift calling convention.  Concatenate
the legal type sequences from all the direct results.  If
target-specific logic decides that this is an acceptable collection to
return directly, construct the appropriate IR result type to convince
the backend to handle it.  Otherwise, use the `void` IR result type
and return the "direct" results indirectly by passing the address of a
tuple combining the original direct results (*not* the types from the
legal type sequence).

Finally, any indirect results from the semantically-lowered function
type are simply added as pointer parameters.

Parameters
^^^^^^^^^^

After all the results are collected, it's time to collect the
parameters.  This is done one at the time, from left to right, adding
parameters to our physically-lowered type.

If semantic lowering has decided that we have to pass the parameter
indirectly, we simply add a pointer to the type.  This covers both
mandatory-indirect pass-by-value parameters and pass-by-reference
parameters.  The latter can arise even in C and Objective-C.

Otherwise, the rules are somewhat different if we're matching the C
calling convention.  If the parameter is a nominal type imported from
Clang, then we just add the imported Clang type to the Clang function
type as a parameter.  Otherwise, we derive the legal type sequence for
the parameter type.  Again, we should only have compatibility
requirements if the legal type sequence has a single element, but for
the convenience of interoperation, we collect the corresponding Clang
types for all of the elements of the sequence.

Finally, if we're matching the Swift calling convention, derive the
legal type sequence.  If the result appears to be a reasonably small
and efficient set of parameters, add their corresponding IR types to
the function type we're building; otherwise, ignore the legal type
sequence and pass the address of the original type indirectly.

Considerations for whether a legal type sequence is reasonable to pass
directly:

* There probably ought to be a maximum size.  Unless it's a single
  256-bit vector, it's hard to imagine wanting to pass more than, say,
  32 bytes of data as individual values.  The callee may decide that
  it needs to reconstruct the value for some reason, and the larger
  the type gets, the more expensive this is.  It may also be
  reasonable for this cap to be lower on 32-bit targets, but that
  might be dealt with better by the next restriction.

* There should also be a cap on the number of values.  A 32-byte limit
  might be reasonable for passing 4 doubles.  It's probably not
  reasonable for passing 8 pointers.  That many values will exhaust
  all the parameter registers for just a single value.  4 is probably
  a reasonable cap here.

* There's no reason to require the data to be homogeneous.  If a
  struct contains three floats and a pointer, why force it to be
  passed in memory?

When all of the parameters have been processed in this manner,
the function type is complete.