swift-mirror/docs/proposals/Accessors.rst

:orphan:

Optimizing Accessors in Swift
=============================

Definitions
-----------

An abstract storage declaration is a language construct that declares
a means of accessing some sort of abstract entity.  I'll just say
"storage" hereafter.

Swift provides three storage declarations:

* a single named entity, called a *variable* and declared with ``var``
* a single named entity which can never be reassigned, called a *constant* and declared with ``let``
* a compound unnamed entity accessed with an index, called a *subscript* and declared with ``subscript``

These features are similar to those in other languages.  Swift notably
lacks compound named entities, such as C#'s indexed properties; the
design team intentionally chose to favor the use of named single
entities of subscriptable type.

It's useful to lump the two kinds of single named entities together;
I'll just call them both "variables" hereafter.

Subscripts must always be instance type members.  When a variable is
a type member, it's called a *property*.

When I say that these entities are *abstract*, I mean that they're not
directly tied to any particular implementation.  All of them may be
backed directly by memory storing values of the entity's type, or they
may simply provide a way of invoking arbitrary code to access a
logical memory, or they may be a little of both.

Full-value accesses
-------------------

All accesses to storage, no matter the implementation, can be performed
with two primitive operations:

* a full-value load, which creates a new copy of the current value
* a full-value store, which overwrites the current value with a
  different, independent value

A function which implements a full-value load is called a *getter*;
a full-value store, a *setter*.

An operation which calls for a full-value load into a temporary, then
a modification of the temporary, then a full-value store of the
temporary into the original entity, is called a *write-back*.

Implementing accesses with full-value accesses introduces two
problems: subobject clobbering and performance.

Subobject clobbering
~~~~~~~~~~~~~~~~~~~~

Subobject clobbering is a semantic issue.  It occurs when there are
two changes to an entity in flight at once, as in::

   swap(&point.x, &point.y)

(By "at once", I mean synchronously.  Unlike Java, Swift is willing to
say that an *asynchronous* simultaneous access (mutating or not) to an
entity that's being modified is completely undefined behavior, with
any sort of failure permitted, up to and including memory corruption
and crashes.  As Swift transitions towards being a systems language,
it can selectively choose to define some of this behavior,
e.g. allowing racing accesses to different elements of an array.)

Subobject clobbering is a problem with user expectation.  Users are
unlikely to write code that intentionally modifies two obviously
overlapping entities, but they might very reasonably write code that
modifies two "separate" sub-entities.  For example, they might write
code to swap two properties of a struct, or two elements of an array.
The natural expansion of these subobject accesses using whole-object
accesses generates code that "loses" the changes made to one of the
objects::

  var point0 = point
  var x = point0.x
  var point1 = point
  var y = point1.y
  swap(&x, &y)
  point1.y = y
  point = point1
  point0.x = x
  point = point0

Note that ``point.y`` is left unchanged.

Local analysis
^^^^^^^^^^^^^^

There are two straightforward solutions to subobject clobbering, both
reliant on doing local analysis to recognize two accesses which
obviously alias (like the two references to ``point`` in the above
example).  Once you've done this, you can either:

* Emit an error, outlawing such code.  This is what Swift currently
  does (but only when the aliasing access must be implemented with
  full-value loads and stores).
* Use a single temporary, potentially changing the semantics.

It's impossible to guarantee the absence of subobject clobbering with
this analysis without extremely heavy-handed languages changes.
Fortunately, subobject clobbering is "only" a semantics issue, not a
memory-safety issue, at least as long as it's implemented with
full-value accesses.

Reprojection
^^^^^^^^^^^^

A more general solution is to re-project the modified subobject from
scratch before writing it back.  That is, you first acquire an initial
value like normal for the subobject you wish to modify, then modify
that temporary copy in-place.  But instead of then recursively
consuming all the intermediate temporaries when writing them back, you
drop them all and recompute the current value, then write the modified
subobject to it, then write back all the way up.  That is::

  var point0 = point
  var x = point0.x
  var point1 = point
  var y = point1.y
  swap(&x, &y)
  point1 = point      // reload point1
  point1.y = y
  point = point1
  point0 = point      // reload point0
  point0.x = x
  point = point0

In this example, I've applied the solution consistently to all
accesses, which protects against unseen modifications (e.g. during the
call to ``swap``) at the cost of performing two extra full-value
loads.

You can heuristically lower this by combining it with a simple local
analysis and only re-projecting when writing back to other l-values
besides the last.  In other words, generate code that will work as
long as the entity is not modified behind abstraction barriers or
through unexpected aliases::

  var point0 = point
  var x = point0.x
  var point1 = point
  var y = point1.y
  swap(&x, &y)
  point1.y = y        // do not reload point1
  point = point1
  point0 = point      // reload point0
  point0.x = x
  point = point0

Note that, in either solution, you've introduced extra full-value
loads.  This may be quite expensive, and it's not guaranteed to be
semantically equivalent.

Performance
~~~~~~~~~~~

There are three major reasons why full-value accesses are inefficient.

Unnecessary subobject accesses
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The first is that they may load or store more than is necessary.

As an obvious example, imagine a variable of type ``(Int, Int)``; even
if my code only accesses the first element of the tuple, full-value
accesses force me to read or write the second element as well.  That
means that, even if I'm purely overwriting the first element, I
actually have to perform a full-value load first so that I know what
value to use for the second element when performing the full-value
store.

Additionally, while unnecessarily loading the second element of an
``(Int, Int)`` pair might seem trivial, consider that the tuple could
actually have twenty elements, or that the second element might be
non-trivial to copy (e.g. if it's a retainable pointer).

Abstraction barriers
^^^^^^^^^^^^^^^^^^^^

A full-value load or store which you can completely reason about is one
thing, but if it has to be performed as a call, it can be a major
performance drag.

For one, calls do carry a significant amount of low-level overhead.

For another, optimizers must be extremely conservative about what a
call might do.  A retainable pointer might have to be retained and
later released purely to protect against the possibility that a getter
might, somehow, cause the pointer to otherwise be deallocated.

Furthermore, the conventions of the call might restrict performance.
One way or another, a getter for a retainable pointer generally
returns at +1, meaning that as part of the return, it is retained,
forcing the caller to later release.  If the access were instead
direct to memory, this retain might be avoidable, depending on what
the caller does with the pointer.

Copy-on-write
^^^^^^^^^^^^^

These problems are compounded by copy-on-write (COW) types.  In Swift,
a copy-on-write value embeds an object reference.  Copying the value
has low immediate cost, because it simply retains the existing
reference.  However, modifying a value requires the reference to be
made unique, generally by copying the data held by the value into a
fresh object.  I'll call this operation a *structural copy* in an
effort to avoid the more treacherous term "deep copy".

COW types are problematic with full-value accesses for several reasons.

First, COW types are often used to implement aggregates and thus often
have several distinguishable subobjects which users are likely to
think of as independent.  This heightens the dangers of subobject
clobbering.

Second, a full-value load of a COW type implies making the object
reference non-unique.  Changing the value at this point will force a
structural copy.  This means that modifying a temporary copy has
dramatically worse performance compared to modifying the original
entity in-place.  For example::

  window.name += " (closing)"

If ``&window.name`` can be passed directly to the operator, and the
string buffer is uniquely referenced by that string, then this
operation may be as cheap as copying a few characters into the tail of
the buffer.  But if this must be done with a write-back, then the
temporary will never have a unique reference, and there will always
be an unneeded structural copy.

Conservative access patterns
----------------------------

When you know how storage is implemented, it's straightforward to
generate an optimal access to it.  There are several major reasons why
you might not know how a storage declaration is implemented, though:

* It might be an abstract declaration, not a concrete declaration.
  Currently this means a protocol member, but Swift may someday add
  abstract class members.

* It might be a non-final class member, where the implementation you
  can see is potentially overridable by a subclass.

* It might be a resilient declaration, where you know only that the
  entity exists and know nothing statically about its implementation.

In all of these cases, you must generate code that will handle the
worst possible case, which is that the entity is implemented with a
getter and a setter.  Therefore, the conservative access pattern
includes opaque getter and setter functions.

However, for all the reasons discussed above, using unnecessary
full-value accesses can be terrible for performance.  It's really bad
if a little conservatism --- e.g. because Swift failed to devirtualize
a property access --- causes asymptotic inefficiencies.  Therefore,
Swift's native conservative access pattern also includes a third
accessor which permits direct access to storage when possible.  This
accessor is called ``materializeForSet``.

``materializeForSet`` receives an extra argument, which is an
uninitialized buffer of the value type, and it returns a pointer and a
flag.  When it can provide direct access to storage for the entity, it
constructs a pointer to the storage and returns false.  When it can't,
it performs a full-value load into the buffer and returns true.  The
caller performs the modification in-place on the returned pointer and
then, if the flag is true, passes the value to the setter.

The overall effect is to enable direct storage access as a dynamic
optimization when it's impossible as a static optimization.

For now, ``materializeForSet`` is always automatically generated based
on whether the entity is implemented with a computed setter.  It is
possible to imagine data structures that would benefit from having
this lifted to a user-definable feature; for example, a data structure
which sometimes holds its elements in memory but sometimes does not.

``materializeForSet`` can provide direct access whenever an address
for the storage can be derived.  This includes when the storage is
implemented with a ``mutableAddress`` accessor, as covered below.
Observing accessors currently prevent ``materializeForSet`` from
offering direct access; that's fixable for ``didSet`` using a slightly
different code pattern, but ``willSet`` is an inherent obstacle.

Independent of any of the other optimizations discussed in this
whitepaper, ``materializeForSet`` had the potential to immediately
optimize the extremely important case of mutations to COW values in
un-devirtualized class properties, with fairly minimal risk.
Therefore, ``materializeForSet`` was implemented first, and it shipped
in Xcode 6.1.

Direct access at computed addresses
-----------------------------------

What entities can be directly accessed in memory?  Non-computed
variables make up an extremely important set of cases; Swift has
enough built-in knowledge to know that it can provide direct access to
them.  But there are a number of other important cases where the
address of an entity is not built-in to the compiler, but where direct
access is nonetheless possible.  For example, elements of a simple
array always have independent storage in memory.  Most benchmarks on
arrays would profit from being able to modify array elements in-place.

There's a long chain of proposals in this area, many of which are
refinement on previous proposals.  None of these proposals has yet
shipped in Xcode.

Addressors
~~~~~~~~~~

For something like a simple array (or any similar structure, like a
deque) which is always backed by a buffer, it makes sense for the
implementor to simply define accessors which return the address of
the element.  Such accessors are called *addressors*, and there are
two: ``address`` and ``mutableAddress``.

The conservative access pattern can be generated very easily from
this: the getter calls ``address`` and loads from it, the setter calls
``mutableAddress`` and stores to it, and ``materializeForSet``
provides direct access to the address returned from
``mutableAddress``.

If the entity has type ``T``, then ``address`` returns an
``UnsafePointer<T>`` and ``mutableAddress`` returns an
``UnsafeMutablePointer<T>``.  This means that the formal type of the
entity must exactly match the formal type of the storage.  Thus, the
standard subscript on ``Dictionary<K,V>`` cannot be implemented using
addressors, because the formal type of the entity is ``V?``, but the
backing storage holds a ``V``.  (And this is in keeping with user
expectations about the data structure: assigning ``nil`` at a key is
supposed to erase any existing entry there, not create a new entry to
hold ``nil``.)

This simple addressor proposal was the first prong of our efforts to
optimize array element access.  Unfortunately, while it is useful for
several other types (such as ``ContiguousArray`` and
``UnsafeMutablePointer``), it is not flexible enough for the ``Array``
type.

Mixed addressors
~~~~~~~~~~~~~~~~

Swift's chief ``Array`` type is only a simple array when it is not
interacting with Objective-C.  Type bridging requires ``Array`` to be
able to store an immutable ``NSArray`` instance, and the ``NSArray``
interface does not expose the details of how it stores elements.  An
``NSArray`` is even permitted to dynamically generate its values in
its ``objectAtIndex:`` method.  And it would be absurd for ``Array``
to perform a structural copy during a load just to make non-mutating
accesses more efficient!  So the load access pattern for ``Array``'s
subscript declaration must use a getter.

Fortunately, this requirement does not preclude using an addressor for
mutating accesses.  Mutations to ``Array`` always transition the array
to a unique contiguous buffer representation as their first step.
This means that the subscript operator can sensibly return an address
when it's used for the purposes of mutation: in other words, exactly
when ``mutableAddress`` would be invoked.

Therefore, the second prong of our efforts to optimize array element
access was to allow entities to be implemented with the combination of
a ``get`` accessor and a ``mutableAddress`` accessor.  This is
straightforward in the user model, where it simply means lifting a
restriction.  It's more complex behind the scenes because it broke
what was previously a clean conceptual division between "physical" and
"logical" l-values.

Mixed addressors have now been adopted by ``Array`` to great success.
As expected, they substantially improved performance mutating COW
array elements.  But they also fix an important instance of subobject
clobbering, because modifications to different subobjects (notably,
different elements of the same array) can occur simultaneously by
simply projecting out their addresses in the unique buffer.  For
example, this means that it's possible to simply swap two elements
of an array directly::

  swap(&array[i], &array[j])

  // Expanded:
  array.transitionToUniquelyReferenced()
  let address_i = array.buffer.storage + i
  array.transitionToUniquelyReferenced()
  let address_j = array.buffer.storage + j
  swap(address_i, address_j)

Mixed addressors weren't completely implemented until very close to
the Xcode 6.1 deadline, and they changed code-generation patterns
enough to break a number of important array-specific optimizations.
Therefore, the team sensibly decided that they were too risky for that
release, and that there wasn't enough benefit from other applications
to justify including any of the addressor work.

In a way, that was a fortunate decision, because the naive version of
addressors implemented so far in Swift creates a safety hole which
would otherwise have been exposed to users.

Memory unsafety of addressors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The semantics and memory safety of operations on COW types rely on a
pair of simple rules:

* A non-mutating operation must own a reference to the buffer for
  the full course of the read.

* A mutating operation must own a unique reference to the buffer
  for the full course of the mutation.

Both rules tend to be naturally satisfied by the way that operations
are organized into methods.  A value must own a reference to its
buffer at the moment that a method is invoked on it.  A mutating
operation immediately transitions the buffer to a unique reference,
performing a structural copy if necessary.  This reference will remain
valid for the rest of the method as long as the method is *atomic*: as
long as it does not synchronously invoke arbitrary user code.

(This is a single-threaded notion of atomicity.  A second thread which
modifies the value simultaneously can clearly invalidate the
assumption.  But that would necessarily be a data race, and the
language design team is willing to say that such races have fully
undefined behavior, and arbitrary consequences like memory corruption
and crashes are acceptable in their wake.)

However, addressors are not atomic in this way: they return an address
to the caller, which may then interleave arbitrary code before
completing the operation.  This can present the opportunity for
corruption if the interleaved code modifies the original value.
Consider the following code::

  func operate(value: inout Int, count: Int) { ... }

  var array: [Int] = [1,2,3,4]
  operate(&array[0], { array = []; return 0 }())

The dynamic sequence of operations performed here will expand like so::

  var array: [Int] = [1,2,3,4]
  let address = array.subscript.mutableAddress(0)
  array = []
  operate(address, 0)

The assignment to ``array`` within the closure will release the buffer
containing ``address``, thus passing ``operate`` a dangling pointer.

Nor can this be fixed with a purely local analysis; consider::

  class C { var array: [Int] }
  let global_C = C()

  func assign(value: inout Int) {
    C.array = []
    value = 0
  }

  assign(&global_C.array[0])

Fixing the memory safety hole
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Conceptually, the correct fix is to guarantee that the rules are
satisfied by ensuring that the buffer is retained for the duration of
the operation.  Any interleaving modifications will then see a
non-uniquely-referenced buffer and perform a structural copy::

  // Project the array element.
  let address = array.subscript.mutableAddress(0)

  // Remember the new buffer value and keep it retained.
  let newArrayBuffer = array.buffer
  retain(newArrayBuffer)

  // Reassign the variable.
  release(array.buffer)
  array.buffer = ...

  // Perform the mutation.  These changes will be silently lost, but
  // they at least won't be using deallocated memory.
  operate(address, 0)

  // Release the "new" buffer.
  release(newArrayBuffer)

Note that this still leaves a semantic hole if the original value is
copied in interleaving code before the modification, because the
subsequent modification will be reflected in the copy::

  // Project the array element.
  let address = array.subscript.mutableAddress(0)

  // Remember the new buffer value and keep it retained.
  let newArrayBuffer = array.buffer
  retain(newArrayBuffer)

  // Copy the value.  Note that arrayCopy uses the same buffer that
  // 'address' points into.
  let arrayCopy = array
  retain(arrayCopy.buffer)

  // Perform the mutation.
  operate(address, 0)

  // Release the "new" buffer.
  release(newArrayBuffer)

This might be unexpected behavior, but the language team is willing to
accept unexpected behavior for this code.  What's non-negotiable is
breaking memory safety.

Unfortunately, applying this fix naively reintroduces the problem of
subobject clobbering: since a modification of one subobject
immediately retains a buffer that's global to the entire value, an
interleaved modification of a different subobject will see a
non-unique buffer reference and therefore perform a structural copy.
The modifications to the first subobject will therefore be silently
lost.

Unlike the interleaving copy case, this is seen as unacceptable.
Notably, it breaks swapping two array elements::

  // Original:
  swap(&array[i], &array[j])

  // Expanded:

  // Project array[i].
  array.transitionToUniquelyReferenced()
  let address_i = array.buffer.storage + i
  let newArrayBuffer_i = array.buffer
  retain(newArrayBuffer_i)

  // Project array[j].  Note that this transition is guaranteed
  // to have to do a structural copy.
  array.transitionToUniquelyReferenced()
  let address_j = array.buffer.storage + j
  let newArrayBuffer_j = array.buffer
  retain(newArrayBuffer_j)

  // Perform the mutations.
  swap(address_i, address_j)

  // Balance out the retains.
  release(newArrayBuffer_j)
  release(newArrayBuffer_i)

get- and setForMutation
~~~~~~~~~~~~~~~~~~~~~~~

Some collections need finer-grained control over the entire mutation
process. For instance, to support divide-and-conquer algorithms using
slices, sliceable collections must "pin" and "unpin" their buffers
while a slice is being mutated to grant permission for the slice
to mutate the collection in-place while sharing ownership. This
flexibility can be exposed by a pair of accessors that are called
before and after a mutation. The "get" stage produces both the
value to mutate and a state value (whose type must be declared) to
forward to the "set" stage. A pinning accessor can then look something
like this::

  extension Array {
    subscript(range: Range<Int>) -> Slice<Element> {
      // `getForMutation` must declare its return value, a pair of both
      // the value to mutate and a state value that is passed to
      // `setForMutation`.
      getForMutation() -> (Slice<Element>, PinToken) {
        let slice = _makeSlice(range)
        let pinToken = _pin()
        return (slice, pinToken)
      }

      // `setForMutation` receives two arguments--the result of the
      // mutation to write back, and the state value returned by
      // `getForMutation`.
      setForMutation(slice, pinToken) {
        _unpin(pinToken)
        _writeSlice(slice, backToRange: range)
      }
    }
  }

``getForMutation`` and ``setForMutation`` must appear as a pair;
neither one is valid on its own. A ``get`` and ``set`` accessor
should also still be provided for simple read and write operations.
When the compiler has visibility that storage is implemented in
terms of ``getForMutation`` and ``setForMutation``, it lowers a mutable
projection using those accessors as follows::

  // A mutation like this (assume `reverse` is a mutating method):
  array[0...99].reverse()
  // Decomposes to:
  let index = 0...99
  (var slice, let state) = array.`subscript.getForMutation`(index)
  slice.reverse()
  array.`subscript.setForMutation`(index, slice, state)

To support the conservative access pattern,
a `materializeForSet` accessor can be generated from `getForMutation`
and `setForMutation` in an obvious fashion: perform `getForMutation`
and store the state result in its scratch space, and return a
callback that loads the state and hands it off to `setForMutation`.

The beacon of hope for a user-friendly future: Inversion of control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Addressors and ``{get,set}ForMutation`` expose important optimizations
to the standard library, but are undeniably fiddly and unsafe constructs
to expose to users. A more natural model would be to
recognize that a compound mutation is a composition of nested scopes, and
express it in the language that way. A strawman model might look something
like this::

  var foo: T {
    get { return getValue() }
    set { setValue(newValue) }

    // Perform a full in-out mutation. The `next` continuation is of
    // type `(inout T) -> ()` and must be called exactly once
    // with the value to hand off to the nested mutation operation.
    mutate(next) {
      var value = getValue()
      next(&value)
      setValue(value)
    }
  }

This presents a natural model for expressing the lifetime extension concerns
of addressors, and the state maintenance necessary for pinning ``getForMutation``
accessors::

  // An addressing mutator
  mutate(next) {
    withUnsafePointer(&resource) {
      next(&$0.memory)
    }
  }

  // A pinning mutator
  mutate(next) {
    var slice = makeSlice()
    let token = pin()
    next(&slice)
    unpin(token)
    writeBackSlice(slice)
  }

For various semantic and implementation efficiency reasons, we don't want to
literally implement every access as a nesting of closures like this. Doing so
would allow for semantic surprises (a mutate() operation never invoking its
continuation, or doing so multiple times would be disastrous), and would
interfere with the ability for `inout` and `mutating` functions to throw or
otherwise nonlocally exit. However, we could present this model using
*inversion of control*, similar to Python generators or async-await.
A `mutate` operation could `yield` the `inout` reference to its inner value,
and the compiler could enforce that a `yield` occurs exactly once on every
control flow path::

  // An addressing, yielding mutator
  mutate {
    withUnsafePointer(&resource) {
      yield &$0.memory
    }
  }

  // A pinning mutator
  mutate {
    var slice = makeSlice()
    let token = pin()
    yield &slice
    unpin(token)
    writeBackSlice(slice)
  }

This obviously requires more implementation infrastructure than we currently
have, and raises language and library design issues (in particular,
lifetime-extending combinators like ``withUnsafePointer`` would need either
a ``reyields`` kind of decoration, or to become macros), but represents a
promising path toward exposing the full power of the accessor model to
users in an elegant way.

Acceptability
-------------

This whitepaper has mentioned several times that the language team is
prepared to accept such-and-such behavior but not prepared to accept
some other kind of behavior.  Clearly, there is a policy at work.
What is it?

General philosophy
~~~~~~~~~~~~~~~~~~

For any given language problem, a perfect solution would be one which:

* guarantees that all operations complete without crashing or
  corrupting the program state,

* guarantees that all operations produce results according to
  consistent, reliable, and intuitive rules,

* does not limit or require complex interactions with the remainder
  of the language, and

* imposes no performance cost.

These goals are, however, not all simultaneously achievable, and
different languages reach different balances.  Swift's particular
philosophy is as follows:

* The language should be as dynamically safe as possible.
  Straightforward uses of ordinary language features may cause
  dynamic failure, but the should never corrupt the program state.
  Any unsafe language or library features (other than simply calling
  into C code) should be explicitly labeled as unsafe.

  A dynamic failure should mean that the program reliably halts,
  ideally with a message clearly describing the source of the
  failure.  In the future, the language may allow for emergency
  recovery from such failures.

* The language should sit on top of C, relying only on a relatively
  unobtrusive runtime.  Accordingly, the language's interactions
  with C-based technologies should be efficient and obvious.

* The language should allow a static compiler to produce efficient
  code without dynamic instrumentation.  Accordingly, static
  analysis should only be blocked by incomplete information when
  the code uses an obviously abstract language feature (such as
  calling a class method or an unknown function), and the language
  should provide tools to allow programmers to limit such cases.

  (Dynamic instrumentation can, of course, still help, but it
  shouldn't be required for excellent performance.)

General solutions
~~~~~~~~~~~~~~~~~

A language generally has six tools for dealing with code it considers
undesirable.  Some of this terminology is taken from existing
standards, others not.

* The language may nonetheless take steps to ensure that the code
  executes with a reliable result.  Such code is said to have
  *guaranteed behavior*.

* The language may report the code as erroneous before it executes.
  Such code is said to be *ill formed*.

* The language may reliably report the code as having performed an
  illegal operation when it executes.  Such code is said to be
  *asserting* or *aborting*.

* The language may allow the code to produce an arbitrary-but-sound
  result.  Such code is said to have *unspecified behavior* or to
  have produced an *unspecified value*.

* The language may allow the code to produce an unsound result which
  will result in another of these behaviors, but only if used.
  Such code is said to have produced a *trap value*.

* The language may declare the code to be completely outside of the
  guarantees of the language.  Such code is said to have
  *undefined behavior*.

In keeping with its design philosophy, Swift has generally limited
itself to the first four solutions, with two significant exceptions.

The first exception is that Swift provides several explicitly unsafe
language and library features, such as ``UnsafePointer<T>`` and
``unowned(unsafe)``.  The use of these features is generally subject
to undefined behavior rules.

The second exception is that Swift does not make any guarantees about
programs in the presence of race conditions.  It is extremely
difficult to make even weak statements about the behavior of a program
with a race condition without either:

* heavily restricting shared mutable state on a language level, which
  would require invasive changes to how the language interacts with C;

* forcing implicit synchronization when making any change to
  potentially shared memory, which would cripple performance and
  greatly complicate library implementation; or

* using a garbage collector to manage all accessible memory, which
  would impose a very large burden on almost all of Swift's language
  goals.

Therefore, Swift does surrender safety in the presence of races.

Acceptability conditions for storage accesses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Storage access involves a tension between four goals:

* Preserving all changes when making simultaneous modifications to
  distinct subobjects; in other words, avoiding subobject clobbering

* Performing a predictable and intuitive sequence of operations when
  modifying storage that's implemented with a getter and setter

* Avoiding unnecessary copies of a value during a modification,
  especially when this forces a structural copy of a COW value

* Avoiding memory safety holes when accessing storage that's been
  implemented with memory.

Reprojection_ is good at preserving changes, but it introduces extra
copies, and it's less intuitive about how many times getters and
setters will be called.  There may be a place for it anyway, if we're
willing to accept the extra conceptual complexity for computed
storage, but it's not a reasonable primary basis for optimizing the
performance of storage backed by memory.

Solutions permitting in-place modification are more efficient, but
they do have the inherent disadvantage of having to vend the address
of a value before arbitrary interleaving code.  Even if the address
remains valid, and the solution to that avoids subobject clobbering,
there's an unavoidable issue that the write can be lost because the
address became dissociated from the storage.  For example, if your
code passes ``&array[i]`` to a function, you might plausibly argue
that changes to that argument should show up in the ``i``\ th element of
``array`` even if you completely reassign ``array``.  Reprojection_
could make this work, but in-place solutions cannot efficiently do so.
So, for any in-place solution to be acceptable, there does need to be
some rule specifying when it's okay to "lose track" of a change.

Furthermore, the basic behavior of COW means that it's possible to
copy an array with an element under modification and end up sharing
the same buffer, so that the modification will be reflected in a value
that was technically copied beforehand.  For example::

  var array = [1,2,3]
  var oldArray : [Int] = []

  // This function copies array before modifying it, but because that
  // copy is of a value undergoing modification, the copy will use
  // the same buffer and therefore observe updates to the element.
  func foo(element: inout Int) {
    oldArray = array
    element = 4
  }

  // Therefore, oldArray[2] will be 4 after this call.
  foo(&array[2])

Nor can this be fixed by temporarily moving the modified array aside,
because that would prevent simultaneous modifications to different
elements (and, in fact, likely cause them to assert).  So the rule
will also have to allow this.

However, both of these possibilities already come up in the design of
both the library and the optimizer.  The optimizer makes a number of
assumptions about aliasing; for example, the general rule is that
storage bound to an ``inout`` parameter cannot be accessed through
other paths, and while the optimizer is not permitted to compromise
memory safety, it is permitted to introduce exactly this kind of
unexpected behavior where aliasing accesses may or may not the storage
as a consistent entity.

Formal accesses
^^^^^^^^^^^^^^^

That rule leads to an interesting generalization.  Every modification
of storage occurs during a *formal access* (FA) to that storage.  An
FA is also associated with zero or more *designated storage names*
(DSNs), which are ``inout`` arguments in particular execution records.
An FA arises from an l-value expression, and its duration and DSN set
depend on how the l-value is used:

* An l-value which is simply loaded from creates an instantaneous FA
  at the time of the load.  The DSN set is empty.

  Example::

    foo(array)
    // instantaneous FA reading array

* An l-value which is assigned to with ``=`` creates an
  instantaneous FA at the time of the primitive assignment.  The DSN
  set is empty.

  Example::

    array = []
    // instantaneous FA assigning array

  Note that the primitive assignment strictly follows the evaluation
  of both the l-value and r-value expressions of the assignment.
  For example, the following code::

    // object is a variable of class type
    object.array = object.array + [1,2,3]

  produces this sequence of formal accesses::

    // instantaneous FA reading object (in the left-hand side)
    // instantaneous FA reading object (in the right-hand side)
    // instantaneous FA reading object.array (in the right-hand side)
    // evaluation of [1,2,3]
    // evaluation of +
    // instantaneous FA assigning object.array

* An l-value which is passed as an ``inout`` argument to a call
  creates an FA beginning immediately before the call and ending
  immediately after the call.  (This includes calls where an
  argument is implicitly passed ``inout``, such as to mutating
  methods or user-defined assignment operators such as ``+=`` or
  ``++``.) The DSN set contains the ``inout`` argument within the
  call.

  Example::

    func swap<T>(lhs: inout T, rhs: inout T) {}

    // object is a variable of class type
    swap(&leftObject.array, &rightObject.array)

  This results in the following sequence of formal accesses::

    // instantaneous FA reading leftObject
    // instantaneous FA reading rightObject
    // begin FA for inout argument leftObject.array (DSN={lhs})
    // begin FA for inout argument rightObject.array (DSN={rhs})
    // evaluation of swap
    // end FA for inout argument rightObject.array
    // end FA for inout argument leftObject.array

* An l-value which is used as the base of a member storage access
  begins an FA whose duration is the same as the duration of the FA
  for the subobject l-value.  The DSN set is empty.

  Example::

    swap(&leftObject.array[i], &rightObject.array[j])

  This results in the following sequence of formal accesses::

    // instantaneous FA reading leftObject
    // instantaneous FA reading i
    // instantaneous FA reading rightObject
    // instantaneous FA reading j
    // begin FA for subobject base leftObject.array (DSN={})
    // begin FA for inout argument leftObject.array[i] (DSN={lhs})
    // begin FA for subobject base rightObject.array (DSN={})
    // begin FA for inout argument rightObject.array[j] (DSN={rhs})
    // evaluation of swap
    // end FA for subobject base rightObject.array[j]
    // end FA for inout argument rightObject.array
    // end FA for subobject base leftObject.array[i]
    // end FA for inout argument leftObject.array

* An l-value which is the base of an ! operator begins an FA whose
  duration is the same the duration of the FA for the resulting
  l-value.  The DSN set is empty.

  Example::

    // left is a variable of type T
    // right is a variable of type T?
    swap(&left, &right!)

  This results in the following sequence of formal accesses::

    // begin FA for inout argument left (DSN={lhs})
    // begin FA for ! operand right (DSN={})
    // begin FA for inout argument right! (DSN={rhs})
    // evaluation of swap
    // end FA for inout argument right!
    // end FA for ! operand right
    // end FA for inout argument left

* An l-value which is the base of a ? operator begins an FA whose
  duration begins during the formal evaluation of the l-value
  and ends either immediately (if the operand was nil) or at the
  end of the duration of the FA for the resulting l-value.
  In either case, the DSN set is empty.

  Example::

    // left is a variable of optional struct type
    // right is a variable of type Int
    left?.member += right

  This results in the following sequence of formal accesses, assuming
  that ``left`` contains a value::

    // begin FA for ? operand left (DSN={})
    // instantaneous FA reading right (DSN={})
    // begin FA for inout argument left?.member (DSN={lhs})
    // evaluation of +=
    // end FA for inout argument left?.member
    // end FA for ? operand left

The FAs for all ``inout`` arguments to a call begin simultaneously at
a point strictly following the evaluation of all the argument
expressions.  For example, in the call ``foo(&array, array)``, the
evaluation of the second argument produces a defined value, because
the FA for the first argument does not begin until after all the
arguments are formally evaluated.  No code should actually be emitted
during the formal evaluation of ``&array``, but for an expression like
``someClassReference.someArray[i]``, the class r-value and index
expressions would be fully evaluated at that time, and then the
l-value would be kept abstract until the FA begins.  Note that this
requires changes in SILGen's current code generation patterns.

The FA rule for the chaining operator ``?`` is an exception to the
otherwise-simple intuition that formal accesses begin immediately
before the modification begins.  This is necessary because the
evaluation rules for ``?`` may cause arbitrary computation to be
short-circuited, and therefore the operand must be accessed during the
formal evaluation of the l-value.  There were three options here:

* Abandon short-circuiting for assignments to optional l-values.  This
  is a very high price; short-circuiting fits into user intuitions
  about the behavior of the chaining operator, and it can actually be
  quite awkward to replicate with explicit accesses.

* Short-circuit using an instantaneous formal access, then start a
  separate formal access before the actual modification.  In other
  words, evaluation of ``X += Y`` would proceed by first determining
  whether ``X`` exists (capturing the results of any r-value
  components), discarding any projection information derived from
  that, evaluating ``Y``, reprojecting ``X`` again (using the saved
  r-value components and checking again for whether the l-value
  exists), and finally calling the ``+=`` operator.

  If ``X`` involves any sort of computed storage, the steps required
  to evaluate this might be... counter-intuitive.

* Allow the formal access to begin during the formal evaluation of the
  l-value. This means that code like the following will have
  unspecified behavior::

    array[i]?.member = deriveNewValueFrom(array)

In the end, I've gone with the third option.  The intuitive
explanation is that ``array`` has to be accessed early in order to
continue the evaluation of the l-value.  I think that's
comprehensible to users, even if it's not immediately obvious.

Disjoint and non-disjoint formal accesses
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm almost ready to state the core rule about formal accesses, but
first I need to build up a few more definitions.

An *abstract storage location* (ASL) is:

* a global variable declaration;

* an ``inout`` parameter declaration, along with a reference
  to a specific execution record for that function;

* a local variable declaration, along with a reference to a
  specific execution record for that declaration statement;

* a static/class property declaration, along with a type having
  that property;

* a struct/enum instance property declaration, along with an
  ASL for the base;

* a struct/enum subscript declaration, along with a concrete index
  value and an ASL for the base;

* a class instance property declaration, along with an instance of
  that class; or

* a class instance subscript declaration, along with a concrete
  index value and an instance of that class.

Two abstract storage locations may be said to *overlap*.  Overlap
corresponds to the imprecise intuition that a modification of one
location directly alters the value of another location.  Overlap
is an "open" property of the language: every new declaration may
introduce its own overlap behavior.  However, the language and
library make certain assertions about the overlap of some locations:

* An ``inout`` parameter declaration overlaps exactly the set
  of ASLs overlapped by the ASL which was passed as an argument.

* If two ASLs are both implemented with memory, then they overlap
  only if they have the same kind in the above list and the
  corresponding data match:

    * execution records must represent the same execution
    * types must be the same
    * class instances must be the same
    * ASLs must overlap

* For the purposes of the above rule, the subscript of a standard
  library array type is implemented with memory, and the two
  indexes match if they have the same integer value.

* For the purposes of the above rule, the subscript of a standard
  library dictionary type is implemented with memory, and the two
  indexes match if they compare equal with ``==``.

Because this definition is open, it is impossible to completely
statically or dynamically decided it.  However, it would still be
possible to write a dynamic analysis which decided it for common
location kinds.  Such a tool would be useful as part of, say, an
ASan-like dynamic tool to diagnose violations of the
unspecified-behavior rule below.

The overlap rule is vague about computed storage partly because
computed storage can have non-obvious aliasing behavior and partly
because the subobject clobbering caused by the full-value accesses
required by computed storage can introduce unexpected results that can
be reasonably glossed as unspecified values.

This notion of abstract storage location overlap can be applied to
formal accesses as well.  Two FAs ``x`` and ``y`` are said to be
*disjoint* if:

* they refer to non-overlapping abstract storage locations or

* they are the base FAs of two disjoint member storage accesses
  ``x.a`` and ``y.b``.

Given these definitions, the core unspecified-behavior rule is:

  If two non-disjoint FAs have intersecting durations, and neither FA
  is derived from a DSN for the other, then the program has
  unspecified behavior in the following way: if the second FA is a
  load, it yields an unspecified value; otherwise, both FAs store an
  unspecified value in the storage.

Note that you cannot have two loads with intersecting durations,
because the FAs for loads are instantaneous.

Non-overlapping subobject accesses make the base accesses disjoint
because otherwise code like ``swap(&a[0], &a[1])`` would have
unspecified behavior, because the two base FAs are to clearly
overlapping locations and have intersecting durations.

Note that the optimizer's aliasing rule falls out from this rule.  If
storage has been bound as an ``inout`` argument, accesses to it
through any path not derived from the ``inout`` argument will start a
new FA for overlapping storage, the duration of which will necessarily
intersect duration with that of the FA through which the ``inout``
argument was bound, causing unspecified behavior.  If the ``inout``
argument is forwarded to another call, that will start a new FA which
is validly based on a DSN of the first; but an attempt to modify the
storage through the first ``inout`` argument while the second call is
active will create a third FA not based on the DSN from the second
``inout`` call, causing a conflict there.  Therefore a function may
assume that it can see all accesses to the storage bound to an
``inout`` argument.

If you didn't catch all that...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That may have been a somewhat intense description, so here's a simple
summary of the rule being proposed.

If storage is passed to an ``inout`` argument, then any other
simultaneous attempt to read or write to that storage, including to
the storage containing it, will have unspecified behavior.  Reads
from it may see partially-updated values, or even values which will
change as modifications are made to the original storage; and writes
may be clobbered or simply disappear.

But this only applies during the call with the ``inout`` argument: the
evaluation of other arguments to the call will not be interfered with,
and as soon as the call ends, all these modifications will resolve
back to a quiescent state.

And this unspecified behavior has limits.  The storage may end up with
an unexpected value, with only a subset of the writes made to it, and
copies from it may unexpectedly reflect modifications made after they
were copied.  However, the program will otherwise remain in a
consistent and uncorrupted state.  This means that execution will be
able to continue apace as long as these unexpected values don't trip
up some higher-level invariant.

Tracking formal accesses
------------------------

Okay, now that I've analyzed this to death, it's time to make a
concrete proposal about the implementation.

As discussed above, the safety hole with addressors can be fixed by
always retaining the buffer which keeps the address valid.  Assuming
that other uses of the buffer follow the general copy-on-write
pattern, this retain will prevent structural changes to the buffer
while the address is in use.

But, as I also discussed above, this introduces two problems:

Copies during modification
~~~~~~~~~~~~~~~~~~~~~~~~~~

Copying a COW aggregate value always shares the same buffer that was
stored there at the time of the copy; there is no uniqueness check
done as part of the copy.  Changes to subobjects will then be
instantly reflected in the "copy" as they are made to the original.
The structure of the copy will stay the same, but the values of
its subobjects will appear to spontaneously change.

I want to say that this behavior is acceptable according to the
formal-access rule I laid out above.  How does that reasoning work?

First, I need to establish what kind of behavior is at work here.  It
clearly isn't guaranteed behavior: copies of COW values are normally
expected to be independent.  The code wasn't rejected by the compiler,
nor did it dynamically assert; it simply seems to misbehave.  But
there are limits to the misbehavior:

* By general COW rules, there's no way to change the structure of an
  existing buffer unless the retain count is 1.  For the purposes of
  this analysis, that means that, as long as the retain count is
  above 1, there's no way to invalidate the address returned by the
  addressor.

* The buffer will be retained for as long as the returned address
  is being modified.  This retain is independent of any storage
  which might hold the aggregate value (and thus also retain the buffer).

* Because of this retain, the only way for the retain count to drop
  to 1 is for no storage to continue to refer to the buffer.

* But if no storage refers to the buffer, there is no way to
  initiate an operation which would change the buffer structure.

Thus the address will remain valid, and there's no danger of memory
corruption.  The only thing is that the program no longer makes useful
guarantees about the value of the copied aggregate.  In other words,
the copy yielded an unspecified value.

The formal-access rule allows loads from storage to yield an
unspecified value if there's another formal access to that storage in
play and the load is (1) not from an l-value derived from a name in
the other FA's DSN set and (2) not from a non-overlapping subobject.
Are these conditions true?

Recall that an addressor is invoked for an l-value of the form::

  base.pointee

or::

  base[i]

Both cases involve a formal access to the storage ``base`` as the base
of a subobject formal access.  This kind of formal access always has
an empty DSN set, regardless of how the subobject is used.  A COW
mutable addressor will always ensure that the buffer is uniquely
referenced before returning, so the only way that a value containing
that buffer can be copied is if the load is a non-subobject access to
``base``.  Therefore, there are two simultaneous formal accesses to
the same storage, and the load is not from an l-value derived from the
modification's DSN set (which is empty), nor is it for a
non-overlapping subobject.  So the formal-access rule applies, and
an unspecified value is an acceptable result.

The implementation requirement here, then, is simply that the
addressor must be called, and the buffer retained, within the duration
of the formal access.  In other words, the addressor must only
be called immediately prior to the call, rather than at the time
of the formal evaluation of the l-value expression.

What would happen if there *were* a simultaneous load from a
non-overlapping subobject?  Accessing the subobject might cause a
brief copy of ``base``, but only for the duration of copying the
subobject.  If the subobject does not overlap the subobject which was
projected out for the addressor, then this is harmless, because the
addressor will not allow modifications to those subobjects; there
might be other simultaneous formal accesses which do conflict, but
these two do not.  If the subobject does overlap, then a recursive
analysis must be applied; but note that the exception to the
formal-access rule will only apply if non-overlapping subobjects were
projected out from *both* formal accesses.  Otherwise, it will be
acceptable for the access to the overlapping subobject to yield an
unspecified value.

Avoiding subobject clobbering during parallel modification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The other problem is that the retain will prevent simultaneous changes
to the same buffer.  The second change will cause a structural copy,
and the first address will end up modifying a buffer which is no
longer referenced: in other words, the program will observe subobject
clobbering.  A similar analysis to the one from the last section
suggests that this can be described as unspecified behavior.

Unfortunately, this unspecified behavior is unwanted: it violates the
guarantees of the formal-access rule as I laid it out above, because
it occurs even if you have formal accesses to two non-overlapping
subobjects.  So something does need to be done here.

One simple answer is to dynamically track whether a COW buffer is
currently undergoing a non-structural mutation.  I'll call this *NSM
tracking*, and I'll call buffers which are undergoing non-structural
mutations *NSM-active*.

The general rules of COW say that mutating operations must ensure that
their buffer is uniquely referenced before performing the
modification.  NSM tracking works by having non-structural mutations
perform a weaker check: the buffer must be either uniquely referenced
or be NSM-active.  If the non-structural mutation allows arbitrary
code to run between the start of the mutation and the end --- as an
addressor does --- it must both retain the buffer and flag it as
NSM-active for the entire duration.

Because the retain still occurs, and because any *structural* changes
to the buffer that might invalidate the addresses of subobjects are
still blocked by that retain, all of the earlier analysis about the
memory safety of simultaneous accesses still applies.  The only change
is that simultaneous non-structural modifications, as would be created
by simultaneous formal accesses to subobjects, will now be able to
occur on a single buffer.

A set of simultaneous formal accesses on a single thread follows a
natural stack protocol, or can be made to do so with straightforward
SILGen and SIL optimizer consideration.  Therefore, the runtime can
track whether a buffer is NSM-active on a thread using a single bit,
which nested modifications can be told not to clear.  Call this the
*NSM bit*.  Ignoring multithreading considerations for a moment, since
the NSM bit is only ever set at the same as a retain and only ever
cleared at the same time as a release, it makes sense to pack this
into the strong reference count.  There is no need to support this
operation on non-Swift objects.  The runtime should provide three new
functions:

* A function to test whether an object is either uniquely referenced
  or NSM-active.  Call this ``swift_isUniquelyReferencedForNSM``.

* A function to perform the above test and, if the test passes and
  the NSM bit is not set, atomically retain the object and set
  the NSM bit.  It should return both the result of the test and an
  object to later set as NSM-inactive.  That object will be nil if
  the test failed or the NSM bit was already set.  Call this
  ``swift_tryRetainForNSM``.

* A function to atomically clear the NSM bit and release the object.
  Call this ``swift_releaseForNSM``.

These operations should also be reflected in SIL.

Concurrent modifications and the non-structural modification bit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

What about concurrency?  Two concurrent non-structural modifications
could race to set the NSM bit, and then the winning thread could clear
it before the other thread's modification is complete.  This could
cause memory-unsafe behavior, since the losing thread would be
modifying the object through an address while not retaining the value.

The major question here is whether this is a significant objection.
It's accepted that race conditions have undefined behavior.  Is such
code inherently racy?

The answer appears to be "no", and that it is possible to write code
which concurrently writes to existing non-overlapping elements of a
COW aggregate without causing races; but that such code is extremely
fraught, and moreover it is extremely fraught regardless of whether
NSM-activeness is tracked with a single bit or a wider count.  Consider:

* If the shared aggregate value is ever non-uniquely referenced, two
  threads concurrently modifying it will race to unique the array.
  This unavoidably has undefined behavior, because uniquing the
  array requires the previous value to eventually be released, and a
  race may cause an over-release.

* Assume that it's possible to guarantee that the aggregate value's
  buffer is uniquely referenced before any threads concurrently
  access it.  Now, all of the threads are performing different
  concurrent accesses.

  * If any of the accesses is a structural modification, there will
    be a race to re-unique the buffer.

  * If all of the accesses are non-structural modifications, then
    there will be no races as long as the retain-and-set and
    release-and-clear operations are atomic: when starting any
    particular operation, the buffer will always either be uniquely
    referenced or have the bit set.

  * If any of the accesses is a read, and that read does not occur
    during a non-structural modification, then the buffer may
    briefly become non-uniquely referenced and there will be a
    race from concurrent modifications to re-unique it.

  * If any of the accesses is a read, and that read occurs during a
    non-structural modification, and the optimizer does not re-order
    the read's retain/release around the retainForNSM/releaseForNSM
    operations, then it matters how NSM-activeness is tracked.

    If there is complete tracking (i.e. a count, not just a single
    bit), the retain for the read will only occur while the buffer
    is flagged as NSM-active, and so it will have no effect.

    If there is incomplete tracking (i.e. just a single NSM bit),
    then there is a potential for undefined behavior.  Suppose two
    threads race to set the NSM bit.  The loser then initiates a
    read and retains the buffer.  Before the loser releases the
    buffer, the winner clears the NSM bit.  Now another thread might
    see that the buffer is non-uniquely referenced and not
    NSM-active, and so it will attempt to unique the buffer.

    It is probably unreasonable to require the optimizer to never
    reorder ordinary retains and releases past retainForNSM and
    releaseForNSM operations.

More importantly, the use case here (many threads concurrently
accessing different elements of a shared data structure) just
inherently doesn't really work well with a COW data structure.  Even
if the library were able to make enough guarantees to ensure that,
with the right pattern of accesses, there would never be a structural
copy of the aggregate, it would still be extremely inefficient,
because all of the threads would be competing for atomic access to the
strong reference count.

In short, I think it's reasonable for the library to say that programs
which want to do this should always use a type with reference
semantics.  Therefore, it's reasonable to ignore concurrent accesses
when deciding how to best track whether an aggregate is undergoing
non-structural modification.  This removes the only objection I
can see to tracking this with a single NSM bit.

Code generation patterns
~~~~~~~~~~~~~~~~~~~~~~~~

The signatures and access patterns for addressors will need to change
in order to ensure memory-safety.

``mutableAddress`` currently returns an ``UnsafeMutablePointer``; it
will need to return ``(Builtin.NativeObject?, UnsafeMutablePointer)``.
The owner pointer must be a native object; we cannot efficiently
support either uniqueness checking or the NSM bit on non-Swift
objects.  SILGen will mark that the address depends on the owner
reference and push a cleanup to ``releaseForNSM`` it.

``address`` currently returns an ``UnsafePointer``; it will need to
return ``(Builtin.NativeObject?, UnsafePointer)``.  I do not currently
see a reason to allow non-Swift owners, but the model doesn't depend
on that.  SILGen will mark that the address depends on the owner
reference and push a cleanup to ``release`` it.

In order to support ultimately calling an addressor in the
conservative access path, ``materializeForSet`` must also return an
owner reference.  Since ``materializeForSet`` calls ``mutableAddress``
in this case, SILGen will follow that pattern for calls.  SILGen will
also assume that the need to perform a ``releaseForNSM`` is exclusive
with the need to call the setter.

Mutating operations on COW types will now have two different paths for
making a buffer mutable and unique: one for structural mutations and
another for non-structural mutations.  I expect that this will require
separate semantics annotations, and the optimizer will have to
recognize both.

``releaseForNSM`` operations will not be reorderable unless the
optimizer can prove that the objects are distinct.

Summary of proposal and plan
----------------------------

Let me summarize what I'm proposing:

* Swift's core approach to optimizing accesses should be based
  around providing direct access to memory, either statically or
  dynamically.  In other words, Swift should adopt addressors on
  core data structures as much as possible.

* Swift should fix the current memory hole with addressors by
  retaining for the duration of the access and, for modifications,
  flagging the buffer as NSM-active.  The implementation plan
  follows:

  * The runtime implements the NSM-bit and its entrypoints.

  * SIL provides operations for manipulating and querying the NSM
    bit.  IRGen implements these operations using the runtime
    functions.  Builtins are exposed.

  * The standard library changes data structures to do different
    uniquing for structural and non-structural modifications.  This
    patch is not yet committed.

  * The optimizer reacts to the above.  When both are settled, they
    can be committed.

  * SILGen changes the emission patterns for l-values so that
    addresses and writebacks are live only during the formal
    access.

  * Sema changes the signature of ``address``, ``mutableAddress``,
    and ``materializeForSet`` to return an optional owner reference.
    Sema changes ``materializeForSet`` synthesis to return the
    owner correctly.  SILGen implements the desired code patterns.

    The standard library changes its addressor implementations
    to continue to compile, but for staging purposes, it only uses
    nil owners.

  * The standard library changes addressor implementations to
    use meaningful owners.  This patch is not yet committed.

  * The optimizer reacts to the above.  When both are settled, they
    can be committed.