mirror of
https://github.com/apple/swift.git
synced 2025-12-14 20:36:38 +01:00
1508 lines
62 KiB
ReStructuredText
1508 lines
62 KiB
ReStructuredText
:orphan:
|
|
|
|
Optimizing Accessors in Swift
|
|
=============================
|
|
|
|
Definitions
|
|
-----------
|
|
|
|
An abstract storage declaration is a language construct that declares
|
|
a means of accessing some sort of abstract entity. I'll just say
|
|
"storage" hereafter.
|
|
|
|
Swift provides three storage declarations:
|
|
|
|
* a single named entity, called a *variable* and declared with ``var``
|
|
* a single named entity which can never be reassigned, called a *constant* and declared with ``let``
|
|
* a compound unnamed entity accessed with an index, called a *subscript* and declared with ``subscript``
|
|
|
|
These features are similar to those in other languages. Swift notably
|
|
lacks compound named entities, such as C#'s indexed properties; the
|
|
design team intentionally chose to favor the use of named single
|
|
entities of subscriptable type.
|
|
|
|
It's useful to lump the two kinds of single named entities together;
|
|
I'll just call them both "variables" hereafter.
|
|
|
|
Subscripts must always be instance type members. When a variable is
|
|
a type member, it's called a *property*.
|
|
|
|
When I say that these entities are *abstract*, I mean that they're not
|
|
directly tied to any particular implementation. All of them may be
|
|
backed directly by memory storing values of the entity's type, or they
|
|
may simply provide a way of invoking arbitrary code to access a
|
|
logical memory, or they may be a little of both.
|
|
|
|
Full-value accesses
|
|
-------------------
|
|
|
|
All accesses to storage, no matter the implementation, can be performed
|
|
with two primitive operations:
|
|
|
|
* a full-value load, which creates a new copy of the current value
|
|
* a full-value store, which overwrites the current value with a
|
|
different, independent value
|
|
|
|
A function which implements a full-value load is called a *getter*;
|
|
a full-value store, a *setter*.
|
|
|
|
An operation which calls for a full-value load into a temporary, then
|
|
a modification of the temporary, then a full-value store of the
|
|
temporary into the original entity, is called a *write-back*.
|
|
|
|
Implementing accesses with full-value accesses introduces two
|
|
problems: subobject clobbering and performance.
|
|
|
|
Subobject clobbering
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Subobject clobbering is a semantic issue. It occurs when there are
|
|
two changes to an entity in flight at once, as in::
|
|
|
|
swap(&point.x, &point.y)
|
|
|
|
(By "at once", I mean synchronously. Unlike Java, Swift is willing to
|
|
say that an *asynchronous* simultaneous access (mutating or not) to an
|
|
entity that's being modified is completely undefined behavior, with
|
|
any sort of failure permitted, up to and including memory corruption
|
|
and crashes. As Swift transitions towards being a systems language,
|
|
it can selectively choose to define some of this behavior,
|
|
e.g. allowing racing accesses to different elements of an array.)
|
|
|
|
Subobject clobbering is a problem with user expectation. Users are
|
|
unlikely to write code that intentionally modifies two obviously
|
|
overlapping entities, but they might very reasonably write code that
|
|
modifies two "separate" sub-entities. For example, they might write
|
|
code to swap two properties of a struct, or two elements of an array.
|
|
The natural expansion of these subobject accesses using whole-object
|
|
accesses generates code that "loses" the changes made to one of the
|
|
objects::
|
|
|
|
var point0 = point
|
|
var x = point0.x
|
|
var point1 = point
|
|
var y = point1.y
|
|
swap(&x, &y)
|
|
point1.y = y
|
|
point = point1
|
|
point0.x = x
|
|
point = point0
|
|
|
|
Note that ``point.y`` is left unchanged.
|
|
|
|
Local analysis
|
|
^^^^^^^^^^^^^^
|
|
|
|
There are two straightforward solutions to subobject clobbering, both
|
|
reliant on doing local analysis to recognize two accesses which
|
|
obviously alias (like the two references to ``point`` in the above
|
|
example). Once you've done this, you can either:
|
|
|
|
* Emit an error, outlawing such code. This is what Swift currently
|
|
does (but only when the aliasing access must be implemented with
|
|
full-value loads and stores).
|
|
* Use a single temporary, potentially changing the semantics.
|
|
|
|
It's impossible to guarantee the absence of subobject clobbering with
|
|
this analysis without extremely heavy-handed languages changes.
|
|
Fortunately, subobject clobbering is "only" a semantics issue, not a
|
|
memory-safety issue, at least as long as it's implemented with
|
|
full-value accesses.
|
|
|
|
Reprojection
|
|
^^^^^^^^^^^^
|
|
|
|
A more general solution is to re-project the modified subobject from
|
|
scratch before writing it back. That is, you first acquire an initial
|
|
value like normal for the subobject you wish to modify, then modify
|
|
that temporary copy in-place. But instead of then recursively
|
|
consuming all the intermediate temporaries when writing them back, you
|
|
drop them all and recompute the current value, then write the modified
|
|
subobject to it, then write back all the way up. That is::
|
|
|
|
var point0 = point
|
|
var x = point0.x
|
|
var point1 = point
|
|
var y = point1.y
|
|
swap(&x, &y)
|
|
point1 = point // reload point1
|
|
point1.y = y
|
|
point = point1
|
|
point0 = point // reload point0
|
|
point0.x = x
|
|
point = point0
|
|
|
|
In this example, I've applied the solution consistently to all
|
|
accesses, which protects against unseen modifications (e.g. during the
|
|
call to ``swap``) at the cost of performing two extra full-value
|
|
loads.
|
|
|
|
You can heuristically lower this by combining it with a simple local
|
|
analysis and only re-projecting when writing back to other l-values
|
|
besides the last. In other words, generate code that will work as
|
|
long as the entity is not modified behind abstraction barriers or
|
|
through unexpected aliases::
|
|
|
|
var point0 = point
|
|
var x = point0.x
|
|
var point1 = point
|
|
var y = point1.y
|
|
swap(&x, &y)
|
|
point1.y = y // do not reload point1
|
|
point = point1
|
|
point0 = point // reload point0
|
|
point0.x = x
|
|
point = point0
|
|
|
|
Note that, in either solution, you've introduced extra full-value
|
|
loads. This may be quite expensive, and it's not guaranteed to be
|
|
semantically equivalent.
|
|
|
|
Performance
|
|
~~~~~~~~~~~
|
|
|
|
There are three major reasons why full-value accesses are inefficient.
|
|
|
|
Unnecessary subobject accesses
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The first is that they may load or store more than is necessary.
|
|
|
|
As an obvious example, imagine a variable of type ``(Int, Int)``; even
|
|
if my code only accesses the first element of the tuple, full-value
|
|
accesses force me to read or write the second element as well. That
|
|
means that, even if I'm purely overwriting the first element, I
|
|
actually have to perform a full-value load first so that I know what
|
|
value to use for the second element when performing the full-value
|
|
store.
|
|
|
|
Additionally, while unnecessarily loading the second element of an
|
|
``(Int, Int)`` pair might seem trivial, consider that the tuple could
|
|
actually have twenty elements, or that the second element might be
|
|
non-trivial to copy (e.g. if it's a retainable pointer).
|
|
|
|
Abstraction barriers
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
A full-value load or store which you can completely reason about is one
|
|
thing, but if it has to be performed as a call, it can be a major
|
|
performance drag.
|
|
|
|
For one, calls do carry a significant amount of low-level overhead.
|
|
|
|
For another, optimizers must be extremely conservative about what a
|
|
call might do. A retainable pointer might have to be retained and
|
|
later released purely to protect against the possibility that a getter
|
|
might, somehow, cause the pointer to otherwise be deallocated.
|
|
|
|
Furthermore, the conventions of the call might restrict performance.
|
|
One way or another, a getter for a retainable pointer generally
|
|
returns at +1, meaning that as part of the return, it is retained,
|
|
forcing the caller to later release. If the access were instead
|
|
direct to memory, this retain might be avoidable, depending on what
|
|
the caller does with the pointer.
|
|
|
|
Copy-on-write
|
|
^^^^^^^^^^^^^
|
|
|
|
These problems are compounded by copy-on-write (COW) types. In Swift,
|
|
a copy-on-write value embeds an object reference. Copying the value
|
|
has low immediate cost, because it simply retains the existing
|
|
reference. However, modifying a value requires the reference to be
|
|
made unique, generally by copying the data held by the value into a
|
|
fresh object. I'll call this operation a *structural copy* in an
|
|
effort to avoid the more treacherous term "deep copy".
|
|
|
|
COW types are problematic with full-value accesses for several reasons.
|
|
|
|
First, COW types are often used to implement aggregates and thus often
|
|
have several distinguishable subobjects which users are likely to
|
|
think of as independent. This heightens the dangers of subobject
|
|
clobbering.
|
|
|
|
Second, a full-value load of a COW type implies making the object
|
|
reference non-unique. Changing the value at this point will force a
|
|
structural copy. This means that modifying a temporary copy has
|
|
dramatically worse performance compared to modifying the original
|
|
entity in-place. For example::
|
|
|
|
window.name += " (closing)"
|
|
|
|
If ``&window.name`` can be passed directly to the operator, and the
|
|
string buffer is uniquely referenced by that string, then this
|
|
operation may be as cheap as copying a few characters into the tail of
|
|
the buffer. But if this must be done with a write-back, then the
|
|
temporary will never have a unique reference, and there will always
|
|
be an unneeded structural copy.
|
|
|
|
Conservative access patterns
|
|
----------------------------
|
|
|
|
When you know how storage is implemented, it's straightforward to
|
|
generate an optimal access to it. There are several major reasons why
|
|
you might not know how a storage declaration is implemented, though:
|
|
|
|
* It might be an abstract declaration, not a concrete declaration.
|
|
Currently this means a protocol member, but Swift may someday add
|
|
abstract class members.
|
|
|
|
* It might be a non-final class member, where the implementation you
|
|
can see is potentially overridable by a subclass.
|
|
|
|
* It might be a resilient declaration, where you know only that the
|
|
entity exists and know nothing statically about its implementation.
|
|
|
|
In all of these cases, you must generate code that will handle the
|
|
worst possible case, which is that the entity is implemented with a
|
|
getter and a setter. Therefore, the conservative access pattern
|
|
includes opaque getter and setter functions.
|
|
|
|
However, for all the reasons discussed above, using unnecessary
|
|
full-value accesses can be terrible for performance. It's really bad
|
|
if a little conservatism --- e.g. because Swift failed to devirtualize
|
|
a property access --- causes asymptotic inefficiencies. Therefore,
|
|
Swift's native conservative access pattern also includes a third
|
|
accessor which permits direct access to storage when possible. This
|
|
accessor is called ``materializeForSet``.
|
|
|
|
``materializeForSet`` receives an extra argument, which is an
|
|
uninitialized buffer of the value type, and it returns a pointer and a
|
|
flag. When it can provide direct access to storage for the entity, it
|
|
constructs a pointer to the storage and returns false. When it can't,
|
|
it performs a full-value load into the buffer and returns true. The
|
|
caller performs the modification in-place on the returned pointer and
|
|
then, if the flag is true, passes the value to the setter.
|
|
|
|
The overall effect is to enable direct storage access as a dynamic
|
|
optimization when it's impossible as a static optimization.
|
|
|
|
For now, ``materializeForSet`` is always automatically generated based
|
|
on whether the entity is implemented with a computed setter. It is
|
|
possible to imagine data structures that would benefit from having
|
|
this lifted to a user-definable feature; for example, a data structure
|
|
which sometimes holds its elements in memory but sometimes does not.
|
|
|
|
``materializeForSet`` can provide direct access whenever an address
|
|
for the storage can be derived. This includes when the storage is
|
|
implemented with a ``mutableAddress`` accessor, as covered below.
|
|
Observing accessors currently prevent ``materializeForSet`` from
|
|
offering direct access; that's fixable for ``didSet`` using a slightly
|
|
different code pattern, but ``willSet`` is an inherent obstacle.
|
|
|
|
Independent of any of the other optimizations discussed in this
|
|
whitepaper, ``materializeForSet`` had the potential to immediately
|
|
optimize the extremely important case of mutations to COW values in
|
|
un-devirtualized class properties, with fairly minimal risk.
|
|
Therefore, ``materializeForSet`` was implemented first, and it shipped
|
|
in Xcode 6.1.
|
|
|
|
Direct access at computed addresses
|
|
-----------------------------------
|
|
|
|
What entities can be directly accessed in memory? Non-computed
|
|
variables make up an extremely important set of cases; Swift has
|
|
enough built-in knowledge to know that it can provide direct access to
|
|
them. But there are a number of other important cases where the
|
|
address of an entity is not built-in to the compiler, but where direct
|
|
access is nonetheless possible. For example, elements of a simple
|
|
array always have independent storage in memory. Most benchmarks on
|
|
arrays would profit from being able to modify array elements in-place.
|
|
|
|
There's a long chain of proposals in this area, many of which are
|
|
refinement on previous proposals. None of these proposals has yet
|
|
shipped in Xcode.
|
|
|
|
Addressors
|
|
~~~~~~~~~~
|
|
|
|
For something like a simple array (or any similar structure, like a
|
|
deque) which is always backed by a buffer, it makes sense for the
|
|
implementor to simply define accessors which return the address of
|
|
the element. Such accessors are called *addressors*, and there are
|
|
two: ``address`` and ``mutableAddress``.
|
|
|
|
The conservative access pattern can be generated very easily from
|
|
this: the getter calls ``address`` and loads from it, the setter calls
|
|
``mutableAddress`` and stores to it, and ``materializeForSet``
|
|
provides direct access to the address returned from
|
|
``mutableAddress``.
|
|
|
|
If the entity has type ``T``, then ``address`` returns an
|
|
``UnsafePointer<T>`` and ``mutableAddress`` returns an
|
|
``UnsafeMutablePointer<T>``. This means that the formal type of the
|
|
entity must exactly match the formal type of the storage. Thus, the
|
|
standard subscript on ``Dictionary<K,V>`` cannot be implemented using
|
|
addressors, because the formal type of the entity is ``V?``, but the
|
|
backing storage holds a ``V``. (And this is in keeping with user
|
|
expectations about the data structure: assigning ``nil`` at a key is
|
|
supposed to erase any existing entry there, not create a new entry to
|
|
hold ``nil``.)
|
|
|
|
This simple addressor proposal was the first prong of our efforts to
|
|
optimize array element access. Unfortunately, while it is useful for
|
|
several other types (such as ``ContiguousArray`` and
|
|
``UnsafeMutablePointer``), it is not flexible enough for the ``Array``
|
|
type.
|
|
|
|
Mixed addressors
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Swift's chief ``Array`` type is only a simple array when it is not
|
|
interacting with Objective-C. Type bridging requires ``Array`` to be
|
|
able to store an immutable ``NSArray`` instance, and the ``NSArray``
|
|
interface does not expose the details of how it stores elements. An
|
|
``NSArray`` is even permitted to dynamically generate its values in
|
|
its ``objectAtIndex:`` method. And it would be absurd for ``Array``
|
|
to perform a structural copy during a load just to make non-mutating
|
|
accesses more efficient! So the load access pattern for ``Array``'s
|
|
subscript declaration must use a getter.
|
|
|
|
Fortunately, this requirement does not preclude using an addressor for
|
|
mutating accesses. Mutations to ``Array`` always transition the array
|
|
to a unique contiguous buffer representation as their first step.
|
|
This means that the subscript operator can sensibly return an address
|
|
when it's used for the purposes of mutation: in other words, exactly
|
|
when ``mutableAddress`` would be invoked.
|
|
|
|
Therefore, the second prong of our efforts to optimize array element
|
|
access was to allow entities to be implemented with the combination of
|
|
a ``get`` accessor and a ``mutableAddress`` accessor. This is
|
|
straightforward in the user model, where it simply means lifting a
|
|
restriction. It's more complex behind the scenes because it broke
|
|
what was previously a clean conceptual division between "physical" and
|
|
"logical" l-values.
|
|
|
|
Mixed addressors have now been adopted by ``Array`` to great success.
|
|
As expected, they substantially improved performance mutating COW
|
|
array elements. But they also fix an important instance of subobject
|
|
clobbering, because modifications to different subobjects (notably,
|
|
different elements of the same array) can occur simultaneously by
|
|
simply projecting out their addresses in the unique buffer. For
|
|
example, this means that it's possible to simply swap two elements
|
|
of an array directly::
|
|
|
|
swap(&array[i], &array[j])
|
|
|
|
// Expanded:
|
|
array.transitionToUniquelyReferenced()
|
|
let address_i = array.buffer.storage + i
|
|
array.transitionToUniquelyReferenced()
|
|
let address_j = array.buffer.storage + j
|
|
swap(address_i, address_j)
|
|
|
|
Mixed addressors weren't completely implemented until very close to
|
|
the Xcode 6.1 deadline, and they changed code-generation patterns
|
|
enough to break a number of important array-specific optimizations.
|
|
Therefore, the team sensibly decided that they were too risky for that
|
|
release, and that there wasn't enough benefit from other applications
|
|
to justify including any of the addressor work.
|
|
|
|
In a way, that was a fortunate decision, because the naive version of
|
|
addressors implemented so far in Swift creates a safety hole which
|
|
would otherwise have been exposed to users.
|
|
|
|
Memory unsafety of addressors
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The semantics and memory safety of operations on COW types rely on a
|
|
pair of simple rules:
|
|
|
|
* A non-mutating operation must own a reference to the buffer for
|
|
the full course of the read.
|
|
|
|
* A mutating operation must own a unique reference to the buffer
|
|
for the full course of the mutation.
|
|
|
|
Both rules tend to be naturally satisfied by the way that operations
|
|
are organized into methods. A value must own a reference to its
|
|
buffer at the moment that a method is invoked on it. A mutating
|
|
operation immediately transitions the buffer to a unique reference,
|
|
performing a structural copy if necessary. This reference will remain
|
|
valid for the rest of the method as long as the method is *atomic*: as
|
|
long as it does not synchronously invoke arbitrary user code.
|
|
|
|
(This is a single-threaded notion of atomicity. A second thread which
|
|
modifies the value simultaneously can clearly invalidate the
|
|
assumption. But that would necessarily be a data race, and the
|
|
language design team is willing to say that such races have fully
|
|
undefined behavior, and arbitrary consequences like memory corruption
|
|
and crashes are acceptable in their wake.)
|
|
|
|
However, addressors are not atomic in this way: they return an address
|
|
to the caller, which may then interleave arbitrary code before
|
|
completing the operation. This can present the opportunity for
|
|
corruption if the interleaved code modifies the original value.
|
|
Consider the following code::
|
|
|
|
func operate(value: inout Int, count: Int) { ... }
|
|
|
|
var array: [Int] = [1,2,3,4]
|
|
operate(&array[0], { array = []; return 0 }())
|
|
|
|
The dynamic sequence of operations performed here will expand like so::
|
|
|
|
var array: [Int] = [1,2,3,4]
|
|
let address = array.subscript.mutableAddress(0)
|
|
array = []
|
|
operate(address, 0)
|
|
|
|
The assignment to ``array`` within the closure will release the buffer
|
|
containing ``address``, thus passing ``operate`` a dangling pointer.
|
|
|
|
Nor can this be fixed with a purely local analysis; consider::
|
|
|
|
class C { var array: [Int] }
|
|
let global_C = C()
|
|
|
|
func assign(value: inout Int) {
|
|
C.array = []
|
|
value = 0
|
|
}
|
|
|
|
assign(&global_C.array[0])
|
|
|
|
Fixing the memory safety hole
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Conceptually, the correct fix is to guarantee that the rules are
|
|
satisfied by ensuring that the buffer is retained for the duration of
|
|
the operation. Any interleaving modifications will then see a
|
|
non-uniquely-referenced buffer and perform a structural copy::
|
|
|
|
// Project the array element.
|
|
let address = array.subscript.mutableAddress(0)
|
|
|
|
// Remember the new buffer value and keep it retained.
|
|
let newArrayBuffer = array.buffer
|
|
retain(newArrayBuffer)
|
|
|
|
// Reassign the variable.
|
|
release(array.buffer)
|
|
array.buffer = ...
|
|
|
|
// Perform the mutation. These changes will be silently lost, but
|
|
// they at least won't be using deallocated memory.
|
|
operate(address, 0)
|
|
|
|
// Release the "new" buffer.
|
|
release(newArrayBuffer)
|
|
|
|
Note that this still leaves a semantic hole if the original value is
|
|
copied in interleaving code before the modification, because the
|
|
subsequent modification will be reflected in the copy::
|
|
|
|
// Project the array element.
|
|
let address = array.subscript.mutableAddress(0)
|
|
|
|
// Remember the new buffer value and keep it retained.
|
|
let newArrayBuffer = array.buffer
|
|
retain(newArrayBuffer)
|
|
|
|
// Copy the value. Note that arrayCopy uses the same buffer that
|
|
// 'address' points into.
|
|
let arrayCopy = array
|
|
retain(arrayCopy.buffer)
|
|
|
|
// Perform the mutation.
|
|
operate(address, 0)
|
|
|
|
// Release the "new" buffer.
|
|
release(newArrayBuffer)
|
|
|
|
This might be unexpected behavior, but the language team is willing to
|
|
accept unexpected behavior for this code. What's non-negotiable is
|
|
breaking memory safety.
|
|
|
|
Unfortunately, applying this fix naively reintroduces the problem of
|
|
subobject clobbering: since a modification of one subobject
|
|
immediately retains a buffer that's global to the entire value, an
|
|
interleaved modification of a different subobject will see a
|
|
non-unique buffer reference and therefore perform a structural copy.
|
|
The modifications to the first subobject will therefore be silently
|
|
lost.
|
|
|
|
Unlike the interleaving copy case, this is seen as unacceptable.
|
|
Notably, it breaks swapping two array elements::
|
|
|
|
// Original:
|
|
swap(&array[i], &array[j])
|
|
|
|
// Expanded:
|
|
|
|
// Project array[i].
|
|
array.transitionToUniquelyReferenced()
|
|
let address_i = array.buffer.storage + i
|
|
let newArrayBuffer_i = array.buffer
|
|
retain(newArrayBuffer_i)
|
|
|
|
// Project array[j]. Note that this transition is guaranteed
|
|
// to have to do a structural copy.
|
|
array.transitionToUniquelyReferenced()
|
|
let address_j = array.buffer.storage + j
|
|
let newArrayBuffer_j = array.buffer
|
|
retain(newArrayBuffer_j)
|
|
|
|
// Perform the mutations.
|
|
swap(address_i, address_j)
|
|
|
|
// Balance out the retains.
|
|
release(newArrayBuffer_j)
|
|
release(newArrayBuffer_i)
|
|
|
|
get- and setForMutation
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Some collections need finer-grained control over the entire mutation
|
|
process. For instance, to support divide-and-conquer algorithms using
|
|
slices, sliceable collections must "pin" and "unpin" their buffers
|
|
while a slice is being mutated to grant permission for the slice
|
|
to mutate the collection in-place while sharing ownership. This
|
|
flexibility can be exposed by a pair of accessors that are called
|
|
before and after a mutation. The "get" stage produces both the
|
|
value to mutate and a state value (whose type must be declared) to
|
|
forward to the "set" stage. A pinning accessor can then look something
|
|
like this::
|
|
|
|
extension Array {
|
|
subscript(range: Range<Int>) -> Slice<Element> {
|
|
// `getForMutation` must declare its return value, a pair of both
|
|
// the value to mutate and a state value that is passed to
|
|
// `setForMutation`.
|
|
getForMutation() -> (Slice<Element>, PinToken) {
|
|
let slice = _makeSlice(range)
|
|
let pinToken = _pin()
|
|
return (slice, pinToken)
|
|
}
|
|
|
|
// `setForMutation` receives two arguments--the result of the
|
|
// mutation to write back, and the state value returned by
|
|
// `getForMutation`.
|
|
setForMutation(slice, pinToken) {
|
|
_unpin(pinToken)
|
|
_writeSlice(slice, backToRange: range)
|
|
}
|
|
}
|
|
}
|
|
|
|
``getForMutation`` and ``setForMutation`` must appear as a pair;
|
|
neither one is valid on its own. A ``get`` and ``set`` accessor
|
|
should also still be provided for simple read and write operations.
|
|
When the compiler has visibility that storage is implemented in
|
|
terms of ``getForMutation`` and ``setForMutation``, it lowers a mutable
|
|
projection using those accessors as follows::
|
|
|
|
// A mutation like this (assume `reverse` is a mutating method):
|
|
array[0...99].reverse()
|
|
// Decomposes to:
|
|
let index = 0...99
|
|
(var slice, let state) = array.`subscript.getForMutation`(index)
|
|
slice.reverse()
|
|
array.`subscript.setForMutation`(index, slice, state)
|
|
|
|
To support the conservative access pattern,
|
|
a `materializeForSet` accessor can be generated from `getForMutation`
|
|
and `setForMutation` in an obvious fashion: perform `getForMutation`
|
|
and store the state result in its scratch space, and return a
|
|
callback that loads the state and hands it off to `setForMutation`.
|
|
|
|
The beacon of hope for a user-friendly future: Inversion of control
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Addressors and ``{get,set}ForMutation`` expose important optimizations
|
|
to the standard library, but are undeniably fiddly and unsafe constructs
|
|
to expose to users. A more natural model would be to
|
|
recognize that a compound mutation is a composition of nested scopes, and
|
|
express it in the language that way. A strawman model might look something
|
|
like this::
|
|
|
|
var foo: T {
|
|
get { return getValue() }
|
|
set { setValue(newValue) }
|
|
|
|
// Perform a full in-out mutation. The `next` continuation is of
|
|
// type `(inout T) -> ()` and must be called exactly once
|
|
// with the value to hand off to the nested mutation operation.
|
|
mutate(next) {
|
|
var value = getValue()
|
|
next(&value)
|
|
setValue(value)
|
|
}
|
|
}
|
|
|
|
This presents a natural model for expressing the lifetime extension concerns
|
|
of addressors, and the state maintenance necessary for pinning ``getForMutation``
|
|
accessors::
|
|
|
|
// An addressing mutator
|
|
mutate(next) {
|
|
withUnsafePointer(&resource) {
|
|
next(&$0.memory)
|
|
}
|
|
}
|
|
|
|
// A pinning mutator
|
|
mutate(next) {
|
|
var slice = makeSlice()
|
|
let token = pin()
|
|
next(&slice)
|
|
unpin(token)
|
|
writeBackSlice(slice)
|
|
}
|
|
|
|
For various semantic and implementation efficiency reasons, we don't want to
|
|
literally implement every access as a nesting of closures like this. Doing so
|
|
would allow for semantic surprises (a mutate() operation never invoking its
|
|
continuation, or doing so multiple times would be disastrous), and would
|
|
interfere with the ability for `inout` and `mutating` functions to throw or
|
|
otherwise nonlocally exit. However, we could present this model using
|
|
*inversion of control*, similar to Python generators or async-await.
|
|
A `mutate` operation could `yield` the `inout` reference to its inner value,
|
|
and the compiler could enforce that a `yield` occurs exactly once on every
|
|
control flow path::
|
|
|
|
// An addressing, yielding mutator
|
|
mutate {
|
|
withUnsafePointer(&resource) {
|
|
yield &$0.memory
|
|
}
|
|
}
|
|
|
|
// A pinning mutator
|
|
mutate {
|
|
var slice = makeSlice()
|
|
let token = pin()
|
|
yield &slice
|
|
unpin(token)
|
|
writeBackSlice(slice)
|
|
}
|
|
|
|
This obviously requires more implementation infrastructure than we currently
|
|
have, and raises language and library design issues (in particular,
|
|
lifetime-extending combinators like ``withUnsafePointer`` would need either
|
|
a ``reyields`` kind of decoration, or to become macros), but represents a
|
|
promising path toward exposing the full power of the accessor model to
|
|
users in an elegant way.
|
|
|
|
Acceptability
|
|
-------------
|
|
|
|
This whitepaper has mentioned several times that the language team is
|
|
prepared to accept such-and-such behavior but not prepared to accept
|
|
some other kind of behavior. Clearly, there is a policy at work.
|
|
What is it?
|
|
|
|
General philosophy
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
For any given language problem, a perfect solution would be one which:
|
|
|
|
* guarantees that all operations complete without crashing or
|
|
corrupting the program state,
|
|
|
|
* guarantees that all operations produce results according to
|
|
consistent, reliable, and intuitive rules,
|
|
|
|
* does not limit or require complex interactions with the remainder
|
|
of the language, and
|
|
|
|
* imposes no performance cost.
|
|
|
|
These goals are, however, not all simultaneously achievable, and
|
|
different languages reach different balances. Swift's particular
|
|
philosophy is as follows:
|
|
|
|
* The language should be as dynamically safe as possible.
|
|
Straightforward uses of ordinary language features may cause
|
|
dynamic failure, but the should never corrupt the program state.
|
|
Any unsafe language or library features (other than simply calling
|
|
into C code) should be explicitly labeled as unsafe.
|
|
|
|
A dynamic failure should mean that the program reliably halts,
|
|
ideally with a message clearly describing the source of the
|
|
failure. In the future, the language may allow for emergency
|
|
recovery from such failures.
|
|
|
|
* The language should sit on top of C, relying only on a relatively
|
|
unobtrusive runtime. Accordingly, the language's interactions
|
|
with C-based technologies should be efficient and obvious.
|
|
|
|
* The language should allow a static compiler to produce efficient
|
|
code without dynamic instrumentation. Accordingly, static
|
|
analysis should only be blocked by incomplete information when
|
|
the code uses an obviously abstract language feature (such as
|
|
calling a class method or an unknown function), and the language
|
|
should provide tools to allow programmers to limit such cases.
|
|
|
|
(Dynamic instrumentation can, of course, still help, but it
|
|
shouldn't be required for excellent performance.)
|
|
|
|
General solutions
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
A language generally has six tools for dealing with code it considers
|
|
undesirable. Some of this terminology is taken from existing
|
|
standards, others not.
|
|
|
|
* The language may nonetheless take steps to ensure that the code
|
|
executes with a reliable result. Such code is said to have
|
|
*guaranteed behavior*.
|
|
|
|
* The language may report the code as erroneous before it executes.
|
|
Such code is said to be *ill formed*.
|
|
|
|
* The language may reliably report the code as having performed an
|
|
illegal operation when it executes. Such code is said to be
|
|
*asserting* or *aborting*.
|
|
|
|
* The language may allow the code to produce an arbitrary-but-sound
|
|
result. Such code is said to have *unspecified behavior* or to
|
|
have produced an *unspecified value*.
|
|
|
|
* The language may allow the code to produce an unsound result which
|
|
will result in another of these behaviors, but only if used.
|
|
Such code is said to have produced a *trap value*.
|
|
|
|
* The language may declare the code to be completely outside of the
|
|
guarantees of the language. Such code is said to have
|
|
*undefined behavior*.
|
|
|
|
In keeping with its design philosophy, Swift has generally limited
|
|
itself to the first four solutions, with two significant exceptions.
|
|
|
|
The first exception is that Swift provides several explicitly unsafe
|
|
language and library features, such as ``UnsafePointer<T>`` and
|
|
``unowned(unsafe)``. The use of these features is generally subject
|
|
to undefined behavior rules.
|
|
|
|
The second exception is that Swift does not make any guarantees about
|
|
programs in the presence of race conditions. It is extremely
|
|
difficult to make even weak statements about the behavior of a program
|
|
with a race condition without either:
|
|
|
|
* heavily restricting shared mutable state on a language level, which
|
|
would require invasive changes to how the language interacts with C;
|
|
|
|
* forcing implicit synchronization when making any change to
|
|
potentially shared memory, which would cripple performance and
|
|
greatly complicate library implementation; or
|
|
|
|
* using a garbage collector to manage all accessible memory, which
|
|
would impose a very large burden on almost all of Swift's language
|
|
goals.
|
|
|
|
Therefore, Swift does surrender safety in the presence of races.
|
|
|
|
Acceptability conditions for storage accesses
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Storage access involves a tension between four goals:
|
|
|
|
* Preserving all changes when making simultaneous modifications to
|
|
distinct subobjects; in other words, avoiding subobject clobbering
|
|
|
|
* Performing a predictable and intuitive sequence of operations when
|
|
modifying storage that's implemented with a getter and setter
|
|
|
|
* Avoiding unnecessary copies of a value during a modification,
|
|
especially when this forces a structural copy of a COW value
|
|
|
|
* Avoiding memory safety holes when accessing storage that's been
|
|
implemented with memory.
|
|
|
|
Reprojection_ is good at preserving changes, but it introduces extra
|
|
copies, and it's less intuitive about how many times getters and
|
|
setters will be called. There may be a place for it anyway, if we're
|
|
willing to accept the extra conceptual complexity for computed
|
|
storage, but it's not a reasonable primary basis for optimizing the
|
|
performance of storage backed by memory.
|
|
|
|
Solutions permitting in-place modification are more efficient, but
|
|
they do have the inherent disadvantage of having to vend the address
|
|
of a value before arbitrary interleaving code. Even if the address
|
|
remains valid, and the solution to that avoids subobject clobbering,
|
|
there's an unavoidable issue that the write can be lost because the
|
|
address became dissociated from the storage. For example, if your
|
|
code passes ``&array[i]`` to a function, you might plausibly argue
|
|
that changes to that argument should show up in the ``i``\ th element of
|
|
``array`` even if you completely reassign ``array``. Reprojection_
|
|
could make this work, but in-place solutions cannot efficiently do so.
|
|
So, for any in-place solution to be acceptable, there does need to be
|
|
some rule specifying when it's okay to "lose track" of a change.
|
|
|
|
Furthermore, the basic behavior of COW means that it's possible to
|
|
copy an array with an element under modification and end up sharing
|
|
the same buffer, so that the modification will be reflected in a value
|
|
that was technically copied beforehand. For example::
|
|
|
|
var array = [1,2,3]
|
|
var oldArray : [Int] = []
|
|
|
|
// This function copies array before modifying it, but because that
|
|
// copy is of a value undergoing modification, the copy will use
|
|
// the same buffer and therefore observe updates to the element.
|
|
func foo(element: inout Int) {
|
|
oldArray = array
|
|
element = 4
|
|
}
|
|
|
|
// Therefore, oldArray[2] will be 4 after this call.
|
|
foo(&array[2])
|
|
|
|
Nor can this be fixed by temporarily moving the modified array aside,
|
|
because that would prevent simultaneous modifications to different
|
|
elements (and, in fact, likely cause them to assert). So the rule
|
|
will also have to allow this.
|
|
|
|
However, both of these possibilities already come up in the design of
|
|
both the library and the optimizer. The optimizer makes a number of
|
|
assumptions about aliasing; for example, the general rule is that
|
|
storage bound to an ``inout`` parameter cannot be accessed through
|
|
other paths, and while the optimizer is not permitted to compromise
|
|
memory safety, it is permitted to introduce exactly this kind of
|
|
unexpected behavior where aliasing accesses may or may not the storage
|
|
as a consistent entity.
|
|
|
|
Formal accesses
|
|
^^^^^^^^^^^^^^^
|
|
|
|
That rule leads to an interesting generalization. Every modification
|
|
of storage occurs during a *formal access* (FA) to that storage. An
|
|
FA is also associated with zero or more *designated storage names*
|
|
(DSNs), which are ``inout`` arguments in particular execution records.
|
|
An FA arises from an l-value expression, and its duration and DSN set
|
|
depend on how the l-value is used:
|
|
|
|
* An l-value which is simply loaded from creates an instantaneous FA
|
|
at the time of the load. The DSN set is empty.
|
|
|
|
Example::
|
|
|
|
foo(array)
|
|
// instantaneous FA reading array
|
|
|
|
* An l-value which is assigned to with ``=`` creates an
|
|
instantaneous FA at the time of the primitive assignment. The DSN
|
|
set is empty.
|
|
|
|
Example::
|
|
|
|
array = []
|
|
// instantaneous FA assigning array
|
|
|
|
Note that the primitive assignment strictly follows the evaluation
|
|
of both the l-value and r-value expressions of the assignment.
|
|
For example, the following code::
|
|
|
|
// object is a variable of class type
|
|
object.array = object.array + [1,2,3]
|
|
|
|
produces this sequence of formal accesses::
|
|
|
|
// instantaneous FA reading object (in the left-hand side)
|
|
// instantaneous FA reading object (in the right-hand side)
|
|
// instantaneous FA reading object.array (in the right-hand side)
|
|
// evaluation of [1,2,3]
|
|
// evaluation of +
|
|
// instantaneous FA assigning object.array
|
|
|
|
* An l-value which is passed as an ``inout`` argument to a call
|
|
creates an FA beginning immediately before the call and ending
|
|
immediately after the call. (This includes calls where an
|
|
argument is implicitly passed ``inout``, such as to mutating
|
|
methods or user-defined assignment operators such as ``+=`` or
|
|
``++``.) The DSN set contains the ``inout`` argument within the
|
|
call.
|
|
|
|
Example::
|
|
|
|
func swap<T>(lhs: inout T, rhs: inout T) {}
|
|
|
|
// object is a variable of class type
|
|
swap(&leftObject.array, &rightObject.array)
|
|
|
|
This results in the following sequence of formal accesses::
|
|
|
|
// instantaneous FA reading leftObject
|
|
// instantaneous FA reading rightObject
|
|
// begin FA for inout argument leftObject.array (DSN={lhs})
|
|
// begin FA for inout argument rightObject.array (DSN={rhs})
|
|
// evaluation of swap
|
|
// end FA for inout argument rightObject.array
|
|
// end FA for inout argument leftObject.array
|
|
|
|
* An l-value which is used as the base of a member storage access
|
|
begins an FA whose duration is the same as the duration of the FA
|
|
for the subobject l-value. The DSN set is empty.
|
|
|
|
Example::
|
|
|
|
swap(&leftObject.array[i], &rightObject.array[j])
|
|
|
|
This results in the following sequence of formal accesses::
|
|
|
|
// instantaneous FA reading leftObject
|
|
// instantaneous FA reading i
|
|
// instantaneous FA reading rightObject
|
|
// instantaneous FA reading j
|
|
// begin FA for subobject base leftObject.array (DSN={})
|
|
// begin FA for inout argument leftObject.array[i] (DSN={lhs})
|
|
// begin FA for subobject base rightObject.array (DSN={})
|
|
// begin FA for inout argument rightObject.array[j] (DSN={rhs})
|
|
// evaluation of swap
|
|
// end FA for subobject base rightObject.array[j]
|
|
// end FA for inout argument rightObject.array
|
|
// end FA for subobject base leftObject.array[i]
|
|
// end FA for inout argument leftObject.array
|
|
|
|
* An l-value which is the base of an ! operator begins an FA whose
|
|
duration is the same the duration of the FA for the resulting
|
|
l-value. The DSN set is empty.
|
|
|
|
Example::
|
|
|
|
// left is a variable of type T
|
|
// right is a variable of type T?
|
|
swap(&left, &right!)
|
|
|
|
This results in the following sequence of formal accesses::
|
|
|
|
// begin FA for inout argument left (DSN={lhs})
|
|
// begin FA for ! operand right (DSN={})
|
|
// begin FA for inout argument right! (DSN={rhs})
|
|
// evaluation of swap
|
|
// end FA for inout argument right!
|
|
// end FA for ! operand right
|
|
// end FA for inout argument left
|
|
|
|
* An l-value which is the base of a ? operator begins an FA whose
|
|
duration begins during the formal evaluation of the l-value
|
|
and ends either immediately (if the operand was nil) or at the
|
|
end of the duration of the FA for the resulting l-value.
|
|
In either case, the DSN set is empty.
|
|
|
|
Example::
|
|
|
|
// left is a variable of optional struct type
|
|
// right is a variable of type Int
|
|
left?.member += right
|
|
|
|
This results in the following sequence of formal accesses, assuming
|
|
that ``left`` contains a value::
|
|
|
|
// begin FA for ? operand left (DSN={})
|
|
// instantaneous FA reading right (DSN={})
|
|
// begin FA for inout argument left?.member (DSN={lhs})
|
|
// evaluation of +=
|
|
// end FA for inout argument left?.member
|
|
// end FA for ? operand left
|
|
|
|
The FAs for all ``inout`` arguments to a call begin simultaneously at
|
|
a point strictly following the evaluation of all the argument
|
|
expressions. For example, in the call ``foo(&array, array)``, the
|
|
evaluation of the second argument produces a defined value, because
|
|
the FA for the first argument does not begin until after all the
|
|
arguments are formally evaluated. No code should actually be emitted
|
|
during the formal evaluation of ``&array``, but for an expression like
|
|
``someClassReference.someArray[i]``, the class r-value and index
|
|
expressions would be fully evaluated at that time, and then the
|
|
l-value would be kept abstract until the FA begins. Note that this
|
|
requires changes in SILGen's current code generation patterns.
|
|
|
|
The FA rule for the chaining operator ``?`` is an exception to the
|
|
otherwise-simple intuition that formal accesses begin immediately
|
|
before the modification begins. This is necessary because the
|
|
evaluation rules for ``?`` may cause arbitrary computation to be
|
|
short-circuited, and therefore the operand must be accessed during the
|
|
formal evaluation of the l-value. There were three options here:
|
|
|
|
* Abandon short-circuiting for assignments to optional l-values. This
|
|
is a very high price; short-circuiting fits into user intuitions
|
|
about the behavior of the chaining operator, and it can actually be
|
|
quite awkward to replicate with explicit accesses.
|
|
|
|
* Short-circuit using an instantaneous formal access, then start a
|
|
separate formal access before the actual modification. In other
|
|
words, evaluation of ``X += Y`` would proceed by first determining
|
|
whether ``X`` exists (capturing the results of any r-value
|
|
components), discarding any projection information derived from
|
|
that, evaluating ``Y``, reprojecting ``X`` again (using the saved
|
|
r-value components and checking again for whether the l-value
|
|
exists), and finally calling the ``+=`` operator.
|
|
|
|
If ``X`` involves any sort of computed storage, the steps required
|
|
to evaluate this might be... counter-intuitive.
|
|
|
|
* Allow the formal access to begin during the formal evaluation of the
|
|
l-value. This means that code like the following will have
|
|
unspecified behavior::
|
|
|
|
array[i]?.member = deriveNewValueFrom(array)
|
|
|
|
In the end, I've gone with the third option. The intuitive
|
|
explanation is that ``array`` has to be accessed early in order to
|
|
continue the evaluation of the l-value. I think that's
|
|
comprehensible to users, even if it's not immediately obvious.
|
|
|
|
Disjoint and non-disjoint formal accesses
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
I'm almost ready to state the core rule about formal accesses, but
|
|
first I need to build up a few more definitions.
|
|
|
|
An *abstract storage location* (ASL) is:
|
|
|
|
* a global variable declaration;
|
|
|
|
* an ``inout`` parameter declaration, along with a reference
|
|
to a specific execution record for that function;
|
|
|
|
* a local variable declaration, along with a reference to a
|
|
specific execution record for that declaration statement;
|
|
|
|
* a static/class property declaration, along with a type having
|
|
that property;
|
|
|
|
* a struct/enum instance property declaration, along with an
|
|
ASL for the base;
|
|
|
|
* a struct/enum subscript declaration, along with a concrete index
|
|
value and an ASL for the base;
|
|
|
|
* a class instance property declaration, along with an instance of
|
|
that class; or
|
|
|
|
* a class instance subscript declaration, along with a concrete
|
|
index value and an instance of that class.
|
|
|
|
Two abstract storage locations may be said to *overlap*. Overlap
|
|
corresponds to the imprecise intuition that a modification of one
|
|
location directly alters the value of another location. Overlap
|
|
is an "open" property of the language: every new declaration may
|
|
introduce its own overlap behavior. However, the language and
|
|
library make certain assertions about the overlap of some locations:
|
|
|
|
* An ``inout`` parameter declaration overlaps exactly the set
|
|
of ASLs overlapped by the ASL which was passed as an argument.
|
|
|
|
* If two ASLs are both implemented with memory, then they overlap
|
|
only if they have the same kind in the above list and the
|
|
corresponding data match:
|
|
|
|
* execution records must represent the same execution
|
|
* types must be the same
|
|
* class instances must be the same
|
|
* ASLs must overlap
|
|
|
|
* For the purposes of the above rule, the subscript of a standard
|
|
library array type is implemented with memory, and the two
|
|
indexes match if they have the same integer value.
|
|
|
|
* For the purposes of the above rule, the subscript of a standard
|
|
library dictionary type is implemented with memory, and the two
|
|
indexes match if they compare equal with ``==``.
|
|
|
|
Because this definition is open, it is impossible to completely
|
|
statically or dynamically decided it. However, it would still be
|
|
possible to write a dynamic analysis which decided it for common
|
|
location kinds. Such a tool would be useful as part of, say, an
|
|
ASan-like dynamic tool to diagnose violations of the
|
|
unspecified-behavior rule below.
|
|
|
|
The overlap rule is vague about computed storage partly because
|
|
computed storage can have non-obvious aliasing behavior and partly
|
|
because the subobject clobbering caused by the full-value accesses
|
|
required by computed storage can introduce unexpected results that can
|
|
be reasonably glossed as unspecified values.
|
|
|
|
This notion of abstract storage location overlap can be applied to
|
|
formal accesses as well. Two FAs ``x`` and ``y`` are said to be
|
|
*disjoint* if:
|
|
|
|
* they refer to non-overlapping abstract storage locations or
|
|
|
|
* they are the base FAs of two disjoint member storage accesses
|
|
``x.a`` and ``y.b``.
|
|
|
|
Given these definitions, the core unspecified-behavior rule is:
|
|
|
|
If two non-disjoint FAs have intersecting durations, and neither FA
|
|
is derived from a DSN for the other, then the program has
|
|
unspecified behavior in the following way: if the second FA is a
|
|
load, it yields an unspecified value; otherwise, both FAs store an
|
|
unspecified value in the storage.
|
|
|
|
Note that you cannot have two loads with intersecting durations,
|
|
because the FAs for loads are instantaneous.
|
|
|
|
Non-overlapping subobject accesses make the base accesses disjoint
|
|
because otherwise code like ``swap(&a[0], &a[1])`` would have
|
|
unspecified behavior, because the two base FAs are to clearly
|
|
overlapping locations and have intersecting durations.
|
|
|
|
Note that the optimizer's aliasing rule falls out from this rule. If
|
|
storage has been bound as an ``inout`` argument, accesses to it
|
|
through any path not derived from the ``inout`` argument will start a
|
|
new FA for overlapping storage, the duration of which will necessarily
|
|
intersect duration with that of the FA through which the ``inout``
|
|
argument was bound, causing unspecified behavior. If the ``inout``
|
|
argument is forwarded to another call, that will start a new FA which
|
|
is validly based on a DSN of the first; but an attempt to modify the
|
|
storage through the first ``inout`` argument while the second call is
|
|
active will create a third FA not based on the DSN from the second
|
|
``inout`` call, causing a conflict there. Therefore a function may
|
|
assume that it can see all accesses to the storage bound to an
|
|
``inout`` argument.
|
|
|
|
If you didn't catch all that...
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
That may have been a somewhat intense description, so here's a simple
|
|
summary of the rule being proposed.
|
|
|
|
If storage is passed to an ``inout`` argument, then any other
|
|
simultaneous attempt to read or write to that storage, including to
|
|
the storage containing it, will have unspecified behavior. Reads
|
|
from it may see partially-updated values, or even values which will
|
|
change as modifications are made to the original storage; and writes
|
|
may be clobbered or simply disappear.
|
|
|
|
But this only applies during the call with the ``inout`` argument: the
|
|
evaluation of other arguments to the call will not be interfered with,
|
|
and as soon as the call ends, all these modifications will resolve
|
|
back to a quiescent state.
|
|
|
|
And this unspecified behavior has limits. The storage may end up with
|
|
an unexpected value, with only a subset of the writes made to it, and
|
|
copies from it may unexpectedly reflect modifications made after they
|
|
were copied. However, the program will otherwise remain in a
|
|
consistent and uncorrupted state. This means that execution will be
|
|
able to continue apace as long as these unexpected values don't trip
|
|
up some higher-level invariant.
|
|
|
|
Tracking formal accesses
|
|
------------------------
|
|
|
|
Okay, now that I've analyzed this to death, it's time to make a
|
|
concrete proposal about the implementation.
|
|
|
|
As discussed above, the safety hole with addressors can be fixed by
|
|
always retaining the buffer which keeps the address valid. Assuming
|
|
that other uses of the buffer follow the general copy-on-write
|
|
pattern, this retain will prevent structural changes to the buffer
|
|
while the address is in use.
|
|
|
|
But, as I also discussed above, this introduces two problems:
|
|
|
|
Copies during modification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Copying a COW aggregate value always shares the same buffer that was
|
|
stored there at the time of the copy; there is no uniqueness check
|
|
done as part of the copy. Changes to subobjects will then be
|
|
instantly reflected in the "copy" as they are made to the original.
|
|
The structure of the copy will stay the same, but the values of
|
|
its subobjects will appear to spontaneously change.
|
|
|
|
I want to say that this behavior is acceptable according to the
|
|
formal-access rule I laid out above. How does that reasoning work?
|
|
|
|
First, I need to establish what kind of behavior is at work here. It
|
|
clearly isn't guaranteed behavior: copies of COW values are normally
|
|
expected to be independent. The code wasn't rejected by the compiler,
|
|
nor did it dynamically assert; it simply seems to misbehave. But
|
|
there are limits to the misbehavior:
|
|
|
|
* By general COW rules, there's no way to change the structure of an
|
|
existing buffer unless the retain count is 1. For the purposes of
|
|
this analysis, that means that, as long as the retain count is
|
|
above 1, there's no way to invalidate the address returned by the
|
|
addressor.
|
|
|
|
* The buffer will be retained for as long as the returned address
|
|
is being modified. This retain is independent of any storage
|
|
which might hold the aggregate value (and thus also retain the buffer).
|
|
|
|
* Because of this retain, the only way for the retain count to drop
|
|
to 1 is for no storage to continue to refer to the buffer.
|
|
|
|
* But if no storage refers to the buffer, there is no way to
|
|
initiate an operation which would change the buffer structure.
|
|
|
|
Thus the address will remain valid, and there's no danger of memory
|
|
corruption. The only thing is that the program no longer makes useful
|
|
guarantees about the value of the copied aggregate. In other words,
|
|
the copy yielded an unspecified value.
|
|
|
|
The formal-access rule allows loads from storage to yield an
|
|
unspecified value if there's another formal access to that storage in
|
|
play and the load is (1) not from an l-value derived from a name in
|
|
the other FA's DSN set and (2) not from a non-overlapping subobject.
|
|
Are these conditions true?
|
|
|
|
Recall that an addressor is invoked for an l-value of the form::
|
|
|
|
base.pointee
|
|
|
|
or::
|
|
|
|
base[i]
|
|
|
|
Both cases involve a formal access to the storage ``base`` as the base
|
|
of a subobject formal access. This kind of formal access always has
|
|
an empty DSN set, regardless of how the subobject is used. A COW
|
|
mutable addressor will always ensure that the buffer is uniquely
|
|
referenced before returning, so the only way that a value containing
|
|
that buffer can be copied is if the load is a non-subobject access to
|
|
``base``. Therefore, there are two simultaneous formal accesses to
|
|
the same storage, and the load is not from an l-value derived from the
|
|
modification's DSN set (which is empty), nor is it for a
|
|
non-overlapping subobject. So the formal-access rule applies, and
|
|
an unspecified value is an acceptable result.
|
|
|
|
The implementation requirement here, then, is simply that the
|
|
addressor must be called, and the buffer retained, within the duration
|
|
of the formal access. In other words, the addressor must only
|
|
be called immediately prior to the call, rather than at the time
|
|
of the formal evaluation of the l-value expression.
|
|
|
|
What would happen if there *were* a simultaneous load from a
|
|
non-overlapping subobject? Accessing the subobject might cause a
|
|
brief copy of ``base``, but only for the duration of copying the
|
|
subobject. If the subobject does not overlap the subobject which was
|
|
projected out for the addressor, then this is harmless, because the
|
|
addressor will not allow modifications to those subobjects; there
|
|
might be other simultaneous formal accesses which do conflict, but
|
|
these two do not. If the subobject does overlap, then a recursive
|
|
analysis must be applied; but note that the exception to the
|
|
formal-access rule will only apply if non-overlapping subobjects were
|
|
projected out from *both* formal accesses. Otherwise, it will be
|
|
acceptable for the access to the overlapping subobject to yield an
|
|
unspecified value.
|
|
|
|
Avoiding subobject clobbering during parallel modification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The other problem is that the retain will prevent simultaneous changes
|
|
to the same buffer. The second change will cause a structural copy,
|
|
and the first address will end up modifying a buffer which is no
|
|
longer referenced: in other words, the program will observe subobject
|
|
clobbering. A similar analysis to the one from the last section
|
|
suggests that this can be described as unspecified behavior.
|
|
|
|
Unfortunately, this unspecified behavior is unwanted: it violates the
|
|
guarantees of the formal-access rule as I laid it out above, because
|
|
it occurs even if you have formal accesses to two non-overlapping
|
|
subobjects. So something does need to be done here.
|
|
|
|
One simple answer is to dynamically track whether a COW buffer is
|
|
currently undergoing a non-structural mutation. I'll call this *NSM
|
|
tracking*, and I'll call buffers which are undergoing non-structural
|
|
mutations *NSM-active*.
|
|
|
|
The general rules of COW say that mutating operations must ensure that
|
|
their buffer is uniquely referenced before performing the
|
|
modification. NSM tracking works by having non-structural mutations
|
|
perform a weaker check: the buffer must be either uniquely referenced
|
|
or be NSM-active. If the non-structural mutation allows arbitrary
|
|
code to run between the start of the mutation and the end --- as an
|
|
addressor does --- it must both retain the buffer and flag it as
|
|
NSM-active for the entire duration.
|
|
|
|
Because the retain still occurs, and because any *structural* changes
|
|
to the buffer that might invalidate the addresses of subobjects are
|
|
still blocked by that retain, all of the earlier analysis about the
|
|
memory safety of simultaneous accesses still applies. The only change
|
|
is that simultaneous non-structural modifications, as would be created
|
|
by simultaneous formal accesses to subobjects, will now be able to
|
|
occur on a single buffer.
|
|
|
|
A set of simultaneous formal accesses on a single thread follows a
|
|
natural stack protocol, or can be made to do so with straightforward
|
|
SILGen and SIL optimizer consideration. Therefore, the runtime can
|
|
track whether a buffer is NSM-active on a thread using a single bit,
|
|
which nested modifications can be told not to clear. Call this the
|
|
*NSM bit*. Ignoring multithreading considerations for a moment, since
|
|
the NSM bit is only ever set at the same as a retain and only ever
|
|
cleared at the same time as a release, it makes sense to pack this
|
|
into the strong reference count. There is no need to support this
|
|
operation on non-Swift objects. The runtime should provide three new
|
|
functions:
|
|
|
|
* A function to test whether an object is either uniquely referenced
|
|
or NSM-active. Call this ``swift_isUniquelyReferencedForNSM``.
|
|
|
|
* A function to perform the above test and, if the test passes and
|
|
the NSM bit is not set, atomically retain the object and set
|
|
the NSM bit. It should return both the result of the test and an
|
|
object to later set as NSM-inactive. That object will be nil if
|
|
the test failed or the NSM bit was already set. Call this
|
|
``swift_tryRetainForNSM``.
|
|
|
|
* A function to atomically clear the NSM bit and release the object.
|
|
Call this ``swift_releaseForNSM``.
|
|
|
|
These operations should also be reflected in SIL.
|
|
|
|
Concurrent modifications and the non-structural modification bit
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
What about concurrency? Two concurrent non-structural modifications
|
|
could race to set the NSM bit, and then the winning thread could clear
|
|
it before the other thread's modification is complete. This could
|
|
cause memory-unsafe behavior, since the losing thread would be
|
|
modifying the object through an address while not retaining the value.
|
|
|
|
The major question here is whether this is a significant objection.
|
|
It's accepted that race conditions have undefined behavior. Is such
|
|
code inherently racy?
|
|
|
|
The answer appears to be "no", and that it is possible to write code
|
|
which concurrently writes to existing non-overlapping elements of a
|
|
COW aggregate without causing races; but that such code is extremely
|
|
fraught, and moreover it is extremely fraught regardless of whether
|
|
NSM-activeness is tracked with a single bit or a wider count. Consider:
|
|
|
|
* If the shared aggregate value is ever non-uniquely referenced, two
|
|
threads concurrently modifying it will race to unique the array.
|
|
This unavoidably has undefined behavior, because uniquing the
|
|
array requires the previous value to eventually be released, and a
|
|
race may cause an over-release.
|
|
|
|
* Assume that it's possible to guarantee that the aggregate value's
|
|
buffer is uniquely referenced before any threads concurrently
|
|
access it. Now, all of the threads are performing different
|
|
concurrent accesses.
|
|
|
|
* If any of the accesses is a structural modification, there will
|
|
be a race to re-unique the buffer.
|
|
|
|
* If all of the accesses are non-structural modifications, then
|
|
there will be no races as long as the retain-and-set and
|
|
release-and-clear operations are atomic: when starting any
|
|
particular operation, the buffer will always either be uniquely
|
|
referenced or have the bit set.
|
|
|
|
* If any of the accesses is a read, and that read does not occur
|
|
during a non-structural modification, then the buffer may
|
|
briefly become non-uniquely referenced and there will be a
|
|
race from concurrent modifications to re-unique it.
|
|
|
|
* If any of the accesses is a read, and that read occurs during a
|
|
non-structural modification, and the optimizer does not re-order
|
|
the read's retain/release around the retainForNSM/releaseForNSM
|
|
operations, then it matters how NSM-activeness is tracked.
|
|
|
|
If there is complete tracking (i.e. a count, not just a single
|
|
bit), the retain for the read will only occur while the buffer
|
|
is flagged as NSM-active, and so it will have no effect.
|
|
|
|
If there is incomplete tracking (i.e. just a single NSM bit),
|
|
then there is a potential for undefined behavior. Suppose two
|
|
threads race to set the NSM bit. The loser then initiates a
|
|
read and retains the buffer. Before the loser releases the
|
|
buffer, the winner clears the NSM bit. Now another thread might
|
|
see that the buffer is non-uniquely referenced and not
|
|
NSM-active, and so it will attempt to unique the buffer.
|
|
|
|
It is probably unreasonable to require the optimizer to never
|
|
reorder ordinary retains and releases past retainForNSM and
|
|
releaseForNSM operations.
|
|
|
|
More importantly, the use case here (many threads concurrently
|
|
accessing different elements of a shared data structure) just
|
|
inherently doesn't really work well with a COW data structure. Even
|
|
if the library were able to make enough guarantees to ensure that,
|
|
with the right pattern of accesses, there would never be a structural
|
|
copy of the aggregate, it would still be extremely inefficient,
|
|
because all of the threads would be competing for atomic access to the
|
|
strong reference count.
|
|
|
|
In short, I think it's reasonable for the library to say that programs
|
|
which want to do this should always use a type with reference
|
|
semantics. Therefore, it's reasonable to ignore concurrent accesses
|
|
when deciding how to best track whether an aggregate is undergoing
|
|
non-structural modification. This removes the only objection I
|
|
can see to tracking this with a single NSM bit.
|
|
|
|
Code generation patterns
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The signatures and access patterns for addressors will need to change
|
|
in order to ensure memory-safety.
|
|
|
|
``mutableAddress`` currently returns an ``UnsafeMutablePointer``; it
|
|
will need to return ``(Builtin.NativeObject?, UnsafeMutablePointer)``.
|
|
The owner pointer must be a native object; we cannot efficiently
|
|
support either uniqueness checking or the NSM bit on non-Swift
|
|
objects. SILGen will mark that the address depends on the owner
|
|
reference and push a cleanup to ``releaseForNSM`` it.
|
|
|
|
``address`` currently returns an ``UnsafePointer``; it will need to
|
|
return ``(Builtin.NativeObject?, UnsafePointer)``. I do not currently
|
|
see a reason to allow non-Swift owners, but the model doesn't depend
|
|
on that. SILGen will mark that the address depends on the owner
|
|
reference and push a cleanup to ``release`` it.
|
|
|
|
In order to support ultimately calling an addressor in the
|
|
conservative access path, ``materializeForSet`` must also return an
|
|
owner reference. Since ``materializeForSet`` calls ``mutableAddress``
|
|
in this case, SILGen will follow that pattern for calls. SILGen will
|
|
also assume that the need to perform a ``releaseForNSM`` is exclusive
|
|
with the need to call the setter.
|
|
|
|
Mutating operations on COW types will now have two different paths for
|
|
making a buffer mutable and unique: one for structural mutations and
|
|
another for non-structural mutations. I expect that this will require
|
|
separate semantics annotations, and the optimizer will have to
|
|
recognize both.
|
|
|
|
``releaseForNSM`` operations will not be reorderable unless the
|
|
optimizer can prove that the objects are distinct.
|
|
|
|
Summary of proposal and plan
|
|
----------------------------
|
|
|
|
Let me summarize what I'm proposing:
|
|
|
|
* Swift's core approach to optimizing accesses should be based
|
|
around providing direct access to memory, either statically or
|
|
dynamically. In other words, Swift should adopt addressors on
|
|
core data structures as much as possible.
|
|
|
|
* Swift should fix the current memory hole with addressors by
|
|
retaining for the duration of the access and, for modifications,
|
|
flagging the buffer as NSM-active. The implementation plan
|
|
follows:
|
|
|
|
* The runtime implements the NSM-bit and its entrypoints.
|
|
|
|
* SIL provides operations for manipulating and querying the NSM
|
|
bit. IRGen implements these operations using the runtime
|
|
functions. Builtins are exposed.
|
|
|
|
* The standard library changes data structures to do different
|
|
uniquing for structural and non-structural modifications. This
|
|
patch is not yet committed.
|
|
|
|
* The optimizer reacts to the above. When both are settled, they
|
|
can be committed.
|
|
|
|
* SILGen changes the emission patterns for l-values so that
|
|
addresses and writebacks are live only during the formal
|
|
access.
|
|
|
|
* Sema changes the signature of ``address``, ``mutableAddress``,
|
|
and ``materializeForSet`` to return an optional owner reference.
|
|
Sema changes ``materializeForSet`` synthesis to return the
|
|
owner correctly. SILGen implements the desired code patterns.
|
|
|
|
The standard library changes its addressor implementations
|
|
to continue to compile, but for staging purposes, it only uses
|
|
nil owners.
|
|
|
|
* The standard library changes addressor implementations to
|
|
use meaningful owners. This patch is not yet committed.
|
|
|
|
* The optimizer reacts to the above. When both are settled, they
|
|
can be committed.
|