swift-mirror/docs/ARCOptimization.rst

:orphan:

.. highlight:: sil

==========================
ARC Optimization for Swift
==========================

.. contents::

.. admonition:: TODO

   This is an evolving document on ARC optimization in the Swift
   compiler. Please extend it.

Terms
=====

Some terms that are used often times in this document that must be
defined. These may have more general definitions else where, but we define them
with enough information for our purposes here:

1. Reference type: This is referring to a retainable pointer, not an aggregate
   that can contain a reference counted value.
2. A trivial type: A type for which a retain_value on a value of this type is a
   no-op.

Reference Counting Instructions
===============================

- strong_retain
- strong_release
- strong_retain_unowned
- unowned_retain
- unowned_release
- load_weak
- store_weak
- fix_lifetime
- mark_dependence
- is_unique
- is_unique_or_pinned
- copy_block

Memory Behavior of ARC Operations
=================================

At SIL level, reference counting and reference checking instructions
are attributed with MayHaveSideEffects to prevent arbitrary passes
from reordering them.

At IR level, retains are marked NoModRef with respect to load and
store instructions so they don't pessimize memory dependence. (Note
the Retains are still considered to write to memory with respect to
other calls because getModRefBehavior is not overridden.) Releases
cannot be marked NoModRef because they can have arbitrary side
effects. Is_unique calls cannot be marked NoModRef because they cannot
be reordered with other operations that may modify the reference
count.

.. admonition:: TODO

   Marking runtime calls with NoModRef in LLVM is misleading (they
   write memory), inconsistent (getModRefBehavior returns Unknown),
   and fragile (e.g. if we inline ARC operations at IR level). To be
   robust and allow stronger optimization, TBAA tags should be used to
   indicate functions that only access object metadata. This would
   also enable more LLVM level optimization in the presence of
   is_unique checks which currently appear to arbitrarily write memory.

ARC and Copying
===============

TODO: Talk about how "ARC" and copying fit together. This means going into how
retaining/releasing is really "copying"/"destroying" a pointer reference where
the value that is pointed to does not change means you don't have to change the
bits.

Talk about how this fits into @owned and @guaranteed parameters.

RC Identity
===========

A core ARC concept in Swift optimization is the concept of ``Reference Count
Identity`` (RC Identity) and RC Identity preserving instructions. In this
section, we:

1. Define concepts related to RC identity.
2. Contrast RC identity analysis with alias analysis.
3. Discuss instructions/properties that cause certain instructions which "seem"
   to be RC identical to not be so.

Definitions
-----------

Let ``I`` be a SIL instruction with n operands and m results. We say that ``I``
is a (i, j) RC Identity preserving instruction if performing a ``retain_value``
on the ith SSA argument immediately before ``I`` is executed is equivalent to
performing a ``retain_value`` on the jth SSA result of ``I`` immediately
following the execution of ``I``. For example in the following, if::

    retain_value %x
    %y = unary_instruction %x

is equivalent to::

    %y = unary_instruction %x
    retain_value %y

then we say that unary_instruction is a (0,0) RC Identity preserving
instruction. In a case of a unary instruction, we omit (0,0) and just say that
the instruction is RC Identity preserving.

TODO: This section defines RC identity only for loadable types. We also need to
define it for instructions on addresses and instructions that mix addresses and
values. It should be pretty straight forward to do this.

Given two SSA values ``%a``, ``%b``, we define ``%a`` as immediately RC
identical to ``%b`` (or ``%a ~rci %b``) if there exists an instruction ``I``
such that:

- ``%a`` is the jth result of ``I``.
- ``%b`` is the ith argument of ``I``.
- ``I`` is (i, j) RC identity preserving.

Due to the nature of SSA form, we can not even speak of symmetry or
reflexivity. But we do get transitivity! Easily if ``%b ~rci %a`` and ``%c ~rci
%b``, we must by these two assumptions be able to do the following::

  retain_value %a
  %b = unary_instruction %a
  %c = unary_instruction %b

which by our assumption means that we can perform the following code motion::

  %b = unary_instruction %a
  %c = unary_instruction %b
  retain_value %c

our desired result. But we would really like for this operation to be reflexive
and symmetric. To get around this issue, we define the equivalent relation RC
identity as follows: We say that ``%a ~rc %b`` if:

1. ``%a == %b``
2. ``%a ~rci %b`` or ``%b ~rci %a``.
3. There exists a finite sequence of ``n`` SSA values ``{%a[i]}`` such that:
   a. ``%a ~rci %a[0]``
   b. ``%a[i] ~rci %a[i+1]`` for all ``i < n``.
   c. ``%a[n] ~rci %b``.

These equivalence classes consisting of chains of RC identical values are
computed via the SILAnalysis called ``RC Identity Analysis``. By performing ARC
optimization on RC Identical operations, our optimizations are able to operate
on the level of granularity that we actually care about, ignoring superficial
changes in SSA form that still yield manipulations of the same reference count.

.. admonition:: NOTE

   RCIdentityAnalysis is a flow insensitive analysis. Dataflow that needs to
   be flow sensitive must handle phi nodes in the dataflow itself.

Contrasts with Alias Analysis
-----------------------------

A common question is what is the difference in between RC Identity analysis and
alias analysis. While alias analysis is attempting to determine if two memory
location are the same, RC identity analysis is attempting to determine if
reference counting operations on different values would result in the same
reference count being read or written to.

Some interesting examples of where RC identity differs from alias analysis are:

 - ``struct`` is an RC identity preserving operation if the ``struct`` literal
   only has one non-trivial operand. This means for instance that any struct
   with one reference counted field used as an owning pointer is RC Identical
   with its owning pointer (a useful property for Arrays).

 - An ``enum`` instruction is always RC Identical with the given tuple payload.

 - A ``tuple`` instruction is an RC identity preserving operation if the
   ``tuple`` literal has one non-trivial operand.

 - ``init_class_existential`` is an RC identity preserving operation since
   performing a retain_value on a class existential is equivalent to performing
   a retain_value on the class itself.

The corresponding value projection operations have analogous properties.

.. admonition:: NOTE

    An important consequence of RC Identity is that value types with only one
    RCIdentity are a simple case for ARC optimization to handle. The ARC
    optimizer relies on other optimizations like SROA, Function Signature Opts,
    and SimplifyCFG (for block arguments) to try and eliminate cases where value
    types have multiple reference counted subtypes. If one has a struct type
    with multiple reference counted sub fields, wrapping the struct in a COW
    data structure (for instance storing the struct in an array of one element)
    will reduce the reference count overhead.

what is ``retain_value`` and why is it important
------------------------------------------------

Notice in the section above how we defined RC identity using the SIL
``retain_value`` instruction. ``retain_value`` and ``release_value`` are the
catch-all please retain or please release this value at the SIL level. The
following table is a quick summary of what ``retain_value`` (``release_value``)
does when applied to various types of objects:

+-----------+--------------+-------------------------------------------------------------------------------------+
| Ownership | Type         | Effect                                                                              |
+===========+==============+=====================================================================================+
| Strong    | Class        | Increment strong ref count of class                                                 |
+-----------+--------------+-------------------------------------------------------------------------------------+
| Any       | Struct/Tuple | retain_value each field                                                             |
+-----------+--------------+-------------------------------------------------------------------------------------+
| Any       | Enum         | switch on the enum and apply retain_value to the enum case's payload (if it exists) |
+-----------+--------------+-------------------------------------------------------------------------------------+
| Unowned   | Class        | Increment the unowned ref count of class                                            |
+-----------+--------------+-------------------------------------------------------------------------------------+

.. admonition:: Notice

  Aggregate value types like struct/tuple/enums's definitions are defined
  recursively via retain_value on payloads/fields. This is why operations like
  ``struct_extract`` do not always propagate RC identity.

Conversions
-----------

Conversions are a common operation that propagate RC identity. But not all
conversions have these properties. In this section, we attempt to explain why
this is true. The rule for conversions is that a conversion that preserves RC
identity must have the following properties:

1. Both of its arguments must be non-trivial values with the same ownership
   semantics (i.e. unowned, strong, weak). This means that the following
   conversions do not propagate RC identity:

   - address_to_pointer
   - pointer_to_address
   - unchecked_trivial_bitcast
   - ref_to_raw_pointer
   - raw_pointer_to_ref
   - ref_to_unowned
   - unowned_to_ref
   - ref_to_unmanaged
   - unmanaged_to_ref

   The reason why we want the ownership semantics to be the same is that
   whenever there is a change in ownership semantics, we want the programmer to
   explicitly reason about the change in ownership semantics.

2. The instruction must not introduce type aliasing. This disqualifies such
   casts as:

   - unchecked_addr_cast
   - unchecked_bitwise_cast

This means in sum that conversions that preserve types and preserve
non-trivialness are the interesting instructions.

ARC and Enums
-------------

Enum types provide interesting challenges for ARC optimization. This is because
if there exists one case where an enum is non-trivial, the aggregate type in all
situations must be treated as if it is non-trivial. An important consideration
here is that when performing ARC optimization on cases, one has to be very
careful about ensuring that one only ignores reference count operations on
values that are able to be proved to be that specific case.

.. admonition:: TODO

  This section needs to be filled out more.

Copy-On-Write Considerations
============================

The copy-on-write capabilities of some data structures, such as Array
and Set, are efficiently implemented via Builtin.isUnique calls which
lower directly to is_unique instructions in SIL.

The is_unique instruction takes the address of a reference, and
although it does not actually change the reference, the reference must
appear mutable to the optimizer. This forces the optimizer to preserve
a retain distinct from what's required to maintain lifetime for any of
the reference's source-level copies, because the called function is
allowed to replace the reference, thereby releasing the
referent. Consider the following sequence of rules:

(1) An operation taking the address of a variable is allowed to
    replace the reference held by that variable. The fact that
    is_unique will not actually replace it is opaque to the optimizer.

(2) If the refcount is 1 when the reference is replaced, the referent
    is deallocated.

(3) A different source-level variable pointing at the same referent
    must not be changed/invalidated by such a call.

(4) If such a variable exists, the compiler must guarantee the
    refcount is > 1 going into the call.

With the is_unique instruction, the variable whose reference is being
checked for uniqueness appears mutable at the level of an individual
SIL instruction. After IRGen, is_unique instructions are expanded into
runtime calls that no longer take the address of the
variable. Consequently, LLVM-level ARC optimization must be more
conservative. It must not remove retain/release pairs of this form:

::

   retain X
   retain X
   _swift_isUniquelyReferenced(X)
   release X
   release X

To prevent removal of the apparently redundant inner retain/release
pair, the LLVM ARC optimizer should model _swift_isUniquelyReferenced
as a function that may release X, use X, and exit the program (the
subsequent release instruction does not prove safety).

.. _arcopts.is_unique:

is_unique instruction
---------------------

As explained above, the SIL-level is_unique instruction enforces the
semantics of uniqueness checks in the presence of ARC
optimization. The kind of reference count checking that
is_unique performs depends on the argument type:

    - Native object types are directly checked by reading the strong
      reference count:
      (Builtin.NativeObject, known native class reference)

    - Objective-C object types require an additional check that the
      dynamic object type uses native Swift reference counting:
      (Builtin.UnknownObject, unknown class reference, class existential)

    - Bridged object types allow the dynamic object type check to be
      bypassed based on the pointer encoding:
      (Builtin.BridgeObject)

Any of the above types may also be wrapped in an optional.  If the
static argument type is optional, then a null check is also performed.

Thus, is_unique only returns true for non-null, native Swift object
references with a strong reference count of one.

is_unique_or_pinned has the same semantics as is_unique except that it
also returns true if the object is marked pinned (by strong_pin)
regardless of the reference count. This allows for simultaneous
non-structural modification of multiple subobjects.

Builtin.isUnique
----------------

Builtin.isUnique and Builtin.isUniqueOrPinned give the standard
library access to optimization safe uniqueness checking. Because the
type of reference check is derived from the builtin argument's static
type, the most efficient check is automatically generated. However, in
some cases, the standard library can dynamically determine that it has
a native reference even though the static type is a bridge or unknown
object. Unsafe variants of the builtin are available to allow the
additional pointer bit mask and dynamic class lookup to be bypassed in
these cases:

- isUnique_native : <T> (inout T[?]) -> Int1
- isUniqueOrPinned_native : <T> (inout T[?]) -> Int1

These builtins perform an implicit cast to NativeObject before
checking uniqueness. There's no way at SIL level to cast the address
of a reference, so we need to encapsulate this operation as part of
the builtin.

Semantic Tags
=============

ARC takes advantage of certain semantic tags. This section documents these
semantics and their meanings.

arc.programtermination_point
----------------------------

If this semantic tag is applied to a function, then we know that:

- The function does not touch any reference counted objects.
- After the function is executed, all reference counted objects are leaked
  (most likely in preparation for program termination).

This allows one, when performing ARC code motion, to ignore blocks that contain
an apply to this function as long as the block does not have any other side
effect having instructions.

Unreachable Code and Lifetimes
==============================

The general case of unreachable code in terms of lifetime balancing has further
interesting properties. Namely, an unreachable and noreturn functions signify a
scope that has been split. This means that objects that are alive in that
scope's lifetime may never end. This means that:

1. While we can not ignore all such unreachable terminated blocks for ARC
purposes for instance, if we sink a retain past a br into a non
arc.programtermination_point block, we must sink the retain into the block.

2. If we are able to infer that an object's lifetime scope would never end due
to the unreachable/no-return function, then we do not need to end the lifetime
of the object early. An example of a situation where this can happen is with
closure specialization. In closure specialization, we clone a caller that takes
in a closure and create a copy of the closure in the caller with the specific
closure. This allows for the closure to be eliminated in the specialized
function and other optimizations to come into play. Since the lifetime of the
original closure extended past any assertions in the original function, we do
not need to insert releases in such locations to maintain program behavior.

ARC Sequence Optimization
=========================

TODO: Fill this in.

ARC Loop Hoisting
=================

Abstract
--------

This section describes the ARCLoopHoisting algorithm that hoists retains and
releases out of loops. This is a high level description that justifies the
correction of the algorithm and describes its design. In the following
discussion we talk about the algorithm conceptually and show its safety and
considerations necessary for good performance.

.. admonition:: NOTE

    In the following when we refer to "hoisting", we are not just talking about
    upward code motion of retains, but also downward code motion of releases.

Loop Canonicalization
---------------------

In the following we assume that all loops are canonicalized such that:

1. The loop has a pre-header.
2. The loop has one backedge.
3. All exiting edges have a unique exit block.

Motiviation
-----------

Consider the following simple loop::

  bb0:
    br bb1

  bb1:
    retain %x                    (1)
    apply %f(%x)
    apply %f(%x)
    release %x                   (2)
    cond_br ..., bb1, bb2

  bb2:
    return ...

When it is safe to hoist (1),(2) out of the loop? Imagine if we know the trip
count of the loop is 3 and completely unroll the loop so the whole function is
one basic block. In such a case, we know the function looks as follows::

  bb0:
    # Loop Iteration 0
    retain %x
    apply %f(%x)
    apply %f(%x)
    release %x                   (4)

    # Loop Iteration 1
    retain %x                    (5)
    apply %f(%x)
    apply %f(%x)
    release %x                   (6)

    # Loop Iteration 2
    retain %x                    (7)
    apply %f(%x)
    apply %f(%x)
    release %x

    return ...

Notice how (3) can be paired with (4) and (5) can be paired with (6). Assume
that we eliminate those. Then the function looks as follows::

  bb0:
    # Loop Iteration 0
    retain %x
    apply %f(%x)
    apply %f(%x)

    # Loop Iteration 1
    apply %f(%x)
    apply %f(%x)

    # Loop Iteration 2
    apply %f(%x)
    apply %f(%x)
    release %x

    return ...

We can then re-roll the loop, yielding the following loop::

  bb0:
    retain %x                    (8)
    br bb1

  bb1:
    apply %f(%x)
    apply %f(%x)
    cond_br ..., bb1, bb2

  bb2:
    release %x                   (9)
    return ...

Notice that this transformation is equivalent to just hoisting (1) and (2) out
of the loop in the original example. This form of hoisting is what is termed
"ARCLoopHoisting". What is key to notice is that even though we are performing
"hoisting" we are actually pairing releases from one iteration with retains in
the next iteration and then eliminating the pairs. This realization will guide
our further analysis.

Correctness
-----------

In this simple loop case, the proof of correctness is very simple to see
conceptually. But in a more general case, when is safe to perform this
optimization? We must consider three areas of concern:

1. Are the retains/releases upon the same reference count? This can be found
   conservatively by using RCIdentityAnalysis.

2. Can we move retains, releases in the unrolled case as we have specified?
   This is simple since it is always safe to move a retain earlier and a release
   later in the dynamic execution of a program. This can only extend the life of
   a variable which is a legal and generally profitable in terms of allowing for
   this optimization.

3. How do we pair all necessary retains/releases to ensure we do not unbalance
   retain/release counts in the loop? Consider a set of retains and a set of
   releases that we wish to hoist out of a loop. We can only hoist the retain,
   release sets out of the loop if all paths in the given loop region from the
   entrance to the backedge.  have exactly one retain or release from this set.

4. Any early exits that we must move a retain past or a release by must be
   compensated appropriately. This will be discussed in the next section.

Assuming that our optimization does all of these things, we should be able to
hoist with safety.

Compensating Early Exits for Lost Dynamic Reference Counts
----------------------------------------------------------

Lets say that we have the following loop canonicalized SIL::

  bb0(%0 : $Builtin.NativeObject):
    br bb1

  bb1:
    strong_retain %0 : $Builtin.NativeObject
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject
    cond_br ..., bb2, bb3

  bb2:
    cond_br ..., bb1, bb4

  bb3:
    br bb5

  bb4:
    br bb5

  bb6:
    return ...

Can we hoist the retain/release pair here? Lets assume the loop is 3 iterations
and we completely unroll it. Then we have::

  bb0:
    strong_retain %0 : $Builtin.NativeObject               (1)
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject              (2)
    cond_br ..., bb1, bb4

  bb1: // preds: bb0
    strong_retain %0 : $Builtin.NativeObject               (3)
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject              (4)
    cond_br ..., bb2, bb4

  bb2: // preds: bb1
    strong_retain %0 : $Builtin.NativeObject               (5)
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject              (6)
    cond_br ..., bb3, bb4

  bb3: // preds: bb2
    br bb5

  bb4: // preds: bb0, bb1, bb2
    br bb5

  bb5: // preds: bb3, bb4
    return ...

We want to be able to pair and eliminate (2)/(3) and (4)/(5). In order to do
that, we need to move (2) from bb0 into bb1 and (4) from bb1 into bb2. In order
to do this, we need to move a release along all paths into bb4 lest we lose
dynamic releases along that path. We also sink (6) in order to not have an extra
release along that path. This then give us::

  bb0:
    strong_retain %0 : $Builtin.NativeObject               (1)

  bb1:
    apply %f(%0)
    apply %f(%0)
    cond_br ..., bb2, bb3

  bb2:
    cond_br ..., bb1, bb4

  bb3:
    strong_release %0 : $Builtin.NativeObject              (6*)
    br bb5

  bb4:
    strong_release %0 : $Builtin.NativeObject              (7*)
    br bb5

  bb5: // preds: bb3, bb4
    return ...

An easy inductive proof follows.

What if we have the opposite problem, that of moving a retain past an early
exit. Consider the following::

  bb0(%0 : $Builtin.NativeObject):
    br bb1

  bb1:
    cond_br ..., bb2, bb3

  bb2:
    strong_retain %0 : $Builtin.NativeObject
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject
    cond_br ..., bb1, bb4

  bb3:
    br bb5

  bb4:
    br bb5

  bb6:
    return ...

Lets unroll this loop::

  bb0(%0 : $Builtin.NativeObject):
    br bb1

  # Iteration 1
  bb1: // preds: bb0
    cond_br ..., bb2, bb8

  bb2: // preds: bb1
    strong_retain %0 : $Builtin.NativeObject               (1)
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject              (2)
    br bb3

  # Iteration 2
  bb3: // preds: bb2
    cond_br ..., bb4, bb8

  bb4: // preds: bb3
    strong_retain %0 : $Builtin.NativeObject               (3)
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject              (4)
    br bb5

  # Iteration 3
  bb5: // preds: bb4
    cond_br ..., bb6, bb8

  bb6: // preds: bb5
    strong_retain %0 : $Builtin.NativeObject               (5)
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject              (6)
    cond_br ..., bb7, bb8

  bb7: // preds: bb6
    br bb9

  bb8: // Preds: bb1, bb3, bb5, bb6
    br bb9

  bb9:
    return ...

First we want to move the retain into the previous iteration. This means that we
have to move a retain over the cond_br in bb1, bb3, bb5. If we were to do that
then bb8 would have an extra dynamic retain along that path. In order to fix
that issue, we need to balance that release by putting a release in bb8. But we
cannot move a release into bb8 without considering the terminator of bb6 since
bb6 is also a predecessor of bb8. Luckily, we have (6). Notice that bb7 has one
predecessor to bb6 so we can safely move 1 release along that path as well. Thus
we perform that code motion, yielding the following::

  bb0(%0 : $Builtin.NativeObject):
    br bb1

  # Iteration 1
  bb1: // preds: bb0
    strong_retain %0 : $Builtin.NativeObject               (1)
    cond_br ..., bb2, bb8

  bb2: // preds: bb1
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject              (2)
    br bb3

  # Iteration 2
  bb3: // preds: bb2
    strong_retain %0 : $Builtin.NativeObject               (3)
    cond_br ..., bb4, bb8

  bb4: // preds: bb3
    apply %f(%0)
    apply %f(%0)
    strong_release %0 : $Builtin.NativeObject              (4)
    br bb5

  # Iteration 3
  bb5: // preds: bb4
    strong_retain %0 : $Builtin.NativeObject               (5)
    cond_br ..., bb6, bb8

  bb6: // preds: bb5
    apply %f(%0)
    apply %f(%0)
    cond_br ..., bb7, bb8

  bb7: // preds: bb6
    strong_release %0 : $Builtin.NativeObject              (7*)
    br bb9

  bb8: // Preds: bb1, bb3, bb5, bb6
    strong_release %0 : $Builtin.NativeObject              (8*)
    br bb9

  bb9:
    return ...

Then we move (1), (3), (4) into the single predecessor of their parent block and
eliminate (3), (5) through a pairing with (2), (4) respectively. This yields
then::

  bb0(%0 : $Builtin.NativeObject):
    strong_retain %0 : $Builtin.NativeObject               (1)
    br bb1

  # Iteration 1
  bb1: // preds: bb0
    cond_br ..., bb2, bb8

  bb2: // preds: bb1
    apply %f(%0)
    apply %f(%0)
    br bb3

  # Iteration 2
  bb3: // preds: bb2
    cond_br ..., bb4, bb8

  bb4: // preds: bb3
    apply %f(%0)
    apply %f(%0)
    br bb5

  # Iteration 3
  bb5: // preds: bb4
    cond_br ..., bb6, bb8

  bb6: // preds: bb5
    apply %f(%0)
    apply %f(%0)
    cond_br ..., bb7, bb8

  bb7: // preds: bb6
    strong_release %0 : $Builtin.NativeObject              (7*)
    br bb9

  bb8: // Preds: bb1, bb3, bb5, bb6
    strong_release %0 : $Builtin.NativeObject              (8*)
    br bb9

  bb9:
    return ...

Then we finish by rerolling the loop::

  bb0(%0 : $Builtin.NativeObject):
    strong_retain %0 : $Builtin.NativeObject               (1)
    br bb1

  # Iteration 1
  bb1: // preds: bb0
    cond_br ..., bb2, bb8

  bb2:
    apply %f(%0)
    apply %f(%0)
    cond_br bb1, bb7

  bb7:
    strong_release %0 : $Builtin.NativeObject              (7*)
    br bb9

  bb8: // Preds: bb1, bb3, bb5, bb6
    strong_release %0 : $Builtin.NativeObject              (8*)
    br bb9

  bb9:
    return ...


Uniqueness Check Complications
------------------------------

A final concern that we must consider is if we introduce extra copy on write
copies through our optimization. To see this, consider the following simple
IR sequence::

  bb0(%0 : $Builtin.NativeObject):
    // refcount(%0) == n
    is_unique %0 : $Builtin.NativeObject
    // refcount(%0) == n
    strong_retain %0 : $Builtin.NativeObject
    // refcount(%0) == n+1

If n is not 1, then trivially is_unique will return false. So assume that n is 1
for our purposes so no copy is occurring here. Thus we have::

  bb0(%0 : $Builtin.NativeObject):
    // refcount(%0) == 1
    is_unique %0 : $Builtin.NativeObject
    // refcount(%0) == 1
    strong_retain %0 : $Builtin.NativeObject
    // refcount(%0) == 2

Now imagine that we move the strong_retain before the is_unique. Then we have::

  bb0(%0 : $Builtin.NativeObject):
    // refcount(%0) == 1
    strong_retain %0 : $Builtin.NativeObject
    // refcount(%0) == 2
    is_unique %0 : $Builtin.NativeObject

Thus is_unique is guaranteed to return false introducing a copy that was not
needed. We wish to avoid that if it is at all possible.

Deinit Model
============

The semantics around deinits in swift are a common area of confusion. This
section is not attempting to state where the deinit model may be in the future,
but is just documenting where things are today in the hopes of improving
clarity.

The following characteristics of deinits are important to the optimizer:

1. deinits run on the same thread and are not asynchronous like Java
   finalizers.
2. deinits are not sequenced with regards to each other or code in normal
   control flow.
3. If the optimizer takes advantage of the lack of sequencing it must do so in a
   way that preserves memory safety.

Consider the following pseudo-Swift example::

  class D {}
  class D1 : D {}
  class D2 : D {}

  var GLOBAL_D : D = D1()

  class C { deinit { GLOBAL_D = D2() } }

  func main() {
    let c = C()
    let d = GLOBAL_D
    useC(c)
    useD(d)
  }

  main()

Assume that useC does not directly in any way touch an instance of class D
except via the destructor.

Since memory operations in normal control flow are not sequenced with respect to
deinits, there are two correct programs here that the optimizer can produce: the
original and the one where useC(c) and GLOBAL_D are swapped, i.e.::

  func main() {
    let c = C()
    useC(c)
    let d = GLOBAL_D
    useD(d)
  }

In the first program, d would be an instance of class D1. In the second, it
would be an instance of class D2. Notice how in both programs though, no
deinitialized object is accessed. On the other hand, imagine if we had split
main like so::

  func main() {
    let c = C()
    let d = unsafe_unowned_load(GLOBAL_D)
    useC(c)
    let owned_d = retain(d)
    useD(owned_d)
  }

In this case, we would be passing off to useD a deallocated instance of class D1
which would be undefined behavior. An optimization that produced such code would
be a miscompile.