This involved fixing a few issues exposed by trying to use the load_borrow code
with load [copy] that were caught in our tests by the ownership
verifier. Specifically:
1. In getUnderlyingBorrowIntroducingValues we were first checking if a user was
a guaranteed forwarding instruction and then doing a check if our value was a
value with None ownership that we want to skip. This caused weird behavior where
one could get the wrong borrow introducers if that trivial value came from a
different guaranteed value.
2. I found via a test case in predictable_memaccess_opts that we needed to
insert compensating destroys for phi nodes after we process all of the incoming
values. The reason why this is needed is that multiple phi nodes could use the
same incoming value. In such a case, we need to treat all of the copy_values
inserted for each phi node as users of that incoming value to prevent us from
inserting too many destroy_value. Consider:
```
bb0:
%0 = copy_value
cond_br ..., bb1, bb2
bb1:
br bb3(%0)
bb2:
br bb4(%0)
```
If we processed the phi args in bb3, bb4 separately, we would insert an extra 2
destroys in bb1, bb2.
The implementation works by processing each phi and for each incoming value of
the phi appending the value, copy we made for the value to an array. We then
stable sort the array only by value. This then allows us to process ranges of
copies with the same underlying incoming value in a 2nd loop taking advantage of
all of the copies for an incoming value being contiguous in said array. We still
lifetime extend to the load/insert destroy_value for the actual phi (not its
incoming values) in that first loop.
3. I tightened up the invariant in the AvailableValueAggregator that it always
returns values that are newly copied unless we are performing a take. This
involved changing the code in handlePrimitiveValues to always insert copies when
ownership is enabled.
4. I tightened up the identification of intermediate phi nodes by instead of
just checking if the phi node has a single terminator user to instead it having
a single terminator user that has a successor with an inserted phi node that we
know about.
This entailed untangling a few issues. I have purposefully only fixed this for
now for load_borrow to make the change in tree smaller (the reason for my
splitting the code paths in a previous group of commits). I am going to follow
up with a load copy version of the patch. I think that the code should be the
same. The interesting part about this is that all of these code conditions are
caught by the ownership verifier, suggesting to me that the code in the
stdlib/overlays is simple enough that we didn't hit these code patterns.
Anyways, the main change here is that we now eliminate control equivalence
issues arising from @owned values that are not control equivalent being
forwarded into aggregates and phis. This would cause our outer value to be
destroyed early since we would be destroying it once for every time iteration in
a loop.
The way that this is done is that (noting that this is only for borrows today):
* The AvailableValueAggregator always copies values at available value points to
lifetime extend the values to the load block. In the load block, the
aggregator will take the incoming values and borrow the value to form tuples,
structs as needed. This new guaranteed aggregate is then copied. If we do not
need to form an aggregate, we just copy. Since the copy is in our load block,
we know that we can use it for our load_borrow. Importantly note how we no
longer allow for these aggregates to forward owned ownership preventing the
control equivalence problem above.
* Since the aggregator may use the SSA updater if it finds multiple available
values, we need to also make sure that any phis are strongly control
equivalent on all of its incoming values. So we insert copies along those
edges and then lifetime extend the phi as appropriate.
* Since in all of the above all copy_value inserted by the available value
aggregator are never consumed, we can just add destroy_values in the
appropriate places and everything works.
The reason why this is true is that we will be inserting new copy_values for
each available value implying that we can never have such a singular value.
I also added two test cases that show that we have a singular value with the
trivial type and that works.
The key thing to note here is that when we are processing a take, we validate
that at all destroy points we have a fully available value. So we should never
hit this code path which is correct. This assert just makes the assumption
clearer in the small when reading code locally in handlePrimitiveValue.
Previously, we only specified via a boolean value whether or not we were a take
or not. Now we have an enum which is more specific and descriptive. It will also
let me fix an issue that occurs with Borrow/Copy incrementally as separate
patches.
This is a pure refactor so I can make some subsequent transformations easier to
do.
I also did a little bit of debridement when there were opportunities to
flatten control flow by eliminating direct checks for one or the other classes.
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
This commit only changes how BranchPropagatedUser is constructed and does not
change the internal representation. This is a result of my noticing that
BranchPropagatedUser could also use an operand internally to represent its
state. To simplify how I am making the change, I am splitting the change into
two PRs that should be easy to validate:
1. A commit that maps how the various users of BranchPropagatedUser have been
constructing BPUs to a single routine that takes an Operand. This leaves
BranchPropagatedUser's internal state alone as well as its user the Linear
Lifetime Checker.
2. A second commit that changes the internal bits of the BranchPropagatedUser to
store an Operand instead of a PointerUnion.
This will allow me to use the first commit to validate the second.
This is just a simple refactoring commit in preparation for hiding more of the
details of the linear lifetime checker. This is NFC, just moving around code.
This mostly requires changing various entry points to pass around a
TypeConverter instead of a SILModule. I've left behind entry points
that take a SILModule for a few methods like SILType::subst() to
avoid creating even more churn.
Example would be a case where we dynamically control if the load [take] occurs
and the load [take] happens with conditional control flow.
This was exposed by the throwing_initializer and failable_initializer
interpreter tests in the test of PolarBear(n:before:during).
I am doing this to eliminate some differences in codegen before/after
serialization ownership. It just means less of the tests need to be touched when
I flip the switch.
Specifically, today this change allows us to handle certain cases where there is
a dead allocation being used to pass around a value at +1 by performing a load
[take] and then storing a value back into the memory. The general format is an
allocation that only has stores, load [take], and destroy_addr users. Consider
the following SIL:
```
store %x to [init] %mem (0)
%xhat = load [take] %mem (1)
%xhat_cast = apply %f(%xhat) (2)
store %xhat_cast to [init] %mem (3)
destroy_addr %mem
```
Notice how assuming that we can get rid of the store, we can perform the
following store -> load forwarding:
```
%xhat_cast = apply %f(%x) (2)
store %xhat_cast to [init] %mem (3)
destroy_addr %mem
```
In contrast, notice how we get an ownership violation (double consume of %x by
(0) and (2)) if we can not get rid of the store:
```
store %x to [init] %mem
%xhat_cast = apply %f(%x)
store %xhat_cast to [init] %mem (2)
destroy_addr %mem
```
This is in fact the same condition for promoting a destroy_addr since when a
destroy_addr is a load [take] + destroy_value. So I was able to generalize the
code for destroy_addr to handle this case.
This is a large patch; I couldn't split it up further while still
keeping things working. There are four things being changed at
once here:
- Places that call SILType::isAddressOnly()/isLoadable() now call
the SILFunction overload and not the SILModule one.
- SILFunction's overloads of getTypeLowering() and getLoweredType()
now pass the function's resilience expansion down, instead of
hardcoding ResilienceExpansion::Minimal.
- Various other places with '// FIXME: Expansion' now use a better
resilience expansion.
- A few tests were updated to reflect SILGen's improved code
generation, and some new tests are added to cover more code paths
that previously were uncovered and only manifested themselves as
standard library build failures while I was working on this change.
The ownership kind is Any for trivial types, or Owned otherwise, but
whether a type is trivial or not will soon depend on the resilience
expansion.
This means that a SILModule now uniques two SILUndefs per type instead
of one, and serialization uses two distinct sentinel IDs for this
purpose as well.
For now, the resilience expansion is not actually used here, so this
change is NFC, other than changing the module format.
This reduces the diff in between -Onone output when stripping before/after
serialization.
We support load_borrow by translating it to the load [copy] case. Specifically,
for +1, we normally perform the following transform.
store %1 to [init] %0
...
%2 = load [copy] %0
...
use(%2)
...
destroy_value %2
=>
%1a = copy_value %1
store %1 to [init] %0
...
use(%1a)
...
destroy_value %1a
We analogously can optimize load_borrow by replacing the load with a
begin_borrow:
store %1 to [init] %0
...
%2 = load_borrow %0
...
use(%2)
...
end_borrow %2
=>
%1a = copy_value %1
store %1 to [init] %0
...
%2 = begin_borrow %1a
...
use(%2)
...
end_borrow %2
destroy_value %1a
The store from outside a loop being used by a load_borrow inside a loop is a
similar transformation as the +0 version except that we use a begin_borrow
inside the loop instead of a copy_value (making it even more efficient).
Specifically:
bb0:
br bb1
bb1:
cond_br ..., bb2, bb3
bb2:
br bb1
bb3:
return
What would happen in this case is that we wouldn't revisit bb1 after we found
the double consume to grab the leak (and would early exit as well). So we would
not insert a destroy for the out of loop value causing a leak from the
perspective of the ownership checker. This was due to us early exiting and also
due to us not revisiting bb1 after we went around the backedge from bb2 ->
bb1. Now what we do instead is if we catch a double consume in a block we have
already visited, we check if we haven't visited any of that block's
successors. If we haven't, we add that to the list of successors we need to
visit.
I am starting to use the linear lifetime checker in an optimizer role where it
no longer asserts but instead tells the optimizer pass what is needed to cause
the lifetime to be linear. To do so I need to be able to return richer
information to the caller such as whether or not a leak, double consume, or
use-after-free occurs.
We already do the SSA updater optimization if we have an available value in the
same block. If we do not have such an available value, we insert the Phis
despite all of the available values being the same. The small change here just
fixes that issue.
I also translated predictable_deadalloc_elim.sil into an ownership test and
added more tests that double checks the ownership specific functionality. This
change should be NFC without ownership.
This is NFC. I am going to use this check in another part of the code to verify
that we can split up destroy_addr without needing to destructure available
values.
With ownership PMO needs to insert copies before the stores that provide an
available value. Usually, we are only using the available value along a single
path. This change ensures that if in that situation, we have "leaking paths"
besides that single path, PMO inserts compensating destroys.
NOTE: Without ownership enabled this is NFC.
When ownership is disabled this is NFC.
NOTE: In terms of testing, I am trying to get everything into place in
preparation for landing a complete ownership translation of the pmo tests. I
need to get enough in place for the full test translation to work. In the mean
time, I am going to start adding memaccess specific stuff in its own ownership
file.
The reason this is true is that an assign is an instruction in PMO parlance that
must destroy the value stored into memory at that location previously. So PMO
would need to be taught to ensure that said destroy is promoted. Consider the
following example:
%0 = alloc_stack $Foo
store %1 to [init] %0 : $*Foo
store %2 to [assign] %0 : $*Foo
destroy_addr %0 : $*Foo
dealloc_stack %0 : $*Foo
If PMO were to try to eliminate the alloc_stack as PMO is written today, one
would have:
destroy_value %2 : $Foo
That is clearly wrong.
PMO uses InitOrAssign for trivially typed things and Init/Assign for non-trivial
things, so I think this was an oversight from a long time ago. There is actually
no /real/ effect on the code today since after exploding the copy_addr, the
store will still be used to produce the right available value and since for
stores, init/assign/initorassign all result in allocations being removed. Once
though I change assign to not allow for allocation removal (the proper way to
model this), without this change, certain trivial allocations will no longer be
removed, harming perf as seen via the benchmarking run on the bots in #21918.
This models the semantics we are trying to have here more closely without
changing current output. Specifically, an assign in this case is supposed to
mean that the underlying value is overwritten and destroyed (via a ref count
decrement). This is obviously true with the non-init copy_addr form, i.e.:
%0 = alloc_stack $Builtin.NativeObject
%1 = alloc_stack $Builtin.NativeObject
...
copy_addr [take] %0 to %1 : $*Builtin.NativeObject
...
Notice how the instruction is actively going to destroy whatever is in %1. Lets
consider the same SIL after exploding the copy_addr.
%0 = alloc_stack $Builtin.NativeObject
%1 = alloc_stack $Builtin.NativeObject
...
%2 = load %0 : $*Builtin.NativeObject
%3 = load %1 : $*Builtin.NativeObject (1)
store %2 to %1 : $*Builtin.NativeObject
destroy_value %3 : $Builtin.NativeObject (2)
...
In this case, the store is actually acting like an initialization since the
destructive part of the assign is performed by (1), (2). So considering the
store to be an assign is incorrect since a store that is an assign /should/
destroy the underlying value itself.
In terms of the actual effect on the result of the pass today, Initialization
and Assign stores are treated the same when it comes to getting available values
from stores and (for stores) destroying allocations. So this should be an NFC
change, but gets us closer to the model where assigns are only used for things
that store into memory and destroy the underlying model directly.
I am removing these for the following reasons:
* PMO does not have any tests for these code paths. (1).
* PMO does not try to promote these loads (it explicitly pattern matches load,
copy_addr) or get available values from these (it explicitly pattern matches
store or explodes a copy_addr to get the copy_addr's stores). This means that
removing this code will not effect our constant propagation diagnostics. So,
removing this untested code path at worst could cause us to no longer
eliminate some dead objects that we otherwise would be able to eliminate at
-Onone (low-priority). (2).
----
(1). I believe that the lack of PMO tests is due to this being a vestigal
remnant of DI code in PMO. My suspicion arises since:
* The code was added when the two passes were both sharing the same use
collector and auxillary data structures. Since then I have changed DI/PMO
to each have their own copies.
* DI has a bunch of tests that verify behavior around these instructions.
(2). I expect the number of actually removed allocations that are no longer
removed should be small since we do not promote loads from such allocations
and PMO will not eliminate an allocation that has any loads.
PartialStore is a PMOUseKind that is a vestigal remnant of Definite Init in the
PMO source. This can be seen by noting that in Definite Init, PartialStore is
how Definite Init diagnoses partially initialized values and errors. In contrast
in PMO the semantics of PartialStore are:
1. It can only be produced if we have a raw store use or a copy_addr.
2. We allow for the use to provide an available value just like if it was an
assign or an init.
3. We ignore it for the purposes of removing store only allocations since by
itself without ownership, stores (and stores from exploded copy_addr) do not
effect ownership in any way.
Rather than keeping this around, in this commit I review it since it doesn't
provide any additional value or [init] or [assign]. Functionally there should be
no change.
I discovered while updating PMO for ownership that for ~5 years there has been a
bug where we were treating copy_addr of trivial values like an "Assign" (in PMO
terminology) of a non-trivial value and thus stopping allocation
elimination. When I fixed this I discovered that this caused us to no longer
emit diagnostics in a predictable way. Specifically, consider the following
swift snippet:
var _: UInt = (-1) >> 0
Today, we emit a diagnostic that -1 can not be put into a UInt. This occurs
since even though the underlying allocation is only stored into, the copy_addr
assign keeps it alive, causing the diagnostics pass to see the conversion. With
my fix though, we see that we are only storing into the allocation, causing the
allocation to be eliminated before the constant propagation diagnostic pass
runs, causing the diagnostic to no longer be emitted.
We should truly not be performing this type of DCE before we emit such
diagnostics. So in this commit, I split the pass into two parts:
1. A load promotion pass that performs the SSA formation needed for SSA based
diagnostics to actually work.
2. An allocation elimination passes that run /after/ SSA based diagnostics.
This should be NFC since the constant propagation SSA based diagnostics do not
create memory operations so the output should be the same.
The main contribution of this commit is that I refactored out a helper class for
managing the data used when promoting destroy_addr. This enabled me to make a
getData method on the helper class to simplify getting iterating over a
destroy_addr and its available values.
TLDR: This does not eliminate the struct/tuple flat namespace from Predictable
Mem Opts. Just the tuple specific flat namespace code from PMOMemoryUseCollector
that we were computing and then throwing away. I explain below in more detail.
First note that this is cruft from when def-init and pmo were one pass. What we
were doing here was maintaing a flattened tuple namespace while we were
collecting uses in PMOMemoryUseCollector. We never actually used them for
anything since we recomputed this information including information about
structs in PMO itself! So this information was truly completely dead.
This commit removes that and related logic and from a maintenance standpoint
makes PMOMemoryUseCollector a simple visitor that doesn't have any real special
logic in it beyond the tuple scalarization.
Specifically, we are putting dealloc_stack, destroy_box into the Releases array
in PMOMemoryUseCollector only to ignore them in the only place that we use the
Releases array in PredictableMemOpts.
These are vestigal remnants of the code when it needed to support DI and
PredMemOpts. Since both of the passes have been split, these are now dead in
PMO. So eliminate them.
This is in preparation for verifying that when ownership verification is enabled
that only enums and trivial values can have any ownership. I am doing this in
preparation for eliminating ValueOwnershipKind::Trivial.
rdar://46294760