The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
This commit only changes how BranchPropagatedUser is constructed and does not
change the internal representation. This is a result of my noticing that
BranchPropagatedUser could also use an operand internally to represent its
state. To simplify how I am making the change, I am splitting the change into
two PRs that should be easy to validate:
1. A commit that maps how the various users of BranchPropagatedUser have been
constructing BPUs to a single routine that takes an Operand. This leaves
BranchPropagatedUser's internal state alone as well as its user the Linear
Lifetime Checker.
2. A second commit that changes the internal bits of the BranchPropagatedUser to
store an Operand instead of a PointerUnion.
This will allow me to use the first commit to validate the second.
This is just a simple refactoring commit in preparation for hiding more of the
details of the linear lifetime checker. This is NFC, just moving around code.
This mostly requires changing various entry points to pass around a
TypeConverter instead of a SILModule. I've left behind entry points
that take a SILModule for a few methods like SILType::subst() to
avoid creating even more churn.
Example would be a case where we dynamically control if the load [take] occurs
and the load [take] happens with conditional control flow.
This was exposed by the throwing_initializer and failable_initializer
interpreter tests in the test of PolarBear(n:before:during).
I am doing this to eliminate some differences in codegen before/after
serialization ownership. It just means less of the tests need to be touched when
I flip the switch.
Specifically, today this change allows us to handle certain cases where there is
a dead allocation being used to pass around a value at +1 by performing a load
[take] and then storing a value back into the memory. The general format is an
allocation that only has stores, load [take], and destroy_addr users. Consider
the following SIL:
```
store %x to [init] %mem (0)
%xhat = load [take] %mem (1)
%xhat_cast = apply %f(%xhat) (2)
store %xhat_cast to [init] %mem (3)
destroy_addr %mem
```
Notice how assuming that we can get rid of the store, we can perform the
following store -> load forwarding:
```
%xhat_cast = apply %f(%x) (2)
store %xhat_cast to [init] %mem (3)
destroy_addr %mem
```
In contrast, notice how we get an ownership violation (double consume of %x by
(0) and (2)) if we can not get rid of the store:
```
store %x to [init] %mem
%xhat_cast = apply %f(%x)
store %xhat_cast to [init] %mem (2)
destroy_addr %mem
```
This is in fact the same condition for promoting a destroy_addr since when a
destroy_addr is a load [take] + destroy_value. So I was able to generalize the
code for destroy_addr to handle this case.
This is a large patch; I couldn't split it up further while still
keeping things working. There are four things being changed at
once here:
- Places that call SILType::isAddressOnly()/isLoadable() now call
the SILFunction overload and not the SILModule one.
- SILFunction's overloads of getTypeLowering() and getLoweredType()
now pass the function's resilience expansion down, instead of
hardcoding ResilienceExpansion::Minimal.
- Various other places with '// FIXME: Expansion' now use a better
resilience expansion.
- A few tests were updated to reflect SILGen's improved code
generation, and some new tests are added to cover more code paths
that previously were uncovered and only manifested themselves as
standard library build failures while I was working on this change.
The ownership kind is Any for trivial types, or Owned otherwise, but
whether a type is trivial or not will soon depend on the resilience
expansion.
This means that a SILModule now uniques two SILUndefs per type instead
of one, and serialization uses two distinct sentinel IDs for this
purpose as well.
For now, the resilience expansion is not actually used here, so this
change is NFC, other than changing the module format.
This reduces the diff in between -Onone output when stripping before/after
serialization.
We support load_borrow by translating it to the load [copy] case. Specifically,
for +1, we normally perform the following transform.
store %1 to [init] %0
...
%2 = load [copy] %0
...
use(%2)
...
destroy_value %2
=>
%1a = copy_value %1
store %1 to [init] %0
...
use(%1a)
...
destroy_value %1a
We analogously can optimize load_borrow by replacing the load with a
begin_borrow:
store %1 to [init] %0
...
%2 = load_borrow %0
...
use(%2)
...
end_borrow %2
=>
%1a = copy_value %1
store %1 to [init] %0
...
%2 = begin_borrow %1a
...
use(%2)
...
end_borrow %2
destroy_value %1a
The store from outside a loop being used by a load_borrow inside a loop is a
similar transformation as the +0 version except that we use a begin_borrow
inside the loop instead of a copy_value (making it even more efficient).
Specifically:
bb0:
br bb1
bb1:
cond_br ..., bb2, bb3
bb2:
br bb1
bb3:
return
What would happen in this case is that we wouldn't revisit bb1 after we found
the double consume to grab the leak (and would early exit as well). So we would
not insert a destroy for the out of loop value causing a leak from the
perspective of the ownership checker. This was due to us early exiting and also
due to us not revisiting bb1 after we went around the backedge from bb2 ->
bb1. Now what we do instead is if we catch a double consume in a block we have
already visited, we check if we haven't visited any of that block's
successors. If we haven't, we add that to the list of successors we need to
visit.
I am starting to use the linear lifetime checker in an optimizer role where it
no longer asserts but instead tells the optimizer pass what is needed to cause
the lifetime to be linear. To do so I need to be able to return richer
information to the caller such as whether or not a leak, double consume, or
use-after-free occurs.
We already do the SSA updater optimization if we have an available value in the
same block. If we do not have such an available value, we insert the Phis
despite all of the available values being the same. The small change here just
fixes that issue.
I also translated predictable_deadalloc_elim.sil into an ownership test and
added more tests that double checks the ownership specific functionality. This
change should be NFC without ownership.
This is NFC. I am going to use this check in another part of the code to verify
that we can split up destroy_addr without needing to destructure available
values.
With ownership PMO needs to insert copies before the stores that provide an
available value. Usually, we are only using the available value along a single
path. This change ensures that if in that situation, we have "leaking paths"
besides that single path, PMO inserts compensating destroys.
NOTE: Without ownership enabled this is NFC.
When ownership is disabled this is NFC.
NOTE: In terms of testing, I am trying to get everything into place in
preparation for landing a complete ownership translation of the pmo tests. I
need to get enough in place for the full test translation to work. In the mean
time, I am going to start adding memaccess specific stuff in its own ownership
file.
The reason this is true is that an assign is an instruction in PMO parlance that
must destroy the value stored into memory at that location previously. So PMO
would need to be taught to ensure that said destroy is promoted. Consider the
following example:
%0 = alloc_stack $Foo
store %1 to [init] %0 : $*Foo
store %2 to [assign] %0 : $*Foo
destroy_addr %0 : $*Foo
dealloc_stack %0 : $*Foo
If PMO were to try to eliminate the alloc_stack as PMO is written today, one
would have:
destroy_value %2 : $Foo
That is clearly wrong.
PMO uses InitOrAssign for trivially typed things and Init/Assign for non-trivial
things, so I think this was an oversight from a long time ago. There is actually
no /real/ effect on the code today since after exploding the copy_addr, the
store will still be used to produce the right available value and since for
stores, init/assign/initorassign all result in allocations being removed. Once
though I change assign to not allow for allocation removal (the proper way to
model this), without this change, certain trivial allocations will no longer be
removed, harming perf as seen via the benchmarking run on the bots in #21918.
This models the semantics we are trying to have here more closely without
changing current output. Specifically, an assign in this case is supposed to
mean that the underlying value is overwritten and destroyed (via a ref count
decrement). This is obviously true with the non-init copy_addr form, i.e.:
%0 = alloc_stack $Builtin.NativeObject
%1 = alloc_stack $Builtin.NativeObject
...
copy_addr [take] %0 to %1 : $*Builtin.NativeObject
...
Notice how the instruction is actively going to destroy whatever is in %1. Lets
consider the same SIL after exploding the copy_addr.
%0 = alloc_stack $Builtin.NativeObject
%1 = alloc_stack $Builtin.NativeObject
...
%2 = load %0 : $*Builtin.NativeObject
%3 = load %1 : $*Builtin.NativeObject (1)
store %2 to %1 : $*Builtin.NativeObject
destroy_value %3 : $Builtin.NativeObject (2)
...
In this case, the store is actually acting like an initialization since the
destructive part of the assign is performed by (1), (2). So considering the
store to be an assign is incorrect since a store that is an assign /should/
destroy the underlying value itself.
In terms of the actual effect on the result of the pass today, Initialization
and Assign stores are treated the same when it comes to getting available values
from stores and (for stores) destroying allocations. So this should be an NFC
change, but gets us closer to the model where assigns are only used for things
that store into memory and destroy the underlying model directly.
I am removing these for the following reasons:
* PMO does not have any tests for these code paths. (1).
* PMO does not try to promote these loads (it explicitly pattern matches load,
copy_addr) or get available values from these (it explicitly pattern matches
store or explodes a copy_addr to get the copy_addr's stores). This means that
removing this code will not effect our constant propagation diagnostics. So,
removing this untested code path at worst could cause us to no longer
eliminate some dead objects that we otherwise would be able to eliminate at
-Onone (low-priority). (2).
----
(1). I believe that the lack of PMO tests is due to this being a vestigal
remnant of DI code in PMO. My suspicion arises since:
* The code was added when the two passes were both sharing the same use
collector and auxillary data structures. Since then I have changed DI/PMO
to each have their own copies.
* DI has a bunch of tests that verify behavior around these instructions.
(2). I expect the number of actually removed allocations that are no longer
removed should be small since we do not promote loads from such allocations
and PMO will not eliminate an allocation that has any loads.
PartialStore is a PMOUseKind that is a vestigal remnant of Definite Init in the
PMO source. This can be seen by noting that in Definite Init, PartialStore is
how Definite Init diagnoses partially initialized values and errors. In contrast
in PMO the semantics of PartialStore are:
1. It can only be produced if we have a raw store use or a copy_addr.
2. We allow for the use to provide an available value just like if it was an
assign or an init.
3. We ignore it for the purposes of removing store only allocations since by
itself without ownership, stores (and stores from exploded copy_addr) do not
effect ownership in any way.
Rather than keeping this around, in this commit I review it since it doesn't
provide any additional value or [init] or [assign]. Functionally there should be
no change.
I discovered while updating PMO for ownership that for ~5 years there has been a
bug where we were treating copy_addr of trivial values like an "Assign" (in PMO
terminology) of a non-trivial value and thus stopping allocation
elimination. When I fixed this I discovered that this caused us to no longer
emit diagnostics in a predictable way. Specifically, consider the following
swift snippet:
var _: UInt = (-1) >> 0
Today, we emit a diagnostic that -1 can not be put into a UInt. This occurs
since even though the underlying allocation is only stored into, the copy_addr
assign keeps it alive, causing the diagnostics pass to see the conversion. With
my fix though, we see that we are only storing into the allocation, causing the
allocation to be eliminated before the constant propagation diagnostic pass
runs, causing the diagnostic to no longer be emitted.
We should truly not be performing this type of DCE before we emit such
diagnostics. So in this commit, I split the pass into two parts:
1. A load promotion pass that performs the SSA formation needed for SSA based
diagnostics to actually work.
2. An allocation elimination passes that run /after/ SSA based diagnostics.
This should be NFC since the constant propagation SSA based diagnostics do not
create memory operations so the output should be the same.
The main contribution of this commit is that I refactored out a helper class for
managing the data used when promoting destroy_addr. This enabled me to make a
getData method on the helper class to simplify getting iterating over a
destroy_addr and its available values.
TLDR: This does not eliminate the struct/tuple flat namespace from Predictable
Mem Opts. Just the tuple specific flat namespace code from PMOMemoryUseCollector
that we were computing and then throwing away. I explain below in more detail.
First note that this is cruft from when def-init and pmo were one pass. What we
were doing here was maintaing a flattened tuple namespace while we were
collecting uses in PMOMemoryUseCollector. We never actually used them for
anything since we recomputed this information including information about
structs in PMO itself! So this information was truly completely dead.
This commit removes that and related logic and from a maintenance standpoint
makes PMOMemoryUseCollector a simple visitor that doesn't have any real special
logic in it beyond the tuple scalarization.
Specifically, we are putting dealloc_stack, destroy_box into the Releases array
in PMOMemoryUseCollector only to ignore them in the only place that we use the
Releases array in PredictableMemOpts.
These are vestigal remnants of the code when it needed to support DI and
PredMemOpts. Since both of the passes have been split, these are now dead in
PMO. So eliminate them.
This is in preparation for verifying that when ownership verification is enabled
that only enums and trivial values can have any ownership. I am doing this in
preparation for eliminating ValueOwnershipKind::Trivial.
rdar://46294760
This makes the use of the used attribute more portable.
`__attribute__((__used__))` is not accepted by Visual Studio, but LLVM
conveniently provides the helpful macro `LLVM_ATTRIBUTE_USED` to enable this.
I am going to DI and predictable mem opts have been split for a long time and
their subroutines aren't going to be joined in the future... so replace the DI
prefixes in pred-mem-opts impl with PMO and rename DIMemoryUseCollector =>
PMOUseCollector.
Been sitting on this for a long time... just happy to get it in.
Until the beginning of the ownership transition, DI and predictable mem opts
used the same memory use collector. I split them partially since I need to turn
on ownership for predictable mem opts at one time, but also b/c there was a huge
amount of special code that would only trigger if it was used by DI or used by
predictable mem opts. After I did the copy some of the asserts that were needed
for DI remained in the predictable mem opts code. When pred-memopts was only run
in the mandatory pipeline keeping these assertions were ok, but pred-memopts was
recently added to the perf pipeline meaning that it may see code that breaks
these DI invariants (and thus hit this assertion).
We should remove this limitation on predictable-memopts but that would require
some scheduled time to read the code (more than I have to fix this bug = p). So
instead I changed the code to just bail in these cases.
rdar://40032102
This mandatory pass inserts struct_extract operations before earlier
stores to the aggregate but didn't set the debug location of those new
instructions to the location of the store, which caused unexpected
stepping behavior in the debugger.
Fixes a regression from e74367f2b3.
<rdar://problem/35459092>