This reduces the diff in between -Onone output when stripping before/after
serialization.
We support load_borrow by translating it to the load [copy] case. Specifically,
for +1, we normally perform the following transform.
store %1 to [init] %0
...
%2 = load [copy] %0
...
use(%2)
...
destroy_value %2
=>
%1a = copy_value %1
store %1 to [init] %0
...
use(%1a)
...
destroy_value %1a
We analogously can optimize load_borrow by replacing the load with a
begin_borrow:
store %1 to [init] %0
...
%2 = load_borrow %0
...
use(%2)
...
end_borrow %2
=>
%1a = copy_value %1
store %1 to [init] %0
...
%2 = begin_borrow %1a
...
use(%2)
...
end_borrow %2
destroy_value %1a
The store from outside a loop being used by a load_borrow inside a loop is a
similar transformation as the +0 version except that we use a begin_borrow
inside the loop instead of a copy_value (making it even more efficient).
Beside fixing the compiler crash, this change also improves the stack-nesting correction mechanisms in the inliners:
* Instead of trying to correct the nesting after each inlining of a callee, correct the nesting once when inlining is finished for a caller function.
This fixes a potential compile time problem, because StackNesting iterates over the whole function.
In worst case this can lead to quadratic behavior in case many begin_apply instructions with overlapping stack locations are inlined.
* Because we are doing it only once for a caller, we can remove the complex logic for checking if it is necessary.
We can just do it unconditionally in case any coroutine gets inlined.
The inliners iterate over all instruction of a function anyway, so this does not increase the computational complexity (StackNesting is roughly linear with the number of instructions).
rdar://problem/47615442
This disables a bunch of passes when ownership is enabled. This will allow me to
keep transparent functions in ossa and skip most of the performance pipeline without
being touched by passes that have not been updated for ownership.
This is important so that we can in -Onone code import transparent functions and
inline them into other ossa functions (you can't inline from ossa => non-ossa).
Specifically:
bb0:
br bb1
bb1:
cond_br ..., bb2, bb3
bb2:
br bb1
bb3:
return
What would happen in this case is that we wouldn't revisit bb1 after we found
the double consume to grab the leak (and would early exit as well). So we would
not insert a destroy for the out of loop value causing a leak from the
perspective of the ownership checker. This was due to us early exiting and also
due to us not revisiting bb1 after we went around the backedge from bb2 ->
bb1. Now what we do instead is if we catch a double consume in a block we have
already visited, we check if we haven't visited any of that block's
successors. If we haven't, we add that to the list of successors we need to
visit.
I am starting to use the linear lifetime checker in an optimizer role where it
no longer asserts but instead tells the optimizer pass what is needed to cause
the lifetime to be linear. To do so I need to be able to return richer
information to the caller such as whether or not a leak, double consume, or
use-after-free occurs.
We already do the SSA updater optimization if we have an available value in the
same block. If we do not have such an available value, we insert the Phis
despite all of the available values being the same. The small change here just
fixes that issue.
I also translated predictable_deadalloc_elim.sil into an ownership test and
added more tests that double checks the ownership specific functionality. This
change should be NFC without ownership.
This is NFC. I am going to use this check in another part of the code to verify
that we can split up destroy_addr without needing to destructure available
values.
With ownership PMO needs to insert copies before the stores that provide an
available value. Usually, we are only using the available value along a single
path. This change ensures that if in that situation, we have "leaking paths"
besides that single path, PMO inserts compensating destroys.
NOTE: Without ownership enabled this is NFC.
When ownership is disabled this is NFC.
NOTE: In terms of testing, I am trying to get everything into place in
preparation for landing a complete ownership translation of the pmo tests. I
need to get enough in place for the full test translation to work. In the mean
time, I am going to start adding memaccess specific stuff in its own ownership
file.
The reason this is true is that an assign is an instruction in PMO parlance that
must destroy the value stored into memory at that location previously. So PMO
would need to be taught to ensure that said destroy is promoted. Consider the
following example:
%0 = alloc_stack $Foo
store %1 to [init] %0 : $*Foo
store %2 to [assign] %0 : $*Foo
destroy_addr %0 : $*Foo
dealloc_stack %0 : $*Foo
If PMO were to try to eliminate the alloc_stack as PMO is written today, one
would have:
destroy_value %2 : $Foo
That is clearly wrong.
PMO uses InitOrAssign for trivially typed things and Init/Assign for non-trivial
things, so I think this was an oversight from a long time ago. There is actually
no /real/ effect on the code today since after exploding the copy_addr, the
store will still be used to produce the right available value and since for
stores, init/assign/initorassign all result in allocations being removed. Once
though I change assign to not allow for allocation removal (the proper way to
model this), without this change, certain trivial allocations will no longer be
removed, harming perf as seen via the benchmarking run on the bots in #21918.
This models the semantics we are trying to have here more closely without
changing current output. Specifically, an assign in this case is supposed to
mean that the underlying value is overwritten and destroyed (via a ref count
decrement). This is obviously true with the non-init copy_addr form, i.e.:
%0 = alloc_stack $Builtin.NativeObject
%1 = alloc_stack $Builtin.NativeObject
...
copy_addr [take] %0 to %1 : $*Builtin.NativeObject
...
Notice how the instruction is actively going to destroy whatever is in %1. Lets
consider the same SIL after exploding the copy_addr.
%0 = alloc_stack $Builtin.NativeObject
%1 = alloc_stack $Builtin.NativeObject
...
%2 = load %0 : $*Builtin.NativeObject
%3 = load %1 : $*Builtin.NativeObject (1)
store %2 to %1 : $*Builtin.NativeObject
destroy_value %3 : $Builtin.NativeObject (2)
...
In this case, the store is actually acting like an initialization since the
destructive part of the assign is performed by (1), (2). So considering the
store to be an assign is incorrect since a store that is an assign /should/
destroy the underlying value itself.
In terms of the actual effect on the result of the pass today, Initialization
and Assign stores are treated the same when it comes to getting available values
from stores and (for stores) destroying allocations. So this should be an NFC
change, but gets us closer to the model where assigns are only used for things
that store into memory and destroy the underlying model directly.
I am removing these for the following reasons:
* PMO does not have any tests for these code paths. (1).
* PMO does not try to promote these loads (it explicitly pattern matches load,
copy_addr) or get available values from these (it explicitly pattern matches
store or explodes a copy_addr to get the copy_addr's stores). This means that
removing this code will not effect our constant propagation diagnostics. So,
removing this untested code path at worst could cause us to no longer
eliminate some dead objects that we otherwise would be able to eliminate at
-Onone (low-priority). (2).
----
(1). I believe that the lack of PMO tests is due to this being a vestigal
remnant of DI code in PMO. My suspicion arises since:
* The code was added when the two passes were both sharing the same use
collector and auxillary data structures. Since then I have changed DI/PMO
to each have their own copies.
* DI has a bunch of tests that verify behavior around these instructions.
(2). I expect the number of actually removed allocations that are no longer
removed should be small since we do not promote loads from such allocations
and PMO will not eliminate an allocation that has any loads.
PartialStore is a PMOUseKind that is a vestigal remnant of Definite Init in the
PMO source. This can be seen by noting that in Definite Init, PartialStore is
how Definite Init diagnoses partially initialized values and errors. In contrast
in PMO the semantics of PartialStore are:
1. It can only be produced if we have a raw store use or a copy_addr.
2. We allow for the use to provide an available value just like if it was an
assign or an init.
3. We ignore it for the purposes of removing store only allocations since by
itself without ownership, stores (and stores from exploded copy_addr) do not
effect ownership in any way.
Rather than keeping this around, in this commit I review it since it doesn't
provide any additional value or [init] or [assign]. Functionally there should be
no change.
I discovered while updating PMO for ownership that for ~5 years there has been a
bug where we were treating copy_addr of trivial values like an "Assign" (in PMO
terminology) of a non-trivial value and thus stopping allocation
elimination. When I fixed this I discovered that this caused us to no longer
emit diagnostics in a predictable way. Specifically, consider the following
swift snippet:
var _: UInt = (-1) >> 0
Today, we emit a diagnostic that -1 can not be put into a UInt. This occurs
since even though the underlying allocation is only stored into, the copy_addr
assign keeps it alive, causing the diagnostics pass to see the conversion. With
my fix though, we see that we are only storing into the allocation, causing the
allocation to be eliminated before the constant propagation diagnostic pass
runs, causing the diagnostic to no longer be emitted.
We should truly not be performing this type of DCE before we emit such
diagnostics. So in this commit, I split the pass into two parts:
1. A load promotion pass that performs the SSA formation needed for SSA based
diagnostics to actually work.
2. An allocation elimination passes that run /after/ SSA based diagnostics.
This should be NFC since the constant propagation SSA based diagnostics do not
create memory operations so the output should be the same.
The main contribution of this commit is that I refactored out a helper class for
managing the data used when promoting destroy_addr. This enabled me to make a
getData method on the helper class to simplify getting iterating over a
destroy_addr and its available values.
This will cause this code to automagically work correctly in ossa code since we
will instead of emitting the tuple_extracts will emit a destructure operation
automagically without code change.
Since this entrypoint will emit tuple_extracts in non-ossa code, this is a NFC
patch.
This is technically a NFC commit. The changes are:
1. When we scalarize loads/stores, we need to not just use unqualified
loads/stores. Instead we need to use the createTrivial{Load,Store}Or APIs. In
OSSA mode, this will propagate through the original ownership qualifier if the
sub-type is non-trivial, but if the sub-type is non-trivial will change the
qualifier to trivial. Today when the pass runs without ownership nothing is
changed since I am passing in the "supports unqualified" flag to the
createTrivial{Load,Store}Or API so that we just create an unqualified memop if
we are passed in an unqualified memop. Once we fully move pmo to ownership, this
flag will be removed and we will assert.
2. The container walker is taught about copy_value, destroy_value. Specifically,
we teach the walker how to recursively look through copy_values during the walk
and to treat a destroy_value of the box like a strong_release,
release_value. Since destroy_value, copy_value only exist in [ossa] today, this
is also NFC.
This undoes some of Joe's work in 8665342 to add a guarantee: if an
@objc convenience initializer only calls other @objc initializers that
eventually call a designated initializer, it won't result in an extra
allocation. While Objective-C /allows/ returning a different object
from an initializer than the allocation you were given, doing so
doesn't play well with some very hairy implementation details of
compiled nib files (or NSCoding archives with cyclic references in
general).
This guarantee only applies to
(1) calling `self.init`
(2) where the delegated-to initializer is @objc
because convenience initializers must do dynamic dispatch when they
delegate, and Swift only stores allocating entry points for
initializers in a class's vtable. To dynamically find an initializing
entry point, ObjC dispatch must be used instead.
(It's worth noting that this patch does NOT check that the calling
initializer is a convenience initializer when deciding whether to use
ObjC dispatch for `self.init`. If we ever add peer delegation to
designated initializers, which is totally a valid feature, that should
use static dispatch and therefore should not go through objc_msgSend.)
This change doesn't /always/ result in fewer allocations; if the
delegated-to initializer ends up returning a different object after
all, the original allocation was wasted. Objective-C has the same
problem (one of the reasons why factory methods exist for things like
NSNumber and NSArray).
We do still get most of the benefits of Joe's original change. In
particular, vtables only ever contain allocating initializer entry
points, never the initializing ones, and never /both/ (which was a
thing that could happen with 'required' before).
rdar://problem/46823518
In Swift 5 mode, 'self' in a convenience init has a DynamicSelfType, so
the value_metatype instruction returns a DynamicSelfType metatype.
However, the metatype argument to the constructor is a plain class metatype
because it's an interface type, so we have to "open" it by bitcasting it
to a DynamicSelfType.
Fixes <https://bugs.swift.org/browse/SR-9430>, <rdar://problem/46982573>.
Since:
1. We only handle alloc_stack, alloc_box in predictable memopts.
2. alloc_stack can not be released.
We know that the release collecting in collectFrom can just be done in
collectContainerUses() [which only processes alloc_box].
This also let me simplify some code as well and add a defensive check in case
for some reason we are passed a release_value on the box. NOTE: I verified that
previously this did not result in a bug since we would consider the
release_value to be an escape of the underlying value even though we didn't
handle it in collectFrom. But the proper way to handle release_value is like
strong_release, so I added code to do that as well.
TLDR: This does not eliminate the struct/tuple flat namespace from Predictable
Mem Opts. Just the tuple specific flat namespace code from PMOMemoryUseCollector
that we were computing and then throwing away. I explain below in more detail.
First note that this is cruft from when def-init and pmo were one pass. What we
were doing here was maintaing a flattened tuple namespace while we were
collecting uses in PMOMemoryUseCollector. We never actually used them for
anything since we recomputed this information including information about
structs in PMO itself! So this information was truly completely dead.
This commit removes that and related logic and from a maintenance standpoint
makes PMOMemoryUseCollector a simple visitor that doesn't have any real special
logic in it beyond the tuple scalarization.