We do allow for limited def-use traversal to do things like find debug_value.
But nothing like a full data flow pass. As a proof of concept I just emit
opt-remarks when we see retains, releases. Eventually I would like to also print
out which lvalue the retain is from but that is more than I want to do with this
proof of concept.
This is a really simple pass that isn't going to be touched for a long time
after I am done fixing the pass for ownership. So it makes sense to clean it up
now. I am doing this as a separate commit before updating for ownership.
The idea is that this will let me remove these assertions that were in place to
make sure we were really conservative around specializing ownership code. For me
to remove that I need to be able to actually test out this code (since I think
there are some code paths where this will trigger in other parts of the compiler
now).
So to work out the kinks, I added a flag that allows for the generic specializer
to process ownership code and translated most of the .sil test cases/fixed any
bugs that I found. This hopefully will expose anything that is missing.
NOTE: I have not enabled the generic specializer running in ownership in the
pipeline. This is just a step in that direction by adding tests/etc.
CopyForwarding attempts to enforce "normal" SIL address patterns using
asserts. This isn't a good strategy because it results in strange
crash diagnostics in release builds. Eventually, we should replace
this logic with a SIL address lifetime utility based on OSSA form and
enforced in the verifier.
Loosen one of these restrictions where we assume that a value
initialized with "copy_addr [initialization]" will be properly
destroyed. This assumption is violated when lowering
.int_fma_FPIEEE32, which knows that the type is trivial, so avoids
deinitialization.
The original SIL looks like this:
copy_addr %src to [initialization] %dest : $*Float
%fma = builtin "int_fma_FPIEEE32"(% : $Builtin.FPIEEE32, % : $Builtin.FPIEEE32, % : $Builtin.FPIEEE32) : $Builtin.FPIEEE32
%result = struct $Float (%fma : $Builtin.FPIEEE32)
store %result to %dest : $*Float
Fixes rdar://64671864.
* Don't always give shared linkage to spl functions
private functions on specialization were being given shared linkage.
Use swift::getSpecializeLinkage to correctly get the linkage for the
specialized function based on the linkage of the original function.
* Extend AllocBoxToStack to handle apply
AllocBoxToStack analyzes the uses of boxes and promotes them to stack if
it is safe to do so. Currently the analysis is limited to only a few known
users including partial_apply.
With this change, the pass also analyzes apply users, where the callee
is a local private function.
The analysis is recursive and bound by a threshold.
Fixes rdar://59070139
Optimizes copies from a temporary (an "l-value") to a destination.
%temp = alloc_stack $Ty
instructions_which_store_to %temp
copy_addr [take] %temp to %destination
dealloc_stack %temp
is optimized to
destroy_addr %destination
instructions_which_store_to %destination
The name TempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side.
The TempLValueOpt is similar to CopyForwarding::backwardPropagateCopy.
It's more restricted (e.g. the copy-source must be an alloc_stack).
That enables other patterns to be optimized, which backwardPropagateCopy cannot handle.
This pass also performs a small peephole optimization which simplifies copy_addr - destroy sequences.
copy_addr %source to %destination
destroy_addr %source
is replace with
copy_addr [take] %source to %destination
The pass is already not being run during normal compilation scenarios today
since it bails on OSSA except in certain bit-rot situations where a test wasn't
updated and so was inadvertently invoking the pass. I discovered these while
originally just trying to eliminate the pass from the diagnostic pipeline. The
reason why I am doing this in one larger change is that I found there were a
bunch of sil tests inadvertently relying on guaranteed arc opts to eliminate
copy traffic. So, if I just removed this and did this in two steps, I would
basically be unoptimizing then re-optimizing the tests.
Some notes:
1. The new guaranteed arc opts is based off of SemanticARCOpts and runs only on
ossa. Specifically, in this new pass, we just perform simple
canonicalizations that do not involve any significant analysis. Some
examples: a copy_value all of whose uses are destroys. This will do what the
original pass did and more without more compile time. I did a conservative
first approximation, but we can probably tune this a bit.
2. the reason why I am doing this now is that I was trying to eliminate the
enable-ownership-stripping-after-serialization flag and discovered that the
test opaque_value_mandatory implicitly depends on this since sil-opt by
default was the only place left in the compiler with that option set to false
by default. So I am eliminating that dependency before I land the larger
change.
Private and internal classes shouldn't have ABI constraints on their concrete vtable layout, so if methods
don't have overrides in practice, we can elide their vtable entries.
We were not using the primary benefits of an intrusive list, namely the
ability to insert or remove from the middle of the list, so let's switch
to a plain vector. This also avoids linked-list pointer chasing.
Use the new builtins for COW representation in Array, ContiguousArray and ArraySlice.
The basic idea is to strictly separate code which mutates an array buffer from code which reads from an array.
The concept is explained in more detail in docs/SIL.rst, section "Copy-on-Write Representation".
The main change is to use beginCOWMutation() instead of isUniquelyReferenced() and insert endCOWMutation() at the end of all mutating functions. Also, reading from the array buffer must be done differently, depending on if the buffer is in a mutable or immutable state.
All the required invariants are enforced by runtime checks - but only in an assert-build of the library: a bit in the buffer object side-table indicates if the buffer is mutable or not.
Along with the library changes, also two optimizations needed to be updated: COWArrayOpt and ObjectOutliner.
Start with some high-level checks of whether a declaration is formally final, or has no visible overrides
in the domain of the program visible to the SIL module.
We can eventually adopt more of the logic from the Devirtualizer pass to tell whether call sites are
"effectively final", and maybe make the Devirtualizer consume that information applied by this pass.
Just like br inst, structs are phis in the ownership graph that one can induce
on top of the def-use graph. In this commit, I basically fill in the relevant
blanks in the ADT for such phis for struct so that the optimization for branches
is generalized onto structs.
<rdar://problem/63950481>
This is letting me refactor the implementation of every place that I work with
branches as "ownership phi operands" to instead work with
OwnershipPhiOperand. In a future commit, I am going to use this to add support
for optimizing structs and tuples that have multiple owned incoming values.
<rdar://problem/63950481>
Used to "finalize" an array literal. It's not used, yet. So this is NFC.
Also handle the "array.finalize_intrinsic" function in various array specific optimizations.
Constant folds the uniqueness result of begin_cow_mutation instructions, if it can be proved that the buffer argument is uniquely referenced.
For example:
%buffer = end_cow_mutation %mutable_buffer
// ...
// %buffer does not escape here
// ...
(%is_unique, %mutable_buffer2) = begin_cow_mutation %buffer
cond_br %is_unique, ...
is replaced with
%buffer = end_cow_mutation [keep_unique] %mutable_buffer
// ...
(%not_used, %mutable_buffer2) = begin_cow_mutation %buffer
%true = integer_literal 1
cond_br %true, ...
Note that the keep_unique flag is set on the end_cow_mutation because the code now relies on that the buffer is really uniquely referenced.
The optimization can also handle def-use chains between end_cow_mutation and begin_cow_mutation which involve phi-arguments.
An additional peephole optimization is performed: if the begin_cow_mutation is the only use of the end_cow_mutation, the whole pair of instructions is eliminated.
If only a single field of a struct phi-argument is used, replace the argument by the field value.
br bb(%str)
bb(%phi):
%f = struct_extract %phi, #Field // the only use of %phi
use %f
is replaced with
%f = struct_extract %str, #Field
br bb(%f)
bb(%phi):
use %phi
This also works if the phi-argument is in a def-use cycle.
The new PhiExpansionPass is in the same file as the RedundantPhiEliminationPass. Therefore I renamed the source file to PhiArgumentOptimizations.cpp
This simplifies the handling of the subdirectories in the SIL and
SILOptimizer paths. Create individual libraries as object libraries
which allows the analysis of the source changes to be limited in scope.
Because these are object libraries, this has 0 overhead compared to the
previous implementation. However, string operations over the filenames
are avoided. The cost for this is that any new sub-library needs to be
added into the list rather than added with the special local function.