The major important thing here is that by using copy_unowned_value we can
guarantee that the non-ownership SIL ARC optimizer will treat the release
associated with the strong_retain_unowned as on a distinc rc-identity from its
argument. As an example of this problem consider the following SILGen like
output:
----
%1 = copy_value %0 : $Builtin.NativeObject
%2 = ref_to_unowned %1
%3 = copy_unowned_value %2
destroy_value %1
...
destroy_value %3
----
In this case, we are converting a strong reference to an unowned value and then
lifetime extending the value past the original value. After eliminating
ownership this lowers to:
----
strong_retain %0 : $Builtin.NativeObject
%1 = ref_to_unowned %0
strong_retain_unowned %1
strong_release %0
...
strong_release %0
----
From an RC identity perspective, we have now blurred the lines in between %3 and
%1 in the previous example. This can then result in the following miscompile:
----
%1 = ref_to_unowned %0
strong_retain_unowned %1
...
strong_release %0
----
In this case, it is possible that we created a lifetime gap that will then cause
strong_retain_unowned to assert. By not lowering copy_unowned_value throughout
the SIL pipeline, we instead get this after lowering:
----
strong_retain %0 : $Builtin.NativeObject
%1 = ref_to_unowned %0
%2 = copy_unowned_value %1
strong_release %0
...
strong_release %2
----
And we do not miscompile since we preserved the high level rc identity
pairing.
There shouldn't be any performance impact since we do not really optimize
strong_retain_unowned at the SIL level. I went through all of the places that
strong_retain_unowned was referenced and added appropriate handling for
copy_unowned_value.
rdar://41328987
**NOTE** I am going to remove strong_retain_unowned in a forthcoming commit. I
just want something more minimal for cherry-picking purposes.
introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
Replace `NameOfType foo = dyn_cast<NameOfType>(bar)` with DRY version `auto foo = dyn_cast<NameOfType>(bar)`.
The DRY auto version is by far the dominant form already used in the repo, so this PR merely brings the exceptional cases (redundant repetition form) in line with the dominant form (auto form).
See the [C++ Core Guidelines](https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es11-use-auto-to-avoid-redundant-repetition-of-type-names) for a general discussion on why to use `auto` to avoid redundant repetition of type names.
At some point, pass definitions were heavily macro-ized. Pass
descriptive names were added in two places. This is not only redundant
but a source of confusion. You could waste a lot of time grepping for
the wrong string. I removed all the getName() overrides which, at
around 90 passes, was a fairly significant amount of code bloat.
Any pass that we want to be able to invoke by name from a tool
(sil-opt) or pipeline plan *should* have unique type name, enum value,
commend-line string, and name string. I removed a comment about the
various inliner passes that contradicted that.
Side note: We should be consistent with the policy that a pass is
identified by its type. We have a couple passes, LICM and CSE, which
currently violate that convention.
This was already done for getSuccessorBlocks() to distinguish getting successor
blocks from getting the full list of SILSuccessors via getSuccessors(). This
commit just makes all of the successor/predecessor code follow that naming
convention.
Some examples:
getSingleSuccessor() => getSingleSuccessorBlock().
isSuccessor() => isSuccessorBlock().
getPreds() => getPredecessorBlocks().
Really, IMO, we should consider renaming SILSuccessor to a more verbose name so
that it is clear that it is more of an internal detail of SILBasicBlock's
implementation rather than something that one should consider as apart of one's
mental model of the IR when one really wants to be thinking about predecessor
and successor blocks. But that is not what this commit is trying to change, it
is just trying to eliminate a bit of technical debt by making the naming
conventions here consistent.
Before this commit all code relating to handling arguments in SILBasicBlock had
somewhere in the name BB. This is redundant given that the class's name is
already SILBasicBlock. This commit drops those names.
Some examples:
getBBArg() => getArgument()
BBArgList => ArgumentList
bbarg_begin() => args_begin()
Today, loads and stores are treated as having @unowned(unsafe) ownership
semantics. This leaves the user to specify ownership changes on the loaded or
stored value independently of the load/store by inserting ARC operations. With
the change to Semantic SIL, this will no longer be true. Instead loads, stores
have ownership semantics that one must reason about such as copy, take, and
trivial.
This change moves us closer to that world by eliminating the default
OwnershipQualification argument from create{Load,Store}. This means that the
compiler developer cannot ignore reasoning about the ownership semantics of the
memory operation that they are creating.
Operationally, this is a NFC change since I have just gone through the compiler
and updated all places where we create loads, stores to pass in the former
default argument ({Load,Store}OwnershipQualifier::Unqualified), to
SILBuilder::create{Load,Store}(...). For now, one can just do that in situations
where one needs to create loads/stores, but over time, I am going to tighten the
semantics up via the verifier.
rdar://28685236
LSValue::reduce reduces a set of LSValues (mapped to a set of LSLocations) to
a single LSValue.
It can then be used as the forwarding value for the location.
Previously, we expand into intermediate nodes and leaf nodes and then go bottom
up, trying to create a single LSValue out of the given LSValues.
Instead, we now use a recursion to go top down. This simplifies the code. And this
is fine as we do not expect to run into type tree that are too deep.
Existing test cases ensure correctness.
For forwarding on allocstacks, we can invalidate the forwable bit when we
hit the deallocate stack.
This helps compilation time as we do not need to propagate these bits down
to subsequent basic blocks.
For a release on a guaranteed function paramater, we know right away
that its not the final release and therefore does not call deinit.
Therefore we know it does not read or write memory other than the reference
count.
This reduces the compilation time of dead store and redundant load elim. As
we need to go over alias analysis to make sure tracked locations do not alias
with it.
This patch also implements some of the missing functions used by RLE and DSE in new projection
that exist in the old projection.
New projection provides better memory usage, eventually we will phase out the old projection code.
New projection is now copyable, i.e. we have a proper constructor for it. This helps make the code
more readable.
We do see a bit increase in compilation time in compiling stdlib -O, this is a result of the way
we now get types of a projection path, but I expect this to go down (away) with further improvement
on how memory locations are constructed and cached with later patches.
=== With the OLD Projection. ===
Total amount of memory allocated.
--------------------------------
Bytes Used Count Symbol Name
13032.01 MB 50.6% 2158819 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2879.70 MB 11.1% 3076018 (anonymous namespace)::ARCSequenceOpts::run()
2663.68 MB 10.3% 1375465 (anonymous namespace)::RedundantLoadElimination::run()
1534.35 MB 5.9% 5067928 (anonymous namespace)::SimplifyCFGPass::run()
1278.09 MB 4.9% 576714 (anonymous namespace)::SILCombine::run()
1052.68 MB 4.0% 935809 (anonymous namespace)::DeadStoreElimination::run()
771.75 MB 2.9% 1677391 (anonymous namespace)::SILCSE::run()
715.07 MB 2.7% 4198193 (anonymous namespace)::GenericSpecializer::run()
434.87 MB 1.6% 652701 (anonymous namespace)::SILSROA::run()
402.99 MB 1.5% 658563 (anonymous namespace)::SILCodeMotion::run()
341.13 MB 1.3% 962459 (anonymous namespace)::DCE::run()
279.48 MB 1.0% 415031 (anonymous namespace)::StackPromotion::run()
Compilation time breakdown.
--------------------------
Running Time Self (ms) Symbol Name
25716.0ms 35.8% 0.0 swift::runSILOptimizationPasses(swift::SILModule&)
25513.0ms 35.5% 0.0 swift::SILPassManager::runOneIteration()
20666.0ms 28.8% 24.0 swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19664.0ms 27.4% 77.0 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3272.0ms 4.5% 12.0 (anonymous namespace)::SimplifyCFGPass::run()
3266.0ms 4.5% 7.0 (anonymous namespace)::ARCSequenceOpts::run()
2608.0ms 3.6% 5.0 (anonymous namespace)::SILCombine::run()
2089.0ms 2.9% 104.0 (anonymous namespace)::SILCSE::run()
1929.0ms 2.7% 47.0 (anonymous namespace)::RedundantLoadElimination::run()
1280.0ms 1.7% 14.0 (anonymous namespace)::GenericSpecializer::run()
1010.0ms 1.4% 45.0 (anonymous namespace)::DeadStoreElimination::run()
966.0ms 1.3% 191.0 (anonymous namespace)::DCE::run()
496.0ms 0.6% 6.0 (anonymous namespace)::SILCodeMotion::run()
=== With the NEW Projection. ===
Total amount of memory allocated.
--------------------------------
Bytes Used Count Symbol Name
11876.64 MB 48.4% 22112349 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2887.22 MB 11.8% 3079485 (anonymous namespace)::ARCSequenceOpts::run()
1820.89 MB 7.4% 1877674 (anonymous namespace)::RedundantLoadElimination::run()
1533.16 MB 6.2% 5073310 (anonymous namespace)::SimplifyCFGPass::run()
1282.86 MB 5.2% 577024 (anonymous namespace)::SILCombine::run()
772.21 MB 3.1% 1679154 (anonymous namespace)::SILCSE::run()
721.69 MB 2.9% 936958 (anonymous namespace)::DeadStoreElimination::run()
715.08 MB 2.9% 4196263 (anonymous namespace)::GenericSpecializer::run()
Compilation time breakdown.
--------------------------
Running Time Self (ms) Symbol Name
25137.0ms 37.3% 0.0 swift::runSILOptimizationPasses(swift::SILModule&)
24939.0ms 37.0% 0.0 swift::SILPassManager::runOneIteration()
20226.0ms 30.0% 29.0 swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19241.0ms 28.5% 83.0 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3214.0ms 4.7% 10.0 (anonymous namespace)::SimplifyCFGPass::run()
3005.0ms 4.4% 14.0 (anonymous namespace)::ARCSequenceOpts::run()
2438.0ms 3.6% 7.0 (anonymous namespace)::SILCombine::run()
2217.0ms 3.2% 54.0 (anonymous namespace)::RedundantLoadElimination::run()
2212.0ms 3.2% 131.0 (anonymous namespace)::SILCSE::run()
1195.0ms 1.7% 11.0 (anonymous namespace)::GenericSpecializer::run()
1168.0ms 1.7% 39.0 (anonymous namespace)::DeadStoreElimination::run()
853.0ms 1.2% 150.0 (anonymous namespace)::DCE::run()
499.0ms 0.7% 7.0 (anonymous namespace)::SILCodeMotion::run()
This change is needed for the next update to ToT LLVM. It can be put
into place now without breaking anything so I am committing it now.
The churn upstream on ilist_node is neccessary to remove undefined
behavior. Rather than updating the different ilist_node patches for the
hacky change required to not use iterators, just use iterators and keep
everything as ilist_nodes. Upstream they want to eventually do this, so
it makes sense for us to just do it now.
Please do not introduce new invocations of
ilist_node::get{Next,Prev}Node() into the tree.
Now we have 3 cases.
1. OptimizeNone (for functions with too many basicblocks and too many locations). Simply return.
2. Pessimisitc single iteration data flow (for functions with many basic blocks and many locations).
3. Optimistic multiple iteration data flow (for functions with some basic blocks and some locations
and require iterative data flow).
With this change stdlib and stdlibunittest has some changes in dead store(DS)
eliminated.
stdlib: 202 -> 203 DS.
stdlibunittest: 42 - 39 DS.
Compilation time improvement: with this change on a RELEASE+ASSERT compiler for stdlibunittest.
Running Time Self (ms) Symbol Name
5525.0ms 5.3% 25.0 (anonymous namespace)::ARCSequenceOpts::run()
3500.0ms 3.4% 25.0 (anonymous namespace)::RedundantLoadElimination::run()
3050.0ms 2.9% 25.0 (anonymous namespace)::SILCombine::run()
2700.0ms 2.6% 0.0 (anonymous namespace)::SimplifyCFGPass::run()
2100.0ms 2.0% 75.0 (anonymous namespace)::SILCSE::run()
1450.0ms 1.4% 0.0 (anonymous namespace)::DeadStoreElimination::run()
750.0ms 0.7% 75.0 (anonymous namespace)::DCE::run()
Compilation time improvement: with this change on a DEBUG compiler for stdlibunittest.
Running Time Self (ms) Symbol Name
42300.0ms 4.9% 50.0 (anonymous namespace)::ARCSequenceOpts::run()
35875.0ms 4.1% 0.0 (anonymous namespace)::RedundantLoadElimination::run()
30475.0ms 3.5% 0.0 (anonymous namespace)::SILCombine::run()
19675.0ms 2.3% 0.0 (anonymous namespace)::SILCSE::run()
18150.0ms 2.1% 25.0 (anonymous namespace)::SimplifyCFGPass::run()
12475.0ms 1.4% 0.0 (anonymous namespace)::DeadStoreElimination::run()
5775.0ms 0.6% 0.0 (anonymous namespace)::DCE::run()
I do not see a compilation time change in stdlib.
Existing tests ensure correctness.