introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
Applying nontrivial generic arguments to a nontrivial SIL layout requires lowered SILType substitution, which requires a SILModule. NFC yet, just an API change.
The new instructions are: ref_tail_addr, tail_addr and a new attribute [ tail_elems ] for alloc_ref.
For details see docs/SIL.rst
As these new instructions are not generated so far, this is a NFC.
The new instructions are: ref_tail_addr, tail_addr and a new attribute [ tail_elems ] for alloc_ref.
For details see docs/SIL.rst
As these new instructions are not generated so far, this is a NFC.
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Small modifications to the test cases.
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Minimal modifications to the test cases.
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
a separate analysis pass.
This pass is run on every function and the optimized signature is return'ed through the
getArgDescList and getResultDescList.
Next step is to split to cloning and callsite rewriting into their own function passes.
rdar://24730896
"
analysis pass.
This pass is run on every function and the optimized signature is return'ed through the
getArgDescList and getResultDescList.
Next step is to split to cloning and callsite rewriting into their own function passes.
rdar://24730896
This change includes an option on how IsLive is defined/computed. the ProjectionTree
can now choose to ignore epilogue releases and mark a node as dead if its only non-debug
user is epilogue release.
It can also mark a node as alive even its only user is epilogue release as before.
Imagine a case where one passes in an array and not access its owner
besides to release it. In such a case, we *do* want to be able to eliminate
that argument even though there is a release in the function epilogue.
This will help to get rid of the retain and release pair at the callsite. i.e.
the guaranteed paramter is elimininated.
rdar://21114206
LSValue::reduce reduces a set of LSValues (mapped to a set of LSLocations) to
a single LSValue.
It can then be used as the forwarding value for the location.
Previously, we expand into intermediate nodes and leaf nodes and then go bottom
up, trying to create a single LSValue out of the given LSValues.
Instead, we now use a recursion to go top down. This simplifies the code. And this
is fine as we do not expect to run into type tree that are too deep.
Existing test cases ensure correctness.
When we have all the epilogue releases. Make sure they cover all the non-trivial
parts of the base. Otherwise, treat as if we've found no releases for the base.
Currently. this is a NFC other than epilogue dumper. I will wire it up with
function signature with next commit.
This is part of rdar://22380547
So instead of only being able to match %1 and release %1 in (1). we
can also match %1 with (release %2, and release%3, i.e. exploded release_value)
in (2).
(1)
foo(%1)
strong_release %1
(2)
foo(%1)
%2 = struct_extract %1, field_a
%3 = struct_extract %1, field_b
strong_release %2
strong_release %3
This will allow function signature to better move the release instructions to
the callers.
Currently, this is a NFC other than testing using the epilogue match dumper.
Previously, we exploded argument to the most-derived fields, i.e. the field that
can no longer be exploded further. And in the spliced (newly created) function,
we form aggregates if necessary.
Changing this to explode only to the deepest level accessed, this enables us to
create the projection tree nodes for fields of which its level is accessed, instead of
all fields on all levels.
Note: this also changes the definition of a leaf node. Leaf node now means the node
which does not have children based on current explosion (it however could have children
if exploded further).
I am refining the old projection tree first before (mostly copying) it to create the
new projection tree.
function signature opt.
Instead of replacing %1 with UNDEF in debugvalueinst %1, we form an aggregate,
taking the alive part of %1 and fill the dead part with undef.
rdar://23727705
to disamuguite index_address with same base but different indices.
But the indices here have to be constant. This is a limitation/design choice
made in the projection code.
In order to handle non-constant indices, we need an analysis to compute the index
difference.
rdar://22484392
This patch also implements some of the missing functions used by RLE and DSE in new projection
that exist in the old projection.
New projection provides better memory usage, eventually we will phase out the old projection code.
New projection is now copyable, i.e. we have a proper constructor for it. This helps make the code
more readable.
We do see a bit increase in compilation time in compiling stdlib -O, this is a result of the way
we now get types of a projection path, but I expect this to go down (away) with further improvement
on how memory locations are constructed and cached with later patches.
=== With the OLD Projection. ===
Total amount of memory allocated.
--------------------------------
Bytes Used Count Symbol Name
13032.01 MB 50.6% 2158819 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2879.70 MB 11.1% 3076018 (anonymous namespace)::ARCSequenceOpts::run()
2663.68 MB 10.3% 1375465 (anonymous namespace)::RedundantLoadElimination::run()
1534.35 MB 5.9% 5067928 (anonymous namespace)::SimplifyCFGPass::run()
1278.09 MB 4.9% 576714 (anonymous namespace)::SILCombine::run()
1052.68 MB 4.0% 935809 (anonymous namespace)::DeadStoreElimination::run()
771.75 MB 2.9% 1677391 (anonymous namespace)::SILCSE::run()
715.07 MB 2.7% 4198193 (anonymous namespace)::GenericSpecializer::run()
434.87 MB 1.6% 652701 (anonymous namespace)::SILSROA::run()
402.99 MB 1.5% 658563 (anonymous namespace)::SILCodeMotion::run()
341.13 MB 1.3% 962459 (anonymous namespace)::DCE::run()
279.48 MB 1.0% 415031 (anonymous namespace)::StackPromotion::run()
Compilation time breakdown.
--------------------------
Running Time Self (ms) Symbol Name
25716.0ms 35.8% 0.0 swift::runSILOptimizationPasses(swift::SILModule&)
25513.0ms 35.5% 0.0 swift::SILPassManager::runOneIteration()
20666.0ms 28.8% 24.0 swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19664.0ms 27.4% 77.0 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3272.0ms 4.5% 12.0 (anonymous namespace)::SimplifyCFGPass::run()
3266.0ms 4.5% 7.0 (anonymous namespace)::ARCSequenceOpts::run()
2608.0ms 3.6% 5.0 (anonymous namespace)::SILCombine::run()
2089.0ms 2.9% 104.0 (anonymous namespace)::SILCSE::run()
1929.0ms 2.7% 47.0 (anonymous namespace)::RedundantLoadElimination::run()
1280.0ms 1.7% 14.0 (anonymous namespace)::GenericSpecializer::run()
1010.0ms 1.4% 45.0 (anonymous namespace)::DeadStoreElimination::run()
966.0ms 1.3% 191.0 (anonymous namespace)::DCE::run()
496.0ms 0.6% 6.0 (anonymous namespace)::SILCodeMotion::run()
=== With the NEW Projection. ===
Total amount of memory allocated.
--------------------------------
Bytes Used Count Symbol Name
11876.64 MB 48.4% 22112349 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2887.22 MB 11.8% 3079485 (anonymous namespace)::ARCSequenceOpts::run()
1820.89 MB 7.4% 1877674 (anonymous namespace)::RedundantLoadElimination::run()
1533.16 MB 6.2% 5073310 (anonymous namespace)::SimplifyCFGPass::run()
1282.86 MB 5.2% 577024 (anonymous namespace)::SILCombine::run()
772.21 MB 3.1% 1679154 (anonymous namespace)::SILCSE::run()
721.69 MB 2.9% 936958 (anonymous namespace)::DeadStoreElimination::run()
715.08 MB 2.9% 4196263 (anonymous namespace)::GenericSpecializer::run()
Compilation time breakdown.
--------------------------
Running Time Self (ms) Symbol Name
25137.0ms 37.3% 0.0 swift::runSILOptimizationPasses(swift::SILModule&)
24939.0ms 37.0% 0.0 swift::SILPassManager::runOneIteration()
20226.0ms 30.0% 29.0 swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19241.0ms 28.5% 83.0 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3214.0ms 4.7% 10.0 (anonymous namespace)::SimplifyCFGPass::run()
3005.0ms 4.4% 14.0 (anonymous namespace)::ARCSequenceOpts::run()
2438.0ms 3.6% 7.0 (anonymous namespace)::SILCombine::run()
2217.0ms 3.2% 54.0 (anonymous namespace)::RedundantLoadElimination::run()
2212.0ms 3.2% 131.0 (anonymous namespace)::SILCSE::run()
1195.0ms 1.7% 11.0 (anonymous namespace)::GenericSpecializer::run()
1168.0ms 1.7% 39.0 (anonymous namespace)::DeadStoreElimination::run()
853.0ms 1.2% 150.0 (anonymous namespace)::DCE::run()
499.0ms 0.7% 7.0 (anonymous namespace)::SILCodeMotion::run()