The new rule is that an argument will be exploded if one of the
following sets of conditions hold:
(1) (a) Specializing the function will result in a thunk. That is, the
thunk that is generated cannot be inlined everywhere.
(b) The argument has dead non-trivial leaves.
(c) The argument has fewer than three live leaves.
(2) (a) Specializing the function will not result in a thunk. That is,
the thunk that is generated will be inlined everywhere and
eliminated as dead code.
(b) The argument has dead potentially trivial leaves.
(c) The argument has fewer than six live leaves.
This change is based heavily on @gottesm's
https://github.com/apple/swift/pull/16756 .
rdar://problem/39957093
Signature optimization is slightly different to (most) other thunks, in that
it's taking an existing function and turning that into a thunk, rather than
creating a thunk that calls an existing function. These symbols can be public,
etc. and so need to be handled a bit different to other types of thunks.
A function is pure if it has no side-effects.
If there is a call of a pure function with constant arguments, it always makes sense to inline it, because we know that the whole computation will be constant folded.
Specializations are implementation details, and thus shouldn't be
public, even if they are specializing a public function. Without this
downgrade, the ABI of a module depends on random internal code
(could change inlining decisions etc.), as well as swiftc's optimiser.
In particular, support the following optimizations:
- owned-to-guaranteed
- dead argument elimination
Argument explosion is disabled for generics at the moment as it usually leads to a slower code.
Also, add a third [serializable] state for functions whose bodies we
*can* serialize, but only do so if they're referenced from another
serialized function.
This will be used for bodies synthesized for imported definitions,
such as init(rawValue:), etc, and various thunks, but for now this
change is NFC.
This could happen in case the argument type is an enum and if one of the enum payloads has multiple non-trivial fields and only one of the values is released before the return.
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Small modifications to the test cases.
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Minimal modifications to the test cases.
If we can not find the epilogue releases for all the fields with
reference sematics, but we found for some fields. Explode the argument.
I do not see a performance improvement with this change
rdar://25451364
Change the optimizer to only make specializations [fragile] if both the
original callee is [fragile] *and* the caller is [fragile].
Otherwise, the specialized callee might be [fragile] even if it is never
called from a [fragile] function, which inhibits the optimizer from
devirtualizing calls inside the specialization.
This opens up some missed optimization opportunities in the performance
inliner and devirtualization, which currently reject fragile->non-fragile
references:
TEST | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
--- | --- | --- | --- | ---
DictionaryRemoveOfObjects | 38391 | 35859 | -6.6% | **1.07x**
Hanoi | 5853 | 5288 | -9.7% | **1.11x**
Phonebook | 18287 | 14988 | -18.0% | **1.22x**
SetExclusiveOr_OfObjects | 20001 | 15906 | -20.5% | **1.26x**
SetUnion_OfObjects | 16490 | 12370 | -25.0% | **1.33x**
Right now, passes other than performance inlining and devirtualization
of class methods are not checking invariants on [fragile] functions
at all, which was incorrect; as part of the work on building the
standard library with -enable-resilience, I added these checks, which
regressed performance with resilience disabled. This patch makes up for
these regressions.
Furthermore, once SIL type lowering is aware of resilience, this will
allow the stack promotion pass to make further optimizations after
specializing [fragile] callees.
It now detects more opportunities for inlining, like some patters with RC instructions or loads/stores from/to stack locations in the caller.
On the other hand a new shortest path analysis limits inlining to those cases where it really gives a benefit.
As the inlining decision now depends on many parameters, the test-threshhold option is removed because it doe not make much sense anymore.
Instead the inliner test files are modified to model the "real" instruction costs.
Eventually, we decided to do this
1. Have the function signature opts (used to be called the cloner to create
the optimized function.
2. Mark the thunk as always_inline
3. Rely on the inliner to inline the thunk to get the benefit of calling optimized
function directly.
This forces the callsites to be rewritten by the inliner.
we have the issue that the thunk changes from the time the its created to
the time its reread to figure out what we have done to the original function
This results in missed opportunities.
This solution solves the problem gracefully, because the thunk carries the information
on how to set up the call to the optimized functions.
Inlining the thunk makes the callsite calling the optimized function for free. i.e.
without any rewriting.
I did not measure any regression with this change.
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
This change includes an option on how IsLive is defined/computed. the ProjectionTree
can now choose to ignore epilogue releases and mark a node as dead if its only non-debug
user is epilogue release.
It can also mark a node as alive even its only user is epilogue release as before.
Imagine a case where one passes in an array and not access its owner
besides to release it. In such a case, we *do* want to be able to eliminate
that argument even though there is a release in the function epilogue.
This will help to get rid of the retain and release pair at the callsite. i.e.
the guaranteed paramter is elimininated.
rdar://21114206
This enables function signature handles a case of self-recursion.
With this change we convert 11 @owned return value to "not owned", while
we convert 179 @owned parameter to @guanrateed.
rdar://24022375
More specifically, this handles a case of self-recursion.
With this change we convert 11 @owned return value to "not owned", while
we convert 179 @owned parameter to @guanrateed.
rdar://24022375
If a value is returned as @owned, we can move the epilogue retain
to the caller and convert the return value to @unowned. This gives
ARC optimizer more freedom to optimize the retain out on the caller's
side.
It appears that epilgue retains are harder to find than epilogue
releases. Most of the time they are not in the return block.
(1) Sometimes, they are in predecessors
(2) Sometimes they come from a call which returns an @owned return value.
This should be improved if we fix (1) and go bottom up.
(3) We do not handle exploded retain_value.
Currently, this catches a small number of opportunities.
We probably need to improve epilogue retain matcher if we are to handle
more cases.
This is part of rdar://24022375.
We also need some refactoring in the pass. e.g. break functions into smaller
functions. I will do with subsequent commit.
optimization.
We get some improvements on the # of parameters converted to guanrateed
from owned on the stdlib.
before
======
103 sil-function-signature-opts - Total owned args -> guaranteed args
after
======
118 sil-function-signature-opts - Total owned args -> guaranteed args
I see the following improvements by running benchmarks with and without this
change. Only difference >=1.05X
ErrorHandling 8154 7497 -657 -8.1% 1.09x
LinkedList 9973 9529 -444 -4.5% 1.05x
ObjectAllocation 239 222 -17 -7.1% 1.08x
RC4 23167 21993 -1174 -5.1% 1.05x (!)
This is part of rdar://22380547