This updates the performance inliner to iterate on inlining in cases
where devirtualization or specialization after the first pass of
inlining expose new opportunities for inlining. Similarly, in some cases
inlining exposes new opportunities for devirtualization, e.g. when we
inline an initializer and can now see an alloc_ref that allows us to
devirtualize some class_methods.
The implementation currently has some inefficiencies which increase the
swift compilation time for the stdlib by around 3% (this is swift-time
only, no LLVM time, so overall time does not grow by this much).
Unfortunately the (unchanged) current implementation of the core
inlining trades off improved estimates of code growth for increased
compile time, and that plays a part in why compile time increases as
much as it does. Despite this, I have some ideas on how to win some of
that time back in future patches.
Performance differences are mixed, and this will likely require some
further inliner tuning to reduce or remove some of the losses seen here
at -O. I will open radars for the losses.
Wins:
DeltaBlue 10.2%
EditDistance 13.8%
SwiftStructuresInsertionSort 32.6%
SwiftStructuresStack 34.9%
Losses:
PopFrontArrayGeneric -12.7%
PrimeNum -19.0%
RC4 -30.7%
Sim2DArray -14.6%
There were a handful of wins and losses at Onone and Ounchecked as
well. I'll review the perf testing output and open radars accordingly.
The new test case shows an example of the power of the closer
integration here. We are able to completely devirtualize and inline a
series of class_method applies (10 deep in this case, but in theory
substantially deeper) in a single pass of the inliner, whereas before we
could only do a single level per pass of inlining & devirtualization.
Swift SVN r27561
This commit adds a flag to disable optimizations on a specific functions. The
primary motivation of this patch is to allow the optimizer developers to reduce
testcasese by disabling optimizations of parts of the code without having to
recompile the compiler or inspect SIL. The annotations "inline(never)"
and "optimize.none" can go a long way.
The second motivation for this patch is to allow our internal adopters to work
around compiler bugs.
rar://19745484
Usage:
@semantics("optimize.never")
public func miscompile() { ... }
Swift SVN r27475
During inlining we'll now attempt to first devirtualize and specialize
within the function that we're going to inline into. If we're successful
devirtualizing and inlining, and we'll attempt to inline into the newly
exposed callees first, before inlining into the function we began with.
This does not remove any existing passes of devirtualization or
specialization yet, partially because we don't completely handle all
cases that they handle at this point (e.g. specializing partial
applies).
We do end up specializing deeper into the call graph with this approach
than we did prior to this commit.
I will have some follow-on changes that integrate things further,
allowing us to devirtualize in more cases after inlining into a given
function.
I will also add some directed tests in a future commit.
I tested the stdlib build and this made no difference in build
times. Perhaps after removing other existing phases we'll recapture some
build time.
I'm not seeing reproducible performance differences with this change,
which is not a big surprise at this point. This sets us up for being
able to improve the compilation pipeline in a future release.
Swift SVN r27327
Previous attempts to update the callgraph explicitly after calls to
linkFunction() weren't completely effective because we can deserialize
deeply and introduce multiple new function bodies in the process.
This gets us a bit closer, but only adds new call graph nodes. It does
not currently add edges for everything that gets deserialized (and this
is not fatal, so it is a step forward).
Swift SVN r27120
We claim to maintain the call graph in these passes, so we really need
to add nodes for new functions we pull in.
Also, link in functions when building the call graph, and only allow
functions with bodies to be added to the call graph.
This makes the call graph more consistent.
At some point we need to revisit our linking story because we've got
code spread out over several phases now where it might make sense to do
a single up-front linking pass that potentially pulls in
never-referenced functions (e.g. pull in all foo() that could be reached
in a given class hierarchy up front, even if in reality only C.foo() is
ever called).
Swift SVN r27096
We already invalidate all the analyses for each function we inline into,
so this shouldn't be necessary.
We should also be able to remove the invalidation of dominators for the
same reason, but I am getting one test failure when I do that so it
needs further investigation.
Swift SVN r26939
Before this commit, passes that were attempting to maintain the call
graph would actually build it if it wasn't already valid, just for the
sake of maintaining it.
Now we only maintain it if we already had a valid call graph built.
Swift SVN r26873
Attempt to devirtualize any apply that we come across in the performance
inliner prior to attempting to inline.
The is the first step of getting the inliner/specializer/devirtualizer
working together so that we can converge on high quality code with less
work.
Although this is not meant to directly improve performance, but rather
be a step towards converging to high quality code with fewer passes,
because it alters what gets inlined when, it did have a (mostly)
positive effect on performance.
These are some of the larger deltas I see, where the percentage is
percentage speed-up, and negative percentages indicate a slow-down.
-O:
---
BenchLangCallingCFunction 16.7%
CaptureProp 17.1%
Sim2DArray 22.0%
-Ounchecked
-----------
BenchLangCallingCFunction -11.2%
QuickSort 39.4%
SwiftStructuresBubbleSort -26.7%
Swift SVN r26728
This commit splits DominanceAnalysis into two analysis (Dom and PDom) that
can be cached and invalidates using the common FunctionAnalysisBase interface
independent of one another.
Swift SVN r26643
The old invalidation lattice was incorrect because changes to control flow could cause changes to the
call graph, so we've decided to change the way passes invalidate analysis. In the new scheme, the lattice
is replaced with a list of traits that passes preserve or invalidate. The current traits are Calls and Branches.
Now, passes report which traits they preserve, which is the opposite of the previous implementation where
passes needed to report what they invalidate.
Node: I tried to limit the changes in this commit to mechanical changes to ease the review. I will cleanup some
of the code in a following commit.
Swift SVN r26449
We used to do this because the mandatory inliner couldn't deal with
generics, and we were marking some things in the stdlib as @transparent
for performance reasons.
In comparing performance before/after this change, I saw noise at -Onone
and -O, and a couple differences at -Ounchecked that could be real (but
are on benchmarks that tend to be very noisy so it's hard to tell for
certain).
This change is important because I am going to commit another change
that marks protocol witness thunks as @transparent in the lead-up to
making the mandatory inliner devirtualize. I don't want that change to
generate a bunch of performance diffs and/or size diffs, which might
happen if we were to force inline *all* of those protocol witness
thunks (as opposed to the ones that will eventually be inlined by the
mandatory inliner because we're able to devirtualize the calls).
Swift SVN r26386
Make the clients remove the apply, which paves the way for the clients
to potentially update the call graph when inlining is successful.
Swift SVN r26075
Gives following code size improvements (positive % means size reduction):
PerfTests_O 7.9%
PerfTests_Ounchecked 1.0%
PerfTests_Onone 0.4%
libswiftCore.dylib -0.1%
Performance is approximately the same. There are only few changes above 10%, and this seems to be noise.
Swift SVN r25485
rdar://problem/19701613
Code size reductions (negative means less code size):
bin/PerfTests_O: -3.7%
bin/PerfTests_Ounchecked: -1.9%
bin/PerfTests_Onone: +0.2%
stdlib/core/macosx/Swift.o: -2.2%
The -2.2% in Swift.o constitutes of about +5% in specializations and -11% in protocoll witnesses in the dylib.
(-> still room for improvement regarding specializations)
Note that completely disabling inlining into thunks (even small functions) would increase the code size.
There is litte change in performance, a few + and - within 10%.
Beyond this there is (+ means faster):
Phonebook@O: +26%
ImageProc@Ounchecked: +14%
StringWalk@Ounchecked: -16%
Swift SVN r25001
Main changes:
*) Instruction costs are not counted for blocks which are dead after inlining
*) Terminator instructions which get constant after inlining increase the threshold
*) Calls inside loops increase the threshold
In theory this should be a step towards making the performance not so dependent on the inlining heuristic.
But I must admit that I still did some fine tuning of all the parameters to get the best results.
Improvements in the benchmarks:
-O:
Chars: +11%
CommonMarkRender: +11%
DollarReduce: +22%
ForLoops: +22%
Forest: +10%
HeapSort: +36%
ImageProc: +14%
StrCat: +14%
StrComplexWalk: +70%
StrToInt: +11%
StringWalk: +99%
-Ounchecked:
Ary: +40%
Ary2: +30%
EditDistance: +22%
Forest: +18%
HeapSort: +50%
Histogram: +11%
StrCat: +12%
StrComplexWalk: +63%
StrSplitter: +11%
StrToInt: +17%
StringWalk: +75%
Regressions (I will file radars for them):
-Ounchecked:
PolymorphicCalls: -21%
QuickSort: -22%
Rectangles: -12%
Code size of the PerfTests_O decreased by 8%
Code size of the PerfTests_Ounchecked increased by 1%
Swift SVN r24801
1. Eliminate unused variable warnings.
2. Change field names to match capitalization of the rest of the field names in the file.
3. Change method names to match rest of the file.
4. Change get,set method for a field to match the field type.
Swift SVN r24501
This lets the inliner better check if a closure is passed to the callee.
It fixes a problem that copy forwarding generates a pattern which could not be analyzed by the ConstantTracker:
<rdar://problem/19426897> [Inliner] Fail to fully optimize RangeAssignment (especially with copy forwarding)
RangeAssignment is now ~6x faster with -O.
Some other improvements: PrimeNum@O: +50%, ImageProc@Ounchecked: +15%, QuickSort@Ounchecked: +20%
There is one degradation in -O and -Ounchecked, which I still have to check: CommonMarkRender: -15%
Swift SVN r24414
When setting the new option -sil-inline-test-threshold=<n> the inliner uses a simplified model
for instruction costs. This helps to test the inline heuristic.
Swift SVN r24178
This change adds a general method to see if inlining would enable constant propagation or
inlining of a closure.
For this it does not matter if a constant/function_ref/etc. is passed directly or within
a struct.
This first version only handles closures which are passed to an apply in the callee.
It fixes the performance problem of RangeAssignment (rdar://problem/19252374) and shows
minor improvements in some other benchmarks, e.g. CommonMarkRender.
The impact on code size is negligible (< 1%).
Swift SVN r24109
Now the SILLinkage for functions and global variables is according to the swift visibility (private, internal or public).
In addition, the fact whether a function or global variable is considered as fragile, is kept in a separate flag at SIL level.
Previously the linkage was used for this (e.g. no inlining of less visible functions to more visible functions). But it had no effect,
because everything was public anyway.
For now this isFragile-flag is set for public transparent functions and for everything if a module is compiled with -sil-serialize-all,
i.e. for the stdlib.
For details see <rdar://problem/18201785> Set SILLinkage correctly and better handling of fragile functions.
The benefits of this change are:
*) Enable to eliminate unused private and internal functions
*) It should be possible now to use private in the stdlib
*) The symbol linkage is as one would expect (previously almost all symbols were public).
More details:
Specializations from fragile functions (e.g. from the stdlib) now get linkonce_odr,default
linkage instead of linkonce_odr,hidden, i.e. they have public visibility.
The reason is: if such a function is called from another fragile function (in the same module),
then it has to be visible from a third module, in case the fragile caller is inlined but not
the specialized function.
I had to update lots of test files, because many CHECK-LABEL lines include the linkage, which has changed.
The -sil-serialize-all option is now handled at SILGen and not at the Serializer.
This means that test files in sil format which are compiled with -sil-serialize-all
must have the [fragile] attribute set for all functions and globals.
The -disable-access-control option doesn't help anymore if the accessed module is not compiled
with -sil-serialize-all, because the linker will complain about unresolved symbols.
A final note: I tried to consider all the implications of this change, but it's not a low-risk change.
If you have any comments, please let me know.
Swift SVN r22215
This will let the performance inliner inline a function even if the costs are too high.
This attribute is only a hint to the inliner.
If the inliner has other good reasons not to inline a function,
it will ignore this attribute. For example if it is a recursive function (which is
currently not supported by the inliner).
Note that setting the inline threshold to 0 does disable performance inlining at all and in
this case also the @inline(__always) has no effect.
Swift SVN r21452