There are ~100 significant benchmark regressions (of ~350) with -O
-enforce-exclusivity=checked.
This optimization roughly cuts the overhead in half for almost all of those
regressions. These are the top 30 improvements with the optimization enabled.
XorShift....................................................2.83x
ReversedArray...............................................2.76x
RangeIterationSigned........................................2.67x
ExclusivityGlobal...........................................2.57x
Random......................................................2.44x
ReversedDictionary..........................................2.41x
GeekbenchGEMM...............................................2.35x
ArrayInClass................................................2.31x
StringWalk..................................................2.29x
Ary.........................................................2.25x
Ary3........................................................2.25x
Ary2........................................................2.21x
MultiFileTogether...........................................2.17x
MultiFileSeparate...........................................2.17x
RecursiveOwnedParameter.....................................2.14x
LevenshteinDistance.........................................2.04x
HashTest....................................................1.97x
Voronoi.....................................................1.94x
NopDeinit...................................................1.92x
Life........................................................1.89x
Richards....................................................1.84x
Rectangles..................................................1.74x
MatMul......................................................1.71x
LinkedList..................................................1.51x
GeekbenchFFT................................................1.47x
Xcbuild_OutputByteStreamPerfTests...........................1.39x
ObjectAllocation............................................1.33x
MapReduceLazyCollection.....................................1.30x
Prims.......................................................1.28x
CharIndexing_tweet_unicodeScalars_Backwards.................1.28x
I am going to be adding logic here to enable apple/swift#1550 to be completed.
The rename makes sense due to precedent from LLVM's codegen prepare and also
since I am going to be expanding what the pass is doing beyond just "cleaning
up". It is really a grab bag pass for performing simple transformations that we
do not want to pollute IRGen's logic with.
https://github.com/apple/swift/pull/15502
rdar://39335800
As a first step to getting mandatory inlining out of the business
of 'linking' (walking the function graph and deserializing all
referenced functions), add a new optimizer pass which links
everything in the mandatory pipeline.
For now this is mostly NFC, except it regresses an optimization
I made recently by linking in bodies of methods of deserialized
vtables eagerly. This will be addressed in upcoming patches.
As a first step to getting mandatory inlining out of the business
of 'linking' (walking the function graph and deserializing all
referenced functions), add a new optimizer pass which links
everything in the mandatory pipeline.
For now this is mostly NFC, except it regresses an optimization
I made recently by linking in bodies of methods of deserialized
vtables eagerly. This will be addressed in upcoming patches.
closure lifetimes.
SILGen will now unconditionally emit
%cvt = convert_escape_to_noescape [guaranteed] %op
instructions. The mandatory ClosureLifetimeFixup pass ensures that %op's
lifetime spans %cvt's uses.
The code in DefiniteInitialization that handled a subset of cases is
removed.
Add a new warning that detects when a function will call itself
recursively on all code paths. Attempts to invoke functions like this
may cause unbounded stack growth at least or undefined behavior in the
worst cases.
The detection code is implemented as DFS for a reachable exit path in
a given SILFunction.
Also: add an additional DeadObjectElimination pass in the low level pipeline because
redundant load elimination (which runs before) can turn an object into a dead object.
We run GlobalOpt multiple times in the pass pipeline but in some cases object outlining shouldn't be done too early.
Having it done in a separate pass enables to run it independently from GlobalOpt.
Also: add an additional DeadObjectElimination pass in the low level pipeline because
redundant load elimination (which runs before) can turn an object into a dead object.
We run GlobalOpt multiple times in the pass pipeline but in some cases object outlining shouldn't be done too early.
Having it done in a separate pass enables to run it independently from GlobalOpt.
Recent changes that eliminated the -sil-serialize-all mode and adding this check to IRGen allow us to get rid of ExternalFunctionDefinitionsElimination and ExternalDefsToDecls passes, which are not needed anymore.
Originally, I was going to update DI at the same time for ownership and to fix
the multiple-projectbox di bug. After a bit messing with it, it is a difficult
change to land due to the size. Instead, I am going to do the ownership update
first and then loop back around and then fix the DI bug.
rdar://31521023
This is very beneficial because many class_method/witness_method instructions may get concrete types after specialization. Doing it this way improves compilation times.
This is a separate optimization that detects short-lived temporaries that can be eliminated.
This is necessary now that SILGen no longer performs basic RValue forwarding in some cases.
SR-5508: Performance regression in benchmarks caused by removing SILGen peephole for LoadExpr in +0 context
This patch is supposed to recover the performance regressions that would be introduced by yet to be merged PR #9145, which introduces custom, more efficient array iterators.
The crucial part of this patch is running loop unrolling also during the mid-level optimizations phase, because it may catch more loops with constant trip counts. To make trip counts constant, an additional run of constant propagation is helpful.
A pass has an ID (C++ identifier), Tag (shell identifier),
and Name (human identifier).
Command line options that identify passes should obviously be compatibile with
with the pass' command line identifier. This is also what the user is used to
typing for the -debug-only option.
In raw SIL, access markers are unconditionally retained. In canonical SIL,
markers are still removed prior to optimization.
A new flag, -sil-optimized-access-markers, allows testing access markers in
optimized builds, but it is not yet fully supported.
This consists of just removing support from OME for ensuring that all uses of a
box go through the project_box that has as its user a mark_uninitialized. Since
the lowering of mark_uninitialized onto the relevant project_box is done by
MarkUninitializedFixup pass, I can just rip out the hacks from the OME!
rdar://31521023
I put in a simple fixup pass (MarkUninitializedFixup) for staging purposes. I
don't expect it to be in tree long. I just did not feel comfortable fixing up in
1 commit all of the passes up to DI.
rdar://31521023
At some point, pass definitions were heavily macro-ized. Pass
descriptive names were added in two places. This is not only redundant
but a source of confusion. You could waste a lot of time grepping for
the wrong string. I removed all the getName() overrides which, at
around 90 passes, was a fairly significant amount of code bloat.
Any pass that we want to be able to invoke by name from a tool
(sil-opt) or pipeline plan *should* have unique type name, enum value,
commend-line string, and name string. I removed a comment about the
various inliner passes that contradicted that.
Side note: We should be consistent with the policy that a pass is
identified by its type. We have a couple passes, LICM and CSE, which
currently violate that convention.
I am going to run it very early and use it to ensure that extra copies due to my
refactoring of SILGenPattern do not cause COW copies to be introduced.
For now, it does a very simple optimization, namely, it eliminates a copy_value,
with only a destroy_value user on a guaranteed parameter.
It is now disabled behind a flag.
if the argument is an array literal.
For example:
arr += [1, 2, 3]
is replaced by:
arr.append(1)
arr.append(2)
arr.append(3)
This gives considerable speedups up to 10x (for our micro-benchmarks which test this).
This is based on the work of @ben-ng, who implemented the first version of this optimization (thanks!).
This is important in case the inline restarts the pass pipeline.
In a sub-sequent invocation of the inlined it should receive a cleaned-up function so that it can make better estimations for further inlining.
As a compensation, reduce the caller-block limit of the inliner.
And add an overall block limit which is also taken into account for always-inline functions.