I discovered while updating PMO for ownership that for ~5 years there has been a
bug where we were treating copy_addr of trivial values like an "Assign" (in PMO
terminology) of a non-trivial value and thus stopping allocation
elimination. When I fixed this I discovered that this caused us to no longer
emit diagnostics in a predictable way. Specifically, consider the following
swift snippet:
var _: UInt = (-1) >> 0
Today, we emit a diagnostic that -1 can not be put into a UInt. This occurs
since even though the underlying allocation is only stored into, the copy_addr
assign keeps it alive, causing the diagnostics pass to see the conversion. With
my fix though, we see that we are only storing into the allocation, causing the
allocation to be eliminated before the constant propagation diagnostic pass
runs, causing the diagnostic to no longer be emitted.
We should truly not be performing this type of DCE before we emit such
diagnostics. So in this commit, I split the pass into two parts:
1. A load promotion pass that performs the SSA formation needed for SSA based
diagnostics to actually work.
2. An allocation elimination passes that run /after/ SSA based diagnostics.
This should be NFC since the constant propagation SSA based diagnostics do not
create memory operations so the output should be the same.
NOTE: This is not in the mandatory passes (which run before this). This will
enable me to strip out ownership after we serialize without touching frontend
code. It also makes Onone and O use the same code paths for serialization
instead of one happening in the driver (Onone today) and the other in a SIL pass
(-O, -Osize).
The reason that I updated the sil-func-extractor test is that I found a bug in
how we emit sib files, namely if you try to emit a sib file to stdout, the
llvm-bcanalyzer flags it as malformed. If I output the .sib into a file rather
than trying to use stdout, everything works.
The idea is that eventually down the line we will split this pass into a
mandatory/performance parts once ownership SSA is eliminated after the
guaranteed passes run. For now though, just run this when optimization is
enabled.
This pass only runs on the stdlib/overlays so far so there shouldn't have been
any slow downs in non-stdlib -Onone builds.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
To be clear this only runs on the stdlibs/overlays since it is gated behind a
flag that is set in our cmake.
I can not move it past closure lifetime fixup since the transformation as
written does not express all of the necessary ownership constraints explicitly
in the IR by using addresses.
General case:
begin_access A
...
strong_release / release_value / destroy
end_access
The release instruction can be sunk below the end_access instruction,
This extends the lifetime of the released value, but, might allow us to
Mark the access scope as no nested conflict.
General case:
—
begin_access A (may or may not have no_nested_conflict)
load/store
end_access
apply // may have a scoped access that conflicts with A
begin_access A [no_nested_conflict]
load/store
end_access A
—
The second access scope does not need to be emitted.
NOTE: KeyPath access must be identified at the top-level, non-inlinable stdlib entry point.
As such, The sodlib entry pointed is annotated by a new @_semantics that is equivalent to inline(never)
Introduce an "early redundant load elimination", which does not optimize loads from arrays.
Later array optimizations, like ABCOpt, get confused if an array load in a loop is converted to a pattern with a phi argument.
This problem was introduced with accessors.
rdar://problem/44184763
Major refactoring + tuning of LICM. Includes:
Support for hosting more array semantic calls
Remove restrictions for sinking instructions
Add support for hoisting and sinking instruction pairs (begin and end accesses)
Testing with Exclusivity enabled on a couple of benchmarks shows:
ReversedArray 7x improvement
StringWalk 2.6x improvement
I am doing this so I can start writing DI tests without this lowering occuring.
There never was a real reason for this code to be in DI beyond convenience. Now
it just makes writing tests more difficult. To prevent any test delta, I changed
all current DI tests to run this pass after DI.
There are ~100 significant benchmark regressions (of ~350) with -O
-enforce-exclusivity=checked.
This optimization roughly cuts the overhead in half for almost all of those
regressions. These are the top 30 improvements with the optimization enabled.
XorShift....................................................2.83x
ReversedArray...............................................2.76x
RangeIterationSigned........................................2.67x
ExclusivityGlobal...........................................2.57x
Random......................................................2.44x
ReversedDictionary..........................................2.41x
GeekbenchGEMM...............................................2.35x
ArrayInClass................................................2.31x
StringWalk..................................................2.29x
Ary.........................................................2.25x
Ary3........................................................2.25x
Ary2........................................................2.21x
MultiFileTogether...........................................2.17x
MultiFileSeparate...........................................2.17x
RecursiveOwnedParameter.....................................2.14x
LevenshteinDistance.........................................2.04x
HashTest....................................................1.97x
Voronoi.....................................................1.94x
NopDeinit...................................................1.92x
Life........................................................1.89x
Richards....................................................1.84x
Rectangles..................................................1.74x
MatMul......................................................1.71x
LinkedList..................................................1.51x
GeekbenchFFT................................................1.47x
Xcbuild_OutputByteStreamPerfTests...........................1.39x
ObjectAllocation............................................1.33x
MapReduceLazyCollection.....................................1.30x
Prims.......................................................1.28x
CharIndexing_tweet_unicodeScalars_Backwards.................1.28x
I am going to be adding logic here to enable apple/swift#1550 to be completed.
The rename makes sense due to precedent from LLVM's codegen prepare and also
since I am going to be expanding what the pass is doing beyond just "cleaning
up". It is really a grab bag pass for performing simple transformations that we
do not want to pollute IRGen's logic with.
https://github.com/apple/swift/pull/15502
rdar://39335800
As a first step to getting mandatory inlining out of the business
of 'linking' (walking the function graph and deserializing all
referenced functions), add a new optimizer pass which links
everything in the mandatory pipeline.
For now this is mostly NFC, except it regresses an optimization
I made recently by linking in bodies of methods of deserialized
vtables eagerly. This will be addressed in upcoming patches.
As a first step to getting mandatory inlining out of the business
of 'linking' (walking the function graph and deserializing all
referenced functions), add a new optimizer pass which links
everything in the mandatory pipeline.
For now this is mostly NFC, except it regresses an optimization
I made recently by linking in bodies of methods of deserialized
vtables eagerly. This will be addressed in upcoming patches.
closure lifetimes.
SILGen will now unconditionally emit
%cvt = convert_escape_to_noescape [guaranteed] %op
instructions. The mandatory ClosureLifetimeFixup pass ensures that %op's
lifetime spans %cvt's uses.
The code in DefiniteInitialization that handled a subset of cases is
removed.
Add a new warning that detects when a function will call itself
recursively on all code paths. Attempts to invoke functions like this
may cause unbounded stack growth at least or undefined behavior in the
worst cases.
The detection code is implemented as DFS for a reachable exit path in
a given SILFunction.
Also: add an additional DeadObjectElimination pass in the low level pipeline because
redundant load elimination (which runs before) can turn an object into a dead object.
We run GlobalOpt multiple times in the pass pipeline but in some cases object outlining shouldn't be done too early.
Having it done in a separate pass enables to run it independently from GlobalOpt.
Also: add an additional DeadObjectElimination pass in the low level pipeline because
redundant load elimination (which runs before) can turn an object into a dead object.
We run GlobalOpt multiple times in the pass pipeline but in some cases object outlining shouldn't be done too early.
Having it done in a separate pass enables to run it independently from GlobalOpt.
Recent changes that eliminated the -sil-serialize-all mode and adding this check to IRGen allow us to get rid of ExternalFunctionDefinitionsElimination and ExternalDefsToDecls passes, which are not needed anymore.
Originally, I was going to update DI at the same time for ownership and to fix
the multiple-projectbox di bug. After a bit messing with it, it is a difficult
change to land due to the size. Instead, I am going to do the ownership update
first and then loop back around and then fix the DI bug.
rdar://31521023
This is very beneficial because many class_method/witness_method instructions may get concrete types after specialization. Doing it this way improves compilation times.
This is a separate optimization that detects short-lived temporaries that can be eliminated.
This is necessary now that SILGen no longer performs basic RValue forwarding in some cases.
SR-5508: Performance regression in benchmarks caused by removing SILGen peephole for LoadExpr in +0 context
This patch is supposed to recover the performance regressions that would be introduced by yet to be merged PR #9145, which introduces custom, more efficient array iterators.
The crucial part of this patch is running loop unrolling also during the mid-level optimizations phase, because it may catch more loops with constant trip counts. To make trip counts constant, an additional run of constant propagation is helpful.
A pass has an ID (C++ identifier), Tag (shell identifier),
and Name (human identifier).
Command line options that identify passes should obviously be compatibile with
with the pass' command line identifier. This is also what the user is used to
typing for the -debug-only option.