This is important since to be more aggressive we are ignoring incoming values
that are no-payload enums since as far as we are concerned they do not matter
since retains, releases on those values are no-ops.
Swift SVN r21932
The cache is needed to ensure we do not run into compile time problems once we
start looking through Phi Nodes.
The analysis is currently disabled and just returns
SILValue::stripRCIdentityPreservingOps. I am going to thread it through the rest
of the passes that use that call. Then I am going to hide
stripRCIdentityPreservingArgs. Finally post OzU, I am going to enable the pass.
rdar://18300069
Swift SVN r21891
PerfTests -----
Before
Totals,54,93821,93821,93821,0,0
Totals,54,86755,86755,86755,0,0
After
Totals,54,93610,93610,93610,0,0
Totals,54,85780,85780,85780,0,0
We may be able to tune BoostFactor for closure in PerformanceInliner.
Swift SVN r21312
Fixes
<rdar://problem/16755460> Specialize functions that are partially applied to constant values
This seems to have little effect on our current benchmark suite except
for the one I wrote specifically for this optimization. We get 4x
speedup on calling reduce to sum a range.
Swift SVN r21248
View the sil cfg of a function at the end of the compilation pipeline with
swiftc -O -Xllvm -sil-view-cfg -Xllvm -view-cfg-only-for-function=foobar
Swift SVN r21238
We can now hoist the check in:
for (i = start; i != end; ++i)
a[i] = ...
or
for i in start ..< end
a[i] = ...
We will also hoist invariant checks as in
k =
for i in start ..< end
a[k] = ...
We will also hoists the overflow check for "++i" out of the loop.
The only thing blocking vectorization of memset loops is the fact that we are
overflow checking the type size muliplication of array accesses. "a[i]" is
translated to "a + sizeof(T) * i" and this multiplication is still overflow
checked.
We can remove bounds checks in PrimeNum, XorLoop, Hash, MemSet, NBody, Walsh.
Memset , 2371.00 , 1180.00 , 1179.00 , 99.9%
XorLoop , 1403.00 , 1255.00 , 149.00 , 11.9%
rdar://14757945
Swift SVN r21182
simplifyInstruction conciously does not create constants (AFAIK) so we need to
run instruction combine after simplify-cfg to enable more cfg simplification
exposed by jumpthreading. We can revisit this decision in a follow-up commit if
necessary (I believe it to be useful for simplifyInstruction to be able to
create constants and for simplifycfg to use simplifyInstruction on the branch
condition).
O:
benchmark , baserun0 , optrun0 , delta, speedup
Fibonacci , 1473.00 , 1317.00 , 75.00 , 5.8%
Histogram , 407.00 , 390.00 , 23.00 , 6.1%
InsertionSort , 1273.00 , 1200.00 , 79.00 , 6.7%
Life , 74.00 , 69.00 , 4.00 , 5.8%
NestedLoop , 937.00 , 883.00 , 56.00 , 6.4%
R17315246 , 8.00 , 801.00 , 793.00 , -99.0%
SelectionSort , 1150.00 , 921.00 , 226.00 , 24.5%
Ounchecked:
Histogram , 394.00 , 342.00 , 51.00 , 15.0%
InsertionSort , 1122.00 , 1024.00 , 85.00 , 8.3%
Life , 57.00 , 44.00 , 9.00 , 20.9%
SelectionSort , 1312.00 , 1060.00 , 246.00 , 23.3%
The R17315246 regression is somewhat bad. We dependent on a loop form such that
LLVM transforms the loop into an inner loop that just iterates from x to y and
this is unrolled (good version) the slow version has a loop with cond_fail
control flow and is not unrolled. The loop does nothing more than count up.
I have not being able to narrow this down further.
rdar://16821595
Swift SVN r21124
Revert "For debugging purposes allow passes to stop any more passes from running by calling PassManager::stopRunning()."
This reverts commit r20604.
This reverts commit r20606.
This was some debugging code that snuck in.
Swift SVN r20615
Implements redundant bounds check elimination for basic blocks and along the
dominator tree of loops.
No induction variable based hoisting yet.
O3:
NBody , 473.00 , 122.00 , 294.2%
QuickSort , 477.00 , 310.00 , 53.9%
RC4 , 1022.00 , 736.00 , 38.6%
Walsh , 1781.00 , 1142.00 , 55.5%
No effect on Ofast.
Disabled for now.
Swift SVN r20199
We want this to get tested.
O3:
Ackermann 8.0%
GlobalClass 30.8%
R17315246 -50.2%
Phonebook 7.4%
Ofast:
RC4 -9.7%
The R17315246 regression is because LLVM seems to unable to 'unswitch' the loop
that makes up this benchmark after rotation. The only explaination I have atm is
that after rotation the first exit is a cond_fail.
I looked at RC4's profile and did not see anything suspicious. I was chasing a
10% regression yesterday in phonebook (today it seems i see about 7%
improvement). So I am not sure how 'stable' wrt to cache effects our benchmarks
are (we are calling into runtimes and whatnot).
Swift SVN r20098
In the current setup analysis information is not reused by new pass managers.
There is no point in having different pass managers. Instead, we can just remove
transformations, reset the internal state of the pass manager, and add new
transformation passes. Analysis information can be reused.
Reuse one pass manager in the pass pipeline so that we don't have to
unnecessarily recompute analysis information.
Swift SVN r19917
This ensures that if we have a bunch of passes in a row which modify the CFG, we
do not continually rebuild the post order, while at the same time preserving the
property of multiple passes which do not touch the CFG sharing the same post
order, reverse post order rather than recomputing them.
rdar://17654239
Swift SVN r19913
The induction variable analysis derives from the SCC visitor CRTP-style
and uses it to drive analysis to find the IVs of a function.
The current definition of induction variable is very weak, but enough to
use for very basic bounds-check elimination.
This is not quite ready for real use. There is an assert that I've
commented out that is firing but should not be, and that will require
some more investigation.
Swift SVN r19845
The main purpose of this pass is to hoist invariant loads out of loops. This
will enable llvm to vectorize loops with array accesses in Ofast once we hoist
the makeUnique functions.
Disabled for now.
rdar://17142604
Swift SVN r19713
In the high-level we don't inline functions with special semantics to allow high-level optimizations.
In this change we are moving from 3 SSA iterations into two high-level and two-low level iterations of the SSA optimization pipeline.
This change reduces the SmallPT benchmark execution time by 50% and changes the overall testsuite score by 9%.
Swift SVN r19581
Inlining exposes more opportunities for CFG simplifications, and this
could be beneficial before ARC opts.
Because we create inline "caches" fairly late we also need this in order
to clean up redundant checked_cast_br instructions that are exposed as a
result of inlining since we only run the SSA passes once after the
inline cache pass.
The change to actually optimize the checked_cast_br is forthcoming.
Swift SVN r19557
This will enable via the -print-stats function the ability to quickly
find out the final count of various forms of instructions. My intention
is to use this to count retains and releases.
Swift SVN r18946
Keep in mind that there is still more work to be done in the optimizer related
to loops, partial merging, etc. But this is the most basic multiple basic block
optimizer that has the features we want.
Swift SVN r18707
The deserializer holds a reference to the deserialized SILFunction, which
prevents Dead Function Elimination from erasing them.
We have a tradeoff on how often we should clean up the unused deserialized
SILFunctions. If we clean up at every optimization iteration, we may
end up deserializing the same SILFunction multiple times. For now, we clean
up only after we are done with the optimization iteration.
rdar://17046033
Swift SVN r18697
Enhances DCE to make unreachable those regions of code that have no
effect.
This allows loops like:
for i in 0..n {
// do nothing
}
to be eliminated by first running DCE to make the loop unreachable, and
the CFG simplification to actually delete the blocks that make up the
loop (assuming we're talking -Ofast and cond_fails have been removed).
What's especially nice is that this can make unreachable several levels
of dead code, including deleting the code that produces the values used
to conditionally branch to other dead code, all in a single pass rather
than needing to iterate between DCE and CFG simplification to achieve
the same effect. For example, this:
func f(b: Bool, c: Bool, d: Bool) {
if (b && c) {
// nothing useful here
if (c && d) {
// nothing useful here
if (b && d) {
// nothing useful here
}
}
}
}
is effectively reduced to:
func f(b: Bool, c: Bool, d: Bool) {
goto end // pretend for a second we have goto
if (b && c) {
// nothing useful here
if (c && d) {
// nothing useful here
if (b && d) {
// nothing useful here
}
}
}
end:
}
after a single pass, after which unreachable code elimination reduces
this to:
func f(b: Bool, c: Bool, d: Bool) {
}
Swift SVN r18664
In a loop like this:
var j = 2
for var i = 0; i < 100; ++i {
j += 3
}
it will completely eliminate j.
It does not yet support rewriting conditional branches as unconditional
branches in the cases where only empty blocks are control dependent on
an edge. Once this support is added, it will also completely eliminate
the loop itself.
Swift SVN r18615
Despite my comment in r17554, the pass was still disabled because of a
potential regression. After enough inlining, _cocoaStringSubscript
addressor is hoisted outside the string comparison, which is of course
a loop.
The regression is hard to measure, ~0.5%, so it's been decided that
we're going to live with it, rather than doing something nasty like
recognizing certain variable names. The post-WWDC fix:
<rdar://problem/16836228> Add an @cold attribute to identify unlikely
code paths.
Swift SVN r17609
And especially don't do that for String subscript. The plan now is to just hoist global initializers out of loops.
And we are faster than ObjC on the StringSort benchmark.
Swift SVN r17240
Currently, this pass simply hoists calls to addressor functions up to
the function entry point. This solves most of the perfomance problem.
Fixes <rdar://problem/16500879> Need to hoist @swift_once outside of loops.
Swift SVN r16684
This commit also enables constant propagation in the performance
pipeline.
Since we are close to WWDC, this commit purposefully minimally touches
the pass (despite my hands wanted to refactor it so bad) just enough so
that we get the desired result with minimal in tree turmoil.
rdar://16604715
Swift SVN r16388