... with disabled test 1_stdlib/Bit.swift for ios.
Most likely the problem of 1_stdlib/Bit.swift (only on armv7) is just uncovered by this change.
Unfortunately I have no possibility to debug the problem on a device. Therefore I filed rdar://problem/20521110
Swift SVN r27274
This avoids that an unoptimized imported function is linked instead the optimized version from the stdlib.
rdar://problem/20485253
It gives considerable performance improvmenets for some benchmarks with -Onone. E.g.
PopFrontUnsafePointer: +281%
ArrayOfPOD: +92%
StrComplexWalk: +91%
ArrayOfGenericPOD: +61%
Several others are within the range of +10% to +30%.
For the implementation I added runSILPassesForOnone() in Passes.cpp.
Here we can add other optimizations for -Onone in the future.
Swift SVN r27206
To set the PassKind automatically, I needed to refactor some code of the pass manager and the pass definitions.
The main changes are:
1) SILPassManager now has an add-function for each pass: PM.add(createP()) -> PM.addP()
2) I removed the ARGS argument in Passes.def, which we didn't use anyway.
Swift SVN r26756
This flag enables one to specify a json file that expresses a specific
pipeline in the following format:
[
[
"$PASS_MANAGER_ID",
"run_n_times"|"run_to_fixed_point",
$NUM_ITERATIONS,
"$PASS1", "$PASS2", ...
],
...
]
This will make it easier to experiment with different pass pipelines by
allowing:
1. Automatic generation of pass pipelines without needing to recompile
the compiler itself.
2. Simple scripting of pass pipelines via the json meta language.
3. Enabling the easy expression and reproducability of a specific
pipeline ordering via radar.
In the next commit I will provide a python library for the generation of these
json files with a few types of pipeline generators already created.
Swift SVN r24055
We know that a native swift array that does not need an element type check is
not going to change to an nsarray, or to an array that needs an element type
check. This allows us to specialize array code.
The array semantic calls 'array.props.isCocoa/needsElementTypeCheck' returns
said array properties for a read.
func f(a : A[AClass]) {
for i in 0..a.count {
let b = a.props.isCocoa()
.. += _getElement(a, i, b)
}
}
==>
func f(a : A[AClass]) {
let b2 = a.props.isCocoa()
if (!b2) {
for i in 0..a.count {
.. += _getElement(a, i, false)
}
} else {
for i in 0..a.count {
let b = a.props.isCocoa
.. += _getElement(a, i, b)
}
}
}
The stdlib will be changed to use array.props calls in a future commit.
rdar://17955309
Swift SVN r23689
This avoids that the deserializer(s) keep references to deserialized functions during the whole optimization passes
(especially dead function elimination).
I have seen no negative effect on compiletime. It seems to be a seldom event that a function is
deserialized twice because of not keeping the cache alive between linking passes.
I also simplified the final dead function elimination by just using the regular dead function elimination pass.
Swift SVN r22837
Also, only hoist releases into switch regions late in the pipeline.
I made retain sinking more aggressive so that we can move more retains into (and
then hopefully out of) switch regions.
Hoisting releases up into switch regions blocks moving the retains out of the
switch region. To remove retain/releases we would have to iterate between
codemotion and global-arc-opts. Instead just don't move releases into switch
regions until late in the pipeline (the late iteration of the SSAPasses).
This fixes the remaining regressions from the array changes in RC4. And gives
some nice further gains.
-O results speedup(SU) = minbefore/minafter:
TEST``````````SMPL`MIN``MAX```MEAN``SD```MEDIAN`MIN``MAX``MEAN`SD```MEDIAN``SU
Forest````````10```4985`5456``5319``156``5376```4423`4914`4700`169``4789````1.12
ImageProc`````10```7852`7873``7858``6````7857```8260`8272`8264`3````8263````0.95
InsertionSort`10```6098`6109``6104``3````6104```6720`6736`6726`4````6727````0.90
PrimeNum``````10```9202`18296`12514`2690`12368``4098`7552`5953`1164`6319````2.24
Prims`````````10```2058`3486``2787``513``3025```1877`2429`2049`196``2007````1.09
QuickSort`````10```6101`6116``6107``5````6107```6380`6415`6396`11```6398````0.95
RC4```````````10```8639`8684``8655``14```8653```7821`7926`7876`32```7880````1.10
Richards``````10```2243`2254``2249``3````2250```1729`1740`1736`3````1738````1.29
StrToInt``````10```4298`4317``4309``6````4310```4090`4107`4097`6````4098````1.05
StringWalk````10```6478`6511``6486``10```6482```5784`5797`5790`4````5790````1.11
Walsh`````````10```5971`7374``6831``515``7103```4780`4803`4789`7````4788````1.24
InsertionSort: Looking at the profile of insertion sort the regression is not
due to increased retain/release traffic. In fact, it is hard to tell where
those 10% come from.
Quicksort: The quicksort function is identical and takes 95% of the time. Not
clear where the 5% are coming from. It is not extra retain/release traffic
AFAICT.
ImageProc: Same story as InsertionSort.
rdar://17653593
Swift SVN r22694
Fixes
<rdar://problem/16755460> Specialize functions that are partially applied to constant values
This seems to have little effect on our current benchmark suite except
for the one I wrote specifically for this optimization. We get 4x
speedup on calling reduce to sum a range.
Swift SVN r21248
If we have function A calls function B with a closure C
func A {
B(...) {
// closure C
}
}
and the inliner decides to not inline B into A, it may be beneficial to generate
a specialized version of B to have
func A {
B_spec_with_C(..., arguments_to_closure_C)
}
SILCombine will optimize apply of partial_apply that are both in B_spec_with_C.
Then inliner can inline the closure to B_spec_with_C.
For profitability, we check the relative size of the callee B and the closure C.
We also check hotness of the callsite to B in A and callsites to the closure
inside B. For now, if closure is called inside a loop, we think it is
profitable.
I will add this to the pass manager in a follow-up patch.
rdar://16569736
Swift SVN r21216
Implements redundant bounds check elimination for basic blocks and along the
dominator tree of loops.
No induction variable based hoisting yet.
O3:
NBody , 473.00 , 122.00 , 294.2%
QuickSort , 477.00 , 310.00 , 53.9%
RC4 , 1022.00 , 736.00 , 38.6%
Walsh , 1781.00 , 1142.00 , 55.5%
No effect on Ofast.
Disabled for now.
Swift SVN r20199
This is not run by default unless one passes in the flag -Xllvm -enable-global-load-store-opts.
Also in order to make sure in the face of multi-bbs dead store elimination is
still correct, we use the post order dominator tree to determine if the dead
store is post dominated by the store that is causing it to be dead.
With this pass enabled, we see a 3.5% decrease in overall time in the precommit
bench and the following tests increase in speed by > 5%:
2Sum: 8.9%
Rectangles: 7.35%
Ackermann: 6.43%
StringBuilder: 6.16%
EditDistance: 5.71%
StringWalk: 5.58%
That means that 30% of our benchmarks increased in speed by > 5%. Many of the
other benchmarks increased in speed significantly but not as drmatically.
The only benchmark that regressed is SmallPt which I am looking into.
rdar://17680758
Swift SVN r20009
The induction variable analysis derives from the SCC visitor CRTP-style
and uses it to drive analysis to find the IVs of a function.
The current definition of induction variable is very weak, but enough to
use for very basic bounds-check elimination.
This is not quite ready for real use. There is an assert that I've
commented out that is firing but should not be, and that will require
some more investigation.
Swift SVN r19845
The main purpose of this pass is to hoist invariant loads out of loops. This
will enable llvm to vectorize loops with array accesses in Ofast once we hoist
the makeUnique functions.
Disabled for now.
rdar://17142604
Swift SVN r19713
The way this pass works is very similar to generic specialization except
that it turns the old function into a thunk that calls the newly created
function that has had the dead arguments removed.
This ensures that any place in the code where we were unable to see that
the old function was being called still works and also enables us to
cut down on code size increase due to code duplication since the
marshalling code for the thunk should be very small.
This is just the first part of a larger body of work that optimizes
function signatures. The plan is to include transforming loadable
pointer args to pass by value and to convert @owned arguments to
@gauranteed arguments.
<rdar://problem/17319928>
Swift SVN r18970
Dynamic languages are able to implement inline caches for virtual calls, but swift is statically compiled, so we have to guess the types at compile time. The early binding pass guesses that types at the bottom of the class hierarchy are not subclassed and emits direct calls to these passes. It converts class_method calls into the following code:
if (Instance is of time Foo) {
Foo::ping()
} else {
Instance->ping();
}
The check if an instance is of a specific type is inexpensive, it is simply a load+icmp sequence.
Swift SVN r18860
The deserializer holds a reference to the deserialized SILFunction, which
prevents Dead Function Elimination from erasing them.
We have a tradeoff on how often we should clean up the unused deserialized
SILFunctions. If we clean up at every optimization iteration, we may
end up deserializing the same SILFunction multiple times. For now, we clean
up only after we are done with the optimization iteration.
rdar://17046033
Swift SVN r18697
In a loop like this:
var j = 2
for var i = 0; i < 100; ++i {
j += 3
}
it will completely eliminate j.
It does not yet support rewriting conditional branches as unconditional
branches in the cases where only empty blocks are control dependent on
an edge. Once this support is added, it will also completely eliminate
the loop itself.
Swift SVN r18615
Currently, this pass simply hoists calls to addressor functions up to
the function entry point. This solves most of the perfomance problem.
Fixes <rdar://problem/16500879> Need to hoist @swift_once outside of loops.
Swift SVN r16684
This commit also enables constant propagation in the performance
pipeline.
Since we are close to WWDC, this commit purposefully minimally touches
the pass (despite my hands wanted to refactor it so bad) just enough so
that we get the desired result with minimal in tree turmoil.
rdar://16604715
Swift SVN r16388
Fix a phase ordering problem: SILGen of a noreturn function doesn't drop an unreachable after the function,
and doing so is problematic for various reasons (all expressions would have to handle their insertion point
vaporizing, and would have to emit unreachable code diagnostics). Instead, run a simple pass that folds
noreturn calls and diagnoses unreachable code, and do it before DI. This prevents DI from seeing false
paths, and rejecting what seems like invalid code.
Swift SVN r14711
PassManager.
I think this is much cleaner and more flexible. The various pass
builders have no business marshalling these things around, and they
shouldn't be bound to the pass C'tor. In the future we will be able
override and dynamically modify pass configuration this way.
Swift SVN r13626