In theory we should be able to eliminate more loads if we run this after
the mem2reg that is after inlining. We aren't really relying heavily on
having promoted values like this prior to inlining.
Again, I see no significant performance delta, but this seems like the
best place to put this pass if we're only running it once per run of the
SSA passes.
Doing this earlier means that optimizations that are looking at SIL
values (rather than memory) have more opportunities earlier.
Minimal impact at the moment, but this may allow for removing some later
passes that are repeated.
Re-apply b00dcbe with a small test update, and a small change in pass
ordering.
I measure around a 10% reduction in compile times of release no-assert
builds of the stdlib and StdlibUnitTest.
For release + debug-swift builds, I see 20% reduction in stdlib compile
time.
My latest measurements show a few regressions at -O:
Calculator
NSError
SetIsSubsetOf
Sim2DArray
There is a small (0.1%) reduction in the libswiftCore.dylib size.
Being able to remove these is a consequence of the reordering that
happened in e50daa6.
I measure around a 10% reduction in compile times of release no-assert
builds of the stdlib and StdlibUnitTest.
For release + debug-swift builds, I see 20% reduction in stdlib compile
time.
I saw no reproducible regressions in the benchmarks, and a few
improvements.
There is a small (0.1%) reduction in the libswiftCore.dylib size.
Being able to remove these is a consequence of the reordering that
happened in e50daa6.
The end goal here is to end up with a good pass ordering that will allow
us to only run one set of these passes, rather than running them
twice. This is a start in that direction.
No real impact measured on compile times as of this change. On
benchmarks I see a mix of regressions and improvements.
-O improvements:
Calculator -17.6% 1.21x
Chars -54.4% 2.19x
PolymorphicCalls -14.7% 1.17x
SetIsSubsetOf -14.1% 1.16x
Sim2DArray -14.1% 1.16x
StrToInt -30.4% 1.44x
-O regressions:
CaptureProp +32.9% 0.75x
DictionarySwap +36.0% 0.74x
XorLoop +39.8% 0.72x
-Ounchecked improvements:
Chars -58.0% 2.38x
-Ounchecked regressions:
CaptureProp +33.3% 0.75x
-Onone improvements:
StrToInt -14.9% 1.18x
StringWalk -47.6% 1.91x
StringWithCString -17.2% 1.21x
(many more smaller improvements)
-Onone regressions:
Calculator +21.5% 0.82x
OpenClose +10.1% 0.91x
This eliminates a pretty similar list of passes added in a similar order
with just re-using the ordering from AddSSAPasses. Beyond the particular
inliner pass (which is maintained with this change), there was nothing
really specific to low-level code with the order that was present before.
I measure a 1% increase in compile time of the stdlib, no perf
regressions (at -O), and a few decent improvements:
19 CaptureProp 5233 4129 -1104 -21.1% 1.27x
30 ErrorHandling 3053 2678 -375 -12.3% 1.14x
65 Sim2DArray 610 518 -92 -15.1% 1.18x
I expect to be able to get back the 1% compile-time hit (and probably
more) with future changes.
Now that we process functions in bottom-up order in the pass manager and
have a mechanism to restart the pass pipeline on the current
function (or on a newly created callee function), we can split these
passes back out from the inliner and end up with the same benefits we
had from initially integrating them. We get the further benefit of fully
optimizing newly created callee functions before continuing with the
function that resulted in the creation of those callee
functions (e.g. as a result of a specialization pass running).
This reverts commit 0515889cf0.
I made a mistake and did not catch this regression when I measured the change on
my local machine. The regression was detected by our automatic performance
tests. Thank you @slavapestov for identifying the commit.
Removing one of the invocation of the ARC optimizer. I did not measure any
regressions on the performance test suite (using -O), but I did see a
reduction in compile time on rdar://24350646.
On the whole it looks like this currently benefits performance.
As with the devirtualization pass, once the updated inliner is
committed, the position of this pass in the pipeline will change.
It looks like this has minimal performance impact either way. Once the
changes to make the inliner a function pass are committed, the position
of this in the pipeline will change.
They aren't needed at the moment, and running the specialization pass
early might have resulted in some performance regressions.
We can add these back in (and in the appropriate place in the pipeline)
when the changes to unbundle this functionality from the inliner goes in.
Add back a stand-alone devirtualizer pass, running prior to generic
specialization. As with the stand-alone generic specializer pass, this
may add functions to the pass manager's work list.
This is another step in unbundling these passes from the performance
inliner.
Begin unbundling devirtualization, specialization, and inlining by
recreating the stand-alone generic specializer pass.
I've added a use of the pass to the pipeline, but this is almost
certainly not going to be the final location of where it runs. It's
primarily there to ensure this code gets exercised.
Since this is running prior to inlining, it changes the order that some
functions are specialized in, which means differences in the order of
output of one of the tests (one which similarly changed when
devirtualization, specialization, and inlining were bundled together).
This enables array value propagation in array literal loops like:
for e in [2,3,4] {
r += e
}
Allowing us to completely get rid of the array.
rdar://19958821
SR-203
This reverts commit 82ff59c0b9.
Original commit message:
This allows us to compile the function:
func valueArray() -> Int{
var a = [1,2,3]
var r = a[0] + a[1] + a[2]
return r
}
Down to just a return of the value 6. And should eventually allow us to remove
the overhead of vararg calls.
rdar://19958821
(libraries now)
It has been generally agreed that we need to do this reorg, and now
seems like the perfect time. Some major pass reorganization is in the
works.
This does not have to be the final word on the matter. The consensus
among those working on the code is that it's much better than what we
had and a better starting point for future bike shedding.
Note that the previous organization was designed to allow separate
analysis and optimization libraries. It turns out this is an
artificial distinction and not an important goal.