Commit Graph

11193 Commits

Author SHA1 Message Date
Adrian Prantl
b6a7d6906a Debug Info: Fix the lowering of the SILDebugScope tree to the LLVM
inlined-at chain.

The previous implementation was only correct for cases where the inliner
inlined bottom-up in the call graph, which happened to cover the majority
of all cases.

rdar://problem/24462475
2016-02-05 13:53:34 -08:00
Max Moiseev
61c837209b Merge remote-tracking branch 'origin/master' into swift-3-api-guidelines 2016-02-04 16:13:39 -08:00
Erik Eckstein
529b386701 SILCombine: allow promotion of init_existentials over control flow + a small bug fix
This allows devirtualization of witness method calls if the initialization of the existential is not in the same basic block.
This change also fixes a bug where promotion is done even if the stack is overwritten after initialization. Although I'm not sure if this kind of code is ever generated.
2016-02-04 10:59:56 -08:00
Xin Tong
ae86ef2b72 Implement more conservative debugging value support on function arguments in
function signature opt.

Instead of replacing %1 with UNDEF in debugvalueinst %1, we form an aggregate,
taking the alive part of %1 and fill the dead part with undef.

rdar://23727705
2016-02-04 10:50:26 -08:00
Mark Lacey
82fd057eaf Remove devirtualization and specialization from the inliner.
Now that we process functions in bottom-up order in the pass manager and
have a mechanism to restart the pass pipeline on the current
function (or on a newly created callee function), we can split these
passes back out from the inliner and end up with the same benefits we
had from initially integrating them. We get the further benefit of fully
optimizing newly created callee functions before continuing with the
function that resulted in the creation of those callee
functions (e.g. as a result of a specialization pass running).
2016-02-04 08:52:01 -08:00
Arnold Schwaighofer
755bfe3185 Revert "Split the method isProfitableToInline."
This reverts commit 670f193e4a.

It has broken a build bot.
2016-02-03 18:10:17 -08:00
Nadav Rotem
670f193e4a Split the method isProfitableToInline.
This commit refactors the parts of isProfitableToInline that compute the cost
and benefit of inlining into a separate function. NFC.
2016-02-03 16:24:06 -08:00
Adrian Prantl
0854b3ce6d SILDebugScope: Add accessors for the parent SIL functions and use them in
assertions. (NFC)
2016-02-03 14:48:06 -08:00
Jordan Rose
923b9a6201 Don't emit any check for #available of another platform.
Previously we treated the * platform as checking for the minimum
deployment target, but that's definitely unnecessary.

There is a bit of a hack here to avoid diagnosing the 'else' branch as
unreachable: if a constant true/false came from #available, ignore it.
2016-02-03 14:27:13 -08:00
Nadav Rotem
6d92a15b29 Sink the loop depth calculation into the IF that uses it. NFC. 2016-02-03 14:11:33 -08:00
Nadav Rotem
1c29dc49ae Remove whitespace. NFC.
I am doing this commit to test the internal infrastructure.
2016-02-03 13:44:50 -08:00
Mark Lacey
a97164656f Disallow inlining of self-recursive functions into other functions.
This effectively returns us to the code from dc65f70.

With the recent pass manager changes, combined with upcoming inliner
changes, we can potentially run the inliner more than we currently
do. Allowing self-recursive functions to be inlined, and running the
inliner more often, can result in a lot of code bloat, which increases
binary sizes and compile times. Even with a relatively small value (10)
for the number of times we allow a function to run through the pass
pipeline, we end up with a significant increase in the stdlib and
stdlib unit test build times.

This results in some performance regressions, but I think the trade-off
here is reasonable.
2016-02-02 20:19:39 -08:00
Erik Eckstein
8520120121 SimplifyCFG: don't recalculate the dominator tree for each jump threaded checked_cast_br instruction.
This is done by splitting the transformation into an analysis phase and a transformation phase (which does not use the dominator tree anymore).
The domintator tree is recalucated once after the whole function is processed.

This change eventually solves the compile time problem of rdar://problem/24410167.
2016-02-02 17:46:32 -08:00
Erik Eckstein
6e00d8a9e1 add asserts in replaceBranchTarget() and use casts instead of dyn_casts 2016-02-02 17:46:32 -08:00
Mark Lacey
378e94b901 Formatting changes on recently added lines. 2016-02-01 21:50:01 -08:00
Arnold Schwaighofer
ac423ebe97 A generic class can inherit from objc and so the devirtualizer needs to emit a default case
rdar://23228386
2016-02-01 20:33:40 -08:00
Arnold Schwaighofer
f6866b4ae7 Perform a dynamic method call if a class has objc ancestry in speculative devirt as fallback.
If a class has an @objc ancestry this class can be dynamically overridden and
therefore we don't know the default case even if we see the full class
hierarchy.

rdar://23228386
2016-02-01 18:16:37 -08:00
Mark Lacey
beb0f7dc2f Update pass manager execution strategy for function passes.
Allow function passes to:

1. Add new functions, to be optimized before continuing with the current
   function.
2. Restart the pipeline on the current function after the current pass
   completes.

This makes it possible to fully optimize callees that are the result of
specialization prior to generating interprocedural information or making
inlining choices about these callees.

It also allows us to solve a phase-ordering issue we have with generic
specialization, devirtualization, and inlining, by rescheduling the
current function after changes happen in one of these passes as opposed
to running all of these as part of the inlining pass as happens today.

Currently this is NFC since we have no passes that use this
functionality.
2016-02-01 16:47:26 -08:00
Dmitri Gribenko
1f6fe29e49 Merge pull request #1155 from practicalswift/typo-fixes-20160201
[gardening] Fix typos: "specalized" → "specialized", "uniqueing" → "uniquing"
2016-02-01 14:57:18 -08:00
practicalswift
397bda1624 [gardening] Fix recently introduced typo: "uniqueing" → "uniquing" 2016-02-01 23:07:39 +01:00
Erik Eckstein
3c6c48c4bf SimplifyCFG: simplify the switch_enum -> select_enum conversion.
The main intention for this change is to eliminate the use of the post/dominator trees in this transformation.
These were re-calculated on every conversion which caused long compile times for functions with lot of switch_enum instructions: rdar://problem/24410167

Beside that, the code for collecting the target-block's predecessors is now simpler. It's not necessary to handle arbitrary control flow pathes because jump threading is simplifying the CFG anyway.

Now SimplifyCFG does not use the PostDominanceAnalysis anymore.
2016-02-01 13:32:55 -08:00
Xin Tong
f73626eb28 Remove 4/5 runs of dead store elimination. I did not measure a real performance difference on
my local machine.

rdar://24392141

This is going to cut compilation time spent in dead store elim by 5X

The last iteration of dead store ran just before the last iteration of arc-sequence-opt
allows us to catch some opportunites passes like Mem2Reg can not eliminate. And this allows
more code motion freedom.

Stdlib -O after removing 4/5 dead stores.
=========================================

Running Time	Self (ms)		Symbol Name
22082.0ms   37.1%	0.0	 	    swift::runSILOptimizationPasses(swift::SILModule&)
21905.0ms   36.8%	0.0	 	     swift::SILPassManager::runOneIteration()
17616.0ms   29.6%	35.0	 	      swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
16667.0ms   28.0%	55.0	 	       swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3063.0ms    5.1%	7.0	 	        (anonymous namespace)::SimplifyCFGPass::run()
2936.0ms    4.9%	20.0	 	        (anonymous namespace)::ARCSequenceOpts::run()
2343.0ms    3.9%	3.0	 	        (anonymous namespace)::SILCombine::run()
1900.0ms    3.1%	110.0	 	        (anonymous namespace)::SILCSE::run()
1642.0ms    2.7%	43.0	 	        (anonymous namespace)::RedundantLoadElimination::run()
1113.0ms    1.8%	6.0	 	        (anonymous namespace)::GenericSpecializer::run()
788.0ms    1.3%	120.0	 	        (anonymous namespace)::DCE::run()
495.0ms    0.8%	3.0	 	        (anonymous namespace)::SILCodeMotion::run()
304.0ms    0.5%	1.0	 	        (anonymous namespace)::StackPromotion::run()
292.0ms    0.4%	1.0	 	        (anonymous namespace)::ConstantPropagation::run()
269.0ms    0.4%	5.0	 	        (anonymous namespace)::ABCOpt::run()
236.0ms    0.3%	35.0	 	        (anonymous namespace)::SILSROA::run()
192.0ms    0.3%	2.0	 	        (anonymous namespace)::SILMem2Reg::run()
146.0ms    0.2%	65.0	 	        (anonymous namespace)::SILLowerAggregate::run()
132.0ms    0.2%	5.0	 	        (anonymous namespace)::LICM::run()
132.0ms    0.2%	7.0	 	        (anonymous namespace)::DeadStoreElimination::run()
96.0ms    0.1%	65.0	 	        (anonymous namespace)::Devirtualizer::run()
67.0ms    0.1%	59.0	 	        (anonymous namespace)::DeadObjectElimination::run()
62.0ms    0.1%	44.0	 	        (anonymous namespace)::RemovePinInsts::run()

StdlibUnitTest -O after removing 4/5 dead stores.
=================================================

Running Time	Self (ms)		Symbol Name
6958.0ms   26.9%	0.0	 	    swift::runSILOptimizationPasses(swift::SILModule&)
6923.0ms   26.8%	0.0	 	     swift::SILPassManager::runOneIteration()
5638.0ms   21.8%	5.0	 	      swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
5363.0ms   20.7%	8.0	 	       swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
1535.0ms    5.9%	1.0	 	        (anonymous namespace)::ARCSequenceOpts::run()
789.0ms    3.0%	2.0	 	        (anonymous namespace)::SimplifyCFGPass::run()
704.0ms    2.7%	0.0	 	        (anonymous namespace)::SILCombine::run()
615.0ms    2.3%	36.0	 	        (anonymous namespace)::SILCSE::run()
506.0ms    1.9%	14.0	 	        (anonymous namespace)::RedundantLoadElimination::run()
224.0ms    0.8%	44.0	 	        (anonymous namespace)::DCE::run()
150.0ms    0.5%	1.0	 	        (anonymous namespace)::SILCodeMotion::run()
113.0ms    0.4%	1.0	 	        (anonymous namespace)::StackPromotion::run()
98.0ms    0.3%	4.0	 	        (anonymous namespace)::DeadStoreElimination::run()
80.0ms    0.3%	3.0	 	        (anonymous namespace)::ABCOpt::run()
74.0ms    0.2%	5.0	 	        (anonymous namespace)::LICM::run()
2016-02-01 12:52:36 -08:00
Slava Pestov
587a11ebb5 Merge pull request #1144 from Saisi/niggling_typos
Fixed more niggling typos
2016-01-29 23:55:21 -08:00
Mark Lacey
4f65f6dc8f Split out the analysis code for function signature optimization.
The split here is rough and will be improved before I create an actual
SILAnalysis out of the analysis piece.
2016-01-29 21:44:54 -08:00
saisi
7f1da6adcc Fixed more niggling typos 2016-01-29 23:52:24 -05:00
Chris Lattner
061c7eb475 Merge pull request #1142 from Saisi/niggling_typos
Fixed niggling typos
2016-01-29 20:19:54 -08:00
saisi
535d400dc6 Fixed niggling typos 2016-01-29 23:16:25 -05:00
Chris Lattner
6810116ff6 Merge pull request #1141 from Saisi/niggling-typos
fixed niggling typos
2016-01-29 20:02:50 -08:00
saisi
08abcd92c3 fixed niggling typos 2016-01-29 22:22:12 -05:00
Adrian Prantl
75fc840126 Merge the parent scope and function fields of SILDebugScope into a
PointerUnion.

This saves 8 bytes per SILDebugScope.

rdar://problem/22706994
2016-01-29 17:21:26 -08:00
Nadav Rotem
52ea0c6c48 Revert "Remove one invocation of the ARC optimizer."
This reverts commit 0515889cf0.

I made a mistake and did not catch this regression when I measured the change on
my local machine. The regression was detected by our automatic performance
tests. Thank you @slavapestov for identifying the commit.
2016-01-28 21:03:10 -08:00
Erik Eckstein
fd6ded4efd Inliner: introduce a new limit for the number of caller blocks.
This avoids too much inlining into a caller function (so far we only had limits based on the callee).
rdar://problem/23228386
2016-01-28 16:21:03 -08:00
Mark Lacey
a7cd2d215a Remove unused getArgList().
There is a method getArgDescList() that is used throughout the code instead.
2016-01-27 15:27:42 -08:00
Xin Tong
5c96bc4945 RLE marks the LiveOut of unreachable block as 0. This is done to simplfy the SSAupdate etc, i.e.
we do not need to place bogus value in the unreachable blocks in case a SILArgument needs to be
constrcuted for this  block's successors.

This relies on simplifycfg or other passes to clean up the CFG before RLE is ran.

isReachable logic is incorrect. This make RLE too conservative in some cases and incorrect in
others .

This fixed ASAN build break caused by commit 925eb2e0d9

I see more redundant loads elim'ed, but I do not see a performance difference with this change.
2016-01-27 14:58:28 -08:00
Mark Lacey
3723b4946b Convert a member variable to a local variable. 2016-01-27 14:21:18 -08:00
Mark Lacey
dd1801898a Remove unused local variable. 2016-01-27 14:21:18 -08:00
Xin Tong
ff8c9e4a6f Fix a logic error in SimplifyCFG. isReachable is only used as part of NDEBUG. 2016-01-27 14:06:53 -08:00
Mark Lacey
033934deeb Rename FunctionAnalyzer to SignatureOptimizer.
This is a first step towards moving the analysis portion of function
signature optimization into an actual SILAnalysis. I'll split this class
into two pieces (one for the analysis it does, and one for the rewrites
it does) next.
2016-01-27 11:05:56 -08:00
Erik Eckstein
905fd37b98 COWArrayOpt: ensure that we can hoist all address projections that we stripped.
This change is needed because we new consider index_addr as address projection in Projection.h
2016-01-27 09:04:57 -08:00
practicalswift
75bec87b5a [gardening] Fix recently introduced typo: optimsitic → optimistic 2016-01-27 11:59:07 +01:00
Mark Lacey
c78cbc3587 Fix typo: not -> no 2016-01-26 23:02:53 -08:00
Xin Tong
dd8244f1a7 Set the kill bit for the store at the end of the basic block where the stored location is de-allocated.
rdar://24354423
2016-01-26 20:05:46 -08:00
Xin Tong
9c3cdcc00e Replace some DenseMap with SmallDenseMap. Many, if not most functions do not have more than 64 Locations.
which is the default for DenseMap.
2016-01-26 19:45:01 -08:00
Xin Tong
925eb2e0d9 Correct an inefficiency in initial state of the data flow in RLE 2016-01-26 19:45:01 -08:00
Xin Tong
b3d0d815fc Change SmallDenseMap initial size from 4 to 16. It seems this gives a bit better compilation time
as we do not resize the densemap as much. NFC.
2016-01-26 19:45:01 -08:00
Nadav Rotem
0515889cf0 Remove one invocation of the ARC optimizer.
Removing one of the invocation of the ARC optimizer. I did not measure any
regressions on the performance test suite (using -O), but I did see a
reduction in compile time on rdar://24350646.
2016-01-26 15:51:25 -08:00
Erik Eckstein
8af1372ff3 remove unused variable 2016-01-26 09:37:08 -08:00
Erik Eckstein
9f83c43a02 SIL: remove unused functions from SILValue 2016-01-26 09:37:08 -08:00
Xin Tong
5034e5ba72 Add in some throttle logic for RLE. This is mostly intended for functions that are way too large to process.
I do not see compilation time difference in stdlib -O nor any change in # of redundant loads eliminated.

I am more looking at compilation time and precision in stdlibunittest.

=== Before Throttle Logic ===

compilation time stdlibunit -O:
Running Time    Self (ms)               Symbol Name
27016.0ms   26.4%       0.0                 swift::runSILOptimizationPasses(swift::SILModule&)
26885.0ms   26.2%       0.0                  swift::SILPassManager::runOneIteration()
22355.0ms   21.8%       15.0                  swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
21416.0ms   20.9%       42.0                   swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
5662.0ms    5.5%        10.0                    (anonymous namespace)::ARCSequenceOpts::run()
3916.0ms    3.8%        58.0                    (anonymous namespace)::RedundantLoadElimination::run()
2707.0ms    2.6%        3.0                     (anonymous namespace)::SILCombine::run()
2248.0ms    2.1%        5.0                     (anonymous namespace)::SimplifyCFGPass::run()
1974.0ms    1.9%        121.0                   (anonymous namespace)::SILCSE::run()
1592.0ms    1.5%        30.0                    (anonymous namespace)::DeadStoreElimination::run()
746.0ms    0.7% 170.0                   (anonymous namespace)::DCE::run()

=== After Throttle Logic ===

compilation time stdlibunit -O:
Running Time    Self (ms)               Symbol Name
25735.0ms   25.4%       0.0                 swift::runSILOptimizationPasses(swift::SILModule&)
25611.0ms   25.3%       0.0                  swift::SILPassManager::runOneIteration()
21260.0ms   21.0%       21.0                  swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
20340.0ms   20.1%       43.0                   swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
5319.0ms    5.2%        8.0                     (anonymous namespace)::ARCSequenceOpts::run()
3265.0ms    3.2%        58.0                    (anonymous namespace)::RedundantLoadElimination::run()
2661.0ms    2.6%        1.0                     (anonymous namespace)::SILCombine::run()
2185.0ms    2.1%        5.0                     (anonymous namespace)::SimplifyCFGPass::run()
1847.0ms    1.8%        105.0                   (anonymous namespace)::SILCSE::run()
1499.0ms    1.4%        21.0                    (anonymous namespace)::DeadStoreElimination::run()
708.0ms    0.7% 150.0                   (anonymous namespace)::DCE::run()
498.0ms    0.4% 7.0                     (anonymous namespace)::SILCodeMotion::run()
370.0ms    0.3% 0.0                     (anonymous namespace)::StackPromotion::run()
2016-01-25 20:10:53 -08:00
Xin Tong
f5bd3eab49 Optimize compilation time for RLE and DSE with respective
to the new projection path. We do not need to trace from the accessed field
to the base object when we've done it before in enumerateLSLOcations

Stdlib -O

=== Before ===

Running Time        Self (ms)           Symbol Name
25137.0ms   37.3%   0.0         swift::runSILOptimizationPasses(swift::SILModule&)
24939.0ms   37.0%   0.0         swift::SILPassManager::runOneIteration()
20226.0ms   30.0%   29.0        swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19241.0ms   28.5%   83.0        swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3214.0ms    4.7%    10.0        (anonymous namespace)::SimplifyCFGPass::run()
3005.0ms    4.4%    14.0        (anonymous namespace)::ARCSequenceOpts::run()
2438.0ms    3.6%    7.0         (anonymous namespace)::SILCombine::run()
2217.0ms    3.2%    54.0        (anonymous namespace)::RedundantLoadElimination::run()
2212.0ms    3.2%    131.0       (anonymous namespace)::SILCSE::run()
1195.0ms    1.7%    11.0        (anonymous namespace)::GenericSpecializer::run()
1168.0ms    1.7%    39.0        (anonymous namespace)::DeadStoreElimination::run()
853.0ms    1.2%     150.0               (anonymous namespace)::DCE::run()
499.0ms    0.7%     7.0                 (anonymous namespace)::SILCodeMotion::run()

=== After ===

Running Time    Self (ms)               Symbol Name
22955.0ms   38.2%       0.0       swift::runSILOptimizationPasses(swift::SILModule&)
22777.0ms   37.9%       0.0       swift::SILPassManager::runOneIteration()
18447.0ms   30.7%       30.0      swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
17510.0ms   29.1%       67.0      swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2944.0ms    4.9%        5.0       (anonymous namespace)::SimplifyCFGPass::run()
2884.0ms    4.8%        12.0      (anonymous namespace)::ARCSequenceOpts::run()
2277.0ms    3.7%        1.0       (anonymous namespace)::SILCombine::run()
1951.0ms    3.2%        117.0     (anonymous namespace)::SILCSE::run()
1803.0ms    3.0%        54.0      (anonymous namespace)::RedundantLoadElimination::run()
1096.0ms    1.8%        10.0      (anonymous namespace)::GenericSpecializer::run()
911.0ms    1.5% 53.0              (anonymous namespace)::DeadStoreElimination::run()
795.0ms    1.3% 135.0             (anonymous namespace)::DCE::run()
453.0ms    0.7% 9.0               (anonymous namespace)::SILCodeMotion::run()
2016-01-25 20:10:04 -08:00