Commit Graph

2592 Commits

Author SHA1 Message Date
eeckstein
718e9e6f72 Merge pull request #1462 from eeckstein/specializer_new_attempt
Specializer new attempt
2016-02-26 17:02:29 -08:00
Xin Tong
19c528e59d Make more passes respect no.optimize 2016-02-26 16:03:17 -08:00
Erik Eckstein
3a83cee006 Reinstate "GenericSpecializer: When specializing a generic function, convert indirect parameters/result to direct parameters/result.""
This reinstates commit 4187959e66.

The exposed crash in the ClosureSpecializer is fixed.
2016-02-26 14:05:48 -08:00
Erik Eckstein
0c2ca94ef7 Rewrite the ValueLifetimeAnalysis.
It fixes a problem with lifetime regions having "exit-edges". This crashed the ClosureSpecializer.
2016-02-26 14:05:48 -08:00
Erik Eckstein
f70b53b015 Revert "Reinstate "GenericSpecializer: When specializing a generic function, convert indirect parameters/result to direct parameters/result."""
This reverts commit c556d5cd39.

Hitting a new assert.
2016-02-25 09:50:11 -08:00
Erik Eckstein
c556d5cd39 Reinstate "GenericSpecializer: When specializing a generic function, convert indirect parameters/result to direct parameters/result.""
This reinstates commit 4187959e66.

After Xin's recent fix in ARC (6a9a430f68) the crash on i386 should be resolved.
2016-02-25 08:48:15 -08:00
Mark Lacey
0a893c1f88 Fix typo in comment. 2016-02-24 14:17:32 -08:00
Michael Gottesman
a5be2fff01 [sil] Use FullApplySite instead of ApplyInst in SILInstruction::getMemoryBehavior().
We were giving special handling to ApplyInst when we were attempting to use
getMemoryBehavior(). This commit changes the special handling to work on all
full apply sites instead of just AI. Additionally, we look through partial
applies and thin to thick functions.

I also added a dumper called BasicInstructionPropertyDumper that just dumps the
results of SILInstruction::get{Memory,Releasing}Behavior() for all instructions
in order to verify this behavior.
2016-02-23 15:00:43 -08:00
Erik Eckstein
5b4c73ed3b Revert "GenericSpecializer: When specializing a generic function, convert indirect parameters/result to direct parameters/result."
This reverts commit 4187959e66.

There is a crash in StdlibUnittests on i386 (Release-Assert build)
2016-02-23 08:29:41 -08:00
Erik Eckstein
4187959e66 GenericSpecializer: When specializing a generic function, convert indirect parameters/result to direct parameters/result.
With this re-abstraction a specialized function has the same calling convention as if it would have been written with the specialized types in the first place.
In general this results in less alloc_stacks and load/stores.
It also can eliminate some re-abstraction thunks, e.g. if a generic closure is used in a non-generic context.
It some (hopefully rare) cases it may require to add re-abstraction thunks.

In case a function has multiple indirect results, only the first is converted to a direct result. This is an open TODO.
2016-02-22 13:58:10 -08:00
Erik Eckstein
666e20381f Refactor some ArraySemanticsCall code. NFC. 2016-02-22 13:58:10 -08:00
Mark Lacey
bc36b2a601 Improve handling of unreachable blocks in mem2reg.
We were handling regular uses, but not handling promotions in things
like debug_value_addr.

This was exposed by some pass ordering changes I have in an upcoming
commit.
2016-02-22 12:11:00 -08:00
Xin Tong
a48584ccbc Create a fast path for not-final release instruction.
For a release on a guaranteed function paramater, we know right away
that its not the final release and therefore does not call deinit.

Therefore we know it does not read or write memory other than the reference
count.

This reduces the compilation time of dead store and redundant load elim. As
we need to go over alias analysis to make sure tracked locations do not alias
with it.
2016-02-20 22:00:36 -08:00
Nadav Rotem
b4d836880f [Doc] Rename a function and change 'auto' to an explicit type as suggested by @slavapestov in code review. 2016-02-19 21:54:07 -08:00
Xin Tong
95f3280461 Remove a double negative. NFC 2016-02-19 20:45:23 -08:00
Xin Tong
e42bd372eb Skip processing block without loads.
After collected enough information in the first iteration of the
data flow. We do not do second iteration (last iteration) for blocks
without loads as we will not forward any load there.

This improves compilation time of redundant load elimination.
2016-02-19 20:45:23 -08:00
Nadav Rotem
2d20eb6c54 Update the pass to use the destructured result types that John introduced a few days ago. 2016-02-19 16:48:29 -08:00
Nadav Rotem
30927d3459 Implement CSE of the trio open_ext + witness_method + apply.
When we emit calls to existential methods silgen produces a sequence of the
three instructions below:

open_existential_addr %0 : $*Pingable to $*@opened("1E467EB8-...") Pingable
witness_method $@opened("1E467EB8-...") Pingable, #Pingable.ping!1
apply %3<@opened("1E467EB8-...") Pingable>(%2)

This commit adds a new CSE-like pass that finds sequences of calls to protocol
methods and reuses the first two instructions open_existential_addr and
witness_method. The optimization finds arguments that must not alias and may not
escape and combines all of the existential method calls to use the same method
lookup. The optimization handles control flow by finding the top dominating
open_existential instruction, and uses that instruction.

related to rdar://22704464.
2016-02-19 16:48:29 -08:00
John McCall
e249fd680e Destructure result types in SIL function types.
Similarly to how we've always handled parameter types, we
now recursively expand tuples in result types and separately
determine a result convention for each result.

The most important code-generation change here is that
indirect results are now returned separately from each
other and from any direct results.  It is generally far
better, when receiving an indirect result, to receive it
as an independent result; the caller is much more likely
to be able to directly receive the result in the address
they want to initialize, rather than having to receive it
in temporary memory and then copy parts of it into the
target.

The most important conceptual change here that clients and
producers of SIL must be aware of is the new distinction
between a SILFunctionType's *parameters* and its *argument
list*.  The former is just the formal parameters, derived
purely from the parameter types of the original function;
indirect results are no longer in this list.  The latter
includes the indirect result arguments; as always, all
the indirect results strictly precede the parameters.
Apply instructions and entry block arguments follow the
argument list, not the parameter list.

A relatively minor change is that there can now be multiple
direct results, each with its own result convention.
This is a minor change because I've chosen to leave
return instructions as taking a single operand and
apply instructions as producing a single result; when
the type describes multiple results, they are implicitly
bound up in a tuple.  It might make sense to split these
up and allow e.g. return instructions to take a list
of operands; however, it's not clear what to do on the
caller side, and this would be a major change that can
be separated out from this already over-large patch.

Unsurprisingly, the most invasive changes here are in
SILGen; this requires substantial reworking of both call
emission and reabstraction.  It also proved important
to switch several SILGen operations over to work with
RValue instead of ManagedValue, since otherwise they
would be forced to spuriously "implode" buffers.
2016-02-18 01:26:28 -08:00
Arnold Schwaighofer
31e01a5dd9 CopyForwarding: More places to check whether we have a function arg 2016-02-17 15:08:43 -08:00
Arnold Schwaighofer
2f81e4eaf8 CopyForwarding: We need to check whether an argument is a function argument before checking its convention 2016-02-17 14:38:44 -08:00
Xin Tong
e2c0990851 Rename hasNoUsesExceptDebug to onlyHaveDebugUses. The double negation logic is
harder to understand. NFC.
2016-02-10 14:46:09 -08:00
Xin Tong
84a6ff1d98 And lastly rename NewProjection to Projection. This is a NFC. rdar://24520269 2016-02-09 22:20:10 -08:00
Xin Tong
6fb741bbca Migrate dead object elimination to new projection. This should be a NFC 2016-02-09 22:20:09 -08:00
Xin Tong
0258e8e816 Migrate to use new projection for SimplifyCFG. This should be a NFC.
This is part of rdar://24520269
2016-02-06 08:27:05 -08:00
Xin Tong
338a2d0af5 Migrate to use new projection for SILmem2reg. This should be a NFC.
This is part of rdar://24520269
2016-02-05 23:16:35 -08:00
Xin Tong
8318b83016 Remove unnecessary header. NFC 2016-02-05 22:49:24 -08:00
Mark Lacey
82fd057eaf Remove devirtualization and specialization from the inliner.
Now that we process functions in bottom-up order in the pass manager and
have a mechanism to restart the pass pipeline on the current
function (or on a newly created callee function), we can split these
passes back out from the inliner and end up with the same benefits we
had from initially integrating them. We get the further benefit of fully
optimizing newly created callee functions before continuing with the
function that resulted in the creation of those callee
functions (e.g. as a result of a specialization pass running).
2016-02-04 08:52:01 -08:00
Adrian Prantl
0854b3ce6d SILDebugScope: Add accessors for the parent SIL functions and use them in
assertions. (NFC)
2016-02-03 14:48:06 -08:00
Erik Eckstein
8520120121 SimplifyCFG: don't recalculate the dominator tree for each jump threaded checked_cast_br instruction.
This is done by splitting the transformation into an analysis phase and a transformation phase (which does not use the dominator tree anymore).
The domintator tree is recalucated once after the whole function is processed.

This change eventually solves the compile time problem of rdar://problem/24410167.
2016-02-02 17:46:32 -08:00
Arnold Schwaighofer
ac423ebe97 A generic class can inherit from objc and so the devirtualizer needs to emit a default case
rdar://23228386
2016-02-01 20:33:40 -08:00
Arnold Schwaighofer
f6866b4ae7 Perform a dynamic method call if a class has objc ancestry in speculative devirt as fallback.
If a class has an @objc ancestry this class can be dynamically overridden and
therefore we don't know the default case even if we see the full class
hierarchy.

rdar://23228386
2016-02-01 18:16:37 -08:00
Dmitri Gribenko
1f6fe29e49 Merge pull request #1155 from practicalswift/typo-fixes-20160201
[gardening] Fix typos: "specalized" → "specialized", "uniqueing" → "uniquing"
2016-02-01 14:57:18 -08:00
practicalswift
397bda1624 [gardening] Fix recently introduced typo: "uniqueing" → "uniquing" 2016-02-01 23:07:39 +01:00
Erik Eckstein
3c6c48c4bf SimplifyCFG: simplify the switch_enum -> select_enum conversion.
The main intention for this change is to eliminate the use of the post/dominator trees in this transformation.
These were re-calculated on every conversion which caused long compile times for functions with lot of switch_enum instructions: rdar://problem/24410167

Beside that, the code for collecting the target-block's predecessors is now simpler. It's not necessary to handle arbitrary control flow pathes because jump threading is simplifying the CFG anyway.

Now SimplifyCFG does not use the PostDominanceAnalysis anymore.
2016-02-01 13:32:55 -08:00
Slava Pestov
587a11ebb5 Merge pull request #1144 from Saisi/niggling_typos
Fixed more niggling typos
2016-01-29 23:55:21 -08:00
saisi
7f1da6adcc Fixed more niggling typos 2016-01-29 23:52:24 -05:00
Adrian Prantl
75fc840126 Merge the parent scope and function fields of SILDebugScope into a
PointerUnion.

This saves 8 bytes per SILDebugScope.

rdar://problem/22706994
2016-01-29 17:21:26 -08:00
Xin Tong
5c96bc4945 RLE marks the LiveOut of unreachable block as 0. This is done to simplfy the SSAupdate etc, i.e.
we do not need to place bogus value in the unreachable blocks in case a SILArgument needs to be
constrcuted for this  block's successors.

This relies on simplifycfg or other passes to clean up the CFG before RLE is ran.

isReachable logic is incorrect. This make RLE too conservative in some cases and incorrect in
others .

This fixed ASAN build break caused by commit 925eb2e0d9

I see more redundant loads elim'ed, but I do not see a performance difference with this change.
2016-01-27 14:58:28 -08:00
Xin Tong
ff8c9e4a6f Fix a logic error in SimplifyCFG. isReachable is only used as part of NDEBUG. 2016-01-27 14:06:53 -08:00
practicalswift
75bec87b5a [gardening] Fix recently introduced typo: optimsitic → optimistic 2016-01-27 11:59:07 +01:00
Xin Tong
dd8244f1a7 Set the kill bit for the store at the end of the basic block where the stored location is de-allocated.
rdar://24354423
2016-01-26 20:05:46 -08:00
Xin Tong
9c3cdcc00e Replace some DenseMap with SmallDenseMap. Many, if not most functions do not have more than 64 Locations.
which is the default for DenseMap.
2016-01-26 19:45:01 -08:00
Xin Tong
925eb2e0d9 Correct an inefficiency in initial state of the data flow in RLE 2016-01-26 19:45:01 -08:00
Xin Tong
b3d0d815fc Change SmallDenseMap initial size from 4 to 16. It seems this gives a bit better compilation time
as we do not resize the densemap as much. NFC.
2016-01-26 19:45:01 -08:00
Erik Eckstein
9f83c43a02 SIL: remove unused functions from SILValue 2016-01-26 09:37:08 -08:00
Xin Tong
5034e5ba72 Add in some throttle logic for RLE. This is mostly intended for functions that are way too large to process.
I do not see compilation time difference in stdlib -O nor any change in # of redundant loads eliminated.

I am more looking at compilation time and precision in stdlibunittest.

=== Before Throttle Logic ===

compilation time stdlibunit -O:
Running Time    Self (ms)               Symbol Name
27016.0ms   26.4%       0.0                 swift::runSILOptimizationPasses(swift::SILModule&)
26885.0ms   26.2%       0.0                  swift::SILPassManager::runOneIteration()
22355.0ms   21.8%       15.0                  swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
21416.0ms   20.9%       42.0                   swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
5662.0ms    5.5%        10.0                    (anonymous namespace)::ARCSequenceOpts::run()
3916.0ms    3.8%        58.0                    (anonymous namespace)::RedundantLoadElimination::run()
2707.0ms    2.6%        3.0                     (anonymous namespace)::SILCombine::run()
2248.0ms    2.1%        5.0                     (anonymous namespace)::SimplifyCFGPass::run()
1974.0ms    1.9%        121.0                   (anonymous namespace)::SILCSE::run()
1592.0ms    1.5%        30.0                    (anonymous namespace)::DeadStoreElimination::run()
746.0ms    0.7% 170.0                   (anonymous namespace)::DCE::run()

=== After Throttle Logic ===

compilation time stdlibunit -O:
Running Time    Self (ms)               Symbol Name
25735.0ms   25.4%       0.0                 swift::runSILOptimizationPasses(swift::SILModule&)
25611.0ms   25.3%       0.0                  swift::SILPassManager::runOneIteration()
21260.0ms   21.0%       21.0                  swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
20340.0ms   20.1%       43.0                   swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
5319.0ms    5.2%        8.0                     (anonymous namespace)::ARCSequenceOpts::run()
3265.0ms    3.2%        58.0                    (anonymous namespace)::RedundantLoadElimination::run()
2661.0ms    2.6%        1.0                     (anonymous namespace)::SILCombine::run()
2185.0ms    2.1%        5.0                     (anonymous namespace)::SimplifyCFGPass::run()
1847.0ms    1.8%        105.0                   (anonymous namespace)::SILCSE::run()
1499.0ms    1.4%        21.0                    (anonymous namespace)::DeadStoreElimination::run()
708.0ms    0.7% 150.0                   (anonymous namespace)::DCE::run()
498.0ms    0.4% 7.0                     (anonymous namespace)::SILCodeMotion::run()
370.0ms    0.3% 0.0                     (anonymous namespace)::StackPromotion::run()
2016-01-25 20:10:53 -08:00
Xin Tong
f5bd3eab49 Optimize compilation time for RLE and DSE with respective
to the new projection path. We do not need to trace from the accessed field
to the base object when we've done it before in enumerateLSLOcations

Stdlib -O

=== Before ===

Running Time        Self (ms)           Symbol Name
25137.0ms   37.3%   0.0         swift::runSILOptimizationPasses(swift::SILModule&)
24939.0ms   37.0%   0.0         swift::SILPassManager::runOneIteration()
20226.0ms   30.0%   29.0        swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19241.0ms   28.5%   83.0        swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3214.0ms    4.7%    10.0        (anonymous namespace)::SimplifyCFGPass::run()
3005.0ms    4.4%    14.0        (anonymous namespace)::ARCSequenceOpts::run()
2438.0ms    3.6%    7.0         (anonymous namespace)::SILCombine::run()
2217.0ms    3.2%    54.0        (anonymous namespace)::RedundantLoadElimination::run()
2212.0ms    3.2%    131.0       (anonymous namespace)::SILCSE::run()
1195.0ms    1.7%    11.0        (anonymous namespace)::GenericSpecializer::run()
1168.0ms    1.7%    39.0        (anonymous namespace)::DeadStoreElimination::run()
853.0ms    1.2%     150.0               (anonymous namespace)::DCE::run()
499.0ms    0.7%     7.0                 (anonymous namespace)::SILCodeMotion::run()

=== After ===

Running Time    Self (ms)               Symbol Name
22955.0ms   38.2%       0.0       swift::runSILOptimizationPasses(swift::SILModule&)
22777.0ms   37.9%       0.0       swift::SILPassManager::runOneIteration()
18447.0ms   30.7%       30.0      swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
17510.0ms   29.1%       67.0      swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2944.0ms    4.9%        5.0       (anonymous namespace)::SimplifyCFGPass::run()
2884.0ms    4.8%        12.0      (anonymous namespace)::ARCSequenceOpts::run()
2277.0ms    3.7%        1.0       (anonymous namespace)::SILCombine::run()
1951.0ms    3.2%        117.0     (anonymous namespace)::SILCSE::run()
1803.0ms    3.0%        54.0      (anonymous namespace)::RedundantLoadElimination::run()
1096.0ms    1.8%        10.0      (anonymous namespace)::GenericSpecializer::run()
911.0ms    1.5% 53.0              (anonymous namespace)::DeadStoreElimination::run()
795.0ms    1.3% 135.0             (anonymous namespace)::DCE::run()
453.0ms    0.7% 9.0               (anonymous namespace)::SILCodeMotion::run()
2016-01-25 20:10:04 -08:00
Xin Tong
4084c2383f Refactor redundant load elimination. NFC 2016-01-25 20:09:17 -08:00
Xin Tong
546471ac4d Port dead store elimination and redundant load elimination to use the new projection.
This patch also implements some of the missing functions used by RLE and DSE in new projection
that exist in the old projection.

New projection provides better memory usage, eventually we will phase out the old projection code.

New projection is now copyable, i.e. we have a proper constructor for it.  This helps make the code
more readable.

We do see a bit increase in compilation time in compiling stdlib -O, this is a result of the way
we now get types of a projection path, but I expect this to go down (away) with further improvement
on how memory locations are constructed and cached with later patches.

=== With the OLD Projection. ===

Total amount of memory allocated.
--------------------------------
Bytes Used	Count		   Symbol Name
13032.01 MB      50.6%	2158819    swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2879.70 MB      11.1%	3076018    (anonymous namespace)::ARCSequenceOpts::run()
2663.68 MB      10.3%	1375465	   (anonymous namespace)::RedundantLoadElimination::run()
1534.35 MB       5.9%	5067928	   (anonymous namespace)::SimplifyCFGPass::run()
1278.09 MB       4.9%	576714	   (anonymous namespace)::SILCombine::run()
1052.68 MB       4.0%	935809	   (anonymous namespace)::DeadStoreElimination::run()
 771.75 MB       2.9%	1677391	   (anonymous namespace)::SILCSE::run()
 715.07 MB       2.7%	4198193	   (anonymous namespace)::GenericSpecializer::run()
 434.87 MB       1.6%	652701	   (anonymous namespace)::SILSROA::run()
 402.99 MB       1.5%	658563	   (anonymous namespace)::SILCodeMotion::run()
 341.13 MB       1.3%	962459	   (anonymous namespace)::DCE::run()
 279.48 MB       1.0%	415031	   (anonymous namespace)::StackPromotion::run()

Compilation time breakdown.
--------------------------
Running Time	Self (ms)	    Symbol Name
25716.0ms   35.8%	0.0	    swift::runSILOptimizationPasses(swift::SILModule&)
25513.0ms   35.5%	0.0	    swift::SILPassManager::runOneIteration()
20666.0ms   28.8%	24.0	    swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19664.0ms   27.4%	77.0	    swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3272.0ms    4.5%	12.0	    (anonymous namespace)::SimplifyCFGPass::run()
3266.0ms    4.5%	7.0	    (anonymous namespace)::ARCSequenceOpts::run()
2608.0ms    3.6%	5.0	    (anonymous namespace)::SILCombine::run()
2089.0ms    2.9%	104.0	    (anonymous namespace)::SILCSE::run()
1929.0ms    2.7%	47.0	    (anonymous namespace)::RedundantLoadElimination::run()
1280.0ms    1.7%	14.0	    (anonymous namespace)::GenericSpecializer::run()
1010.0ms    1.4%	45.0	    (anonymous namespace)::DeadStoreElimination::run()
966.0ms    1.3%	191.0	 	    (anonymous namespace)::DCE::run()
496.0ms    0.6%	6.0	 	    (anonymous namespace)::SILCodeMotion::run()

=== With the NEW Projection. ===

Total amount of memory allocated.
--------------------------------
Bytes Used	Count		    Symbol Name
11876.64 MB      48.4%	22112349    swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2887.22 MB      11.8%	3079485	    (anonymous namespace)::ARCSequenceOpts::run()
1820.89 MB       7.4%	1877674	    (anonymous namespace)::RedundantLoadElimination::run()
1533.16 MB       6.2%	5073310	    (anonymous namespace)::SimplifyCFGPass::run()
1282.86 MB       5.2%	577024	    (anonymous namespace)::SILCombine::run()
 772.21 MB       3.1%	1679154	    (anonymous namespace)::SILCSE::run()
 721.69 MB       2.9%	936958	    (anonymous namespace)::DeadStoreElimination::run()
 715.08 MB       2.9%	4196263	    (anonymous namespace)::GenericSpecializer::run()

Compilation time breakdown.
--------------------------
Running Time	Self (ms)	    Symbol Name
25137.0ms   37.3%	0.0	    swift::runSILOptimizationPasses(swift::SILModule&)
24939.0ms   37.0%	0.0	    swift::SILPassManager::runOneIteration()
20226.0ms   30.0%	29.0	    swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19241.0ms   28.5%	83.0	    swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3214.0ms    4.7%	10.0	    (anonymous namespace)::SimplifyCFGPass::run()
3005.0ms    4.4%	14.0	    (anonymous namespace)::ARCSequenceOpts::run()
2438.0ms    3.6%	7.0	    (anonymous namespace)::SILCombine::run()
2217.0ms    3.2%	54.0	    (anonymous namespace)::RedundantLoadElimination::run()
2212.0ms    3.2%	131.0	    (anonymous namespace)::SILCSE::run()
1195.0ms    1.7%	11.0	    (anonymous namespace)::GenericSpecializer::run()
1168.0ms    1.7%	39.0	    (anonymous namespace)::DeadStoreElimination::run()
853.0ms    1.2%	150.0	 	    (anonymous namespace)::DCE::run()
499.0ms    0.7%	7.0	 	    (anonymous namespace)::SILCodeMotion::run()
2016-01-25 20:08:29 -08:00