Commit Graph

2601 Commits

Author SHA1 Message Date
Xin Tong
310f48eab0 Improve dead store elimination compilation time.
If we know a function is not a one iteration function which means its
its BBWriteSetIn and BBWriteSetOut have been computed and converged,
and a basic block does not even have StoreInsts, there is no point
in processing every instruction in the last iteration of the data flow
again as no store will be eliminated.

We can simply skip the basic block and rely on the converged BBWriteSetIn
to process its predecessors.

Compilation time improvement: 1.7% to 1.5% of overall compilation time.
on stdlib -O. This represents a 4.0% of all SILOptimzations(37.2%)

Existing tests ensure correctness.
2016-01-01 10:16:42 -08:00
Zach Panzarino
e3a4147ac9 Update copyright date 2015-12-31 23:28:40 +00:00
practicalswift
2e995a8ba4 Fix recently introduced typos. 2015-12-31 15:28:54 +01:00
Xin Tong
a35eabd6f7 Instead of enumerating all the LSValues before RLE is ran on the function. we
enumerate them lazily.

This leads to compilation time improvement, as some of the LSValues previously
enumerated do not be created in this approach.

i.e. we enumerate LSValues created by loads previously, but the LoadInsts could be
target for RLE. In such case, the enumerated LSValues are not used.

Compilation time improvement: 1775ms to 1721ms (2.7% to 2.6% of the entire
compilation time for stdlib -O).

Existing tests ensure correctness.

Note: we still enumerate locations, as we need to know how many locations there are
in the function to resize the bitvector appropriately before the data flow runs.
2015-12-30 14:35:59 -08:00
Xin Tong
0c162cd3d1 Refactor in redundant load elimination. NFC.
1. Update some comments.
2. Rename a few functions, e.g. runIterativeDF -> runIterativeRLE, getLSValueBit -> getValueBit.
3. Remove unused headers.
4. Remove no-longer used function, mergePredecessorStates and mergePredecessorState.
5. A few other small NFCs.
2015-12-29 15:35:49 -08:00
practicalswift
6dac32e416 Fix typos in code merged today 2015-12-29 23:19:48 +01:00
Xin Tong
be4bf816bd Update some comments and rename runIterativeDF -> runIterativeDSE,
mergeSuccessorStates -> mergeSuccessorLiveIns in DSE. Also fix some includes.
2015-12-29 10:13:02 -08:00
practicalswift
fb314c37aa Fix typos introduced yesterday :-) 2015-12-29 13:16:02 +01:00
Aaron Raimist
b196e79b88 SimplifyCFG: Fix typo, Fist to First 2015-12-28 23:04:24 -06:00
Xin Tong
cc1dda478e Move to a genset and killset for the dataflow in redundant load elimination.
Previously we process every instruction every time the data flow re-iterates.
This is very inefficient.

In addition to moving to genset and killset, we also group function into
OneIterationFunction which we know that the data flow would converge in 1 iteration
and functions that requre the iterative data flow, mostly due to backedges in loops.
we process them differently.

I observed that there are ~93% of the functions that require just a single iteration
to perform the RLE.

But the other 7% accounts for 2321 (out of 6318) of the redundant loads we eliminated.

This change reduces RLE compilation from 4.1% to 2.7% of the entire compilation time
(frontend+OPT+LLVM) on stdlib with -O. This represents 6.9% of the time spent
in SILOptimizations (38.8%).

~2 weeks ago, RLE was taking 1.9% of the entire compilation time. It rose to 4.1%
mostly due to that we are now eliminating many more redundant loads (mostly thank
to Erik's integragtion of escape analysis in alias analysis). i.e. 3945 redundant
loads elimnated before Erik's change to 6318 redundant loads eliminated now.
2015-12-28 15:04:58 -08:00
Xin Tong
7c43b47bd0 revert some WIP code 2015-12-28 14:41:16 -08:00
Xin Tong
0c05064429 Speculative disable escape analysis in local variable DSE. There is test case failure
on the real device, but not on simulators. The failure could be a result of this change.
2015-12-28 14:39:54 -08:00
practicalswift
149b50d901 Fix typos in code (non-comment/documentation typos). 2015-12-28 11:42:15 +01:00
Chris Lattner
487b0d1365 Merge pull request #796 from practicalswift/fix-comment-typos
Fix typos in code comments
2015-12-27 20:46:46 -08:00
Xin Tong
7e49985b00 Turn a function into early exit sytle. NFC 2015-12-27 17:35:24 -08:00
practicalswift
fd70b26033 Fix typos in comments. 2015-12-28 02:15:34 +01:00
Xin Tong
656894567f Some of the functions do not really need the iterative data flow in DSE. i.e. for function
of which a single post-order would be enough for DSE. In this case, we do not really
need to compute the genset and killset (which is a costly operation).

On stdlib, i see 93% of the functions are "OneIterationFunction".

With this change, i see the compilation time of DSE drops from 2.0% to 1.7% of the entire compilation.
This represents 4.3% of all the time spent in SILOptimizations (39.5%).
2015-12-26 20:33:35 -08:00
Chris Lattner
bee0d955ff Merge pull request #784 from practicalswift/a-vs-an-again
Fix typos: "a" vs. "an"
2015-12-26 17:45:36 -08:00
Xin Tong
173fc871ff Use a SmallBitVector instead of BitVector in DSE. I observed that most functions
do not have over 64 locations which makes SmallBitVector a more suitable choice
than BitVector.

I see a ~10% drop in compilation time in DSE. i.e. 1430ms to 1270ms (2.2% to 2.0% of
overall compilation time).
2015-12-26 12:30:49 -08:00
Xin Tong
8be2c51875 Merge 2 steps in data flow in DSE into 1. The more we iterate over the instructions in the function,
the more compilation time DSE take.

I see a ~10% drop in DSE compilation time, from 2.4% to 2.2%.

(Last time (~1 week ago) i checked DSE was taking 1.4% of the compilation time.  Now its taking 2.2%.
I will look into where the increase come from later).
2015-12-26 11:06:52 -08:00
practicalswift
fa0b339a21 Fix typos. 2015-12-26 17:51:59 +01:00
practicalswift
85e2e6eb9a Fix a vs. an 2015-12-26 14:40:16 +01:00
practicalswift
db13bcb22e Fix typos. 2015-12-26 14:11:42 +01:00
Nadav Rotem
07d4558c1c [Mangler] Change the Swift mangler into a symbol builder.
This commit changes the Swift mangler from a utility that writes tokens into a
stream into a name-builder that has two phases: "building a name", and "ready".
This clear separation is needed for the implementation of the compression layer.

Users of the mangler can continue to build the name using the mangleXXX methods,
but to access the results the users of the mangler need to call the finalize()
method. This method can write the result into a stream, like before, or return
an std::string.
2015-12-25 21:40:25 -08:00
practicalswift
22e10737e2 Fix typos 2015-12-26 01:19:40 +01:00
Xin Tong
ed7ef800ab More refactoring in DSE. Split 1 more function and update some comments. 2015-12-24 16:11:19 -08:00
Xin Tong
30ed2f15aa Move local variable store checks into helper function. NFC 2015-12-24 15:26:37 -08:00
Xin Tong
1dd13b5d92 Optimize how basic blocks are processed in DSE. I do not see a real compilation time improvement 2015-12-24 15:10:39 -08:00
Xin Tong
6a942d2d20 Split bigger functions into multiple smaller functions in DSE. NFC 2015-12-24 14:58:44 -08:00
Xin Tong
32dc903339 Enable local variable dead store elimination
If a variable can not escape the function, we mark the store to it as dead before
the function exits.

It was disabled due to some TBAA and side-effect analysis changes.

We were removing 8 dead stores on the stdlib. With this change, we are now removing
203 dead stores.

I only see noise-level performance difference on PerfTestSuite.

I do not see real increase on compilation time either.
2015-12-24 14:18:36 -08:00
practicalswift
4ff12be924 Fix typos. 2015-12-24 14:26:57 +01:00
Erik Eckstein
dd204e68ef DSE: Use escape analysis for checking if memory locations are dead at the end of a function.
Currently NFC as local DSE is still disabled.
2015-12-23 16:43:11 -08:00
practicalswift
81e7439a9a Fix typos. 2015-12-23 11:16:34 +01:00
Mark Lacey
99d17a47f9 Change the invalidation kind for the generic specializer.
It adds functions, so technically it needs to invalidate more than just
the function body.
2015-12-22 23:19:44 -08:00
Mark Lacey
75f2de5c79 Use a more appropriate invalidation kind in the devirtualizer.
We're not touching branches, so we do not need to invalidate those.
2015-12-22 23:18:38 -08:00
Andrew Trick
2da0f601d8 Explicitly restrict NRVO optimization to "out" args.
Don't allow this optimization to kick in for "inout" args.
The optimization may expose local writes to any aliases of the argument.
I can't prove that is memory safe.

Erik pointed out this case.
2015-12-22 21:03:03 -08:00
Andrew Trick
000e630b2f Teach CopyForwarding to handle existential initialization
...in addition to enum initialization, as Slava requested.

|.benchmark...|.bestbase.|.bestopt.|.delta.|.%delta.|.speedup.|
|.ArrayAppend.|.....2514.|....2382.|..-132.|..-5.3%.|...1.06x.|
|.NSXMLParser.|....12076.|...11223.|..-853.|..-7.1%.|...1.08x.|
|.NopDeinit...|....54961.|...50619.|.-4342.|..-7.9%.|...1.09x.|
|.StringWalk..|....20503.|...16119.|.-4384.|.-21.4%.|...1.27x.|
2015-12-22 21:03:03 -08:00
Nadav Rotem
a981c80571 [Mangler] Move the SILMangler out of the AST Mangler namespace. 2015-12-22 17:19:40 -08:00
Slava Pestov
36ddea64ae Merge pull request #729 from ken0nek/fix-can-not
Convert [Cc]an not -> [Cc]annot
2015-12-22 16:06:20 -08:00
practicalswift
6e3b700b44 Fix typos. 2015-12-23 00:31:13 +01:00
Andrew Trick
0c7ee1f283 Teach CopyForwarding to handle enum initialization sequences.
This requires a bit of code motion.

e.g.

1. %Tmp = alloc_stack
2. copy_addr %InArg to [initialization] %Tmp
3. DataAddr = init_enum_data_addr %OutArg
4. copy_addr %Tmp#1 to [initialization] %DataAddr

becomes

1. %Tmp = alloc_stack
4. DataAddr = init_enum_data_addr %OutArg
2. copy_addr %InArg to [initialization] %DataAddr

Fixes at least one regression resulting from '++' removal.
See rdar://23874709 [perf] -Onone Execution Time regression of up-to 19%

-Onone results
|.benchmark............|.bestbase.|.bestopt.|..delta.|.%delta.|speedup.|
|.StringWalk...........|....33570.|...20967.|.-12603.|.-37.5%.|..1.60x.|
|.OpenClose............|......446.|.....376.|....-70.|.-15.7%.|..1.19x.|
|.SmallPT..............|....98959.|...83964.|.-14995.|.-15.2%.|..1.18x.|
|.StrToInt.............|....17550.|...16377.|..-1173.|..-6.7%.|..1.07x.|
|.BenchLangCallingCFunc|......453.|.....428.|....-25.|..-5.5%.|..1.06x.|
|.CaptureProp..........|....50758.|...48156.|..-2602.|..-5.1%.|..1.05x.|
|.ProtocolDispatch.....|.....5276.|....5017.|...-259.|..-4.9%.|..1.05x.|
|.Join.................|.....1433.|....1372.|....-61.|..-4.3%.|..1.04x.|
2015-12-22 14:35:52 -08:00
Andrew Trick
9a85a0254f Make CopyForwarding more conservative with destination addresses.
AFAICT, this does not fix any existing bug, but eliminates unverified
assumptions about well-formed SIL, which could be broken by future
optimization.

Forward: The optimization will replace all in-scope uses of the
destination address with the source. With this change we will be sure
not eliminate writes into a destination address unless the destination
is an AllocStackInst. This hasn't been a problem in practice because
the optimization requires an in-scope deinit of the destination
address, which can't happen on typical address projections.

Backward: The optimization will replace in-scope uses of the source
with the destination. With this change we will be sure not to write
into the destination location prior to the copy unless the destination
is an AllocStackInst. This hasn't been a problem in practice because
the optimization requires the copy to be an initialization of the
address, which can't happen on typical address projections.

This change prevents both optimizations without an obvious guarantee
that any dependency on the destination address will manifest as a
SIL-level dependence on the address producer. For example,
init_enum_data_addr would not qualify because it simply projects an
address within a value that may have other dependencies.
2015-12-22 14:35:52 -08:00
ken0nek
fcd8fcee91 Convert [Cc]an not -> [Cc]annot 2015-12-23 00:55:48 +09:00
Arsen Gasparyan
be738abb7c Fix else code style 2015-12-22 11:13:26 +03:00
Mark Lacey
70938b1aee Add a stand-alone devirtualizer pass.
Add back a stand-alone devirtualizer pass, running prior to generic
specialization. As with the stand-alone generic specializer pass, this
may add functions to the pass manager's work list.

This is another step in unbundling these passes from the performance
inliner.
2015-12-21 23:42:37 -08:00
Mark Lacey
4bb33dc3fd Remove some inadvertantly committed code.
Remove some things that were supposed to have been removed prior to the
original commit, and fix a typo in the DEBUG_TYPE string.
2015-12-21 23:42:37 -08:00
Michael Gottesman
976d39fa08 Add 3 cases that I missed to make a switch truly exhaustive. 2015-12-21 19:06:25 -06:00
Michael Gottesman
3eca15623b Change 6 non-exhaustive switches on ValueKind to be exhaustive switches on TermKind. NFC.
This exposed the first interesting bug found by using TermKind, in DCE we were
not properly handling switch_enum_addr and checked_cast_addr_br.

SR-335
rdar://23980060
2015-12-21 17:12:06 -06:00
Xin Tong
ee1396aaa6 Refactor RLE into stages.
After the refactoring, RLE runs in the following phases:

Phase 1. we use an iterative data flow to compute whether there is an
available value at a given point, we do not yet care about what the value
is.

Phase 2. we compute the real forwardable value at a given point.

Phase 3. we setup the SILValues for the redundant load elimination.

Phase 4. we perform the redundant load elimination.

Previously we were computing available bit as well as what the available
value is every iteration of the data flow.

I do not see a compilation time improvement though, but this helps to move
to a genset and killset later as we only need to expand Phase 1 into a few smaller
phases to compute genset & killset first and then iterate until convergence for
the data flow.

I verified that we are performing same # of RLE on stdlib before the change.

Existing test ensure correctness.
2015-12-21 14:56:56 -08:00
practicalswift
36d7072013 Remove immediately adjacent repeated words ("the the", "for for", "an an", etc.). 2015-12-21 22:16:04 +01:00