SILValue.h/.cpp just defines the SIL base classes. Referring to specific instructions is a (small) kind of layering violation.
Also I want to keep SILValue small so that it is really just a type alias of ValueBase*.
NFC.
As there are no instructions left which produce multiple result values, this is a NFC regarding the generated SIL and generated code.
Although this commit is large, most changes are straightforward adoptions to the changes in the ValueBase and SILValue classes.
This change is needed for the next update to ToT LLVM. It can be put
into place now without breaking anything so I am committing it now.
The churn upstream on ilist_node is neccessary to remove undefined
behavior. Rather than updating the different ilist_node patches for the
hacky change required to not use iterators, just use iterators and keep
everything as ilist_nodes. Upstream they want to eventually do this, so
it makes sense for us to just do it now.
Please do not introduce new invocations of
ilist_node::get{Next,Prev}Node() into the tree.
And use project_box to get to the address value.
SILGen now generates a project_box for each alloc_box.
And IRGen re-uses the address value from the alloc_box if the operand of project_box is an alloc_box.
This lets the generated code be the same as before.
Other than that most changes of this (quite large) commit are straightforward.
This came up with other changes I have to modify the optimizer
pipeline. We shouldn't assert if we have an alloc_stack/dealloc_stack
where the only other use of the alloc_stack is a debug_value_addr.
It's easy to avoid this by removing allocations that don't have real
uses prior to attempting to handle the ones that do have real uses (as
opposed to the other way around).
In all of the cases where this is being used, we already immediately perform an
unreachable if we find a TermKind::Invalid. So simplify the code and move it
into the conversion switch itself.
Having a separate address and container value returned from alloc_stack is not really needed in SIL.
Even if they differ we have both addresses available during IRGen, because a dealloc_stack is always dominated by the corresponding alloc_stack in the same function.
Although this commit quite large, most changes are trivial. The largest non-trivial change is in IRGenSIL.
This commit is a NFC regarding the generated code. Even the generated SIL is the same (except removed #0, #1 and @local_storage).
Correct format:
```
//===--- Name of file - Description ----------------------------*- Lang -*-===//
```
Notes:
* Comment line should be exactly 80 chars.
* Padding: Pad with dashes after "Description" to reach 80 chars.
* "Name of file", "Description" and "Lang" are all optional.
* In case of missing "Lang": drop the "-*-" markers.
* In case of missing space: drop one, two or three dashes before "Name of file".
Now we have 3 cases.
1. OptimizeNone (for functions with too many basicblocks and too many locations). Simply return.
2. Pessimisitc single iteration data flow (for functions with many basic blocks and many locations).
3. Optimistic multiple iteration data flow (for functions with some basic blocks and some locations
and require iterative data flow).
With this change stdlib and stdlibunittest has some changes in dead store(DS)
eliminated.
stdlib: 202 -> 203 DS.
stdlibunittest: 42 - 39 DS.
Compilation time improvement: with this change on a RELEASE+ASSERT compiler for stdlibunittest.
Running Time Self (ms) Symbol Name
5525.0ms 5.3% 25.0 (anonymous namespace)::ARCSequenceOpts::run()
3500.0ms 3.4% 25.0 (anonymous namespace)::RedundantLoadElimination::run()
3050.0ms 2.9% 25.0 (anonymous namespace)::SILCombine::run()
2700.0ms 2.6% 0.0 (anonymous namespace)::SimplifyCFGPass::run()
2100.0ms 2.0% 75.0 (anonymous namespace)::SILCSE::run()
1450.0ms 1.4% 0.0 (anonymous namespace)::DeadStoreElimination::run()
750.0ms 0.7% 75.0 (anonymous namespace)::DCE::run()
Compilation time improvement: with this change on a DEBUG compiler for stdlibunittest.
Running Time Self (ms) Symbol Name
42300.0ms 4.9% 50.0 (anonymous namespace)::ARCSequenceOpts::run()
35875.0ms 4.1% 0.0 (anonymous namespace)::RedundantLoadElimination::run()
30475.0ms 3.5% 0.0 (anonymous namespace)::SILCombine::run()
19675.0ms 2.3% 0.0 (anonymous namespace)::SILCSE::run()
18150.0ms 2.1% 25.0 (anonymous namespace)::SimplifyCFGPass::run()
12475.0ms 1.4% 0.0 (anonymous namespace)::DeadStoreElimination::run()
5775.0ms 0.6% 0.0 (anonymous namespace)::DCE::run()
I do not see a compilation time change in stdlib.
Existing tests ensure correctness.
functions for dead store elimination to handle. In the worst case, The number of
memory behavior or alias queries we need to do is roughly linear to
the # BBs x(times) # of locations.
Put in some heuristic to trade off accuracy for compilation time.
NOTE: we are not disabling DSE for these offending functions. instead we are running
a one iteration pessimistic data flow as supposed to the multiple iteration optimistic
data flow we've done previously.
With this change. I see compilation time on StdlibUnitTest drops significantly.
50%+ drop in time spent in DSE in StdlibUnit with a release compiler.
I will update more Instruments data post-commit once i get close to my desktop.
I see a slight drop in # of dead stores (DS) elimination in stdlib and stdlibUnit test.
stdlib: 203 DS -> 202 DS. (RLE is affected slightly as well. 6313 -> 6295 RL).
stdlibunittest : 43 DS -> 42. (RLE is not affected).
We are passing all existing dead store tests.
In StdlibUnitTest, there is this function that has too many (2450) LSLocations
and the data flow in DSE takes too long to converge.
StdlibUnittest.TestSuite.(addForwardRangeReplaceableCollectionTests <A, B where A: Swift.RangeReplaceableCollectionType, B: Swift.RangeReplaceableCollectionType, A.SubSequence: Swift.CollectionType, B.Generator.Element: Swift.Equatable, A.SubSequence == A.SubSequence.SubSequence, A.Generator.Element == A.SubSequence.Generator.Element> (Swift.String, makeCollection : ([A.Generator.Element]) -> A, wrapValue : (StdlibUnittest.OpaqueValue<Swift.Int>) -> A.Generator.Element, extractValue : (A.Generator.Element) -> StdlibUnittest.OpaqueValue<Swift.Int>, makeCollectionOfEquatable : ([B.Generator.Element]) -> B, wrapValueIntoEquatable : (StdlibUnittest.MinimalEquatableValue) -> B.Generator.Element, extractValueFromEquatable : (B.Generator.Element) -> StdlibUnittest.MinimalEquatableValue, checksAdded : StdlibUnittest.Box<Swift.Set<Swift.String>>, resiliencyChecks : StdlibUnittest.CollectionMisuseResiliencyChecks, outOfBoundsIndexOffset : Swift.Int) -> ()).(closure #18)
This function alone takes ~20% of the total amount of time spent in DSE in StdlibUnitTest.
And DSE does not eliminate any dead store in the function either. I added this threshold
to abort on functions that have too many LSLocations.
I see no difference in # of dead store eliminated in the Stdlib.
This is something that we have wanted for a long time and will enable us to
remove some hacks from the compiler (i.e. how we determine in the ARC optimizer
that we have "fatalError" like function) and also express new things like
"noarc".
If we know a function is not a one iteration function which means its
its BBWriteSetIn and BBWriteSetOut have been computed and converged,
and a basic block does not even have StoreInsts, there is no point
in processing every instruction in the last iteration of the data flow
again as no store will be eliminated.
We can simply skip the basic block and rely on the converged BBWriteSetIn
to process its predecessors.
Compilation time improvement: 1.7% to 1.5% of overall compilation time.
on stdlib -O. This represents a 4.0% of all SILOptimzations(37.2%)
Existing tests ensure correctness.
enumerate them lazily.
This leads to compilation time improvement, as some of the LSValues previously
enumerated do not be created in this approach.
i.e. we enumerate LSValues created by loads previously, but the LoadInsts could be
target for RLE. In such case, the enumerated LSValues are not used.
Compilation time improvement: 1775ms to 1721ms (2.7% to 2.6% of the entire
compilation time for stdlib -O).
Existing tests ensure correctness.
Note: we still enumerate locations, as we need to know how many locations there are
in the function to resize the bitvector appropriately before the data flow runs.
1. Update some comments.
2. Rename a few functions, e.g. runIterativeDF -> runIterativeRLE, getLSValueBit -> getValueBit.
3. Remove unused headers.
4. Remove no-longer used function, mergePredecessorStates and mergePredecessorState.
5. A few other small NFCs.