1. Update some comments.
2. Rename a few functions, e.g. runIterativeDF -> runIterativeRLE, getLSValueBit -> getValueBit.
3. Remove unused headers.
4. Remove no-longer used function, mergePredecessorStates and mergePredecessorState.
5. A few other small NFCs.
Previously we process every instruction every time the data flow re-iterates.
This is very inefficient.
In addition to moving to genset and killset, we also group function into
OneIterationFunction which we know that the data flow would converge in 1 iteration
and functions that requre the iterative data flow, mostly due to backedges in loops.
we process them differently.
I observed that there are ~93% of the functions that require just a single iteration
to perform the RLE.
But the other 7% accounts for 2321 (out of 6318) of the redundant loads we eliminated.
This change reduces RLE compilation from 4.1% to 2.7% of the entire compilation time
(frontend+OPT+LLVM) on stdlib with -O. This represents 6.9% of the time spent
in SILOptimizations (38.8%).
~2 weeks ago, RLE was taking 1.9% of the entire compilation time. It rose to 4.1%
mostly due to that we are now eliminating many more redundant loads (mostly thank
to Erik's integragtion of escape analysis in alias analysis). i.e. 3945 redundant
loads elimnated before Erik's change to 6318 redundant loads eliminated now.
of which a single post-order would be enough for DSE. In this case, we do not really
need to compute the genset and killset (which is a costly operation).
On stdlib, i see 93% of the functions are "OneIterationFunction".
With this change, i see the compilation time of DSE drops from 2.0% to 1.7% of the entire compilation.
This represents 4.3% of all the time spent in SILOptimizations (39.5%).
do not have over 64 locations which makes SmallBitVector a more suitable choice
than BitVector.
I see a ~10% drop in compilation time in DSE. i.e. 1430ms to 1270ms (2.2% to 2.0% of
overall compilation time).
the more compilation time DSE take.
I see a ~10% drop in DSE compilation time, from 2.4% to 2.2%.
(Last time (~1 week ago) i checked DSE was taking 1.4% of the compilation time. Now its taking 2.2%.
I will look into where the increase come from later).
This commit changes the Swift mangler from a utility that writes tokens into a
stream into a name-builder that has two phases: "building a name", and "ready".
This clear separation is needed for the implementation of the compression layer.
Users of the mangler can continue to build the name using the mangleXXX methods,
but to access the results the users of the mangler need to call the finalize()
method. This method can write the result into a stream, like before, or return
an std::string.
If a variable can not escape the function, we mark the store to it as dead before
the function exits.
It was disabled due to some TBAA and side-effect analysis changes.
We were removing 8 dead stores on the stdlib. With this change, we are now removing
203 dead stores.
I only see noise-level performance difference on PerfTestSuite.
I do not see real increase on compilation time either.
If we use a shared valueenumerator, imagine the case when one of the AAcache or MBcache
is cleared and we clear the valueenumerator.
This could give rise to collisions (false positives) in the not-yet-cleared cache!
to make sure we are not accessing the buffer before the output is ready. The Mangler is going to be buffered (for compression), and accessing the underlying buffer is a bug.
Don't allow this optimization to kick in for "inout" args.
The optimization may expose local writes to any aliases of the argument.
I can't prove that is memory safe.
Erik pointed out this case.