Revert "Make AADumper and MemoryBehaviorDumper function passes. They do not need to be module passes."
This reverts commit a503269e2d.
This reverts commit 375f525c51.
Turns out we /do/ want these two passes to be module passes so that their output
is independent of how the pass manager schedules function passes.
I tried to just fix the issue in MemBehaviorDumper/AADumper without reverting,
but somehow this caused their tests to start failing?!
I will try separating them again in a subsequent commit.
This makes it easy to see which individual retains/releases are partial/known
safe so one can reason about why the final set is partial/known safe (or not).
functions for dead store elimination to handle. In the worst case, The number of
memory behavior or alias queries we need to do is roughly linear to
the # BBs x(times) # of locations.
Put in some heuristic to trade off accuracy for compilation time.
NOTE: we are not disabling DSE for these offending functions. instead we are running
a one iteration pessimistic data flow as supposed to the multiple iteration optimistic
data flow we've done previously.
With this change. I see compilation time on StdlibUnitTest drops significantly.
50%+ drop in time spent in DSE in StdlibUnit with a release compiler.
I will update more Instruments data post-commit once i get close to my desktop.
I see a slight drop in # of dead stores (DS) elimination in stdlib and stdlibUnit test.
stdlib: 203 DS -> 202 DS. (RLE is affected slightly as well. 6313 -> 6295 RL).
stdlibunittest : 43 DS -> 42. (RLE is not affected).
We are passing all existing dead store tests.
In StdlibUnitTest, there is this function that has too many (2450) LSLocations
and the data flow in DSE takes too long to converge.
StdlibUnittest.TestSuite.(addForwardRangeReplaceableCollectionTests <A, B where A: Swift.RangeReplaceableCollectionType, B: Swift.RangeReplaceableCollectionType, A.SubSequence: Swift.CollectionType, B.Generator.Element: Swift.Equatable, A.SubSequence == A.SubSequence.SubSequence, A.Generator.Element == A.SubSequence.Generator.Element> (Swift.String, makeCollection : ([A.Generator.Element]) -> A, wrapValue : (StdlibUnittest.OpaqueValue<Swift.Int>) -> A.Generator.Element, extractValue : (A.Generator.Element) -> StdlibUnittest.OpaqueValue<Swift.Int>, makeCollectionOfEquatable : ([B.Generator.Element]) -> B, wrapValueIntoEquatable : (StdlibUnittest.MinimalEquatableValue) -> B.Generator.Element, extractValueFromEquatable : (B.Generator.Element) -> StdlibUnittest.MinimalEquatableValue, checksAdded : StdlibUnittest.Box<Swift.Set<Swift.String>>, resiliencyChecks : StdlibUnittest.CollectionMisuseResiliencyChecks, outOfBoundsIndexOffset : Swift.Int) -> ()).(closure #18)
This function alone takes ~20% of the total amount of time spent in DSE in StdlibUnitTest.
And DSE does not eliminate any dead store in the function either. I added this threshold
to abort on functions that have too many LSLocations.
I see no difference in # of dead store eliminated in the Stdlib.
If this semantic tag is applied to a function, then we know that:
- The function does not touch any reference counted objects.
- After the function is executed, all reference counted objects are leaked
(most likely in preparation for program termination).
This allows one, when performing ARC code motion, to ignore blocks that contain
an apply to this function as long as the block does not have any other side
effect having instructions.
I have wanted to do this for a while but was stymied by lacking the ability to
apply multiple @_semantics attributes. This is now committed to trunk so I added
this attribute instead of pattern matching against fatalError (since there could
be other functions with this property).
rdar://19592537
This is something that we have wanted for a long time and will enable us to
remove some hacks from the compiler (i.e. how we determine in the ARC optimizer
that we have "fatalError" like function) and also express new things like
"noarc".
If we know a function is not a one iteration function which means its
its BBWriteSetIn and BBWriteSetOut have been computed and converged,
and a basic block does not even have StoreInsts, there is no point
in processing every instruction in the last iteration of the data flow
again as no store will be eliminated.
We can simply skip the basic block and rely on the converged BBWriteSetIn
to process its predecessors.
Compilation time improvement: 1.7% to 1.5% of overall compilation time.
on stdlib -O. This represents a 4.0% of all SILOptimzations(37.2%)
Existing tests ensure correctness.
enumerate them lazily.
This leads to compilation time improvement, as some of the LSValues previously
enumerated do not be created in this approach.
i.e. we enumerate LSValues created by loads previously, but the LoadInsts could be
target for RLE. In such case, the enumerated LSValues are not used.
Compilation time improvement: 1775ms to 1721ms (2.7% to 2.6% of the entire
compilation time for stdlib -O).
Existing tests ensure correctness.
Note: we still enumerate locations, as we need to know how many locations there are
in the function to resize the bitvector appropriately before the data flow runs.
NewProjection is a re-architecting of Projection that supports all of
the same functionality as Projection but in 1/3 of the original size (in
the common case). It is able to accomplish this by removing the base
type out of NewProjection itself and into users such as
NewProjectionPath. Thus NewProjection is now strictly an index from some
parent type rather than being a parent type and an index.
NewProjectionPath also has all of the same functionality as
ProjectionPath, but due to NewProjection being smaller than Projection
is smaller than ProjectionPath.
Used together NewProjection/NewProjectionPath yields the same output as
Projection/ProjectionPath when evaluating the LSLocation dumping tests.
Additionally, NewProjection is more flexible than Projection and will
for free give us the ability to perform AA on index_addr/index_raw_addr
as well as be able to integrate casts into the projection paradigm.
rdar://22484381
1. Update some comments.
2. Rename a few functions, e.g. runIterativeDF -> runIterativeRLE, getLSValueBit -> getValueBit.
3. Remove unused headers.
4. Remove no-longer used function, mergePredecessorStates and mergePredecessorState.
5. A few other small NFCs.
Previously we process every instruction every time the data flow re-iterates.
This is very inefficient.
In addition to moving to genset and killset, we also group function into
OneIterationFunction which we know that the data flow would converge in 1 iteration
and functions that requre the iterative data flow, mostly due to backedges in loops.
we process them differently.
I observed that there are ~93% of the functions that require just a single iteration
to perform the RLE.
But the other 7% accounts for 2321 (out of 6318) of the redundant loads we eliminated.
This change reduces RLE compilation from 4.1% to 2.7% of the entire compilation time
(frontend+OPT+LLVM) on stdlib with -O. This represents 6.9% of the time spent
in SILOptimizations (38.8%).
~2 weeks ago, RLE was taking 1.9% of the entire compilation time. It rose to 4.1%
mostly due to that we are now eliminating many more redundant loads (mostly thank
to Erik's integragtion of escape analysis in alias analysis). i.e. 3945 redundant
loads elimnated before Erik's change to 6318 redundant loads eliminated now.
of which a single post-order would be enough for DSE. In this case, we do not really
need to compute the genset and killset (which is a costly operation).
On stdlib, i see 93% of the functions are "OneIterationFunction".
With this change, i see the compilation time of DSE drops from 2.0% to 1.7% of the entire compilation.
This represents 4.3% of all the time spent in SILOptimizations (39.5%).
do not have over 64 locations which makes SmallBitVector a more suitable choice
than BitVector.
I see a ~10% drop in compilation time in DSE. i.e. 1430ms to 1270ms (2.2% to 2.0% of
overall compilation time).
the more compilation time DSE take.
I see a ~10% drop in DSE compilation time, from 2.4% to 2.2%.
(Last time (~1 week ago) i checked DSE was taking 1.4% of the compilation time. Now its taking 2.2%.
I will look into where the increase come from later).
This commit changes the Swift mangler from a utility that writes tokens into a
stream into a name-builder that has two phases: "building a name", and "ready".
This clear separation is needed for the implementation of the compression layer.
Users of the mangler can continue to build the name using the mangleXXX methods,
but to access the results the users of the mangler need to call the finalize()
method. This method can write the result into a stream, like before, or return
an std::string.
If a variable can not escape the function, we mark the store to it as dead before
the function exits.
It was disabled due to some TBAA and side-effect analysis changes.
We were removing 8 dead stores on the stdlib. With this change, we are now removing
203 dead stores.
I only see noise-level performance difference on PerfTestSuite.
I do not see real increase on compilation time either.