Files
swift-mirror/lib/SILOptimizer/Analysis/EpilogueARCAnalysis.cpp
Michael Gottesman b1debfc401 [epilogue-arc-analysis] Be more efficient with memory usage.
This patch fixes a number of issues:

The analysis was using EpilogueARCContext as a temporary when computing. This is
an performance problem since EpilogueARCContext contains all of the memory used
in the analysis. So essentially, we were mallocing tons of memory every time we
missed the analyses cache. This patch changes the pass to instead have 1
EpilogueARCContext whose internal state is cleared in between invocations. Since
the data structures (see below) used after this patch do not shrink memory after
being cleared, this should cause us to have far less memory churn.

The analysis was managing its block state data structure by allocating the
individual block state structs using a BumpPtrAllocator/DenseMap stored in
EpilogueARCContext. The individual state structures were allocated from the
BumpPtrAllocator and the DenseMap then mapped a specific SILBasicBlock to its
State data structure. Ignoring that we were mallocing this memory every time we
computed rather than reusing global state, this pessimizes performance on small
functions significantly. This is because the BumpPtrAllocator by default heap
allocates initially a page and DenseMap initially mallocs a 64 entry hash
table. Thus for a 1 block function, we would be allocating a large amount of
memory that is just unneeded.

Instead this patch changes the analysis to use a std::vector in combination with
PostOrderFunctionInfo to manage the per block state. The way this works is that
PostOrderFunctionInfo already contains a map from a SILBasicBlock to its post
order number. So, when we are allocating memory for each block, we visit the CFG
in post order. Thus we know that each block's state will be stored in the vector
at vector[post order number].

This has a number of nice effects:

1. By eliminating the need for the DenseMap, in large test cases, we are
signficiantly reducing the memory overhead (by 24 bytes per basic block assuming
8 byte ptrs).
2. We will use far less memory when applying this analysis to small functions.

rdar://33841629
2017-08-11 18:18:39 -07:00

6.1 KiB