This speeds and reduces memory consumption of test cases with large
CFGs. The specific test case that spawned this fix was a large function
with many dictionary assignments:
public func func_0(dictIn : [String : MyClass]) -> [String : MyClass] {
var dictOut : [String : MyClass] = [:]
dictOut["key5000"] = dictIn["key500"]
dictOut["key5010"] = dictIn["key501"]
dictOut["key5020"] = dictIn["key502"]
dictOut["key5030"] = dictIn["key503"]
dictOut["key5040"] = dictIn["key504"]
...
}
This continued for 10k - 20k values.
This commit reduces the compile time by 2.5x and reduces the amount of
memory allocated by ARC by 2.6x (the memory allocation number includes
memory that is subsequently freed).
rdar://24350646
top level driver . Move the top level driver of the pairing analysis into
ARCSequenceOpts and have ARCSequenceOpts use ARCMatchingSetBuilder directly.
This patch is the first in a series of patches that improve ARC compile
time performance by ensuring that ARC only visits the full CFG at most
one time.
Previously when ARC was split into an analysis and a pass, the split in
the codebase occurred at the boundary in between ARCSequenceOpts and
ARCPairingAnalysis. I used a callback to allow ARCSequenceOpts to inject
code into ARCPairingAnalysis.
Now that the analysis has been moved together with the pass this
unnecessarily complicates the code. More importantly though it creates
obstacles towards reducing compile time by visiting the CFG only once.
Specifically, we need to visit the full cfg once to gather interesting
instructions. Then when performing the actual dataflow analysis, we only
visit the interesting instructions. This causes an interesting problem
since retains/releases can have dependencies on each other implying that
I need to be able to update where various "interesting instructions" are
located after ARC moves it. The "interesting instruction" information is
stored at the pairing analysis level, but the moving/removal of
instructions is injected in via the callback.
By moving the top level driver part of ARCPairingAnalysis into
ARCSequenceOpts, we simplify the code by eliminating the dependency
injection callback and also make it easier to manage the cached CFG
state in the face of the ARC optimizer moving/removing retains/releases.
So instead of only being able to match %1 and release %1 in (1). we
can also match %1 with (release %2, and release%3, i.e. exploded release_value)
in (2).
(1)
foo(%1)
strong_release %1
(2)
foo(%1)
%2 = struct_extract %1, field_a
%3 = struct_extract %1, field_b
strong_release %2
strong_release %3
This will allow function signature to better move the release instructions to
the callers.
Currently, this is a NFC other than testing using the epilogue match dumper.
There's a group of methods in `DeclContext` with names that start with *is*,
such as `isClassOrClassExtensionContext()`. These names suggests a boolean
return value, while the methods actually return a type declaration. This
patch replaces the *is* prefix with *getAs* to better reflect their interface.
We don't use this information after this point and are already
invalidating this (and everything else for this function body) at the
end of the pass, so this is not necessary.
NFC.
Prior to splitting devirtualization out of inlining we needed this to
ensure that we would not go into an infinite loop in certain cases.
Now that these are split out and we no longer iterate within the inliner
over new opportunities, we only need to check for self-recursive
functions.
NFC.
inlined-at chain.
The previous implementation was only correct for cases where the inliner
inlined bottom-up in the call graph, which happened to cover the majority
of all cases.
rdar://problem/24462475
This allows devirtualization of witness method calls if the initialization of the existential is not in the same basic block.
This change also fixes a bug where promotion is done even if the stack is overwritten after initialization. Although I'm not sure if this kind of code is ever generated.
function signature opt.
Instead of replacing %1 with UNDEF in debugvalueinst %1, we form an aggregate,
taking the alive part of %1 and fill the dead part with undef.
rdar://23727705
Now that we process functions in bottom-up order in the pass manager and
have a mechanism to restart the pass pipeline on the current
function (or on a newly created callee function), we can split these
passes back out from the inliner and end up with the same benefits we
had from initially integrating them. We get the further benefit of fully
optimizing newly created callee functions before continuing with the
function that resulted in the creation of those callee
functions (e.g. as a result of a specialization pass running).
Previously we treated the * platform as checking for the minimum
deployment target, but that's definitely unnecessary.
There is a bit of a hack here to avoid diagnosing the 'else' branch as
unreachable: if a constant true/false came from #available, ignore it.
This effectively returns us to the code from dc65f70.
With the recent pass manager changes, combined with upcoming inliner
changes, we can potentially run the inliner more than we currently
do. Allowing self-recursive functions to be inlined, and running the
inliner more often, can result in a lot of code bloat, which increases
binary sizes and compile times. Even with a relatively small value (10)
for the number of times we allow a function to run through the pass
pipeline, we end up with a significant increase in the stdlib and
stdlib unit test build times.
This results in some performance regressions, but I think the trade-off
here is reasonable.