This consists of 3 parts:
1) Extend CallerAnalysis to also provide information if a function is partially applied
2) A new DeadArgSignatureOpt pass, similar to FunctionSignatureOpts, which just specializes for dead arguments of partially applied functions.
3) Let CapturePropagation eliminate such partial_apply instructions and replace them with a thin_to_thick conversion of the specialized functions.
This optimzation improves benchmarks where static struct or class functions are passed as a closure (e.g. -20% for SortStrings).
Such functions have a additional metatype parameter. We used to create a partial_apply in this case, which allocates a context, etc.
But this is not necessary as the metatype parameter is not used in most cases.
rdar://problem/27513085
Each time we delete the pass manager, we delete the analyses it
holds. In this analysis we were holding onto memory that wasn't getting
released when the analysis was deleted.
Fixes rdar://problem/26245872.
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Small modifications to the test cases.
Shaves about 19% of the time from the construction of these sets. The
SmallVector size was chosen to minimize the number of dynamic
allocations we end up doing while building the stdlib. This should be a
reasonable size for most projects, too. It's a bit wasteful in space,
but the total amount of allocated space here is pretty small to begin
with.
If we can not find the epilogue releases for all the fields with
reference sematics, but we found for some fields. Explode the argument.
I do not see a performance improvement with this change
rdar://25451364
We now consider effect of deinit in addition to the released value.
rdar://25362826
This is the only 10%+ regression i measured on my machine. no performance improvement.
Sim2DArray | 326 | 366 | +12.3% | **0.89x**
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
We really only need the analysis to tell whether a function has caller
inside the module or not. We do not need to know the callsites.
Remove them for now to make the analysis more memory efficient.
Add a note to indicate it can be extended.
RecomputeFunctionList should really be a SmallVector instead of a
DenseSet. A DenseSet gives rise to a nondeterminstic way of iterating over
all functions.
Add an invalidateAnalysisForDeadFunction API. This API calls the invalidateAnalysis
by default unless overriden by analysis pass themselves. This API passes the extra
information that this function is dead and going to be removed from the module.
CallerAnalysis overrides this API and only invalidate caller/callee relations but
does not push this into the recompute list.
We also considered the possibility of keeping a computed list, instead of recompute
list but that would introduce a O(n^2) complexity as every time we try to complete
the computed list, we need to walk over all the functions that currently exist in the
module to make sure the computed list is complete.
I feel eventually we can do a handleDeleteNotification for function deletion and we
wont need the API added in this change.
Address the comments from 0acc0a8464
I still have not made up my mind how to handle deleted functions.
CallerAnalysis is not hooked up to anything yet.
The analysis can tell all the callsites which calls a function in the module.
The analysis is computed and kept up-to-date lazily.
At the core of it, it keeps a list of functions that need to be recomputed for
the Caller/Callee relation to be precise and on every query, the analysis makes
sure to recompute them and clear the list before any query.
This is NFC right now. I am going to wire it up to function signature analysis
eventually.
a separate analysis pass.
This pass is run on every function and the optimized signature is return'ed through the
getArgDescList and getResultDescList.
Next step is to split to cloning and callsite rewriting into their own function passes.
rdar://24730896
"
analysis pass.
This pass is run on every function and the optimized signature is return'ed through the
getArgDescList and getResultDescList.
Next step is to split to cloning and callsite rewriting into their own function passes.
rdar://24730896
We already computed this information so this is just storing information
we were already computing.
One thing to note is that in code with canonicalized loops, we will
always only have one backedge. But we would like loop region to be
correct even in the case of non-canonicalized code so we support having
multiple back edges. But since the common case is 1 backedge, we
optimize for that case.
This commit contains updated tests and also updates to the loop region graph
viewer so that it draws backedges as green arrows from the loop to its backedge
subregions. The test updates were done by examining each test case by hand.
This is safe because the closure is not allowed to capture the array according
to the documentation of 'withUnsafeMutableBuffer' and the current implementation
makes sure that any such capture would observe an empty array by swapping self
with an empty array.
Users will get "almost guaranteed" stack promotion for small arrays by writing
something like:
func testStackAllocation(p: Proto) {
var a = [p, p, p]
a.withUnsafeMutableBufferPointer {
let array = $0
work(array)
}
}
It is "almost guaranteed" because we need to statically be able to tell the size
required for the array (no unspecialized generics) and the total buffer size
must not exceed 1K.