Change the optimizer to only make specializations [fragile] if both the
original callee is [fragile] *and* the caller is [fragile].
Otherwise, the specialized callee might be [fragile] even if it is never
called from a [fragile] function, which inhibits the optimizer from
devirtualizing calls inside the specialization.
This opens up some missed optimization opportunities in the performance
inliner and devirtualization, which currently reject fragile->non-fragile
references:
TEST | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
--- | --- | --- | --- | ---
DictionaryRemoveOfObjects | 38391 | 35859 | -6.6% | **1.07x**
Hanoi | 5853 | 5288 | -9.7% | **1.11x**
Phonebook | 18287 | 14988 | -18.0% | **1.22x**
SetExclusiveOr_OfObjects | 20001 | 15906 | -20.5% | **1.26x**
SetUnion_OfObjects | 16490 | 12370 | -25.0% | **1.33x**
Right now, passes other than performance inlining and devirtualization
of class methods are not checking invariants on [fragile] functions
at all, which was incorrect; as part of the work on building the
standard library with -enable-resilience, I added these checks, which
regressed performance with resilience disabled. This patch makes up for
these regressions.
Furthermore, once SIL type lowering is aware of resilience, this will
allow the stack promotion pass to make further optimizations after
specializing [fragile] callees.
This broke the test suite under optimizations with a SIL verifier error: "stack dealloc does
not match most recent stack alloc".
This reverts commit 7a2ca23bc2, reversing
changes made to 4c55e8d7a7.
Unreachable blocks prevented stack promotion in some cases.
Now we use our own post-dominator tree which ignores unreachable blocks instead of the standard post-dominator tree provided by the PostDominanceAnalysis.
Unreachable blocks (better: unreachable sub-graphs) are of no interrest because we don't have to insert the dealloc instructions in unreachable blocks anyway.
It is a hint to the optimizer that the code, where this builtin is called, is on the fast path.
Specifically, the inliner takes it into account and increases the assumed benefit for code where the builtin is located.
Compared to the fastPath/slowPath builtins, this builtin can be placed into plain linear code and doesn't need to be used in conditions.
Compared to the @inline(__always) attribute, this builtin has also an effect on the caller function. Let's assume
foo() calls bar() contains onFastPath
and both foo and bar are small functions. Then if bar gets inlined into foo, the builtin also increases the chances that foo gets inlined.
This would not be the case if @inline(__always) is used just for bar.
Do not specialize an apply/partial_apply that we've already added to the
set of dead instructions. Doing so can result in creating a new
instruction which we will leave around, and which will have a type
mismatch in its parameter list.
Fixes rdar://problem/25447450.
We ended up adding the same instruction twice to a SmallVector of
instructions to be deleted. To avoid this, we'll track these
to-be-deleted instructions in a SmallSetVector instead.
We were also failing to add an instruction that we can delete to the set
of instructions to be deleted, so I fixed that as well.
I've added a test case, but it's currently disabled because fixing this
turned up another issue in the same code which I still need to take a
look at.
Fixes rdar://problem/25369617.
It now detects more opportunities for inlining, like some patters with RC instructions or loads/stores from/to stack locations in the caller.
On the other hand a new shortest path analysis limits inlining to those cases where it really gives a benefit.
As the inlining decision now depends on many parameters, the test-threshhold option is removed because it doe not make much sense anymore.
Instead the inliner test files are modified to model the "real" instruction costs.
We can remove the retain/release pair preceeding the builtins based on the
knowledge that the lifetime of the reference is guaranteed by someone hanging on
to the reference elsewhere.
Eventually, we decided to do this
1. Have the function signature opts (used to be called the cloner to create
the optimized function.
2. Mark the thunk as always_inline
3. Rely on the inliner to inline the thunk to get the benefit of calling optimized
function directly.
We decided to use the inliner to rewrite the caller's callsites.
And eventually I will turn FunctionSignatureAnalysis into a Utility.
As its data should only be used and kept in the cloner pass.
This forces the callsites to be rewritten by the inliner.
we have the issue that the thunk changes from the time the its created to
the time its reread to figure out what we have done to the original function
This results in missed opportunities.
This solution solves the problem gracefully, because the thunk carries the information
on how to set up the call to the optimized functions.
Inlining the thunk makes the callsite calling the optimized function for free. i.e.
without any rewriting.
I did not measure any regression with this change.
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
This occured if a stack-promoted object with a devirtualized final release is not actually allocated on the stack.
Now the ReleaseDevirtualizer models the procedure of a final release more accurately.
It inserts a set_deallocating instruction and calles the deallocator (instead of just the deinit).
This changes also includes two peephole optimizations in IRGen and LLVMStackPromotion which get rid of
unused runtime calls in case the stack promoted object is really allocated on the stack.
This fixes rdar://problem/25068118
In many places, we're interested in whether a type with archetypes *might be* a superclass of another type with the right bindings, particularly in the optimizer. Provide a separate Type::isBindableToSuperclassOf method that performs this check. Use it in the devirtualizer to fix rdar://problem/24993618. Using it might unblock other places where the optimizer is conservative, but we can fix those separately.
We were creating new uses of an argument just prior to erasing it from
the block argument list.
We need to replace references to that value in the side structure we
generate with references to the new value that we're replacing it with.
Fixes SR-884 / rdar://problem/25008398.