If we can not find the epilogue releases for all the fields with
reference sematics, but we found for some fields. Explode the argument.
I do not see a performance improvement with this change
rdar://25451364
1) handle cases where the tuple_extract is not in the same basic block as the init call. This is not a correctness issue, but might miss some opportunities.
2) bail if there are multiple tuple_extract. This is a correctness issue, but a theoretical one. I don't think that in reality we will ever get multiple tuple_extracts out of SILGen.
We were waiting to delete old apply / try_apply instructions until after
fully specializing all the apply / try_apply in the function. This is
problematic when we have a recursive call and specializing the function
that we're currently processing, since we end up cloning the function
with the old apply / try_apply present.
Rather than doing this, clean up the old apply / try_apply immediately
after processing each one.
Resolves SR-1114 / rdar://problem/25455308.
Change the optimizer to only make specializations [fragile] if both the
original callee is [fragile] *and* the caller is [fragile].
Otherwise, the specialized callee might be [fragile] even if it is never
called from a [fragile] function, which inhibits the optimizer from
devirtualizing calls inside the specialization.
This opens up some missed optimization opportunities in the performance
inliner and devirtualization, which currently reject fragile->non-fragile
references:
TEST | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
--- | --- | --- | --- | ---
DictionaryRemoveOfObjects | 38391 | 35859 | -6.6% | **1.07x**
Hanoi | 5853 | 5288 | -9.7% | **1.11x**
Phonebook | 18287 | 14988 | -18.0% | **1.22x**
SetExclusiveOr_OfObjects | 20001 | 15906 | -20.5% | **1.26x**
SetUnion_OfObjects | 16490 | 12370 | -25.0% | **1.33x**
Right now, passes other than performance inlining and devirtualization
of class methods are not checking invariants on [fragile] functions
at all, which was incorrect; as part of the work on building the
standard library with -enable-resilience, I added these checks, which
regressed performance with resilience disabled. This patch makes up for
these regressions.
Furthermore, once SIL type lowering is aware of resilience, this will
allow the stack promotion pass to make further optimizations after
specializing [fragile] callees.
This broke the test suite under optimizations with a SIL verifier error: "stack dealloc does
not match most recent stack alloc".
This reverts commit 7a2ca23bc2, reversing
changes made to 4c55e8d7a7.
Unreachable blocks prevented stack promotion in some cases.
Now we use our own post-dominator tree which ignores unreachable blocks instead of the standard post-dominator tree provided by the PostDominanceAnalysis.
Unreachable blocks (better: unreachable sub-graphs) are of no interrest because we don't have to insert the dealloc instructions in unreachable blocks anyway.
It is a hint to the optimizer that the code, where this builtin is called, is on the fast path.
Specifically, the inliner takes it into account and increases the assumed benefit for code where the builtin is located.
Compared to the fastPath/slowPath builtins, this builtin can be placed into plain linear code and doesn't need to be used in conditions.
Compared to the @inline(__always) attribute, this builtin has also an effect on the caller function. Let's assume
foo() calls bar() contains onFastPath
and both foo and bar are small functions. Then if bar gets inlined into foo, the builtin also increases the chances that foo gets inlined.
This would not be the case if @inline(__always) is used just for bar.
Do not specialize an apply/partial_apply that we've already added to the
set of dead instructions. Doing so can result in creating a new
instruction which we will leave around, and which will have a type
mismatch in its parameter list.
Fixes rdar://problem/25447450.
We ended up adding the same instruction twice to a SmallVector of
instructions to be deleted. To avoid this, we'll track these
to-be-deleted instructions in a SmallSetVector instead.
We were also failing to add an instruction that we can delete to the set
of instructions to be deleted, so I fixed that as well.
I've added a test case, but it's currently disabled because fixing this
turned up another issue in the same code which I still need to take a
look at.
Fixes rdar://problem/25369617.
It now detects more opportunities for inlining, like some patters with RC instructions or loads/stores from/to stack locations in the caller.
On the other hand a new shortest path analysis limits inlining to those cases where it really gives a benefit.
As the inlining decision now depends on many parameters, the test-threshhold option is removed because it doe not make much sense anymore.
Instead the inliner test files are modified to model the "real" instruction costs.
We can remove the retain/release pair preceeding the builtins based on the
knowledge that the lifetime of the reference is guaranteed by someone hanging on
to the reference elsewhere.