Figure out the enum case if an enum value is dominated by a switch_enum and use this info for a subsequent switch_enum to be simplified.
This change partly recovers the ClosedRange-regression.
rdar://problem/25933907
- Don't crash if a class_method instruction could not be devirtualized.
- Improve devirtualization of methods with generic parameters and using dependent types.
- Fix a bug in isBindableToSuperclassOf, uncovered while fixing the original bug reported in SR-1206.
This bug could lead in certain cases to invocations of a wrong method from the base class, instead
of using a method from a derived class.
rdar://25891588 and SR-1206
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Minimal modifications to the test cases.
This made call sites confusing to read because it doesn't actually
check if the function already exists.
Also fix some minor formatting issues. This came up while I was working
on a fix for a bug that turned out to not be a bug.
If we can not find the epilogue releases for all the fields with
reference sematics, but we found for some fields. Explode the argument.
I do not see a performance improvement with this change
rdar://25451364
1) handle cases where the tuple_extract is not in the same basic block as the init call. This is not a correctness issue, but might miss some opportunities.
2) bail if there are multiple tuple_extract. This is a correctness issue, but a theoretical one. I don't think that in reality we will ever get multiple tuple_extracts out of SILGen.
We were waiting to delete old apply / try_apply instructions until after
fully specializing all the apply / try_apply in the function. This is
problematic when we have a recursive call and specializing the function
that we're currently processing, since we end up cloning the function
with the old apply / try_apply present.
Rather than doing this, clean up the old apply / try_apply immediately
after processing each one.
Resolves SR-1114 / rdar://problem/25455308.
Change the optimizer to only make specializations [fragile] if both the
original callee is [fragile] *and* the caller is [fragile].
Otherwise, the specialized callee might be [fragile] even if it is never
called from a [fragile] function, which inhibits the optimizer from
devirtualizing calls inside the specialization.
This opens up some missed optimization opportunities in the performance
inliner and devirtualization, which currently reject fragile->non-fragile
references:
TEST | OLD_MIN | NEW_MIN | DELTA (%) | SPEEDUP
--- | --- | --- | --- | ---
DictionaryRemoveOfObjects | 38391 | 35859 | -6.6% | **1.07x**
Hanoi | 5853 | 5288 | -9.7% | **1.11x**
Phonebook | 18287 | 14988 | -18.0% | **1.22x**
SetExclusiveOr_OfObjects | 20001 | 15906 | -20.5% | **1.26x**
SetUnion_OfObjects | 16490 | 12370 | -25.0% | **1.33x**
Right now, passes other than performance inlining and devirtualization
of class methods are not checking invariants on [fragile] functions
at all, which was incorrect; as part of the work on building the
standard library with -enable-resilience, I added these checks, which
regressed performance with resilience disabled. This patch makes up for
these regressions.
Furthermore, once SIL type lowering is aware of resilience, this will
allow the stack promotion pass to make further optimizations after
specializing [fragile] callees.
This broke the test suite under optimizations with a SIL verifier error: "stack dealloc does
not match most recent stack alloc".
This reverts commit 7a2ca23bc2, reversing
changes made to 4c55e8d7a7.
Unreachable blocks prevented stack promotion in some cases.
Now we use our own post-dominator tree which ignores unreachable blocks instead of the standard post-dominator tree provided by the PostDominanceAnalysis.
Unreachable blocks (better: unreachable sub-graphs) are of no interrest because we don't have to insert the dealloc instructions in unreachable blocks anyway.
It is a hint to the optimizer that the code, where this builtin is called, is on the fast path.
Specifically, the inliner takes it into account and increases the assumed benefit for code where the builtin is located.
Compared to the fastPath/slowPath builtins, this builtin can be placed into plain linear code and doesn't need to be used in conditions.
Compared to the @inline(__always) attribute, this builtin has also an effect on the caller function. Let's assume
foo() calls bar() contains onFastPath
and both foo and bar are small functions. Then if bar gets inlined into foo, the builtin also increases the chances that foo gets inlined.
This would not be the case if @inline(__always) is used just for bar.
Do not specialize an apply/partial_apply that we've already added to the
set of dead instructions. Doing so can result in creating a new
instruction which we will leave around, and which will have a type
mismatch in its parameter list.
Fixes rdar://problem/25447450.