When devirtualizing witness method and class method calls, we
transform apply instructions operating on the result of a SIL
witness_method or class_method instruction to direct calls of
a function_ref.
The generic signature of the dynamic call site might not match
the generic signature of the static thunk, so the substitution
list from the dynamic apply instruction cannot be used directly;
instead, we must transform it to a substitution list suitable
for the static thunk.
- With witness methods, the method is called using the protocol
requirement's signature, <Self : P, ...>, however the
witness thunk has a generic signature derived from the
concrete witness.
For example, the requirement might have a signature
<Self : P, T>, where the concrete witness thunk might
have a signature <X, Y>, where the concrete conforming type
is G<X, Y>.
At the call site, we substitute Self := G<X', Y'>; however
to be able to call the witness thunk directly, we need to
form substitutions X := X' and Y := Y'.
- A similar situation occurs with class methods when the
dynamically-dispatched call is performed against a derived
class, but devirtualization actually finds the method on a
base class of the derived class.
The base class may have a different number of generic
parameters than the derived class, either because the
derived class makes some generic parameters of the base
class concrete, or if the derived class introduces new
generic parameters of its own.
In both cases, we need to consider the generic signature of the
dynamic call site (the protocol requirement or the derived
class method) as well as the generic signature of the static
thunk, and carefully remap the substitutions from one form
into another.
Previously the optimizer would implicitly rely on substitutions
being in AllArchetypes order, in particular that concatenating
outer substitutions with inner substitutions makes sense.
This assumption is about to go away, so this patch refactors
the optimizer to use some new abstractions for remapping
substitution lists.
This fixes a problem which might result in converting the owned self argument of a deallocating deinitializer into a guaranteed argument.
rdar://problem/28096460
1. Make sure to abort the data flow as soon as we know we cant find the epilogue retain/release.
2. Ignore retain in the throw block, because we do not use the result or insert retain for it
in the throw block on caller side. This is a bug really, we have a test case for it in the
functionsigopts.sil. It will be tested once this new epilogue retain matcher is wired up.
This adds the typedef and switches uses of NodeType * to NodeRef. This is in
preparation for the eventual NodeRef-ization of the GraphTraits in LLVM. NFC.
This consists of 3 parts:
1) Extend CallerAnalysis to also provide information if a function is partially applied
2) A new DeadArgSignatureOpt pass, similar to FunctionSignatureOpts, which just specializes for dead arguments of partially applied functions.
3) Let CapturePropagation eliminate such partial_apply instructions and replace them with a thin_to_thick conversion of the specialized functions.
This optimzation improves benchmarks where static struct or class functions are passed as a closure (e.g. -20% for SortStrings).
Such functions have a additional metatype parameter. We used to create a partial_apply in this case, which allocates a context, etc.
But this is not necessary as the metatype parameter is not used in most cases.
rdar://problem/27513085
Strict aliasing only applies to memory operations that use strict
addresses. The optimizer needs to be aware of this flag. Uses of raw
addresses should not have their address substituted with a strict
address.
Also add Builtin.LoadRaw which will be used by raw pointer loads.
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Small modifications to the test cases.
Shaves about 19% of the time from the construction of these sets. The
SmallVector size was chosen to minimize the number of dynamic
allocations we end up doing while building the stdlib. This should be a
reasonable size for most projects, too. It's a bit wasteful in space,
but the total amount of allocated space here is pretty small to begin
with.
If we can not find the epilogue releases for all the fields with
reference sematics, but we found for some fields. Explode the argument.
I do not see a performance improvement with this change
rdar://25451364
We now consider effect of deinit in addition to the released value.
rdar://25362826
This is the only 10%+ regression i measured on my machine. no performance improvement.
Sim2DArray | 326 | 366 | +12.3% | **0.89x**
This forces the callsites to be rewritten by the inliner.
we have the issue that the thunk changes from the time the its created to
the time its reread to figure out what we have done to the original function
This results in missed opportunities.
This solution solves the problem gracefully, because the thunk carries the information
on how to set up the call to the optimized functions.
Inlining the thunk makes the callsite calling the optimized function for free. i.e.
without any rewriting.
I did not measure any regression with this change.
Use it for hashing and comparison.
During String's hashValue and comparison function we create a
_NSContiguousString instance to call Foundation's hash/compare function. This is
expensive because we have allocate and deallocate a short lived object on the
heap (and deallocation for Swift objects is expensive). Instead help the
optimizer to allocate this object on the stack.
Introduces two functions on the internal _NSContiguousString:
_unsafeWithNotEscapedSelfPointer and _unsafeWithNotEscapedSelfPointerPair that
pass the _NSContiguousString instance as an opaque pointer to their closure
argument. Usage of these functions asserts that the closure will not escape
objects transitively reachable from the opaque pointer.
We then use those functions to call into the runtime to call foundation
functions on the passed strings. The optimizer can promote the strings to the
stack because of the assertion this API makes.
let lhsStr = _NSContiguousString(self._core) // will be promoted to the stack.
let rhsStr = _NSContiguousString(rhs._core) // will be promoted to the stack.
let res = lhsStr._unsafeWithNotEscapedSelfPointerPair(rhsStr) {
return _stdlib_compareNSStringDeterministicUnicodeCollationPointer($0, $1)
}
Tested by existing String tests.
We should see some nice performance improvements for string comparison and
dictionary benchmarks.
Here is what I measured at -O on my machine
Name Speedup
Dictionary 2.00x
Dictionary2 1.45x
Dictionary2OfObjects 1.20x
Dictionary3 1.50x
Dictionary3OfObjects 1.45x
DictionaryOfObjects 1.40x
SuperChars 1.60x
rdar://22173647
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
We really only need the analysis to tell whether a function has caller
inside the module or not. We do not need to know the callsites.
Remove them for now to make the analysis more memory efficient.
Add a note to indicate it can be extended.
Address the comments from 0acc0a8464
I still have not made up my mind how to handle deleted functions.
CallerAnalysis is not hooked up to anything yet.