It is a hint to the optimizer that the code, where this builtin is called, is on the fast path.
Specifically, the inliner takes it into account and increases the assumed benefit for code where the builtin is located.
Compared to the fastPath/slowPath builtins, this builtin can be placed into plain linear code and doesn't need to be used in conditions.
Compared to the @inline(__always) attribute, this builtin has also an effect on the caller function. Let's assume
foo() calls bar() contains onFastPath
and both foo and bar are small functions. Then if bar gets inlined into foo, the builtin also increases the chances that foo gets inlined.
This would not be the case if @inline(__always) is used just for bar.
Do not specialize an apply/partial_apply that we've already added to the
set of dead instructions. Doing so can result in creating a new
instruction which we will leave around, and which will have a type
mismatch in its parameter list.
Fixes rdar://problem/25447450.
We ended up adding the same instruction twice to a SmallVector of
instructions to be deleted. To avoid this, we'll track these
to-be-deleted instructions in a SmallSetVector instead.
We were also failing to add an instruction that we can delete to the set
of instructions to be deleted, so I fixed that as well.
I've added a test case, but it's currently disabled because fixing this
turned up another issue in the same code which I still need to take a
look at.
Fixes rdar://problem/25369617.
It now detects more opportunities for inlining, like some patters with RC instructions or loads/stores from/to stack locations in the caller.
On the other hand a new shortest path analysis limits inlining to those cases where it really gives a benefit.
As the inlining decision now depends on many parameters, the test-threshhold option is removed because it doe not make much sense anymore.
Instead the inliner test files are modified to model the "real" instruction costs.
We can remove the retain/release pair preceeding the builtins based on the
knowledge that the lifetime of the reference is guaranteed by someone hanging on
to the reference elsewhere.
Based on the review of my previous commit, I did some re-factorings.
The code is now simpler and handles more cases. More tests were added.
The overall idea of this rewrite is that the pass basically tries to check if it can
see all possible writes (i.e. initializations) into a given let property. Only if it can be proven
that the pass sees all possible writes and all those initializations are producing
the same constant, statically known value, the pass propagates this constant value
into uses of a property.
SR-1026 and rdar://25303106
Eventually, we decided to do this
1. Have the function signature opts (used to be called the cloner to create
the optimized function.
2. Mark the thunk as always_inline
3. Rely on the inliner to inline the thunk to get the benefit of calling optimized
function directly.
We decided to use the inliner to rewrite the caller's callsites.
And eventually I will turn FunctionSignatureAnalysis into a Utility.
As its data should only be used and kept in the cloner pass.
This forces the callsites to be rewritten by the inliner.
we have the issue that the thunk changes from the time the its created to
the time its reread to figure out what we have done to the original function
This results in missed opportunities.
This solution solves the problem gracefully, because the thunk carries the information
on how to set up the call to the optimized functions.
Inlining the thunk makes the callsite calling the optimized function for free. i.e.
without any rewriting.
I did not measure any regression with this change.
Use it for hashing and comparison.
During String's hashValue and comparison function we create a
_NSContiguousString instance to call Foundation's hash/compare function. This is
expensive because we have allocate and deallocate a short lived object on the
heap (and deallocation for Swift objects is expensive). Instead help the
optimizer to allocate this object on the stack.
Introduces two functions on the internal _NSContiguousString:
_unsafeWithNotEscapedSelfPointer and _unsafeWithNotEscapedSelfPointerPair that
pass the _NSContiguousString instance as an opaque pointer to their closure
argument. Usage of these functions asserts that the closure will not escape
objects transitively reachable from the opaque pointer.
We then use those functions to call into the runtime to call foundation
functions on the passed strings. The optimizer can promote the strings to the
stack because of the assertion this API makes.
let lhsStr = _NSContiguousString(self._core) // will be promoted to the stack.
let rhsStr = _NSContiguousString(rhs._core) // will be promoted to the stack.
let res = lhsStr._unsafeWithNotEscapedSelfPointerPair(rhsStr) {
return _stdlib_compareNSStringDeterministicUnicodeCollationPointer($0, $1)
}
Tested by existing String tests.
We should see some nice performance improvements for string comparison and
dictionary benchmarks.
Here is what I measured at -O on my machine
Name Speedup
Dictionary 2.00x
Dictionary2 1.45x
Dictionary2OfObjects 1.20x
Dictionary3 1.50x
Dictionary3OfObjects 1.45x
DictionaryOfObjects 1.40x
SuperChars 1.60x
rdar://22173647
The optimization pass was inspecting only init methods to determine if a given let property is defined
in the same way by all initializers. But this is not enough in certain cases, e.g. when some of the
initializers were inlined into the application code and the body of the inlined SIL function representing
such an initializer was removed afterwards by the dead function elimination pass. In such situations,
the Let Properties Optimization pass was assuming that there is only one initializer and considered the
constant let property value defined there as the only possible value of this let property. Therefore it
propagated it into let-property uses, which resulted in an incorrect code.
The right thing to do is to analyze all assignments to a given let property whether they are inside initializer
SIL functions or not. This makes sure that all possible values of a let property are analyzed and compared.
The propagation of a constant let property value can only happen if all found possible values are all the same.
Fixes SR-1026 and rdar://25303106
This was mistakenly reverted in an attempt to fix buildbots.
Unfortunately it's now smashed into one commit.
---
Introduce @_specialize(<type list>) internal attribute.
This attribute can be attached to generic functions. The attribute's
arguments must be a list of concrete types to be substituted in the
function's generic signature. Any number of specializations may be
associated with a generic function.
This attribute provides a hint to the compiler. At -O, the compiler
will generate the specified specializations and emit calls to the
specialized code in the original generic function guarded by type
checks.
The current attribute is designed to be an internal tool for
performance experimentation. It does not affect the language or
API. This work may be extended in the future to add user-visible
attributes that do provide API guarantees and/or direct dispatch to
specialized code.
This attribute works on any generic function: a freestanding function
with generic type parameters, a nongeneric method declared in a
generic class, a generic method in a nongeneric class or a generic
method in a generic class. A function's generic signature is a
concatenation of the generic context and the function's own generic
type parameters.
e.g.
struct S<T> {
var x: T
@_specialize(Int, Float)
mutating func exchangeSecond<U>(u: U, _ t: T) -> (U, T) {
x = t
return (u, x)
}
}
// Substitutes: <T, U> with <Int, Float> producing:
// S<Int>::exchangeSecond<Float>(u: Float, t: Int) -> (Float, Int)
---
[SILOptimizer] Introduce an eager-specializer pass.
This pass finds generic functions with @_specialized attributes and
generates specialized code for the attribute's concrete types. It
inserts type checks and guarded dispatch at the beginning of the
generic function for each specialization. Since we don't currently
expose this attribute as API and don't specialize vtables and witness
tables yet, the only way to reach the specialized code is by calling
the generic function which performs the guarded dispatch.
In the future, we can build on this work in several ways:
- cross module dispatch directly to specialized code
- dynamic dispatch directly to specialized code
- automated specialization based on less specific hints
- partial specialization
- and so on...
I reorganized and refactored the optimizer's generic utilities to
support direct function specialization as opposed to apply
specialization.