Eventually, we decided to do this
1. Have the function signature opts (used to be called the cloner to create
the optimized function.
2. Mark the thunk as always_inline
3. Rely on the inliner to inline the thunk to get the benefit of calling optimized
function directly.
We decided to use the inliner to rewrite the caller's callsites.
And eventually I will turn FunctionSignatureAnalysis into a Utility.
As its data should only be used and kept in the cloner pass.
This forces the callsites to be rewritten by the inliner.
we have the issue that the thunk changes from the time the its created to
the time its reread to figure out what we have done to the original function
This results in missed opportunities.
This solution solves the problem gracefully, because the thunk carries the information
on how to set up the call to the optimized functions.
Inlining the thunk makes the callsite calling the optimized function for free. i.e.
without any rewriting.
I did not measure any regression with this change.
Use it for hashing and comparison.
During String's hashValue and comparison function we create a
_NSContiguousString instance to call Foundation's hash/compare function. This is
expensive because we have allocate and deallocate a short lived object on the
heap (and deallocation for Swift objects is expensive). Instead help the
optimizer to allocate this object on the stack.
Introduces two functions on the internal _NSContiguousString:
_unsafeWithNotEscapedSelfPointer and _unsafeWithNotEscapedSelfPointerPair that
pass the _NSContiguousString instance as an opaque pointer to their closure
argument. Usage of these functions asserts that the closure will not escape
objects transitively reachable from the opaque pointer.
We then use those functions to call into the runtime to call foundation
functions on the passed strings. The optimizer can promote the strings to the
stack because of the assertion this API makes.
let lhsStr = _NSContiguousString(self._core) // will be promoted to the stack.
let rhsStr = _NSContiguousString(rhs._core) // will be promoted to the stack.
let res = lhsStr._unsafeWithNotEscapedSelfPointerPair(rhsStr) {
return _stdlib_compareNSStringDeterministicUnicodeCollationPointer($0, $1)
}
Tested by existing String tests.
We should see some nice performance improvements for string comparison and
dictionary benchmarks.
Here is what I measured at -O on my machine
Name Speedup
Dictionary 2.00x
Dictionary2 1.45x
Dictionary2OfObjects 1.20x
Dictionary3 1.50x
Dictionary3OfObjects 1.45x
DictionaryOfObjects 1.40x
SuperChars 1.60x
rdar://22173647
The optimization pass was inspecting only init methods to determine if a given let property is defined
in the same way by all initializers. But this is not enough in certain cases, e.g. when some of the
initializers were inlined into the application code and the body of the inlined SIL function representing
such an initializer was removed afterwards by the dead function elimination pass. In such situations,
the Let Properties Optimization pass was assuming that there is only one initializer and considered the
constant let property value defined there as the only possible value of this let property. Therefore it
propagated it into let-property uses, which resulted in an incorrect code.
The right thing to do is to analyze all assignments to a given let property whether they are inside initializer
SIL functions or not. This makes sure that all possible values of a let property are analyzed and compared.
The propagation of a constant let property value can only happen if all found possible values are all the same.
Fixes SR-1026 and rdar://25303106
This was mistakenly reverted in an attempt to fix buildbots.
Unfortunately it's now smashed into one commit.
---
Introduce @_specialize(<type list>) internal attribute.
This attribute can be attached to generic functions. The attribute's
arguments must be a list of concrete types to be substituted in the
function's generic signature. Any number of specializations may be
associated with a generic function.
This attribute provides a hint to the compiler. At -O, the compiler
will generate the specified specializations and emit calls to the
specialized code in the original generic function guarded by type
checks.
The current attribute is designed to be an internal tool for
performance experimentation. It does not affect the language or
API. This work may be extended in the future to add user-visible
attributes that do provide API guarantees and/or direct dispatch to
specialized code.
This attribute works on any generic function: a freestanding function
with generic type parameters, a nongeneric method declared in a
generic class, a generic method in a nongeneric class or a generic
method in a generic class. A function's generic signature is a
concatenation of the generic context and the function's own generic
type parameters.
e.g.
struct S<T> {
var x: T
@_specialize(Int, Float)
mutating func exchangeSecond<U>(u: U, _ t: T) -> (U, T) {
x = t
return (u, x)
}
}
// Substitutes: <T, U> with <Int, Float> producing:
// S<Int>::exchangeSecond<Float>(u: Float, t: Int) -> (Float, Int)
---
[SILOptimizer] Introduce an eager-specializer pass.
This pass finds generic functions with @_specialized attributes and
generates specialized code for the attribute's concrete types. It
inserts type checks and guarded dispatch at the beginning of the
generic function for each specialization. Since we don't currently
expose this attribute as API and don't specialize vtables and witness
tables yet, the only way to reach the specialized code is by calling
the generic function which performs the guarded dispatch.
In the future, we can build on this work in several ways:
- cross module dispatch directly to specialized code
- dynamic dispatch directly to specialized code
- automated specialization based on less specific hints
- partial specialization
- and so on...
I reorganized and refactored the optimizer's generic utilities to
support direct function specialization as opposed to apply
specialization.
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
Temporarily reverting @_specialize because stdlib unit tests are
failing on an internal branch during deserialization.
This reverts commit e2c43cfe14, reversing
changes made to 9078011f93.
This change follows up on an idea from Michael (thanks!).
It enables debugging and profiling on SIL level, which is useful for compiler debugging.
There is a new frontend option -gsil which lets the compiler write a SIL file and generated debug info for it.
For details see docs/DebuggingTheCompiler.rst and the comments in SILDebugInfoGenerator.cpp.
This pass finds generic functions with @_specialized attributes and
generates specialized code for the attribute's concrete types. It
inserts type checks and guarded dispatch at the beginning of the
generic function for each specialization. Since we don't currently
expose this attribute as API and don't specialize vtables and witness
tables yet, the only way to reach the specialized code is by calling
the generic function which performs the guarded dispatch.
In the future, we can build on this work in several ways:
- cross module dispatch directly to specialized code
- dynamic dispatch directly to specialized code
- automated specialization based on less specific hints
- partial specialization
- and so on...
I reorganized and refactored the optimizer's generic utilities to
support direct function specialization as opposed to apply
specialization.
We really only need the analysis to tell whether a function has caller
inside the module or not. We do not need to know the callsites.
Remove them for now to make the analysis more memory efficient.
Add a note to indicate it can be extended.
Add an invalidateAnalysisForDeadFunction API. This API calls the invalidateAnalysis
by default unless overriden by analysis pass themselves. This API passes the extra
information that this function is dead and going to be removed from the module.
CallerAnalysis overrides this API and only invalidate caller/callee relations but
does not push this into the recompute list.
We also considered the possibility of keeping a computed list, instead of recompute
list but that would introduce a O(n^2) complexity as every time we try to complete
the computed list, we need to walk over all the functions that currently exist in the
module to make sure the computed list is complete.
I feel eventually we can do a handleDeleteNotification for function deletion and we
wont need the API added in this change.
Address the comments from 0acc0a8464
I still have not made up my mind how to handle deleted functions.
CallerAnalysis is not hooked up to anything yet.
The analysis can tell all the callsites which calls a function in the module.
The analysis is computed and kept up-to-date lazily.
At the core of it, it keeps a list of functions that need to be recomputed for
the Caller/Callee relation to be precise and on every query, the analysis makes
sure to recompute them and clear the list before any query.
This is NFC right now. I am going to wire it up to function signature analysis
eventually.
a separate analysis pass.
This pass is run on every function and the optimized signature is return'ed through the
getArgDescList and getResultDescList.
Next step is to split to cloning and callsite rewriting into their own function passes.
rdar://24730896
"
This occured if a stack-promoted object with a devirtualized final release is not actually allocated on the stack.
Now the ReleaseDevirtualizer models the procedure of a final release more accurately.
It inserts a set_deallocating instruction and calles the deallocator (instead of just the deinit).
This changes also includes two peephole optimizations in IRGen and LLVMStackPromotion which get rid of
unused runtime calls in case the stack promoted object is really allocated on the stack.
This fixes rdar://problem/25068118
We already computed this information so this is just storing information
we were already computing.
One thing to note is that in code with canonicalized loops, we will
always only have one backedge. But we would like loop region to be
correct even in the case of non-canonicalized code so we support having
multiple back edges. But since the common case is 1 backedge, we
optimize for that case.
This commit contains updated tests and also updates to the loop region graph
viewer so that it draws backedges as green arrows from the loop to its backedge
subregions. The test updates were done by examining each test case by hand.