At some point, pass definitions were heavily macro-ized. Pass
descriptive names were added in two places. This is not only redundant
but a source of confusion. You could waste a lot of time grepping for
the wrong string. I removed all the getName() overrides which, at
around 90 passes, was a fairly significant amount of code bloat.
Any pass that we want to be able to invoke by name from a tool
(sil-opt) or pipeline plan *should* have unique type name, enum value,
commend-line string, and name string. I removed a comment about the
various inliner passes that contradicted that.
Side note: We should be consistent with the policy that a pass is
identified by its type. We have a couple passes, LICM and CSE, which
currently violate that convention.
We also either remove or make private the addPass* functions on SILPassManager,
so the only way to execute passes via SILPassManager is by creating a
SILPassPipelinePlan. This beyond adding uniformity ensures that we always
resetAndRemoveTransformations properly after a pipeline is run.
This commit adds the functionality, but does not change SILPassManager to use
it. The reason why I am doing this is so I can implement sil-opt pass bisecting
functionality in python using a tool that dumps the current pass pipelines
out. This will ensure that even in the face of changes to the pass pipelines,
everything should just work.
This is a simple refactoring to make it really easy for me to rip out the pass
pipeline code into a real pass pipeline class that can be
serialized/deserialized. By serializing/deserializing the pass-pipeline
directly, it becomes very easy to write a bug-point like tool in python on top.
Additionally, it allows users who want to manipulate the pipeline by hand to be
able to easily dump out the normal pass pipeline without any work.
This is a hidden option. It should be used like: -assume-single-threaded
When this function is provided, the compiler assumes that the code will be executed in the single threaded mode. It then performs certain optimizations that can benefit from it, e.g. it marks as non-atomic all reference counting instructions in the user code being compiled.
Often times SILGen wants to hold onto values that have been copied. This causes
an issue, when due to Cleanups firing, SILBuilder inserts destroys and destroys
the copy that produced the value that SILGen held onto. This will then cause
SILGen to emit incorrect code.
There really is no reason to introduce such complexity into SILBuilder when a
small simple guaranteed pass can perform the same work. Thus the introduction of
this pass.
In a later commit, I am going to eliminate the SILBuilder entry points.
rdar://28685236
This is a NFC change, since verification still will be behind the flag. But this
will allow me to move copy_value, destroy_value in front of the
EnableSILOwnership flag and verify via SILGen that we are always using those
instructions.
rdar://28851920
Previously I was going to just set a flag and run the verifier once with that
flag enabled. Then I realized that given that the OwnershipModelEliminator is a
function pass, I really need to put the state on whether or not ownership is
enabled on functions. Now this commit refactors the verifier to use the state on
the function when determining if it should allow for ownership qualified
instructions or not in a specific function.
rdar://28685236
radar rdar://problem/28434323
SILGen has no reason to insert shadow copies for inout parameters any more. They cannot be captured. We still emit these copies. Sometimes deshadowing removes them, but sometimes it does not.
In this PR we just avoid emitting the copies and remove the deshadowing pass.
This PR chery-picked some of @dduan work and built on top of it.
This consists of 3 parts:
1) Extend CallerAnalysis to also provide information if a function is partially applied
2) A new DeadArgSignatureOpt pass, similar to FunctionSignatureOpts, which just specializes for dead arguments of partially applied functions.
3) Let CapturePropagation eliminate such partial_apply instructions and replace them with a thin_to_thick conversion of the specialized functions.
This optimzation improves benchmarks where static struct or class functions are passed as a closure (e.g. -20% for SortStrings).
Such functions have a additional metatype parameter. We used to create a partial_apply in this case, which allocates a context, etc.
But this is not necessary as the metatype parameter is not used in most cases.
rdar://problem/27513085
For details see the comment in ConditionForwarding.cpp.
This optimization pass helps to optimize loops iterating over closed ranges, e.g. for i in 0...n { }
I'm measuring around a 1% reduciton in compile time for the stdlib, with
a handful of improvements on the benchmarks when compiled at -O, and one
small regression on one benchmark.
Without this we can end up not inlining in some trivial cases.
For example, the ClosureSpecializer may generate a function_ref - convert_function - apply sequence.
This must be cleaned up by SILCombine before we can inline the function.
rdar://problem/22309472
We can remove the retain/release pair preceeding the builtins based on the
knowledge that the lifetime of the reference is guaranteed by someone hanging on
to the reference elsewhere.
Eventually, we decided to do this
1. Have the function signature opts (used to be called the cloner to create
the optimized function.
2. Mark the thunk as always_inline
3. Rely on the inliner to inline the thunk to get the benefit of calling optimized
function directly.
This forces the callsites to be rewritten by the inliner.
we have the issue that the thunk changes from the time the its created to
the time its reread to figure out what we have done to the original function
This results in missed opportunities.
This solution solves the problem gracefully, because the thunk carries the information
on how to set up the call to the optimized functions.
Inlining the thunk makes the callsite calling the optimized function for free. i.e.
without any rewriting.
I did not measure any regression with this change.
This was mistakenly reverted in an attempt to fix buildbots.
Unfortunately it's now smashed into one commit.
---
Introduce @_specialize(<type list>) internal attribute.
This attribute can be attached to generic functions. The attribute's
arguments must be a list of concrete types to be substituted in the
function's generic signature. Any number of specializations may be
associated with a generic function.
This attribute provides a hint to the compiler. At -O, the compiler
will generate the specified specializations and emit calls to the
specialized code in the original generic function guarded by type
checks.
The current attribute is designed to be an internal tool for
performance experimentation. It does not affect the language or
API. This work may be extended in the future to add user-visible
attributes that do provide API guarantees and/or direct dispatch to
specialized code.
This attribute works on any generic function: a freestanding function
with generic type parameters, a nongeneric method declared in a
generic class, a generic method in a nongeneric class or a generic
method in a generic class. A function's generic signature is a
concatenation of the generic context and the function's own generic
type parameters.
e.g.
struct S<T> {
var x: T
@_specialize(Int, Float)
mutating func exchangeSecond<U>(u: U, _ t: T) -> (U, T) {
x = t
return (u, x)
}
}
// Substitutes: <T, U> with <Int, Float> producing:
// S<Int>::exchangeSecond<Float>(u: Float, t: Int) -> (Float, Int)
---
[SILOptimizer] Introduce an eager-specializer pass.
This pass finds generic functions with @_specialized attributes and
generates specialized code for the attribute's concrete types. It
inserts type checks and guarded dispatch at the beginning of the
generic function for each specialization. Since we don't currently
expose this attribute as API and don't specialize vtables and witness
tables yet, the only way to reach the specialized code is by calling
the generic function which performs the guarded dispatch.
In the future, we can build on this work in several ways:
- cross module dispatch directly to specialized code
- dynamic dispatch directly to specialized code
- automated specialization based on less specific hints
- partial specialization
- and so on...
I reorganized and refactored the optimizer's generic utilities to
support direct function specialization as opposed to apply
specialization.
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
Temporarily reverting @_specialize because stdlib unit tests are
failing on an internal branch during deserialization.
This reverts commit e2c43cfe14, reversing
changes made to 9078011f93.
This change follows up on an idea from Michael (thanks!).
It enables debugging and profiling on SIL level, which is useful for compiler debugging.
There is a new frontend option -gsil which lets the compiler write a SIL file and generated debug info for it.
For details see docs/DebuggingTheCompiler.rst and the comments in SILDebugInfoGenerator.cpp.
This pass finds generic functions with @_specialized attributes and
generates specialized code for the attribute's concrete types. It
inserts type checks and guarded dispatch at the beginning of the
generic function for each specialization. Since we don't currently
expose this attribute as API and don't specialize vtables and witness
tables yet, the only way to reach the specialized code is by calling
the generic function which performs the guarded dispatch.
In the future, we can build on this work in several ways:
- cross module dispatch directly to specialized code
- dynamic dispatch directly to specialized code
- automated specialization based on less specific hints
- partial specialization
- and so on...
I reorganized and refactored the optimizer's generic utilities to
support direct function specialization as opposed to apply
specialization.
This commit moves the SILLinker pass out of AddSSAPasses, so that we run
more function passes on each function before moving up to it's callers.
Now the only remaining module passes in AddSSAPasses are GlobalOpt and
LetPropertiesOpt, which run only when we call AddSSAPasses for the
MidLevel optimizations.
This commit also adds the high level loop opt passes onto the same pass
run. As a result of this and moving SILLinker out of AddSSAPasses, we
now run far more passes together on a given function before moving up
the call graph to the callers.
The net result is that I am now seeing approximately a 2% reduction in
stdlib compile times, with only a single significant performance
regression (there are some other minor improvements and regressions, and
some major improvements with -Ounchecked).
The 2% reduction appears to come largely from the mechanism in the pass
manager that skips running passes if we've not made any changes to a
function since the last time the pass was run.
In theory we should be able to eliminate more loads if we run this after
the mem2reg that is after inlining. We aren't really relying heavily on
having promoted values like this prior to inlining.
Again, I see no significant performance delta, but this seems like the
best place to put this pass if we're only running it once per run of the
SSA passes.
Doing this earlier means that optimizations that are looking at SIL
values (rather than memory) have more opportunities earlier.
Minimal impact at the moment, but this may allow for removing some later
passes that are repeated.
Re-apply b00dcbe with a small test update, and a small change in pass
ordering.
I measure around a 10% reduction in compile times of release no-assert
builds of the stdlib and StdlibUnitTest.
For release + debug-swift builds, I see 20% reduction in stdlib compile
time.
My latest measurements show a few regressions at -O:
Calculator
NSError
SetIsSubsetOf
Sim2DArray
There is a small (0.1%) reduction in the libswiftCore.dylib size.
Being able to remove these is a consequence of the reordering that
happened in e50daa6.
I measure around a 10% reduction in compile times of release no-assert
builds of the stdlib and StdlibUnitTest.
For release + debug-swift builds, I see 20% reduction in stdlib compile
time.
I saw no reproducible regressions in the benchmarks, and a few
improvements.
There is a small (0.1%) reduction in the libswiftCore.dylib size.
Being able to remove these is a consequence of the reordering that
happened in e50daa6.
The end goal here is to end up with a good pass ordering that will allow
us to only run one set of these passes, rather than running them
twice. This is a start in that direction.
No real impact measured on compile times as of this change. On
benchmarks I see a mix of regressions and improvements.
-O improvements:
Calculator -17.6% 1.21x
Chars -54.4% 2.19x
PolymorphicCalls -14.7% 1.17x
SetIsSubsetOf -14.1% 1.16x
Sim2DArray -14.1% 1.16x
StrToInt -30.4% 1.44x
-O regressions:
CaptureProp +32.9% 0.75x
DictionarySwap +36.0% 0.74x
XorLoop +39.8% 0.72x
-Ounchecked improvements:
Chars -58.0% 2.38x
-Ounchecked regressions:
CaptureProp +33.3% 0.75x
-Onone improvements:
StrToInt -14.9% 1.18x
StringWalk -47.6% 1.91x
StringWithCString -17.2% 1.21x
(many more smaller improvements)
-Onone regressions:
Calculator +21.5% 0.82x
OpenClose +10.1% 0.91x
This eliminates a pretty similar list of passes added in a similar order
with just re-using the ordering from AddSSAPasses. Beyond the particular
inliner pass (which is maintained with this change), there was nothing
really specific to low-level code with the order that was present before.
I measure a 1% increase in compile time of the stdlib, no perf
regressions (at -O), and a few decent improvements:
19 CaptureProp 5233 4129 -1104 -21.1% 1.27x
30 ErrorHandling 3053 2678 -375 -12.3% 1.14x
65 Sim2DArray 610 518 -92 -15.1% 1.18x
I expect to be able to get back the 1% compile-time hit (and probably
more) with future changes.
Now that we process functions in bottom-up order in the pass manager and
have a mechanism to restart the pass pipeline on the current
function (or on a newly created callee function), we can split these
passes back out from the inliner and end up with the same benefits we
had from initially integrating them. We get the further benefit of fully
optimizing newly created callee functions before continuing with the
function that resulted in the creation of those callee
functions (e.g. as a result of a specialization pass running).