This attribute allows to define a pre-specialized entry point of a
generic function in a library.
The following definition provides a pre-specialized entry point for
`genericFunc(_:)` for the parameter type `Int` that clients of the
library can call.
```
@_specialize(exported: true, where T == Int)
public func genericFunc<T>(_ t: T) { ... }
```
Pre-specializations of internal `@inlinable` functions are allowed.
```
@usableFromInline
internal struct GenericThing<T> {
@_specialize(exported: true, where T == Int)
@inlinable
internal func genericMethod(_ t: T) {
}
}
```
There is syntax to pre-specialize a method from a different module.
```
import ModuleDefiningGenericFunc
@_specialize(exported: true, target: genericFunc(_:), where T == Double)
func prespecialize_genericFunc(_ t: T) { fatalError("dont call") }
```
Specially marked extensions allow for pre-specialization of internal
methods accross module boundries (respecting `@inlinable` and
`@usableFromInline`).
```
import ModuleDefiningGenericThing
public struct Something {}
@_specializeExtension
extension GenericThing {
@_specialize(exported: true, target: genericMethod(_:), where T == Something)
func prespecialize_genericMethod(_ t: T) { fatalError("dont call") }
}
```
rdar://64993425
This can compensate the performance regression of the more conservative handling of function calls in TempRValueOpt (see previous commit).
The pass runs after the inlining passes and can therefore optimize in some cases where it's not possible before inlining.
Specifically the option: -sil-stop-optzns-before-lowering-ownership. This makes
it possible to write end-to-end tests on OSSA passes. Before one would have to
pattern match after ownership was lowered, losing the ability to do finegrained
FileCheck pattern matching on ossa itself.
Needed to make sure that global initializers are not optimized in mid-level SIL while other functions are still in high-level SIL.
Having the StringOptimization not in high-level SIL was just a mistake in my earlier PR.
Optimizes String operations with constant operands.
Specifically:
* Replaces x.append(y) with x = y if x is empty.
* Removes x.append("")
* Replaces x.append(y) with x = x + y if x and y are constant strings.
* Replaces _typeName(T.self) with a constant string if T is statically known.
With this optimization it's possible to constant fold string interpolations, like "the \(Int.self) type" -> "the Int type"
This new pass runs on high-level SIL, where semantic calls are still in place.
rdar://problem/65642843
In order to test this, I implemented a small source loc inference routine for
instructions without valid SILLocations. This is an optional nob that the
opt-remark writer can optionally enable on a per remark basis. The current
behaviors are just forward/backward scans in the same basic block. If we scan
forwards, if we find a valid SourceLoc, we just use ethat. If we are scanning
backwards, instead we grab the SourceRange and if it is valid use the end source
range of the given instruction. This seems to give a good result for retain
(forward scan) and release (backward scan).
The specific reason that I did that is that my test case for this are
retain/release operations. Often times these operations due to code-motion are
moved around (and rightly to prevent user confusion) given by optimizations auto
generated SIL locations. Since that is the test case I am using, to test this I
needed said engine.
This just moves it past the SIL linker (which since the stdlib doesn't link
anything will not change anything and past TempRValueOpt which is already
updated for OSSA.
I am going to be moving back ownership lowering first in the stdlib so that we
can bring up the optimizer on ownership without needing to deal with
serialization issues (the stdlib doesn't deserialize SIL from any other
modules).
This patch just begins the mechanical process with a nice commit message. Should
be NFC.
* Include small non-generic functions for serializaion
* serialize initializer of global variables: so that global let variables can be constant propagated across modules
rdar://problem/60696510
Optimizes copies from a temporary (an "l-value") to a destination.
%temp = alloc_stack $Ty
instructions_which_store_to %temp
copy_addr [take] %temp to %destination
dealloc_stack %temp
is optimized to
destroy_addr %destination
instructions_which_store_to %destination
The name TempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side.
The TempLValueOpt is similar to CopyForwarding::backwardPropagateCopy.
It's more restricted (e.g. the copy-source must be an alloc_stack).
That enables other patterns to be optimized, which backwardPropagateCopy cannot handle.
This pass also performs a small peephole optimization which simplifies copy_addr - destroy sequences.
copy_addr %source to %destination
destroy_addr %source
is replace with
copy_addr [take] %source to %destination
The pass is already not being run during normal compilation scenarios today
since it bails on OSSA except in certain bit-rot situations where a test wasn't
updated and so was inadvertently invoking the pass. I discovered these while
originally just trying to eliminate the pass from the diagnostic pipeline. The
reason why I am doing this in one larger change is that I found there were a
bunch of sil tests inadvertently relying on guaranteed arc opts to eliminate
copy traffic. So, if I just removed this and did this in two steps, I would
basically be unoptimizing then re-optimizing the tests.
Some notes:
1. The new guaranteed arc opts is based off of SemanticARCOpts and runs only on
ossa. Specifically, in this new pass, we just perform simple
canonicalizations that do not involve any significant analysis. Some
examples: a copy_value all of whose uses are destroys. This will do what the
original pass did and more without more compile time. I did a conservative
first approximation, but we can probably tune this a bit.
2. the reason why I am doing this now is that I was trying to eliminate the
enable-ownership-stripping-after-serialization flag and discovered that the
test opaque_value_mandatory implicitly depends on this since sil-opt by
default was the only place left in the compiler with that option set to false
by default. So I am eliminating that dependency before I land the larger
change.
Sometimes stack promotion can catch cases only at a late stage of the pipeline, after FunctionSignatureOpts.
https://bugs.swift.org/browse/SR-12773
rdar://problem/63068408
Constant folds the uniqueness result of begin_cow_mutation instructions, if it can be proved that the buffer argument is uniquely referenced.
For example:
%buffer = end_cow_mutation %mutable_buffer
// ...
// %buffer does not escape here
// ...
(%is_unique, %mutable_buffer2) = begin_cow_mutation %buffer
cond_br %is_unique, ...
is replaced with
%buffer = end_cow_mutation [keep_unique] %mutable_buffer
// ...
(%not_used, %mutable_buffer2) = begin_cow_mutation %buffer
%true = integer_literal 1
cond_br %true, ...
Note that the keep_unique flag is set on the end_cow_mutation because the code now relies on that the buffer is really uniquely referenced.
The optimization can also handle def-use chains between end_cow_mutation and begin_cow_mutation which involve phi-arguments.
An additional peephole optimization is performed: if the begin_cow_mutation is the only use of the end_cow_mutation, the whole pair of instructions is eliminated.
If only a single field of a struct phi-argument is used, replace the argument by the field value.
br bb(%str)
bb(%phi):
%f = struct_extract %phi, #Field // the only use of %phi
use %f
is replaced with
%f = struct_extract %str, #Field
br bb(%f)
bb(%phi):
use %phi
This also works if the phi-argument is in a def-use cycle.
The new PhiExpansionPass is in the same file as the RedundantPhiEliminationPass. Therefore I renamed the source file to PhiArgumentOptimizations.cpp
This simplifies the handling of the subdirectories in the SIL and
SILOptimizer paths. Create individual libraries as object libraries
which allows the analysis of the source changes to be limited in scope.
Because these are object libraries, this has 0 overhead compared to the
previous implementation. However, string operations over the filenames
are avoided. The cost for this is that any new sub-library needs to be
added into the list rather than added with the special local function.
The PassManager should transform all functions in bottom up order.
This is necessary because when optimizations like inlining looks at the
callee function bodies to compute profitability, the callee functions
should have already undergone optimizations to get better profitability
estimates.
The PassManager builds its function worklist based on bottom up order
on initialization. However, newly created SILFunctions due to
specialization etc, are simply appended to the function worklist. This
can cause us to make bad inlining decisions due to inaccurate
profitability estimates. This change now updates the function worklist such
that, all the callees of the newly added SILFunction are proccessed
before it by the PassManager.
Fixes rdar://52202680
* Fix the mid-level pass pipeline.
Module passes need to be in a separate pipeline, otherwise the
pipeline restart mechanism will be broken.
This makes GlobalOpt and serialization run earlier in the
pipeline. There's no explicit reason for them to be run later, in the
middle of a function pass pipeline.
Also, pipeline boundaries, like serialization and module passes should
be explicit at the the top level function that creates the pass
pipelines.
* SILOptimizer: Add enforcement of function-pass pipelines.
Don't allow module passes to be inserted within a function pass
pipeline. This silently breaks the function pipeline both interfering
with analysis and the normal pipeline restart mechanism.
* Add misssing pass in addFunctionPasses
Co-authored-by: Andrew Trick <atrick@apple.com>
-sil-verify-all flag will verify analyses before and after a pass to
confirm correct invalidations. But if an analysis was never
constructed or invalidated as per current pass order,
it may never detect insufficient invalidations.
-sil-verify-force-analysis will force construct an analysis so that we
can better check for insufficient invalidations.
It is also terribly slow compared to -sil-verify-all.
Don't create a separate pass manager for those passes, just let them run at the beginning of the performance pipeline.
Regarding generated code this is a NFC.
This change fixes a problem with pass-bisecting (for debugging). Having two instances of the pass manager can cause troubles with bisecting, because -sil-opt-pass-count affects both pass managers at the same time.
Add DifferentiabilityWitnessDevirtualizer: an optimization pass that
devirtualizes `differentiability_witness_function` instructions into
`function_ref` instructions.
Co-authored-by: Dan Zheng <danielzheng@google.com>
`VJPEmitter` is a cloner that emits VJP functions. It implements reverse-mode
automatic differentiation, along with `PullbackEmitter`.
`VJPEmitter` clones an original function, replacing function applications with
VJP function applications. In VJP functions, each basic block takes a pullback
struct (containing callee pullbacks) and produces a predecessor enum: these data
structures are consumed by pullback functions.
The differentiation transform does the following:
- Canonicalizes differentiability witnesses by filling in missing derivative
function entries.
- Canonicalizes `differentiable_function` instructions by filling in missing
derivative function operands.
- If necessary, performs automatic differentiation: generating derivative
functions for original functions.
- When encountering non-differentiability code, produces a diagnostic and
errors out.
Partially resolves TF-1211: add the main canonicalization loop.
To incrementally stage changes, derivative functions are currently created
with empty bodies that fatal error with a nice message.
Derivative emitters will be upstreamed separately.
A request is intended to be a pure function of its inputs. That function could, in theory, fail. In practice, there were basically no requests taking advantage of this ability - the few that were using it to explicitly detect cycles can just return reasonable defaults instead of forwarding the error on up the stack.
This is because cycles are checked by *the Evaluator*, and are unwound by the Evaluator.
Therefore, restore the idea that the evaluate functions are themselves pure, but keep the idea that *evaluation* of those requests may fail. This model enables the best of both worlds: we not only keep the evaluator flexible enough to handle future use cases like cancellation and diagnostic invalidation, but also request-based dependencies using the values computed at the evaluation points. These aforementioned use cases would use the llvm::Expected interface and the regular evaluation-point interface respectively.
Introduce evaluator::SideEffect, the type of a request that performs
some operation solely to execute its side effects. Thankfully, there are
precious few requests that need to use this type in practice, but it's
good to call them out explicitly so we can get around to making them
behave much more functionally in the future.