Changes in this CR add part of the, Swift based, Autodiff specific
closure specialization optimization pass. The pass does not modify any
code nor does it even exist in any of the optimization pipelines. The
rationale for pushing this partially complete optimization pass upstream
is to keep up with the breaking changes in the underlying Swift based
compiler infrastructure.
Add a new mandatory BooleanLiteralFolding pass which constant folds conditional branches with boolean literals as operands.
```
%1 = integer_literal -1
%2 = apply %bool_init(%1) // Bool.init(_builtinBooleanLiteral:)
%3 = struct_extract %2, #Bool._value
cond_br %3, bb1, bb2
```
->
```
...
br bb1
```
This pass is intended to run before DefiniteInitialization, where mandatory inlining and constant folding didn't run, yet (which would perform this kind of optimization).
This optimization is required to let DefiniteInitialization handle boolean literals correctly.
For example in infinite loops:
```
init() {
while true { // DI need to know that there is no loop exit from this while-statement
if some_condition {
member_field = init_value
break
}
}
}
```
By default it lowers the builtin to an `alloc_vector` with a paired `dealloc_stack`.
If the builtin appears in the initializer of a global variable and the vector elements are initialized,
a statically initialized global is created where the initializer is a `vector` instruction.
In regular swift this is a nice optimization. In embedded swift it's a requirement, because the compiler needs to be able to specialize generic deinits of non-copyable types.
The new de-virtualization utilities are called from two places:
* from the new DeinitDevirtualizer pass. It replaces the old MoveOnlyDeinitDevirtualization, which is very basic and does not fulfill the needs for embedded swift.
* from MandatoryPerformanceOptimizations for embedded swift
For chains of async functions where suspensions can be statically
proven to never be required, this pass removes all suspensions and
turns the functions into synchronous functions.
For example, this function does not actually require any suspensions,
once the correct executor is acquired upon initial entry:
```
func fib(_ n: Int) async -> Int {
if n <= 1 { return n }
return await fib(n-1) + fib(n-2)
}
```
So we can turn the above into this for better performance:
```
func fib() async -> Int {
return fib_sync()
}
func fib_sync(_ n: Int) -> Int {
if n <= 1 { return n }
return fib(n-1) + fib(n-2)
}
```
while rewriting callers of `fib` to use the `sync` entry-point
when we can prove that it will be invoked on a compatible executor.
This pass is currently experimental and under development. Thus, it
is disabled by default and you must use
`-enable-experimental-async-demotion` to try it.
It lowers let property accesses of classes.
Lowering consists of two tasks:
* In class initializers, insert `end_init_let_ref` instructions at places where all let-fields are initialized.
This strictly separates the life-range of the class into a region where let fields are still written during
initialization and a region where let fields are truly immutable.
* Add the `[immutable]` flag to all `ref_element_addr` instructions (for let-fields) which are in the "immutable"
region. This includes the region after an inserted `end_init_let_ref` in an class initializer, but also all
let-field accesses in other functions than the initializer and the destructor.
This pass should run after DefiniteInitialization but before RawSILInstLowering (because it relies on `mark_uninitialized` still present in the class initializer).
Note that it's not mandatory to run this pass. If it doesn't run, SIL is still correct.
Simplified example (after lowering):
bb0(%0 : @owned C): // = self of the class initializer
%1 = mark_uninitialized %0
%2 = ref_element_addr %1, #C.l // a let-field
store %init_value to %2
%3 = end_init_let_ref %1 // inserted by lowering
%4 = ref_element_addr [immutable] %3, #C.l // set to immutable by lowering
%5 = load %4
The new implementation has several benefits compared to the old C++ implementation:
* It is significantly simpler. It optimizes each load separately instead of all at once with bit-field based dataflow.
* It's using alias analysis more accurately which enables more loads to be optimized
* It avoids inserting additional copies in OSSA
The algorithm is a data flow analysis which starts at the original load and searches for preceding stores or loads by following the control flow in backward direction.
The preceding stores and loads provide the "available values" with which the original load can be replaced.
The old C++ pass didn't catch a few cases.
Also:
* The new pass is significantly simpler: it doesn't perform dataflow for _all_ memory locations at once using bitfields, but handles each store separately. (In both implementations there is a complexity limit in place to avoid quadratic complexity)
* The new pass works with OSSA
It sets the `[bare]` attribute for `alloc_ref` and `global_value` instructions if their header (reference count and metatype) is not used throughout the lifetime of the object.
This allows to run the NamedReturnValueOptimization only late in the pipeline.
The optimization shouldn't be done before serialization, because it might prevent predictable memory optimizations in the caller after inlining.
It converts a lazily initialized global to a statically initialized global variable.
When this pass runs on a global initializer `[global_init_once_fn]` it tries to create a static initializer for the initialized global.
```
sil [global_init_once_fn] @globalinit {
alloc_global @the_global
%a = global_addr @the_global
%i = some_const_initializer_insts
store %i to %a
}
```
The pass creates a static initializer for the global:
```
sil_global @the_global = {
%initval = some_const_initializer_insts
}
```
and removes the allocation and store instructions from the initializer function:
```
sil [global_init_once_fn] @globalinit {
%a = global_addr @the_global
%i = some_const_initializer_insts
}
```
The initializer then becomes a side-effect free function which let's the builtin-simplification remove the `builtin "once"` which calls the initializer.
If a `debug_step` has the same debug location as a previous or succeeding instruction it is removed.
It's just important that there is at least one instruction for a certain debug location so that single stepping on that location will work.
Computes the side effects for a function, which consists of argument- and global effects.
This is similar to the ComputeEscapeEffects pass, just for side-effects.
Removes redundant ObjectiveC <-> Swift bridging calls.
Basically, if a value is bridged from ObjectiveC to Swift an then back to ObjectiveC again, then just re-use the original ObjectiveC value.
Also in this commit: add an additional DCE pass before ownership elimination. It can cleanup dead code which is left behind by the ObjCBridgingOptimization.
rdar://89987440
The ComputeEffects pass derives escape information for function arguments and adds those effects in the function.
This needs a lot of changes in check-lines in the tests, because the effects are printed in SIL
The ComputeEffects pass derives escape information for function arguments and adds those effects in the function.
This needs a lot of changes in check-lines in the tests, because the effects are printed in SIL
The `run-unit-tests` is a "pseudo" pass which is invoked from sil-opt and runs all the unit tests, implemented in Swift.
This is done from the `swift-unit-tests.sil` lit test.