Fixes rdar://113339972 DeadStoreElimination causes uninitialized closure context
Before this fix, the recently enabled function side effect implementation
would return no side effects for a partial apply that is not applied
in the same function. This resulted in DeadStoreElimination
incorrectly eliminating the initialization of the closure context.
The fix is to model the effects of capturing the arguments for the
closure context. The effects of running the closure body will be
considered later, at the point that the closure is
applied. Running the closure does, however, depend on the captured
values to be valid. If the value being captured is addressible,
then we need to model the effect of reading from that memory. In
this case, the capture reads from a local stack slot:
%stack = alloc_stack $Klass
store %ref to %stack : $*Klass
%closure = partial_apply [callee_guaranteed] %f(%stack)
: $@convention(thin) (@in_guaranteed Klass) -> ()
Later, when the closure is applied, we won't have any reference back
to the original stack slot. The application may not even happen in a caller.
Note that, even if the closure will be applied in the current
function, the side effects of the application are insufficient to
cover the side effects of the capture. For example, the closure
body itself may not read from an argument, but the context must
still be valid in case it is copied or if the capture itself was
not a bitwise-move.
As an optimization, we ignore the effect of captures for on-stack
partial applies. Such captures are always either a bitwise-move
or, more commonly, capture the source value by address. In these
cases, the side effects of applying the closure are sufficient to
cover the effects of the captures. And, if an on-stack closure is
not invoked in the current function (or passed to a callee) then
it will never be invoked, so the captures never have effects.
For example:
```
var p = Point(x: 10, y: 20)
let o = UnsafePointer(&p)
```
Also support outlined arrays with pointers to other globals. For example:
```
var g1 = 1
var g2 = 2
func f() -> [UnsafePointer<Int>] {
return [UnsafePointer(&g1), UnsafePointer(&g2)]
}
```
Sometimes structs are not stored in one piece, but as individual elements. Merge such individual stores to a single store of the whole struct.
This enables generating a statically initialized global.
The new implementation has several benefits compared to the old C++ implementation:
* It is significantly simpler. It optimizes each load separately instead of all at once with bit-field based dataflow.
* It's using alias analysis more accurately which enables more loads to be optimized
* It avoids inserting additional copies in OSSA
The algorithm is a data flow analysis which starts at the original load and searches for preceding stores or loads by following the control flow in backward direction.
The preceding stores and loads provide the "available values" with which the original load can be replaced.
To make it available in other optimizations as well.
Also, a few problems:
* Use destructre instructions when in OSSA
* Don't split the store if it's nominal type has unreferenceable stoarge
* rename it to `trySplit` because it's not guaranteed to work
Also, add the counterpart for load instructions: `LoadInst.trySplit()`
`ownership` is a bad name in `LoadInst`, because it hides `Value.ownership`.
Therefore rename it to `loadOwnership`.
Do the same for ownership in StoreInst to be consistent.
Before this change, if a global variable is required to be statically initialized (e.g. due to @_section attribute), we don't allow its type to be a struct, only a scalar type works. This change improves on that by teaching MandatoryPerformanceOptimizations pass to inline struct initializer calls into initializer of globals, as long as they are simple enough so that we can be sure that we don't trigger recursive/infinite inlining.
The old C++ pass didn't catch a few cases.
Also:
* The new pass is significantly simpler: it doesn't perform dataflow for _all_ memory locations at once using bitfields, but handles each store separately. (In both implementations there is a complexity limit in place to avoid quadratic complexity)
* The new pass works with OSSA
For `alloc_ref [bare] [stack]` and `global_value [bare]` omit the object header initialization.
The `bare` flag means that the object header is not used.
This was already done with a peephole optimization inside IRGen for `global_value`. But now rely on the SIL `bare` flag.
It sets the `[bare]` attribute for `alloc_ref` and `global_value` instructions if their header (reference count and metatype) is not used throughout the lifetime of the object.
The `bare` attribute indicates that the object header is not used throughout the lifetime of the value.
This means, no reference counting operations are performed on the object and its metadata is not used.
The header of bare objects doesn't need to be initialized.
* add the StaticInitCloner utility
* remove bridging of `copyStaticInitializer` and `createStaticInitializer`
* add `Context.mangleOutlinedVariable` and `Context.createGlobalVariable`
* add new create-functions for instructions
* allow the Builder to build static initializer instructions for global variables
* some refactoring to simplify the implementation
This allows to run the NamedReturnValueOptimization only late in the pipeline.
The optimization shouldn't be done before serialization, because it might prevent predictable memory optimizations in the caller after inlining.
It converts a lazily initialized global to a statically initialized global variable.
When this pass runs on a global initializer `[global_init_once_fn]` it tries to create a static initializer for the initialized global.
```
sil [global_init_once_fn] @globalinit {
alloc_global @the_global
%a = global_addr @the_global
%i = some_const_initializer_insts
store %i to %a
}
```
The pass creates a static initializer for the global:
```
sil_global @the_global = {
%initval = some_const_initializer_insts
}
```
and removes the allocation and store instructions from the initializer function:
```
sil [global_init_once_fn] @globalinit {
%a = global_addr @the_global
%i = some_const_initializer_insts
}
```
The initializer then becomes a side-effect free function which let's the builtin-simplification remove the `builtin "once"` which calls the initializer.
* Disallow stores in the return -> argument path. When walking up in the EscapeUtils, it's allowed to follow stores. Therefore stores wouldn't be handled correctly.
* Also make sure that there is a return -> argument path at all
Fixes a wrong address-escaping effect in case the called function copies an indirect argument to a newly created object.
rdar://105133434
If a `debug_step` has the same debug location as a previous or succeeding instruction it is removed.
It's just important that there is at least one instruction for a certain debug location so that single stepping on that location will work.
* split the `PassContext` into multiple protocols and structs: `Context`, `MutatingContext`, `FunctionPassContext` and `SimplifyContext`
* change how instruction passes work: implement the `simplify` function in conformance to `SILCombineSimplifyable`
* add a mechanism to add a callback for inserted instructions
A destroy_addr also involves a read from the address. It's equivalent to a `%x = load [take]` and `destroy_value %x`.
It's also a write, because the stored value is not available anymore after the destroy.
Fixes a compiler crash in SILMem2Reg.
rdar://103879105