The InstructionDeleter can remove instructions including their destroys and then insert compensating destroys at a new place.
This is effectively destroy-hoisting which doesn't respect deinit-barriers. Therefore it's not done for lexical lifetimes.
However, since https://github.com/swiftlang/swift/pull/85334, the optimizer should treat _all_ lifetimes as fixed and not only lexical lifetimes.
This change adds a `assumeFixedLifetimes` flag to InstructionDeleter which is on by default.
Only mandatory passes (like OSLogOptimization) should turn this off.
This is needed in Embedded Swift because the `witness_method` convention requires passing the witness table to the callee.
However, the witness table is not necessarily available.
A witness table is only generated if an existential value of a protocol is created.
This is a rare situation because only witness thunks have `witness_method` convention and those thunks are created as "transparent" functions, which means they are always inlined (after de-virtualization of a witness method call).
However, inlining - even of transparent functions - can fail for some reasons.
This change adds a new EmbeddedWitnessCallSpecialization pass:
If a function with `witness_method` convention is directly called, the function is specialized by changing the convention to `method` and the call is replaced by a call to the specialized function:
```
%1 = function_ref @callee : $@convention(witness_method: P) (@guaranteed C) -> ()
%2 = apply %1(%0) : $@convention(witness_method: P) (@guaranteed C) -> ()
...
sil [ossa] @callee : $@convention(witness_method: P) (@guaranteed C) -> () {
...
}
```
->
```
%1 = function_ref @$e6calleeTfr9 : $@convention(method) (@guaranteed C) -> ()
%2 = apply %1(%0) : $@convention(method) (@guaranteed C) -> ()
...
// specialized callee
sil shared [ossa] @$e6calleeTfr9 : $@convention(method) (@guaranteed C) -> () {
...
}
```
Fixes a compiler crash
rdar://165184147
This peephole optimization didn't consider that an alloc_stack of an enum can be overridden by another value.
The fix is to remove this peephole optimization at all because it is already covered by `optimizeEnum` in alloc_stack simplification.
Fixes a miscompile
https://github.com/swiftlang/swift/issues/85687
rdar://165374568
It eliminates dead access scopes if they are not conflicting with other scopes.
Removes:
```
%2 = begin_access [modify] [dynamic] %1
... // no uses of %2
end_access %2
```
However, dead _conflicting_ access scopes are not removed.
If a conflicting scope becomes dead because an optimization e.g. removed a load, it is still important to get an access violation at runtime.
Even a propagated value of a redundant load from a conflicting scope is undefined.
```
%2 = begin_access [modify] [dynamic] %1
store %x to %2
%3 = begin_access [read] [dynamic] %1 // conflicting with %2!
%y = load %3
end_access %3
end_access %2
use(%y)
```
After redundant-load-elimination:
```
%2 = begin_access [modify] [dynamic] %1
store %x to %2
%3 = begin_access [read] [dynamic] %1 // now dead, but still conflicting with %2
end_access %3
end_access %2
use(%x) // propagated from the store, but undefined here!
```
In this case the scope `%3` is not removed because it's important to get an access violation error at runtime before the undefined value `%x` is used.
This pass considers potential conflicting access scopes in called functions.
But it does not consider potential conflicting access in callers (because it can't!).
However, optimizations, like redundant-load-elimination, can only do such transformations if the outer access scope is within the function, e.g.
```
bb0(%0 : $*T): // an inout from a conflicting scope in the caller
store %x to %0
%3 = begin_access [read] [dynamic] %1
%y = load %3 // cannot be propagated because it cannot be proved that %1 is the same address as %0
end_access %3
```
All those checks are only done for dynamic access scopes, because they matter for runtime exclusivity checking.
Dead static scopes are removed unconditionally.
This code was not consulting SILFunctionConventions, so it was
passing things indirect when they are not in sil-opaque-values.
This fixes callers as well to ensure they pass arguments and
results correctly in both modes.
We are creating/relying on a contract between the AST and SIL... that SILDeclRef
should accurately describe the method/accessor that a class_method is from. By
doing this we eliminate pattern matching on the AST which ties this code too
tightly to the AST and makes it brittle in the face of AST changes. This also
fixes an issue where we were not handling setters correctly.
I am doing this now since it is natural to fix it along side fixing the
ref_element_addr issue in the previous commit since they are effectively doing
the same thing.
rdar://153207557
This adds initial support for differentiation of functions that may produce `Error` result.
Essentially we wrap the pullback into `Optional` and emit a diamond-shape control flow pattern depending on whether the pullback value is available or not. VJP emission was modified to accommodate for this. In addition to this, some additional tricks are required as `try_apply` result is not available in the instruction parent block, it is available in normal successor basic block.
As a result we can now:
- differentiate an active `try_apply` result (that would be produced from `do ... try .. catch` constructions)
- `try_apply` when error result is unreachable (usually `try!` and similar source code constructs)
- Support (some) throwing functions with builtin differentiation operators. stdlib change will follow. Though we cannot support typed throws here (yet)
- Correctly propagate error types during currying around differentiable functions as well as type-checking for `@derivative(of:)` attribute, so we can register custom derivatives for functions producing error result
- Added custom derivative for `Optional.??` operator (note that support here is not yet complete as we cannot differentiate through autoclosures, so `x ?? y` works only if `y` is not active, e.g. a constant value).
Some fixes here and there
It hoists `destroy_value` instructions for non-lexical values.
```
%1 = some_ownedValue
...
last_use(%1)
... // other instructions
destroy_value %1
```
->
```
%1 = some_ownedValue
...
last_use(%1)
destroy_value %1 // <- moved after the last use
... // other instructions
```
In contrast to non-mandatory optimization passes, this is the only pass which hoists destroys over deinit-barriers.
This ensures consistent behavior in -Onone and optimized builds.
The previous algorithm was doing an iterative forward data flow analysis
followed by a reverse data flow analysis. I suspect the history here is that
it was a reverse analysis, and that didn't really work for infinite loops,
and so complexity accumulated.
The new algorithm is quite straightforward and relies on the allocations
being properly jointly post-dominated, just not nested. We simply walk
forward through the blocks in consistent-with-dominance order, maintaining
the stack of active allocations and deferring deallocations that are
improperly nested until we deallocate the allocations above it. The only
real subtlety is that we have to delay walking into dead-end regions until
we've seen all of the edges into them, so that we can know whether we have
a coherent stack state in them. If the state is incoherent, we need to
remove any deallocations of previous allocations because we cannot talk
correctly about what's on top of the stack.
The reason I'm doing this, besides it just being a simpler and hopefully
faster algorithm, is that modeling some of the uses of the async stack
allocator properly requires builtins that cannot just be semantically
reordered. That should be somewhat easier to handle with the new approach,
although really (1) we should not have runtime functions that need this and
(2) we're going to need a conservatively-correct solution that's different
from this anyway because hoisting allocations is *also* limited in its own
way.
I've attached a rather pedantic proof of the correctness of the algorithm.
The thing that concerns me most about the rewritten pass is that it isn't
actually validating joint post-dominance on input, so if you give it bad
input, it might be a little mystifying to debug the verifier failures.
This pass removes `copy_addr` instructions.
However, it has some problems which causes compiler crashes.
It's not worth fixing these bugs because
1. Most copy_addrs can be eliminated by TempRValueElimination and TempLValueElimination.
2. Once we have opaque value we don't need copy_addr elimination, anyway.
rdar://162212460
The caller is allowed to assume that the 'inout sending' parameters are not in
the same region on return so can be sent to different isolation domains safely.
To enforce that we have to ensure on return that the two are /actually/ not in
the same region.
rdar://138519484
This pass has been disabled since a very long time (because it's terrible for code size).
It does not work for OSSA. Therefore it cannot be enabled anymore (as is) once we have OSSA throughout the pipeline.
So it's time to completely remove it.
Beside supporting OSSA, this change significantly simplifies the pass.
The main change is that instead of starting at a closure (e.g. `partial_apply`) and finding all call sites, we now start at a call site and look for closures for all arguments. This makes a lot of things much simpler, e.g. not so many intermediate data structures are required to track all the states.
I needed to remove the 3 unit tests because the things those tests were testing are not there anymore. However, the pass is tested with a lot of sil tests (and I added quite a few), which should give good test coverage.
The old ClosureSpecializer pass is still kept in place, because at that point in the pipeline we don't have OSSA, yet. Once we have that, we can replace the old pass withe the new one.
However, the autodiff closure specializer already runs in the OSSA pipeline and there the new changes take effect.
The intent for `@inline(always)` is to act as an optimization control.
The user can rely on inlining to happen or the compiler will emit an error
message.
Because function values can be dynamic (closures, protocol/class lookup)
this guarantee can only be upheld for direct function references.
In cases where the optimizer can resolve dynamic function values the
attribute shall be respected.
rdar://148608854
(old name: CapturePropagation)
The pass is now rewritten in swift which makes the code smaller and simpler.
Compared to the old pass it has two improvements:
* It can constant propagate whole structs (and not only builtin literals). This is important for propagating "real" Swift constants which have a struct type of e.g. `Int`.
* It constant propagates keypaths even if there are other non-constant closure captures which are not propagated. This is something the old pass didn't do.
rdar://151185177
* move some Cloner utilities from ContextCommon.swift directly into Cloner.swift
* add an `cloneRecursively` overload which doesn't require the `customGetCloned` closure argument
* some small cleanups
So far, constant propagated arguments could only be builtin literals.
Now we support arbitrary structs (with constant arguments), e.g. `Int`.
This requires a small addition in the mangling scheme for function specializations.
Also, the de-mangling tree now looks a bit different to support a "tree" of structs and literals.
I am going to be adding support to passes for emitting IsolationHistory behind a
flag. As part of this, we need to store the state of the partition that created
the error when the error is emitted. A partition stores heap memory so it makes
sense to make these types noncopyable types so we just move the heap memory
rather than copying it all over the place.