This pass replaces `alloc_box` with `alloc_stack` if the box is not escaping.
The original implementation had some limitations. It could not handle cases of local functions which are called multiple times or even recursively, e.g.
```
public func foo() -> Int {
var i = 1
func localFunction() { i += 1 }
localFunction()
localFunction()
return i
}
```
The new implementation (done in Swift) fixes this problem with a new algorithm.
It's not only more powerful, but also simpler: the new pass has less than half lines of code than the old pass.
The pass is invoked in the mandatory pipeline and later in the optimizer pipeline.
The new implementation provides a module-pass for the mandatory pipeline (whereas the "regular" pass is a function pass).
This is required because the mandatory pass needs to remove originals of specialized closures, which cannot be done from a function-pass.
In the old implementation this was done with a hack by adding a semantic attribute and deleting the function later in the pipeline.
I still kept the sources of the old pass for being able to bootstrap the compiler without a host compiler.
rdar://142756547
This PR implements first set of changes required to support autodiff for coroutines. It mostly targeted to `_modify` accessors in standard library (and beyond), but overall implementation is quite generic.
There are some specifics of implementation and known limitations:
- Only `@yield_once` coroutines are naturally supported
- VJP is a coroutine itself: it yields the results *and* returns a pullback closure as a normal return. This allows us to capture values produced in resume part of a coroutine (this is required for defers and other cleanups / commits)
- Pullback is a coroutine, we assume that coroutine cannot abort and therefore we execute the original coroutine in reverse from return via yield and then back to the entry
- It seems there is no semantically sane way to support `_read` coroutines (as we will need to "accept" adjoints via yields), therefore only coroutines with inout yields are supported (`_modify` accessors). Pullbacks of such coroutines take adjoint buffer as input argument, yield this buffer (to accumulate adjoint values in the caller) and finally return the adjoints indirectly.
- Coroutines (as opposed to normal functions) are not first-class values: there is no AST type for them, one cannot e.g. store them into tuples, etc. So, everywhere where AST type is required, we have to hack around.
- As there is no AST type for coroutines, there is no way one could register custom derivative for coroutines. So far only compiler-produced derivatives are supported
- There are lots of common things wrt normal function apply's, but still there are subtle but important differences. I tried to organize the code to enable code reuse, still it was not always possible, so some code duplication could be seen
- The order of how pullback closures are produced in VJP is a bit different: for normal apply's VJP produces both value and pullback closure via a single nested VJP apply. This is not so anymore with coroutine VJP's: yielded values are produced at `begin_apply` site and pullback closure is available only from `end_apply`, so we need to track the order in which pullbacks are produced (and arrange consumption of the values accordingly – effectively delay them)
- On the way some complementary changes were required in e.g. mangler / demangler
This patch covers the generation of derivatives up to SIL level, however, it is not enough as codegen of `partial_apply` of a coroutine is completely broken. The fix for this will be submitted separately as it is not directly autodiff-related.
---------
Co-authored-by: Andrew Savonichev <andrew.savonichev@gmail.com>
Co-authored-by: Richard Wei <rxwei@apple.com>
Previously, the lexical attribute on allock_stack instructions was used.
This doesn't work for values without lexical lifetimes which are
consumed, e.g. stdlib CoW types. Here, the new var_decl attribute on
alloc_stack is keyed off of instead. This flag encodes exactly that a
value corresponds to a source-level VarDecl, which is the condition
under which checking needs to run.
Add a new mandatory BooleanLiteralFolding pass which constant folds conditional branches with boolean literals as operands.
```
%1 = integer_literal -1
%2 = apply %bool_init(%1) // Bool.init(_builtinBooleanLiteral:)
%3 = struct_extract %2, #Bool._value
cond_br %3, bb1, bb2
```
->
```
...
br bb1
```
This pass is intended to run before DefiniteInitialization, where mandatory inlining and constant folding didn't run, yet (which would perform this kind of optimization).
This optimization is required to let DefiniteInitialization handle boolean literals correctly.
For example in infinite loops:
```
init() {
while true { // DI need to know that there is no loop exit from this while-statement
if some_condition {
member_field = init_value
break
}
}
}
```
Introduce the notion of "semantic result parameter". Handle differentiation of inouts via semantic result parameter abstraction. Do not consider non-wrt semantic result parameters as semantic results
Fixes#67174
It is necessary for opaque values where for casts that will newly start
out as checked_cast_brs and be lowered to checked_cast_addr_brs, since
the latter has the source formal type, IRGen relies on being able to
access it, and there's no way in general to obtain the source formal
type from the source lowered type.
Types that have "value semantics" should not have lexical lifetimes.
Value types are not expected to have custom deinits. Are not expected to
expose unsafe interior pointers. And cannot have weak references because
they are structs. Therefore, deinitialization barriers are irrelevant.
rdar://107076869
Fix the common error of using underscores instead of dashes.
In the rebranch this is an error (lit got more picky), but it also makes sense to fix the tests in the main branch
Closure literals are sometimes type-checked as one type then immediately converted to another
type in the AST. One particular case of this is when a closure body never throws, but the closure
is used as an argument to a function that takes a parameter that `throws`. Emitting this naively,
by emitting the closure as its original type, then converting to throws, can be expensive for
async closures, since that takes a reabstraction thunk. Even for non-async functions, we still want
to get the benefit of reabstraction optimization for the closure literal through the conversion too.
So if the function conversion just add `throws`, emit the closure as throwing, and pass down the
context abstraction pattern when emitting the closure as well.
Compiler:
- Add `Forward` and `Reverse` to `DifferentiabilityKind`.
- Expand `DifferentiabilityMask` in `ExtInfo` to 3 bits so that it now holds all 4 cases of `DifferentiabilityKind`.
- Parse `@differentiable(reverse)` and `@differentiable(_forward)` declaration attributes and type attributes.
- Emit a warning for `@differentiable` without `reverse`.
- Emit an error for `@differentiable(_forward)`.
- Rename `@differentiable(linear)` to `@differentiable(_linear)`.
- Make `@differentiable(reverse)` type lowering go through today's `@differentiable` code path. We will specialize it to reverse-mode in a follow-up patch.
ABI:
- Add `Forward` and `Reverse` to `FunctionMetadataDifferentiabilityKind`.
- Extend `TargetFunctionTypeFlags` by 1 bit to store the highest bit of differentiability kind (linear). Note that there is a 2-bit gap in `DifferentiabilityMask` which is reserved for `AsyncMask` and `ConcurrentMask`; `AsyncMask` is ABI-stable so we cannot change that.
_Differentiation module:
- Replace all occurrences of `@differentiable` with `@differentiable(reverse)`.
- Delete `_transpose(of:)`.
Resolves rdar://69980056.
I am going to be using in inst-simplify/sil-combine/canonicalize instruction a
RAUW everything against everything API (*). This creates some extra ARC
traffic/borrows. It is going to be useful to have some simple peepholes that
gets rid of some of the extraneous traffic.
(*) Noting that we are not going to support replacing non-trivial
OwnershipKind::None values with non-trivial OwnershipKind::* values. This is a
corner case that only comes up with non-trivial enums that have a non-payloaded
or trivial enum case. It is much more complex to implement that transform, but
it is an edge case, so we are just not going to support those for now.
----
I also eliminated the dependence of SILGenCleanup on Swift/SwiftShims. This
speeds up iterating on the test case with a debug compiler since we don't need
those modules.
Add differentiation support for non-active `try_apply` SIL instructions.
Notable pullback generation changes:
* Original basic blocks are now visited in a different order:
* starting from the original basic block, all its predecessors
* are visited in a breadth-first search order. This ensures that
* all successors of any block are visited before the block itself.
Resolves TF-433.
Pullback generation now supports `switch_enum` and `switch_enum_addr`
instructions for `Optional`-typed operands.
Currently, the logic is special-cased to `Optional`, but may be generalized in
the future to support enums following general rules.
* [SILGenFunction] Don't create redundant nested debug scopes
Instead of emitting:
```
sil_scope 4 { loc "main.swift":6:19 parent 3 }
sil_scope 5 { loc "main.swift":7:3 parent 4 }
sil_scope 6 { loc "main.swift":7:3 parent 5 }
sil_scope 7 { loc "main.swift":7:3 parent 5 }
sil_scope 8 { loc "main.swift":9:5 parent 4 }
```
Emit:
```
sil_scope 4 { loc "main.swift":6:19 parent 3 }
sil_scope 5 { loc "main.swift":7:3 parent 4 }
sil_scope 6 { loc "main.swift":9:5 parent 5 }
```
* [IRGenSIL] Diagnose conflicting shadow copies
If we attempt to store a value with the wrong type into a slot reserved
for a shadow copy, diagnose what went wrong.
* [SILGenPattern] Defer debug description of case variables
Create unique nested debug scopes for a switch, each of its case labels,
and each of its case bodies. This looks like:
```
switch ... { // Enter scope 1.
case ... : // Enter scope 2, nested within scope 1.
<body-1> // Enter scope 3, nested within scope 2.
case ... : // Enter scope 4, nested within scope 1.
<body-2> // Enter scope 5, nested within scope 4.
}
```
Use the new scope structure to defer emitting debug descriptions of case
bindings. Specifically, defer the work until we can nest the scope for a
case body under the scope for a pattern match.
This fixes SR-7973, a problem where it was impossible to inspect a case
binding in lldb when stopped at a case with multiple items.
Previously, we would emit the debug descriptions too early (in the
pattern match), leading to duplicate/conflicting descriptions. The only
reason that the ambiguous description was allowed to compile was because
the debug scopes were nested incorrectly.
rdar://41048339
* Update tests
For COW support in SIL it's required to "finalize" array literals.
_finalizeUninitializedArray is a compiler known stdlib function which is called after all elements of an array literal are stored.
This runtime function marks the array literal as finished.
%uninitialized_result_tuple = apply %_allocateUninitializedArray(%count)
%mutable_array = tuple_extract %uninitialized_result_tuple, 0
%elem_base_address = tuple_extract %uninitialized_result_tuple, 1
...
store %elem_0 to %elem_addr_0
store %elem_1 to %elem_addr_1
...
%final_array = apply %_finalizeUninitializedArray(%mutable_array)
In this commit _finalizeUninitializedArray is still a no-op because the COW support is not used in the Array implementation yet.
`DifferentiableFunctionInst` now stores result indices.
`SILAutoDiffIndices` now stores result indices instead of a source index.
`@differentiable` SIL function types may now have multiple differentiability
result indices and `@noDerivative` resutls.
`@differentiable` AST function types do not have `@noDerivative` results (yet),
so this functionality is not exposed to users.
Resolves TF-689 and TF-1256.
Infrastructural support for TF-983: supporting differentiation of `apply`
instructions with multiple active semantic results.
Support differentiation of `is` and `as?` operators.
These operators lower to branching cast SIL instructions, requiring control
flow differentiation support.
Resolves SR-12898.