This PR implements first set of changes required to support autodiff for coroutines. It mostly targeted to `_modify` accessors in standard library (and beyond), but overall implementation is quite generic.
There are some specifics of implementation and known limitations:
- Only `@yield_once` coroutines are naturally supported
- VJP is a coroutine itself: it yields the results *and* returns a pullback closure as a normal return. This allows us to capture values produced in resume part of a coroutine (this is required for defers and other cleanups / commits)
- Pullback is a coroutine, we assume that coroutine cannot abort and therefore we execute the original coroutine in reverse from return via yield and then back to the entry
- It seems there is no semantically sane way to support `_read` coroutines (as we will need to "accept" adjoints via yields), therefore only coroutines with inout yields are supported (`_modify` accessors). Pullbacks of such coroutines take adjoint buffer as input argument, yield this buffer (to accumulate adjoint values in the caller) and finally return the adjoints indirectly.
- Coroutines (as opposed to normal functions) are not first-class values: there is no AST type for them, one cannot e.g. store them into tuples, etc. So, everywhere where AST type is required, we have to hack around.
- As there is no AST type for coroutines, there is no way one could register custom derivative for coroutines. So far only compiler-produced derivatives are supported
- There are lots of common things wrt normal function apply's, but still there are subtle but important differences. I tried to organize the code to enable code reuse, still it was not always possible, so some code duplication could be seen
- The order of how pullback closures are produced in VJP is a bit different: for normal apply's VJP produces both value and pullback closure via a single nested VJP apply. This is not so anymore with coroutine VJP's: yielded values are produced at `begin_apply` site and pullback closure is available only from `end_apply`, so we need to track the order in which pullbacks are produced (and arrange consumption of the values accordingly – effectively delay them)
- On the way some complementary changes were required in e.g. mangler / demangler
This patch covers the generation of derivatives up to SIL level, however, it is not enough as codegen of `partial_apply` of a coroutine is completely broken. The fix for this will be submitted separately as it is not directly autodiff-related.
---------
Co-authored-by: Andrew Savonichev <andrew.savonichev@gmail.com>
Co-authored-by: Richard Wei <rxwei@apple.com>
Linear maps are captured in vjp routine via callee-guaranteed partial apply and are passed as @owned references to the enclosing pullback that finally consumes them. Necessary retains are inserted by a partial apply forwarder.
However, this is not the case when the function being differentiated contains loops as heap-allocated context is used and bare pointer is captured by the pullback partial apply. As a result, partial apply forwarder does not retain the linear maps that are owned by a heap-allocated context, however, they are still treated as @owned references and therefore are released in the pullback after the first call. As a result, subsequent pullback calls release linear maps and we'd end with possible use-after-free.
Ensure we retain values when we load values from the context.
Reproducible only when:
* Loops (so, heap-allocated context)
* Pullbacks of thick functions (so context is non-zero)
* Multiple pullback calls
* Some cleanup while there
Fixes#64257
The changes are intentionally were made close to the original implementation w/o possible simplifications to ease the review
Fixes#63207, supersedes #63379 (and fixes#63234)
We had two notions of canonical types, one is the structural property
where it doesn't contain sugared types, the other one where it does
not contain reducible type parameters with respect to a generic
signature.
Rename the second one to a 'reduced type'.
In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.
When a function contains a differentiated loop, the closure context is a `Builtin.NativeObject`, which contains a `swift::AutoDiffLinearMapContext` and a tail-allocated top-level linear map struct (which represents the linear map struct that was previously directly partial-applied into the pullback). In branching trace enums, the payloads of previously indirect cases will be allocated by `swift::AutoDiffLinearMapContext::allocate` and stored as a `Builtin.RawPointer`.
Move differentiation-related SILOptimizer files to
{include/swift,lib}/SILOptimizer/Differentiation/.
This reduces directory nesting and gathers files together.