Previously, AutoDiff closure specialization pass was triggered only on
VJPs containing single basic block. However, the pass logic allows
running on arbitrary VJPs. This PR enables the pass for all VJPs
unconditionally. So, if the pullback corresponding to multiple-BB VJP
accepts some closures directly as arguments, these closures might become
specialized by the pass. Closures passed via payload of branch tracing
enum are not specialized - this is subject for future changes.
The PR contains several commits.
1. The thing named "call site" in the code is partial_apply of pullback
corresponding to the VJP. This might appear only once, so we drop
support for multiple "call sites".
2. Enhance existing SILOptimizer tests for the pass.
3. Add validation-tests for single basic block case.
4. The change itself - delete check against single basic block.
5. Add validation-tests for multiple basic block case.
6. Add SILOptimizer tests for multiple basic block case.
This pass replaces `alloc_box` with `alloc_stack` if the box is not escaping.
The original implementation had some limitations. It could not handle cases of local functions which are called multiple times or even recursively, e.g.
```
public func foo() -> Int {
var i = 1
func localFunction() { i += 1 }
localFunction()
localFunction()
return i
}
```
The new implementation (done in Swift) fixes this problem with a new algorithm.
It's not only more powerful, but also simpler: the new pass has less than half lines of code than the old pass.
The pass is invoked in the mandatory pipeline and later in the optimizer pipeline.
The new implementation provides a module-pass for the mandatory pipeline (whereas the "regular" pass is a function pass).
This is required because the mandatory pass needs to remove originals of specialized closures, which cannot be done from a function-pass.
In the old implementation this was done with a hack by adding a semantic attribute and deleting the function later in the pipeline.
I still kept the sources of the old pass for being able to bootstrap the compiler without a host compiler.
rdar://142756547
Introduce a new pass MandatoryTempRValueElimination, which works as the original TempRValueElimination, except that it does not remove any alloc_stack instruction which are associated with source variables.
Running this pass at Onone helps to reduce copies of large structs, e.g. InlineArrays or structs containing InlineArrays.
Copying large structs can be a performance problem, even at Onone.
rdar://151629149
* `sitofp` signed integer to floating point
* `rint` round floating point to integral
* `bitcast` between integer and floating point
Constant folding `bitcast`s also made it necessary to rewrite constant folding for Nan and inf values, because the old code explicitly checked for `bitcast` intrinsics.
Relying on constant folded `bitcast`s makes the new version much simpler.
It is important to constant fold these intrinsics already in SIL because it enables other optimizations.
If the default argument generator (and, consequently, the function taking this default argument) has public visibility, it's OK to have a closure (which always has private visibility) as the default value of the argument.
Inside fragile functions, we expect function derivatives to be public, which could be achieved by either explicitly marking the functions as differentiable or having a public explicit derivative defined for them. This is obviously not
possible for single and double curry thunks which are a special case of `AutoClosureExpr`.
Instead of looking at the thunk itself, we unwrap it and look at the function being wrapped. While the thunk itself and its differentiability witness will not have public visibility, it's not an issue for the case where the function being wrapped (and its witness) have public visibility.
Fixes#54819Fixes#75776
Since `move_value` is a destroying operation, the adjoint of `y = move_value x` should be `adj[x] += adj[y]; adj[y] = 0` instead of just `adj[x] += adj[y]`.
Type annotations for instruction operands are omitted, e.g.
```
%3 = struct $S(%1, %2)
```
Operand types are redundant anyway and were only used for sanity checking in the SIL parser.
But: operand types _are_ printed if the definition of the operand value was not printed yet.
This happens:
* if the block with the definition appears after the block where the operand's instruction is located
* if a block or instruction is printed in isolation, e.g. in a debugger
The old behavior can be restored with `-Xllvm -sil-print-types`.
This option is added to many existing test files which check for operand types in their check-lines.
In #58965, lookup for custom derivatives in non-primary source files was
introduced. It required triggering delayed members parsing of nominal types in
a file if the file was compiled with differential programming enabled.
This patch introduces `CustomDerivativesRequest` to address the issue.
We only parse delayed members if tokens `@` and `derivative` appear
together inside skipped nominal type body (similar to how member operators
are handled).
Resolves#60102
The inlining benefit for VJPs and pullbacks seems to be a bit different on
macos and linux. This difference seems to be arising due to certain
function calls such as, `sin`, `cos` etc. always being inlined on macos
but not on linux.
Until I figure out a better way, I'm modifying these tests to avoid matching
the exact value of the inlining benefit, which causes them to fail on
Linux.
Changes in this CR add part of the, Swift based, Autodiff specific
closure specialization optimization pass. The pass does not modify any
code nor does it even exist in any of the optimization pipelines. The
rationale for pushing this partially complete optimization pass upstream
is to keep up with the breaking changes in the underlying Swift based
compiler infrastructure.
This PR implements first set of changes required to support autodiff for coroutines. It mostly targeted to `_modify` accessors in standard library (and beyond), but overall implementation is quite generic.
There are some specifics of implementation and known limitations:
- Only `@yield_once` coroutines are naturally supported
- VJP is a coroutine itself: it yields the results *and* returns a pullback closure as a normal return. This allows us to capture values produced in resume part of a coroutine (this is required for defers and other cleanups / commits)
- Pullback is a coroutine, we assume that coroutine cannot abort and therefore we execute the original coroutine in reverse from return via yield and then back to the entry
- It seems there is no semantically sane way to support `_read` coroutines (as we will need to "accept" adjoints via yields), therefore only coroutines with inout yields are supported (`_modify` accessors). Pullbacks of such coroutines take adjoint buffer as input argument, yield this buffer (to accumulate adjoint values in the caller) and finally return the adjoints indirectly.
- Coroutines (as opposed to normal functions) are not first-class values: there is no AST type for them, one cannot e.g. store them into tuples, etc. So, everywhere where AST type is required, we have to hack around.
- As there is no AST type for coroutines, there is no way one could register custom derivative for coroutines. So far only compiler-produced derivatives are supported
- There are lots of common things wrt normal function apply's, but still there are subtle but important differences. I tried to organize the code to enable code reuse, still it was not always possible, so some code duplication could be seen
- The order of how pullback closures are produced in VJP is a bit different: for normal apply's VJP produces both value and pullback closure via a single nested VJP apply. This is not so anymore with coroutine VJP's: yielded values are produced at `begin_apply` site and pullback closure is available only from `end_apply`, so we need to track the order in which pullbacks are produced (and arrange consumption of the values accordingly – effectively delay them)
- On the way some complementary changes were required in e.g. mangler / demangler
This patch covers the generation of derivatives up to SIL level, however, it is not enough as codegen of `partial_apply` of a coroutine is completely broken. The fix for this will be submitted separately as it is not directly autodiff-related.
---------
Co-authored-by: Andrew Savonichev <andrew.savonichev@gmail.com>
Co-authored-by: Richard Wei <rxwei@apple.com>
Previously, the lexical attribute on allock_stack instructions was used.
This doesn't work for values without lexical lifetimes which are
consumed, e.g. stdlib CoW types. Here, the new var_decl attribute on
alloc_stack is keyed off of instead. This flag encodes exactly that a
value corresponds to a source-level VarDecl, which is the condition
under which checking needs to run.
Add a new mandatory BooleanLiteralFolding pass which constant folds conditional branches with boolean literals as operands.
```
%1 = integer_literal -1
%2 = apply %bool_init(%1) // Bool.init(_builtinBooleanLiteral:)
%3 = struct_extract %2, #Bool._value
cond_br %3, bb1, bb2
```
->
```
...
br bb1
```
This pass is intended to run before DefiniteInitialization, where mandatory inlining and constant folding didn't run, yet (which would perform this kind of optimization).
This optimization is required to let DefiniteInitialization handle boolean literals correctly.
For example in infinite loops:
```
init() {
while true { // DI need to know that there is no loop exit from this while-statement
if some_condition {
member_field = init_value
break
}
}
}
```
For linear maps containing control-flow, closures (representing the pullbacks
of intermediate values) may be passed as arguments, however, they may be
hidden behind a branch-tracing enum (tracing execution flow of the original
function).
Such linear maps did not use to get inlining benefits as the compiler
could not see that the intermediate pullback closures were actually part
of the input.
This change modifies the inliner logic to correctly award inlining
benefits to linear maps containing control-flow, by checking if a
"callee" in the linear map actually traces back to an input closure that
was received as part of a branch-tracing enum input argument.
Fixes#68945
Optional's `init_enum_data_addr` and `inject_enum_addr` instructions are generated in presence of non-loadable Optional values. The compiler used to treat these instructions as inactive, and this resulted in silent run-time
issues described in #64223.
The patch marks `init_enum_data_addr` as "active" if its Optional operand is also active, and in PullbackCloner we differentiate through it and the related `inject_enum_addr`.
However, we only determine this relation in simple cases when both instructions are in the same block. There is no def-use relation between them (both take the same Optional operand), so if there is more than one set of instructions
operating on the same Optional, or there is some control flow, we currently bail out.
In PullbackCloner, we walk over instructions in reverse order and start from `inject_enum_addr` and its `Optional<Wrapped>.TangentVector` operand. Assuming that is is already initialized, we emit an `unchecked_take_enum_data_addr` and set it as the adjoint buffer of `init_enum_data_addr`. The Optional value is
invalidated, and we have to destroy the enum data address later when we reach `init_enum_data_addr`.
We need a lowered type for branch trace enum in order to compute linear map tuple type. However, the lowering of branch trace enum type depends on the types of its elements (the payloads are linear map tuples of predecessor BB).
As lowered types are cached, we cannot populate branch trace enum entries in the end as we did before: we already used wrong lowered types for linear map tuples.
Traverse basic blocks in reverse post-order traverse order building linear map tuples and branch tracing enums in one go, ensuring that we've done with predecessor BBs before processing the BB itself.