This PR implements first set of changes required to support autodiff for coroutines. It mostly targeted to `_modify` accessors in standard library (and beyond), but overall implementation is quite generic.
There are some specifics of implementation and known limitations:
- Only `@yield_once` coroutines are naturally supported
- VJP is a coroutine itself: it yields the results *and* returns a pullback closure as a normal return. This allows us to capture values produced in resume part of a coroutine (this is required for defers and other cleanups / commits)
- Pullback is a coroutine, we assume that coroutine cannot abort and therefore we execute the original coroutine in reverse from return via yield and then back to the entry
- It seems there is no semantically sane way to support `_read` coroutines (as we will need to "accept" adjoints via yields), therefore only coroutines with inout yields are supported (`_modify` accessors). Pullbacks of such coroutines take adjoint buffer as input argument, yield this buffer (to accumulate adjoint values in the caller) and finally return the adjoints indirectly.
- Coroutines (as opposed to normal functions) are not first-class values: there is no AST type for them, one cannot e.g. store them into tuples, etc. So, everywhere where AST type is required, we have to hack around.
- As there is no AST type for coroutines, there is no way one could register custom derivative for coroutines. So far only compiler-produced derivatives are supported
- There are lots of common things wrt normal function apply's, but still there are subtle but important differences. I tried to organize the code to enable code reuse, still it was not always possible, so some code duplication could be seen
- The order of how pullback closures are produced in VJP is a bit different: for normal apply's VJP produces both value and pullback closure via a single nested VJP apply. This is not so anymore with coroutine VJP's: yielded values are produced at `begin_apply` site and pullback closure is available only from `end_apply`, so we need to track the order in which pullbacks are produced (and arrange consumption of the values accordingly – effectively delay them)
- On the way some complementary changes were required in e.g. mangler / demangler
This patch covers the generation of derivatives up to SIL level, however, it is not enough as codegen of `partial_apply` of a coroutine is completely broken. The fix for this will be submitted separately as it is not directly autodiff-related.
---------
Co-authored-by: Andrew Savonichev <andrew.savonichev@gmail.com>
Co-authored-by: Richard Wei <rxwei@apple.com>
This should be NFC since the only case where I used this was with self... and I
found another way of doing that using the API I added in the previous commit.
I fixed a bunch of small issues around here that resulted in a bunch of radars
being fixed. Specifically:
1. I made it so that we treat function_refs that are from an actor isolated
function as actor isolated instead of sendable.
2. I made it so that autoclosures which return global actor isolated functions
are treated as producing a global actor isolated function.
3. I made it so that we properly handle SILGen code patterns produced by
Sendable GlobalActor isolated things.
rdar://125452372
rdar://121954871
rdar://121955895
rdar://122692698
ActorIsolation already has a "I have no value case": unspecified. Lets just use
that.
Just a mistake I made that I am trying to fix before anything further depends on
this code.
To squelch errors, we need access to functionality not available in the
unittests. The unittests do not require this functionality anyways, so just
disable squelching during the unittests.
To be more specific this means that either:
1. The use is actually isolated to the same actor. This could mean that the
use is global actor isolated to the same function.
2. The use is nonisolated but is executing within a function that is globally
isolated to the same isolation domain.
rdar://123474616
This issue can come up when a value is initially statically disconnected, but
after we performed dataflow, we discovered that it was actually actor isolated
at the transfer point, implying that we are not actually transferring.
Example:
```swift
@MainActor func testGlobalAndGlobalIsolatedPartialApplyMatch2() {
var ns = (NonSendableKlass(), NonSendableKlass())
// Regions: (ns.0, ns.1), {(mainActorIsolatedGlobal), @MainActor}
ns.0 = mainActorIsolatedGlobal
// Regions: {(ns.0, ns.1, mainActorIsolatedGlobal), @MainActor}
// This is not a transfer since ns is already main actor isolated.
let _ = { @MainActor in
print(ns)
}
useValue(ns)
}
```
To do this, I also added to SILFunction an actor isolation that SILGen puts on
the SILFunction during pre function visitation. We don't print it or serialize
it for now.
rdar://123474616
Just making PartitionUtils.h a little easier to walk through by moving more of
the impl into the .cpp file. This reduces the header from ~1500 lines to ~950
lines which is more managable. This is especially important since I am going
to be adding IsolationHistory to the header file which will expand it even
further.
As an example of the change:
- // expected-note @-1 {{'x' is transferred from nonisolated caller to main actor-isolated callee. Later uses in caller could race with potential uses in callee}}
+ // expected-note @-1 {{transferring disconnected 'x' to main actor-isolated callee could cause races in between callee main actor-isolated and local nonisolated uses}}
Part of the reason I am doing this is that I am going to be ensuring that we
handle a bunch more cases and I wanted to fix this diagnostic before I added
more incaranations of it to the tests.
I am making this specific API since I am going to make it so that
SILIsolationInfo::get(SILInstruction *) can infer isolation info from self even
from functions that are not apply isolation crossing points. For example, in the
following, we need to understand that test is main actor isolated and we
shouldn't emit an error.
```swift
@MainActor func test(_ x: NonSendable) {}
@OtherActor func doSomething() {
let x = NonSendable()
Task.init { @MainActor in print(x) }
test(x)
}
```
Long term I would like to get region analysis and transfer non sendable out of
the business of directly interpreting the AST... but if we have to do it now, I
would rather us do it through a helper struct. At least the helper struct can be
modified later to work with additional SIL concurrency support when it is added.
This is in preparation for beginning to track an IsolationRegionInfo in
TransferringOperand.
Just splitting this into a separate commit to make it easier to read.
I need to start tracking the dynamic IsolationRegionInfo for the transferring
operand so I can ignore uses that are part of the same
IsolationRegionInfo. IsolationRegionInfo doesn't fit into a pointer, so just to
keep things the same, I am going to just allocate it.
This is an initial staging commit that tests out the bump ptr allocating without
expanding the type yet.
* Let the customBits and lastInitializedBitfieldID share a single uint64_t. This increases the number of available bits in SILNode and Operand from 8 to 20. Also, it simplifies the Operand class because no PointerIntPairs are used anymore to store the operand pointer fields.
* Instead make the "deleted" flag a separate bool field in SILNode (instead of encoding it with the sign of lastInitializedBitfieldID). Another simplification
* Enable important invariant checks also in release builds by using `require` instead of `assert`. Not catching such errors in release builds would be a disaster.
* Let the Swift optimization passes use all the available bits and not only a fixed amount of 8 (SILNode) and 16 (SILBasicBlock).
Enable KeyPath/AnyKeyPath/PartialKeyPath/WritableKeyPath in Embedded Swift, but
for compile-time use only:
- Add keypath optimizations into the mandatory optimizations pipeline
- Allow keypath optimizations to look through begin_borrow, to make them work
even in OSSA.
- If a use of a KeyPath doesn't optimize away, diagnose in PerformanceDiagnostics
- Make UnsafePointer.pointer(to:) transparent to allow the keypath optimization
to happen in the callers of UnsafePointer.pointer(to:).
I also eliminated the very basic "why is this task isolated" part of the warning
in favor of the larger, better, region history impl that I am going to land
soon. The diagnostic wasn't that good in the first place and also was fiddly. So
I removed it for now.
rdar://124960994
I am going to need this so that I can use it when evaluating partition
ops. Specifically, so I can when I find two values that do not merge correctly,
emit an error.
We often look at the SIL output of -sil-print-function and may want to debug a specific pass
after looking at the output.
-sil-break-before-pass-count=<pass_number> will allow to automatically break in the debugger
after <pass_count> of passes are run.
Example:
From -sil-print-function dump:
"SIL function after #6680, stage MidLevel,Function, pass 38: RedundantLoadElimination"
-Xllvm -sil-break-before-pass-count=6680 will break before running this pass in the debugger
Now that we actually know the region that non transferrable things belong to, we
can use this information to give a better diagnostic here.
A really nice effect of this is that we now emit that actor isolated parameters
are actually actor isolated instead of task isolated.
In a subsequent commit, this is going to let me begin handling parameters with
actor regions in a nice way (and standardize all of the errors).
This is meant to be a refactoring commit that uses the current tests in tree to
make sure I did it correctly, so no tests need to be updated.
To keep this as an NFC commit, I only modeled initially actor isolated using
this. I am going to make it so that we properly treat global actor isolated
values as actor isolated/etc in a subsequent commit.
InstructionRange uses 3 sets. Some algorithms need two active ranges at the top-level. Additionally, utilities often have few levels of nesting that each require a block set.
When we run RegionAnalysis, since it uses RPO order, we do not visit dead
blocks. This can create a problem when we emit diagnostics since we may merge in
a value into the region that was never actually defined. In this patch, if we
actually visit the block while performing dataflow, I mark a bit in its state
saying that it was live. Then when we emit diagnostics, I do not visit blocks
that were not marked live.
rdar://124042351
[region-isolation] Refactor out the stubify dead function if no longer used functionality from move only checker into its own pass and put it before region based isolation.
I am doing this since region based isolation hit the same issue that the move
checker did. So it makes sense to refactor the functionality into its own pass
and move it into a helper pass that runs before both.
It is very conservative and only stubifies functions that the specialization
passes explicitly mark as this being ok to be done to.
A destroy_value of Optional.none can be deleted.
A move-only struct with deinit has a non trivial SILType but OwnershipKind::None,
such values cannot be deleted.