This is to deal with the fact that LLVM's coroutine can't handle instructions
with side-effects well that are inserted before the coro.begin.
rdar://81113950
Change the code generation patterns for `async let` bindings to use an ABI based on the following
functions:
- `swift_asyncLet_begin`, which starts an `async let` child task, but which additionally
now associates the `async let` with a caller-owned buffer to receive the result of the task.
This is intended to allow the task to emplace its result in caller-owned memory, allowing the
child task to be deallocated after completion without invalidating the result buffer.
- `swift_asyncLet_get[_throwing]`, which replaces `swift_asyncLet_wait[_throwing]`. Instead of
returning a copy of the value, this entry point concerns itself with populating the local buffer.
If the buffer hasn't been populated, then it awaits completion of the task and emplaces the
result in the buffer; otherwise, it simply returns. The caller can then read the result out of
its owned memory. These entry points are intended to be used before every read from the
`async let` binding, after which point the local buffer is guaranteed to contain an initialized
value.
- `swift_asyncLet_finish`, which replaces `swift_asyncLet_end`. Unlike `_end`, this variant
is async and will suspend the parent task after cancelling the child to ensure it finishes
before cleaning up. The local buffer will also be deinitialized if necessary. This is intended
to be used on exit from an `async let` scope, to handle cleaning up the local buffer if necessary
as well as cancelling, awaiting, and deallocating the child task.
- `swift_asyncLet_consume[_throwing]`, which combines `get` and `finish`. This will await completion
of the task, leaving the result value in the result buffer (or propagating the error, if it
throws), while destroying and deallocating the child task. This is intended as an optimization
for reading `async let` variables that are read exactly once by their parent task.
To avoid an epoch break with existing swiftinterfaces and ABI clients, the old builtins and entry
points are kept intact for now, but SILGen now only generates code using the new interface.
This new interface fixes several issues with the old async let codegen, including use-after-free
crashes if the `async let` was never awaited, and the inability to read from an `async let` variable
more than once.
rdar://77855176
This patch is primarily doing two things:
- Teach IRGen for some of the instructions to generate debug info based
on SILDebugVariable rather than AST node, which is not always
available when reading from SIL.
- Generate corresponding `llvm.dbg.*` intrinsics and `!DIExpression`
from the newly added SIL DIExpression.
Currently IRGen requires the AST node of a variable declaration to
generate debug location. However, this will fail if the input is SIL due
to the lack of AST reconstruction. Plus, it's unnecessary since we can
just use the `SILLocation` attached on `debug_value` (and its friends) SIL
instruction to generate the correct LLVM debug metadata.
Resolves SR-14868
This patch removes a heuristic to promote all debug intrinsics pointing into
allocas to llvm.dbg.declare() intrinsics and instead more accurate classifies
variables in async contexts by adding the missing cases alloc_box and
alloc_stack cases.
rdar://78977132
Changes the task, taskGroup, asyncLet wait funtion call ABIs.
To reduce code size pass the context parameters and resumption function
as arguments to the wait function.
This means that the suspend point does not need to store parent context
and resumption to the suspend point's context.
```
void swift_task_future_wait_throwing(
OpaqueValue * result,
SWIFT_ASYNC_CONTEXT AsyncContext *callerContext,
AsyncTask *task,
ThrowingTaskFutureWaitContinuationFunction *resume,
AsyncContext *callContext);
```
The runtime passes the caller context to the resume entry point saving
the load of the parent context in the resumption function.
This patch adds a `Metadata *` field to `GroupImpl`. The await entry
pointer no longer pass the metadata pointer and there is a path through
the runtime where the task future is no longer available.
Through various means, it is possible for a synchronous actor-isolated
function to escape to another concurrency domain and be called from
outside the actor. The problem existed previously, but has become far
easier to trigger now that `@escaping` closures and local functions
can be actor-isolated.
Introduce runtime detection of such data races, where a synchronous
actor-isolated function ends up being called from the wrong executor.
Do this by emitting an executor check in actor-isolated synchronous
functions, where we query the executor in thread-local storage and
ensure that it is what we expect. If it isn't, the runtime complains.
The runtime's complaints can be controlled with the environment
variable `SWIFT_UNEXPECTED_EXECUTOR_LOG_LEVEL`:
0 - disable checking
1 - warn when a data race is detected
2 - error and abort when a data race is detected
At an implementation level, this introduces a new concurrency runtime
entry point `_checkExpectedExecutor` that checks the given executor
(on which the function should always have been called) against the
executor on which is called (which is in thread-local storage). There
is a special carve-out here for `@MainActor` code, where we check
against the OS's notion of "main thread" as well, so that `@MainActor`
code can be called via (e.g.) the Dispatch library's
`DispatchQueue.main.async`.
The new SIL instruction `extract_executor` performs the lowering of an
actor down to its executor, which is implicit in the `hop_to_executor`
instruction. Extend the LowerHopToExecutor pass to perform said
lowering.
Previously, the partial application of a generic method at the receiver
would result in the emission of a "simple" partial apply where the
receiver was treated as the context and the original function was
treated as the function. That was problematic because such partial
applications must capture the polymorphic arguments supplied at
partial_apply time. Here, a full partial application forwarder is
emitted for this case.
rdar://76479222
As part of bringup, specifically in order to support storing the size of
the async context as the first entry in the thick context, the
optimization that allows the partial application of a single refcounted
object to avoid the allocation of a thick context was disabled.
Now that we have async function pointers for partial application
forwarders, that rationale is moot, so, here, the optimization is
restored.
rdar://76372871
The address of the function to be called when generating code to invoke
the function associated with FunctionPointer which is produced via
direct reference is by definition statically known; it is neither necessary
nor desireable to load this address out of the AsyncFunctionPointer
corresponding to the function.
Here, that spurious additional work is skipped. The approach is to add
a second value to the FunctionPointer struct. For FunctionPointers
whose kind is AsyncFunctionPointer, this value is either null or else
the address of the corresponding function.
rdar://71376092
Previously, AsyncFunctionPointer constants were signed as code. That
was incorrect considering that these constants are in fact data. Here,
that is fixed.
rdar://76118522
The comment in LowerHopToActor explains the design here.
We want SILGen to emit hops to actors, ignoring executors,
because it's easier to fully optimize in a world where deriving
an executor is a non-trivial operation. But we also want something
prior to IRGen to lower the executor derivation because there are
useful static optimizations we can do, such as doing the derivation
exactly once on a dominance path and strength-reducing the derivation
(e.g. exploiting static knowledge that an actor is a default actor).
There are probably phase-ordering problems with doing this so late,
but hopefully they're restricted to situations like actors that
share an executor. We'll want to optimize that eventually, but
in the meantime, this unblocks the executor work.
The immediate desire is to minimize the set of ABI dependencies
on the layout of an ExecutorRef. In addition to that, however,
I wanted to generally reduce the code size impact of an unsafe
continuation since it now requires accessing thread-local state,
and I wanted resumption to not have to create unnecessary type
metadata for the value type just to do the initialization.
Therefore, I've introduced a swift_continuation_init function
which handles the default initialization of a continuation
and returns a reference to the current task. I've also moved
the initialization of the normal continuation result into the
caller (out of the runtime), and I've moved the resumption-side
cmpxchg into the runtime (and prior to the task being enqueued).
Throwing functions pass the error result in `swiftself` to the resume
partial function.
Therefore, `() async -> ()` to `() async throws -> ()` is not ABI compatible.
TODO: go through remaining failing IRGen async tests and replace the
illegal convert_functions.
Most of the async runtime functions have been changed to not
expect the task and executor to be passed in. When knowing the
task and executor is necessary, there are runtime functions
available to recover them.
The biggest change I had to make to a runtime function signature
was to swift_task_switch, which has been altered to expect to be
passed the context and resumption function instead of requiring
the caller to park the task. This has the pleasant consequence
of allowing the implementation to very quickly turn around when
it recognizes that the current executor is satisfactory. It does
mean that on arm64e we have to sign the continuation function
pointer as an argument and then potentially resign it when
assigning into the task's resume slot.
rdar://70546948
Previously, thick async functions were represented sometimes as a pair
of (AsyncFunctionPointer, nullptr)--when the thick function was produced
via a thin_to_thick_function, e.g.--and sometimes as a pair of
(FunctionPointer, ThickContext)--when the thick function was produced by
a partial_apply--with the size stored in the slot of the ThickContext.
That optimized for the wrong case: partial applies of dynamic async
functions; in that case, there is no appropriate AsyncFunctionPointer to
form when lowering the partial_apply instruction. The far more common
case is to know exactly which function is being partially applied. In
that case, we can form the appropriate AsyncFunctionPointer.
Furthermore, the previous representation made calling a thick function
more complex: it was always necessary to check whether the context was
in fact null and then proceed along two different paths depending.
Here, that behavior is corrected by creating a thunk in a mandatory
IRGen SIL pass in the case that the function that is being partially
applied is dynamic. That new thunk is then partially applied in place
of the original partial_apply of the dynamic function.
We don't need to initialize the global object if it's never used for something which can access the object header.
This let's e.g. lookup-tables to be compiled with zero Array-overhead.
rdar://59874359
OwnershipEliminator lowers destroy_value [poison] to debug_value
[poison].
IRGen overwrites all references in shadow copies with a sentinel value
in place of debug_value [poison].
Part 2/2: rdar://75012368 (-Onone compiler support for early object
deinitialization with sentinel dead references)
The `coro.end.async` intrinsic allow specifying a function that is to be
tail-called as the last thing before returning.
LLVM lowering will inline the `must-tail-call` function argument to
`coro.end.async`. This `must-tail-call` function can contain a
`musttail` call.
```
define @my_must_tail_call_func(void (*)(i64) %fnptr, i64 %args) {
musttail call void %fnptr(i64 %args)
ret void
}
define @async_func() {
...
coro.end.async(..., @my_must_tail_call_func, %return_continuation, i64 %args)
unreachable
}
```
r11 is the linker scratch register, so we need to make sure that
we don't end up in the linker when making cross-library direct
async calls. Since there are currently only a few special
symbols that we do that for, we can achieve this by disabling
lazy binding for those symbols specifically.
First, just call an async -> T function instead of forcing the caller
to piece together which case we're in and perform its own copy. This
ensures that the task is actually kept alive properly.
Second, now that we no longer implicitly depend on the waiting tasks
being run synchronously, go ahead and schedule them to run on the
global executor.
This solves some problems which were blocking the work on TLS-ifying
the task/executor state.