We mostly get away without this because we're fairly disciplined
about using constant memory orderings, and apparently that's
usually good enough to get inline accesses and avoid needing to
link atomic. However, we have a few places with the task status
atomic that use a non-constant load ordering with load and
compare_exchange_weak, and my recent change to make that atomic
a double-word was apparently sufficient on some (but not all)
Linux distributions to get the compiler to call the runtime
function. Regardless, we shouldn't be playing around in the
margins here: Linux requires us to link libatomic, so we should.
See rdar://79378762, SR-14802, SR-14841, SR-14875.
This doesn't resolve all hangs, such as those occurring
due to explicit usage of priorities and certain other
situations where priorities seem to be causing issues
(rdar://79823345), but it does resolve some cases.
The original async ABI made callees deallocate the context,
which allows tail calls (at the async-function level) but
interferes with callers' ability to optimize callee frame
allocation. The purpose of this bit was to allow callers
to do that optimization, but we've since just made callers
responsible for deallocating the context, which is overall
just a lot simpler. So this has been dead for quite some
time.
Tracking this as a single bit is actually largely uninteresting
to the runtime. To handle priority escalation properly, we really
need to track this at a finer grain of detail: recording that the
task is running on a specific thread, enqueued on a specific actor,
or so on. But starting by tracking a single bit is important for
two reasons:
- First, it's more realistic about the performance overheads of
tasks: we're going to be doing this tracking eventually, and
the cost of that tracking will be dominated by the atomic
access, so doing that access now sets the baseline about right.
- Second, it ensures that we've actually got runtime involvement
in all the right places to do this tracking.
A propos of the latter: there was no runtime involvement with
awaiting a continuation, which is a point at which the task
potentially transitions from running to suspended. We must do
the tracking as part of this transition, rather than recognizing
in the run-loops that a task is still active and treating it as
having suspended, because the latter point potentially races with
the resumption of the task. To do this, I've had to introduce
a runtime function, swift_continuation_await, to do this awaiting
rather than inlining the atomic operation on the continuation.
As part of doing this work, I've also fixed a bug where we failed
to load-acquire in swift_task_escalate before walking the task
status records to invoke escalation actions.
I've also fixed several places where the handling of task statuses
may have accidentally allowed the task to revert to uncancelled.
The self object isn't actually a Swift object, so we can neither
do class dispatch on it nor retain it with swift_retain.
Some of the credit goes to Mike Ash on this one. All the
blame is mine, of course.
This builtin never occurs in @inlinable code. But apparently we still
need to add a language feature for every builtin. This must allow
older compilers to reparse the library source (though I don't know why
that would ever happen!)
Fixes rdar://80525569 error: module 'Builtin' has no member named 'hopToActor')
The prior implementation of `Task.sleep()` effectively had two
different atomic words to capture the state, which could lead to cases
where cancelling before a sleep operation started would fail to
throw `CancellationError`. Reimplement the logic for the cancellable
sleep with a more traditional lock-free approach by
packing all of the state information into a single word, where we
always load, figure out what to do, then compare-and-swap.
Do this as a staged change to the ABI, introducing an underscored
`@usableFromInline` implementation to the ABI that we can rely on
later, and an `@_alwaysEmitIntoClient` version we can inline now.
The symbol is swift_async_extendedFramePointerFlags. Since the value doesn't need to be dynamically computed, we save a level of indirection by emitting a fake global variable whose address is the value we want, similar to objc_absolute_packed_isa_class_mask.
This bit is mixed in to the frame pointer address stored on the stack to signal that a frame is an async frame. The compiler can emit code that ORs in the address of this symbol to apply the appropriate flag when it doesn't know the flag statically.
rdar://80277146
* Synchronize both versions of actor_counters.swift test
* Synchronize on Job address
Make sure to synchronize on Job address (AsyncTasks are Jobs, but not
all Jobs are AsyncTasks).
* Add fprintf debug output for TSan acquire/release
* Add tsan_release edge on task creation
without this, we are getting false data races between when a task
is created and immediately scheduled on a different thread.
False positive for `Sanitizers/tsan/actor_counters.swift` test:
```
WARNING: ThreadSanitizer: data race (pid=81452)
Read of size 8 at 0x7b2000000560 by thread T5:
#0 Counter.next() <null>:2 (a.out:x86_64+0x1000047f8)
#1 (1) suspend resume partial function for worker(identity:counters:numIterations:) <null>:2 (a.out:x86_64+0x100005961)
#2 swift::runJobInEstablishedExecutorContext(swift::Job*) <null>:2 (libswift_Concurrency.dylib:x86_64+0x280ef)
Previous write of size 8 at 0x7b2000000560 by main thread:
#0 Counter.init(maxCount:) <null>:2 (a.out:x86_64+0x1000046af)
#1 Counter.__allocating_init(maxCount:) <null>:2 (a.out:x86_64+0x100004619)
#2 runTest(numCounters:numWorkers:numIterations:) <null>:2 (a.out:x86_64+0x100006d2e)
#3 swift::runJobInEstablishedExecutorContext(swift::Job*) <null>:2 (libswift_Concurrency.dylib:x86_64+0x280ef)
#4 main <null>:2 (a.out:x86_64+0x10000a175)
```
New edge with this change:
```
[4357150208] allocate task 0x7b3800000000, parent = 0x0
[4357150208] creating task 0x7b3800000000 with parent 0x0
[4357150208] tsan_release on 0x7b3800000000 <<< new release edge
[139088221442048] tsan_acquire on 0x7b3800000000
[139088221442048] trying to switch from executor 0x0 to 0x7ff85e2d9a00
[139088221442048] switch failed, task 0x7b3800000000 enqueued on executor 0x7ff85e2d9a00
[139088221442048] enqueue job 0x7b3800000000 on executor 0x7ff85e2d9a00
[139088221442048] tsan_release on 0x7b3800000000
[139088221442048] tsan_release on 0x7b3800000000
[4357150208] tsan_acquire on 0x7b3800000000
counters: 1, workers: 1, iterations: 1
[4357150208] allocate task 0x7b3c00000000, parent = 0x0
[4357150208] creating task 0x7b3c00000000 with parent 0x0
[4357150208] tsan_release on 0x7b3c00000000 <<< new release edge
[139088221442048] tsan_acquire on 0x7b3c00000000
[4357150208] task 0x7b3800000000 waiting on task 0x7b3c00000000, going to sleep
[4357150208] tsan_release on 0x7b3800000000
[4357150208] tsan_release on 0x7b3800000000
[139088221442048] getting current executor 0x0
[139088221442048] tsan_release on 0x7b3c00000000
...
```
rdar://78932849
* Add static_cast<Job *>()
* Move TSan release edge to swift_task_enqueueGlobal()
Move the TSan release edge from `swift_task_create_commonImpl()` to
`swift_task_enqueueGlobalImpl()`. Task creation itself is not an event
that needs synchronization, but rather that task creation "happens
before" execution of that task on another thread.
This edge is usually added when the task is scheduled via
`swift_task_enqueue()` (which then usually calls
`swift_task_enqueueGlobal()`). However, not all task scheduling goes
through the `swift_task_enqueue()` funnel as some places call the more
specific `swift_task_enqueueGlobal()` directly. So let's annotate this
function (duplicate edges aren't harmful) to ensure we cover all
schedule events, including newly-created tasks (our original problem
here).
rdar://78932849
Co-authored-by: Julian Lettner <julian.lettner@apple.com>
This is to workaround a bug in llvm's codegen when emitting the
callee-pop stack adjustment on a regular return from a swiftasync
function (vs. a tail call).
Without the workaround we fail to emit the callee-pop stack adjustment
leading to a mis-aligned stack on return.
```
pop {r7, pc}
add sp, #16
```
Workaround for rdar://79726989
The `swift_task_create` entry point is our general runtime ABI for
launching tasks. Make the various Swift APIs sitting on top of it
always-emit-into-client to take them out of the ABI. This reduces the
number of ABI entry points and allows us to make more ABI-compatible
changes to the Swift side.
We're not actually performing the adjustments at the moment due to an
unrelated bug, and will want to perform them within
`swift_task_create_common` based on inheritContext and the given
priority.
Rather than using group task options constructed from the Swift parts
of the _Concurrency library and passed through `createAsyncTask`'s
options, introduce a separate builtin that always takes a group. Move
the responsibility for creating the options structure into IRGen, so
we don't need to expose the TaskGroupTaskOptionRecord type in Swift.