This patch has two desirable effects for the price of one.
1. An uncaught error thrown from main will now explode
2. Move us off of using runAsyncAndBlock
The issue with runAsyncAndBlock is that it blocks the main thread
outright. UI and the main actor need to run on the main thread or bad
things happen, so blocking the main thread results in a bad day for
them.
Instead, we're using CFRunLoopRun to run the core-foundation run loop on
the main thread, or, dispatch_main if CFRunLoopRun isn't available.
The issue with just using dispatch_main is that it doesn't actually
guarantee that it will run the tasks on the main thread either, just
that it clears the main queue. We don't want to require everything that
uses concurrency to have to include CoreFoundation either, but dispatch
is already required, which supplies dispatch_main, which just empties
out the main queue.
When reallocating storage, the storing the pointer to the new storage had insufficiently strong ordering. This could cause writers to check the reader count before storing the new storage pointer. If a reader then came in between that load and store, it would end up using the old storage pointer, while the writer could end up freeing it.
Also adjust the Concurrent.cpp tests to test with a variety of reader and writer counts. Counterintuitively, when freeing garbage is gated on there being zero readers, having a single reader will shake out problems that having lots of readers will not. We were testing with lots of readers in order to stress the code as much as possible, but this resulted in it being extremely rare for writers to ever see zero active readers.
rdar://69798617
Mark the relevant snapshot methods as ref-qualified (adding a & after the parameter list) so the compiler will enforce this for us and forbid calling them on temporaries.
rdar://72997638
To help catch runtime issues adopting `withUnsafeContinuation`, such as callback-based APIs that misleadingly
invoke their callback multiple times and/or not at all, provide a couple of classes that can take ownership of
a fresh `UnsafeContinuation` or `UnsafeThrowingContinuation`, and log attempts to resume the continuation multiple times
or discard the object without ever resuming the continuation.
StableAddressConcurrentReadableHashMap::getOrInsert had a race condition in the first lookup, where the snapshot was destroyed before the pointer was extracted from the returned wrapper. Fix this by creating the snapshot outside the if so that it stays alive.
rdar://problem/71932487
move comments to the wired up continuations
remove duplicated continuations; leep the wired up ones
before moving to C++ for queue impl
trying to next wait via channel_poll
submitting works; need to impl next()
It would be more abstractly correct if this got DI support so
that we destroy the member if the constructor terminates
abnormally, but we can get to that later.
We expect to iterate on this quite a bit, both publicly
and internally, but this is a fine starting-point.
I've renamed runAsync to runAsyncAndBlock to underline
very clearly what it does and why it's not long for this
world. I've also had to give it a radically different
implementation in an effort to make it continue to work
given an actor implementation that is no longer just
running all work synchronously.
The major remaining bit of actor-scheduling work is to
make swift_task_enqueue actually do something sensible
based on the executor it's been given; currently it's
expecting a flag that IRGen simply doesn't know to set.
Credit for the cmake fix here goes to Saleem Abdulrasool.
The substantive fix is embarrassing; I didn't pay close attention
to the intrinsic's argument order and just assumed that the first
argument for the replacement value was the low half (the part
you'd find at index 0 if it were an array), but in fact it's the
high half (the part you'd find at index 1).
I also change the code to be much more reinterpret_casty, which
isolates the type-punning mostly "within" the intrinsic, and
which seems to match how other code uses it.
Use native thread-locals when available and simulated
thread-locals when not. The simulation layer uses
pthread_getspecific.
Using TLS is significantly more annoying this way, but I kindof
like it because it reinforces that TLS accesses aren't as cheap
as they look.
In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.
When a function contains a differentiated loop, the closure context is a `Builtin.NativeObject`, which contains a `swift::AutoDiffLinearMapContext` and a tail-allocated top-level linear map struct (which represents the linear map struct that was previously directly partial-applied into the pullback). In branching trace enums, the payloads of previously indirect cases will be allocated by `swift::AutoDiffLinearMapContext::allocate` and stored as a `Builtin.RawPointer`.
Switch the contract between the runtime operation `swift_future_task_wait`
and Task.Handle.get() pver to an asynchronous call, so that the
compiler will set up the resumption frame for us. This allows us to
correctly wait on futures.
Update our "basic" future test to perform both normal returns and
throwing returns from a future, either having to wait on the queue or
coming by afterward.
Thick async functions store their async context size in the closure
context. Only if the closure context is nil can we assume the
partial_apply_forwarder function to be the address of an async function
pointer struct value.
Introduce `FutureAsyncContext` to line up with the async context formed
by IR generation for the type `<T> () async throws -> T`. When allocating
a future task, set up the context with the address of the future's storage
for the successful result and null out the error result, so the caller
will directly fill in the result. This eliminates a bunch of extra
complexity and a copy.
Use a single atomic for the wait queue that combines the status with
the first task in the queue. Address race conditions in waiting and
completing the future.
Thanks to John for setting the direction here for me.
Extend AsyncTask and the concurrency runtime with basic support for
task futures. AsyncTasks with futures contain a future fragment with
information about the type produced by the future, and where the
future will put the result value or the thrown error in the initial
context.
We still don't have the ability to schedule the waiting tasks on an
executor when the future completes, so this isn't useful for anything
just test, and we can only test limited code paths.
`Builtin.createAsyncTask` takes flags, an optional parent task, and an
async/throwing function to execute, and passes it along to the
`swift_task_create_f` entry point to create a new (potentially child)
task, returning the new task and its initial context.
os_unfair_lock is much smaller than pthread_mutex_t (4 bytes versus 64) and a bit faster.
However, it doesn't support condition variables. Most of our uses of Mutex don't use condition variables, but a few do. Introduce ConditionMutex and StaticConditionMutex, which allow condition variables and continue to use pthread_mutex_t.
On all other platforms, we continue to use the same backing mutex type for both Mutex and ConditionMutex.
rdar://problem/45412121
Implement a new builtin, `cancelAsyncTask()`, to cancel the given
asynchronous task. This lowers down to a call into the runtime
operation `swift_task_cancel()`.
Use this builtin to implement Task.Handle.cancel().
* [Runtime] Switch MetadataCache to ConcurrentReadableHashMap.
Use StableAddressConcurrentReadableHashMap. MetadataCacheEntry's methods for awaiting a particular state assume a stable address, where it will repeatedly examine `this` in a loop while waiting on a condition variable, so we give it a stable address to accommodate that. Some of these caches may be able to tolerate unstable addresses if this code is changed to perform the necessary table lookup each time through the loop instead. Some of them store metadata inline and we assume metadata never moves, so they'll have to stay this way.
* Have StableAddressConcurrentReadableHashMap remember the last found entry and check that before doing a more expensive lookup.
* Make a SmallMutex type that stores the mutex data out of line, and use it to get LockingConcurrentMapStorage to fit into the available space on 32-bit.
rdar://problem/70220660
Add a new entry point for getting generic metadata which adds the
canonical metadata records attached to the nominal type descriptor to
the metadata cache.
Change the implementation of the primary entry-point
swift_getGenericMetadata to stop looking through canonical
prespecialized records.
Change the implementation of swift_getCanonicalSpecializedMetadata to
use the caching token attached to the nominal type descriptor to add
canonical prespecialized metadata records to the metadata cache only
once rather than using the cache variables to limit the number of times
the attempt was made.