move comments to the wired up continuations
remove duplicated continuations; leep the wired up ones
before moving to C++ for queue impl
trying to next wait via channel_poll
submitting works; need to impl next()
It would be more abstractly correct if this got DI support so
that we destroy the member if the constructor terminates
abnormally, but we can get to that later.
We expect to iterate on this quite a bit, both publicly
and internally, but this is a fine starting-point.
I've renamed runAsync to runAsyncAndBlock to underline
very clearly what it does and why it's not long for this
world. I've also had to give it a radically different
implementation in an effort to make it continue to work
given an actor implementation that is no longer just
running all work synchronously.
The major remaining bit of actor-scheduling work is to
make swift_task_enqueue actually do something sensible
based on the executor it's been given; currently it's
expecting a flag that IRGen simply doesn't know to set.
Credit for the cmake fix here goes to Saleem Abdulrasool.
The substantive fix is embarrassing; I didn't pay close attention
to the intrinsic's argument order and just assumed that the first
argument for the replacement value was the low half (the part
you'd find at index 0 if it were an array), but in fact it's the
high half (the part you'd find at index 1).
I also change the code to be much more reinterpret_casty, which
isolates the type-punning mostly "within" the intrinsic, and
which seems to match how other code uses it.
Use native thread-locals when available and simulated
thread-locals when not. The simulation layer uses
pthread_getspecific.
Using TLS is significantly more annoying this way, but I kindof
like it because it reinforces that TLS accesses aren't as cheap
as they look.
In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.
When a function contains a differentiated loop, the closure context is a `Builtin.NativeObject`, which contains a `swift::AutoDiffLinearMapContext` and a tail-allocated top-level linear map struct (which represents the linear map struct that was previously directly partial-applied into the pullback). In branching trace enums, the payloads of previously indirect cases will be allocated by `swift::AutoDiffLinearMapContext::allocate` and stored as a `Builtin.RawPointer`.
Switch the contract between the runtime operation `swift_future_task_wait`
and Task.Handle.get() pver to an asynchronous call, so that the
compiler will set up the resumption frame for us. This allows us to
correctly wait on futures.
Update our "basic" future test to perform both normal returns and
throwing returns from a future, either having to wait on the queue or
coming by afterward.
Thick async functions store their async context size in the closure
context. Only if the closure context is nil can we assume the
partial_apply_forwarder function to be the address of an async function
pointer struct value.
Introduce `FutureAsyncContext` to line up with the async context formed
by IR generation for the type `<T> () async throws -> T`. When allocating
a future task, set up the context with the address of the future's storage
for the successful result and null out the error result, so the caller
will directly fill in the result. This eliminates a bunch of extra
complexity and a copy.
Use a single atomic for the wait queue that combines the status with
the first task in the queue. Address race conditions in waiting and
completing the future.
Thanks to John for setting the direction here for me.
Extend AsyncTask and the concurrency runtime with basic support for
task futures. AsyncTasks with futures contain a future fragment with
information about the type produced by the future, and where the
future will put the result value or the thrown error in the initial
context.
We still don't have the ability to schedule the waiting tasks on an
executor when the future completes, so this isn't useful for anything
just test, and we can only test limited code paths.
`Builtin.createAsyncTask` takes flags, an optional parent task, and an
async/throwing function to execute, and passes it along to the
`swift_task_create_f` entry point to create a new (potentially child)
task, returning the new task and its initial context.
os_unfair_lock is much smaller than pthread_mutex_t (4 bytes versus 64) and a bit faster.
However, it doesn't support condition variables. Most of our uses of Mutex don't use condition variables, but a few do. Introduce ConditionMutex and StaticConditionMutex, which allow condition variables and continue to use pthread_mutex_t.
On all other platforms, we continue to use the same backing mutex type for both Mutex and ConditionMutex.
rdar://problem/45412121
Implement a new builtin, `cancelAsyncTask()`, to cancel the given
asynchronous task. This lowers down to a call into the runtime
operation `swift_task_cancel()`.
Use this builtin to implement Task.Handle.cancel().
* [Runtime] Switch MetadataCache to ConcurrentReadableHashMap.
Use StableAddressConcurrentReadableHashMap. MetadataCacheEntry's methods for awaiting a particular state assume a stable address, where it will repeatedly examine `this` in a loop while waiting on a condition variable, so we give it a stable address to accommodate that. Some of these caches may be able to tolerate unstable addresses if this code is changed to perform the necessary table lookup each time through the loop instead. Some of them store metadata inline and we assume metadata never moves, so they'll have to stay this way.
* Have StableAddressConcurrentReadableHashMap remember the last found entry and check that before doing a more expensive lookup.
* Make a SmallMutex type that stores the mutex data out of line, and use it to get LockingConcurrentMapStorage to fit into the available space on 32-bit.
rdar://problem/70220660
Add a new entry point for getting generic metadata which adds the
canonical metadata records attached to the nominal type descriptor to
the metadata cache.
Change the implementation of the primary entry-point
swift_getGenericMetadata to stop looking through canonical
prespecialized records.
Change the implementation of swift_getCanonicalSpecializedMetadata to
use the caching token attached to the nominal type descriptor to add
canonical prespecialized metadata records to the metadata cache only
once rather than using the cache variables to limit the number of times
the attempt was made.
There are things about this that I'm far from sold on. In
particular, I'm concerned that in order to implement escalation
correctly, we're going to have to add a status record for the
fact that the task is being executed, which means we're going
to have to potentially wait to acquire the status lock; overall,
that means making an extra runtime function call and doing some
atomics whenever we resume or suspend a task, which is an
uncomfortable amount of overhead.
The testing here is pretty grossly inadequate, but I wanted to
lay down the groundwork here.
This gives us faster lookups and a small advantage in memory usage. Most of these maps need stable addresses for their entries, so we add a level of indirection to ConcurrentReadableHashMap for these cases to accommodate that. This costs some extra memory, but it's still a net win.
A new StableAddressConcurrentReadableHashMap type handles this indirection and adds a convenience getOrInsert to take advantage of it.
ConcurrentReadableHashMap is tweaked to avoid any global constructors or destructors when using it as a global variable.
ForeignWitnessTables does not need stable addresses and it now uses ConcurrentReadableHashMap directly.
rdar://problem/70056398