- stop storing the parent task in the TaskGroup at the .swift level
- make sure that swift_taskGroup_isCancelled is implied by the parent
task being cancelled
- make the TaskGroup structs frozen
- make the withTaskGroup functions inlinable
- remove swift_taskGroup_create
- teach IRGen to allocate memory for the task group
- don't deallocate the task group in swift_taskGroup_destroy
To achieve the allocation change, introduce paired create/destroy builtins.
Furthermore, remove the _swiftRetain and _swiftRelease functions and
several calls to them. Replace them with uses of the appropriate builtins.
I should probably change the builtins to return retained, since they're
working with a managed type, but I'll do that in a separate commit.
The immediate desire is to minimize the set of ABI dependencies
on the layout of an ExecutorRef. In addition to that, however,
I wanted to generally reduce the code size impact of an unsafe
continuation since it now requires accessing thread-local state,
and I wanted resumption to not have to create unnecessary type
metadata for the value type just to do the initialization.
Therefore, I've introduced a swift_continuation_init function
which handles the default initialization of a continuation
and returns a reference to the current task. I've also moved
the initialization of the normal continuation result into the
caller (out of the runtime), and I've moved the resumption-side
cmpxchg into the runtime (and prior to the task being enqueued).
Take the existing CompatibilityOverride mechanism and generalize it so it can be used in both the runtime and Concurrency libraries. The mechanism is preprocessor-heavy, so this requires some tricks. Use the SWIFT_TARGET_LIBRARY_NAME define to distinguish the libraries, and use a different .def file and mach-o section name accordingly.
We want the global/main executor functions to be a little more flexible. Instead of using the override mechanism, we expose function pointers that can be set by the compatibility library, or by any other code that wants to use a custom implementation.
rdar://73726764
Throwing functions pass the error result in `swiftself` to the resume
partial function.
Therefore, `() async -> ()` to `() async throws -> ()` is not ABI compatible.
TODO: go through remaining failing IRGen async tests and replace the
illegal convert_functions.
Most of the async runtime functions have been changed to not
expect the task and executor to be passed in. When knowing the
task and executor is necessary, there are runtime functions
available to recover them.
The biggest change I had to make to a runtime function signature
was to swift_task_switch, which has been altered to expect to be
passed the context and resumption function instead of requiring
the caller to park the task. This has the pleasant consequence
of allowing the implementation to very quickly turn around when
it recognizes that the current executor is satisfactory. It does
mean that on arm64e we have to sign the continuation function
pointer as an argument and then potentially resign it when
assigning into the task's resume slot.
rdar://70546948
This is conditional on UseAsyncLowering and in the future should also be
conditional on `clangTargetInfo.isSwiftAsyncCCSupported()` once that
support is merged.
Update tests to work either with swiftcc or swifttailcc.
We want to use the 'just built' compiler for the runtime tests because
of the recently added `swiftasync` parameter support.
But ASAN won't link if we use the 'just built' compiler because libgtest
and LLVMSupport are compiled by the host compiler.
Disable the runtime test when running under ASAN.
rdar://73664504
* Adds support for generating code that uses swiftasync parameter lowering.
* Currently only arm64's llvm lowering supports the swift_async_context_addr intrinsic.
* Add arm64e pointer signing of updated swift_async_context_addr.
This commit needs the PR llvm-project#2291.
* [runtime] unittests should use just-built compiler if the runtime did
This will start to matter with the introduction of usage of swiftasync parameters which only very recent compilers support.
rdar://71499498
When reallocating storage, the storing the pointer to the new storage had insufficiently strong ordering. This could cause writers to check the reader count before storing the new storage pointer. If a reader then came in between that load and store, it would end up using the old storage pointer, while the writer could end up freeing it.
Also adjust the Concurrent.cpp tests to test with a variety of reader and writer counts. Counterintuitively, when freeing garbage is gated on there being zero readers, having a single reader will shake out problems that having lots of readers will not. We were testing with lots of readers in order to stress the code as much as possible, but this resulted in it being extremely rare for writers to ever see zero active readers.
rdar://69798617
A StackAllocator performs fast allocation and deallocation of memory by implementing a bump-pointer allocation strategy.
In contrast to a pure bump-pointer allocator, it's possible to free memory.
Allocations and deallocations must follow a strict stack discipline.
In general, slabs which become unused are _not_ freed, but reused for subsequent allocations.
The first slab can be placed into pre-allocated memory.
Introduce `FutureAsyncContext` to line up with the async context formed
by IR generation for the type `<T> () async throws -> T`. When allocating
a future task, set up the context with the address of the future's storage
for the successful result and null out the error result, so the caller
will directly fill in the result. This eliminates a bunch of extra
complexity and a copy.
Extend AsyncTask and the concurrency runtime with basic support for
task futures. AsyncTasks with futures contain a future fragment with
information about the type produced by the future, and where the
future will put the result value or the thrown error in the initial
context.
We still don't have the ability to schedule the waiting tasks on an
executor when the future completes, so this isn't useful for anything
just test, and we can only test limited code paths.
os_unfair_lock is much smaller than pthread_mutex_t (4 bytes versus 64) and a bit faster.
However, it doesn't support condition variables. Most of our uses of Mutex don't use condition variables, but a few do. Introduce ConditionMutex and StaticConditionMutex, which allow condition variables and continue to use pthread_mutex_t.
On all other platforms, we continue to use the same backing mutex type for both Mutex and ConditionMutex.
rdar://problem/45412121
* [Runtime] Switch MetadataCache to ConcurrentReadableHashMap.
Use StableAddressConcurrentReadableHashMap. MetadataCacheEntry's methods for awaiting a particular state assume a stable address, where it will repeatedly examine `this` in a loop while waiting on a condition variable, so we give it a stable address to accommodate that. Some of these caches may be able to tolerate unstable addresses if this code is changed to perform the necessary table lookup each time through the loop instead. Some of them store metadata inline and we assume metadata never moves, so they'll have to stay this way.
* Have StableAddressConcurrentReadableHashMap remember the last found entry and check that before doing a more expensive lookup.
* Make a SmallMutex type that stores the mutex data out of line, and use it to get LockingConcurrentMapStorage to fit into the available space on 32-bit.
rdar://problem/70220660
There are things about this that I'm far from sold on. In
particular, I'm concerned that in order to implement escalation
correctly, we're going to have to add a status record for the
fact that the task is being executed, which means we're going
to have to potentially wait to acquire the status lock; overall,
that means making an extra runtime function call and doing some
atomics whenever we resume or suspend a task, which is an
uncomfortable amount of overhead.
The testing here is pretty grossly inadequate, but I wanted to
lay down the groundwork here.
This gives us faster lookups and a small advantage in memory usage. Most of these maps need stable addresses for their entries, so we add a level of indirection to ConcurrentReadableHashMap for these cases to accommodate that. This costs some extra memory, but it's still a net win.
A new StableAddressConcurrentReadableHashMap type handles this indirection and adds a convenience getOrInsert to take advantage of it.
ConcurrentReadableHashMap is tweaked to avoid any global constructors or destructors when using it as a global variable.
ForeignWitnessTables does not need stable addresses and it now uses ConcurrentReadableHashMap directly.
rdar://problem/70056398
- Remove references to swiftImageInspectionShared on Linux and dont have
split libraries between static/non-static linked executables.
- -static-executable now links correctly Linux.
Note: swift::lookupSymbol() will not work on statically linked
executables but this can be worked around by using the
https://github.com/swift-server/swift-backtrace package.