Introduce `FutureAsyncContext` to line up with the async context formed
by IR generation for the type `<T> () async throws -> T`. When allocating
a future task, set up the context with the address of the future's storage
for the successful result and null out the error result, so the caller
will directly fill in the result. This eliminates a bunch of extra
complexity and a copy.
Extend AsyncTask and the concurrency runtime with basic support for
task futures. AsyncTasks with futures contain a future fragment with
information about the type produced by the future, and where the
future will put the result value or the thrown error in the initial
context.
We still don't have the ability to schedule the waiting tasks on an
executor when the future completes, so this isn't useful for anything
just test, and we can only test limited code paths.
os_unfair_lock is much smaller than pthread_mutex_t (4 bytes versus 64) and a bit faster.
However, it doesn't support condition variables. Most of our uses of Mutex don't use condition variables, but a few do. Introduce ConditionMutex and StaticConditionMutex, which allow condition variables and continue to use pthread_mutex_t.
On all other platforms, we continue to use the same backing mutex type for both Mutex and ConditionMutex.
rdar://problem/45412121
* [Runtime] Switch MetadataCache to ConcurrentReadableHashMap.
Use StableAddressConcurrentReadableHashMap. MetadataCacheEntry's methods for awaiting a particular state assume a stable address, where it will repeatedly examine `this` in a loop while waiting on a condition variable, so we give it a stable address to accommodate that. Some of these caches may be able to tolerate unstable addresses if this code is changed to perform the necessary table lookup each time through the loop instead. Some of them store metadata inline and we assume metadata never moves, so they'll have to stay this way.
* Have StableAddressConcurrentReadableHashMap remember the last found entry and check that before doing a more expensive lookup.
* Make a SmallMutex type that stores the mutex data out of line, and use it to get LockingConcurrentMapStorage to fit into the available space on 32-bit.
rdar://problem/70220660
There are things about this that I'm far from sold on. In
particular, I'm concerned that in order to implement escalation
correctly, we're going to have to add a status record for the
fact that the task is being executed, which means we're going
to have to potentially wait to acquire the status lock; overall,
that means making an extra runtime function call and doing some
atomics whenever we resume or suspend a task, which is an
uncomfortable amount of overhead.
The testing here is pretty grossly inadequate, but I wanted to
lay down the groundwork here.
This gives us faster lookups and a small advantage in memory usage. Most of these maps need stable addresses for their entries, so we add a level of indirection to ConcurrentReadableHashMap for these cases to accommodate that. This costs some extra memory, but it's still a net win.
A new StableAddressConcurrentReadableHashMap type handles this indirection and adds a convenience getOrInsert to take advantage of it.
ConcurrentReadableHashMap is tweaked to avoid any global constructors or destructors when using it as a global variable.
ForeignWitnessTables does not need stable addresses and it now uses ConcurrentReadableHashMap directly.
rdar://problem/70056398
- Remove references to swiftImageInspectionShared on Linux and dont have
split libraries between static/non-static linked executables.
- -static-executable now links correctly Linux.
Note: swift::lookupSymbol() will not work on statically linked
executables but this can be worked around by using the
https://github.com/swift-server/swift-backtrace package.
to use it.
ConcurrentReadableHashMap is lock-free for readers, with writers using a lock to
ensure mutual exclusion amongst each other. The intent is to eventually replace
all uses ConcurrentMap with ConcurrentReadableHashMap.
ConcurrentReadableHashMap provides for relatively quick lookups by using a hash
table. Rearders perform an atomic increment/decrement in order to inform writers
that there are active readers. The design attempts to minimize wasted memory by
storing the actual elements out-of-line, and having the table store indices into
a separate array of elements.
The protocol conformance cache now uses ConcurrentReadableHashMap, which
provides faster lookups and less memory use than the previous ConcurrentMap
implementation. The previous implementation caches
ProtocolConformanceDescriptors and extracts the WitnessTable after the cache
lookup. The new implementation directly caches the WitnessTable, removing an
extra step (potentially a quite slow one) from the fast path.
The previous implementation used a generational scheme to detect when negative
cache entries became obsolete due to new dynamic libraries being loaded, and
update them in place. The new implementation just clears the entire cache when
libraries are loaded, greatly simplifying the code and saving the memory needed
to track the current generation in each negative cache entry. This means we need
to re-cache all requested conformances after loading a dynamic library, but
loading libraries at runtime is rare and slow anyway.
rdar://problem/67268325
Since libDemangling is included in the Swift standard library,
ODR violations can occur on platforms that allow statically
linking stdlib if Swift code is linked with other compiler
libraries that also transitively pull in libDemangling, and if
the stdlib version and compiler version do not match exactly
(even down to commit drift between releases). This lets the
runtime conditionally segregate its copies of the libDemangling
symbols from those in the compiler using an inline namespace
without affecting usage throughout source.
`swiftDemangling` was built three times:
1. swiftc
2. swiftRuntime
3. swiftReflection
Fold the last two instances into a single build, sharing the objects
across both the target libraries. This ensures that `swiftDemangling`
is built with the same compiler as the target libraries and that the
target library build remains self-contained.
Rather than build multiple copies of LLVMSupport (4x!) build it one and
merge it into the various targets. This would ideally not be needed to
be named explicitly everywhere, but that requires using `add_library`
rather than `add_swift_target_library`.
This adds a new copy of LLVMSupport into the runtime. This is the final
step before changing the inline namespace for the runtime support. This
will allow us to avoid the ODR violations from the header definitions of
LLVMSupport.
LLVMSupport forked at: 22492eead218ec91d349c8c50439880fbeacf2b7
Changes made to LLVMSupport from that revision:
process.inc forward declares `_beginthreadex` due to compilation issues due to custom flag handling
API changes required that we alter the `Deallocate` routine to account
for the alignment.
This is a temporary state, meant to simplify the process. We do not use
the entire LLVMSupport library and there is no value in keeping the
entire library. Subsequent commits will prune the library to the needs
for the runtime.
SWIFT_HOST_VARIANT is lowercase. SWIFT_HOST_VARIANT_SDK is uppercase.
Compare against the right one.
This tries to fix#29805, which didn't fix the CI for Windows VS2017.
The long-running runtime tests depend on the Objective-C runtime on
Darwin, so link it directly rather than relying on various uses of
__builtin_available to link CoreFoundation for us.
This reverts commit efaf1fbefa.
Add a much more palatable workaround for the unit tests. Rather than
adding the dllimport for the symbols, locally define the required
symbols. This list is sufficient to restore the ability to build tests
for Windows.
The restrictions on std::atomic make a it fragile to have a vector of them, and this failed to compile on Linux in CI.
While I'm in there, clean up a couple of tests that repeated a raw `10` for the thread count.
rdar://problem/49709062
Many of these tests would fail if one of the test threads didn't begin execution within 100 milliseconds. They would hit while (!done) the very first time and never execute the loop body. That does happen from time to time, and the result would be a spurious failure. Change the loops to do {} while(!done) to ensure they always execute at least one iteration.
Many tests also had the test threads concurrently write results to a std::vector, which appeared to be causing some failures in my local testing when I had extra sleeps inserted to simulate pathological scheduling. Change the results vectors to be vectors of std::atomic to ensure that this concurrent writing is safe.
These changes shouldn't affect its ability to test the functionality it intends to test.
rdar://problem/49386389
Unfortunately, ASAN breaks with the just built compiler. The runtime
and the runtime tests should really use the same compiler. As a
workaround, if the host compiler is clang, just use that for the time
being. This should fix the build on the ASAN bots.
The runtime is meant to be built with the just built clang (as this
absolutely requires clang) as the runtime is a target library. The host
tools can be built with the host compiler. Swap out the compiler for
the unittests as we do for the runtime itself.
Because we do not have proper libraries in our system, we cannot attach
interface link libraries, especially in light of the
incorporate_object_library implementation. Explicitly add the
dependency on DbgHelp for the unit tests.
Note that I've called out a couple of suspicious places where we
are requesting abstract metadata for superclasses but probably
need to be requesting something more complete.
This is essentially a long-belated follow-up to Arnold's #12606.
The key observation here is that the enum-tag-single-payload witnesses
are strictly more powerful than the XI witnesses: you can simulate
the XI witnesses by using an extra case count that's <= the XI count.
Of course the result is less efficient than the XI witnesses, but
that's less important than overall code size, and we can work on
fast-paths for that.
The extra inhabitant count is stored in a 32-bit field (always present)
following the ValueWitnessFlags, which now occupy a fixed 32 bits.
This inflates non-XI VWTs on 32-bit targets by a word, but the net effect
on XI VWTs is to shrink them by two words, which is likely to be the
more important change. Also, being able to access the XI count directly
should be a nice win.
Currently ignored, but this will allow future compilers to pass down source location information for cast
failure runtime errors without backward deployment constraints.