In a discussion with @rjmccall, we agreed that these should really be
UNKNOWN_MEMEFFECTS so we are conservative.
Just slicing this off from a larger patch stream.
Add SWIFT_DEBUG_ENABLE_TASK_SLAB_ALLOCATOR, which is on by default. When turned off, async stack allocations call through to malloc/free. This allows memory debugging tools to be used on async stack allocations.
In embedded mode, some things call retain/release using the C++ declarations, but the implementations are in Swift. The Swift implementations don't use preservemost, so the C++ declarations must not declare preservemost in that context.
rdar://163940783
This is necessary because we need to model its stack-allocation
behavior, although I'm not yet doing that in this patch because
StackNesting first needs to be taught to not try to move the
deallocation.
I'm not convinced that `async let` *should* be doing a stack allocation,
but it undoubtedly *is* doing a stack allocation, and until we have an
alternative to that, we will need to model it properly.
functions.
These were introduced in an early draft implementation of async let, but
never used by a released compiler. They are not used as symbols by any
app binaries. There's no reason to keep carrying them.
While I'm at it, dramatically improve the documentation of the remaining
async let API functions.
This is currently disabled by default. Building the client library can be enabled with the CMake option SWIFT_BUILD_CLIENT_RETAIN_RELEASE, and using the library can be enabled with the flags -Xfrontend -enable-client-retain-release.
To improve retain/release performance, we build a static library containing optimized implementations of the fast paths of swift_retain, swift_release, and the corresponding bridgeObject functions. This avoids going through a stub to make a cross-library call.
IRGen gains awareness of these new functions and emits calls to them when the functionality is enabled and the target supports them. Two options are added to force use of them on or off: -enable-client-retain-release and -disable-client-retain-release. When enabled, the compiler auto-links the static library containing the implementations.
The new calls also use LLVM's preserve_most calling convention. Since retain/release doesn't need a large number of scratch registers, this is mostly harmless for the implementation, while allowing callers to improve code size and performance by spilling fewer registers around refcounting calls. (Experiments with an even more aggressive calling convention preserving x2 and up showed an insignificant savings in code size, so preserve_most seems to be a good middle ground.)
Since the implementations are embedded into client binaries, any change in the runtime's refcounting implementation needs to stay compatible with this new fast path implementation. This is ensured by having the implementation use a runtime-provided mask to check whether it can proceed into its fast path. The mask is provided as the address of the absolute symbol _swift_retainRelease_slowpath_mask_v1. If that mask ANDed with the object's current refcount field is non-zero, then we take the slow path. A future runtime that changes the refcounting implementation can adjust this mask to match, or set the mask to all 1s to disable the old embedded fast path entirely (as long as the new representation never uses 0 as a valid refcount field value).
As part of this work, the overall approach for bridgeObjectRetain is changed slightly. Previously, it would mask off the spare bits from the native pointer and then call through to swift_retain. This either lost the spare bits in the return value (when tail calling swift_retain) which is problematic since it's supposed to return its parameter, or it required pushing a stack frame which is inefficient. Now, swift_retain takes on the responsibility of masking off spare bits from the parameter and preserving them in the return value. This is a trivial addition to the fast path (just a quick mask and an extra register for saving the original value) and makes bridgeObjectRetain quite a bit more efficient when implemented correctly to return the exact value it was passed.
The runtime's implementations of swift_retain/release are now also marked as preserve_most so that they can be tail called from the client library. preserve_most is compatible with callers expecting the standard calling convention so this doesn't break any existing clients. Some ugly tricks were needed to prevent the compiler from creating unnecessary stack frames with the new calling convention. Avert your eyes.
To allow back deployment, the runtime now has aliases for these functions called swift_retain_preservemost and swift_release_preservemost. The client library brings weak definitions of these functions that save the extra registers and call through to swift_retain/release. This allows them to work correctly on older runtimes, with a small performance penalty, while still running at full speed on runtimes that have the new preservemost symbols.
Although this is only supported on Darwin at the moment, it shouldn't be too much work to adapt it to other ARM64 targets. We need to ensure the assembly plays nice with the other platforms' assemblers, and make sure the implementation is correct for the non-ObjC-interop case.
rdar://122595871
We scan the target's initial allocation pool, and all 16kB heap allocations. We check each pointer-aligned offset within those areas, and try to read it as Swift metadata and get a name from it. If that fails, quietly move on. It's very unlikely for some random memory to look enough like Swift metadata for this to produce a name, so this works very well to print the generic metadata instantiated in the remote process without requiring `SWIFT_DEBUG_ENABLE_METADATA_ALLOCATION_ITERATION`.
rdar://161120936
swift_coroFrameAlloc was introduced in the Swift 6.2 runtime. Give it
the appropriate availability in IRGen, so that it gets weak
availability when needed (per the deployment target). Then, only
create the stub function for calling into swift_coroFrameAlloc or
malloc (when the former isn't available) when we're back-deploying to
a runtime prior to Swift 6.2. This is a small code size/performance
win when allocating coroutine frames on Swift 6.2-or-newer platforms.
This has a side effect of fixing a bug in Embedded Swift, where the
swift_coroFrameAlloc was getting unconditionally set to have weak
external linkage despite behind defined in the same LLVM module
(because it comes from the standard library).
Fixes rdar://149695139 / issue #80947.
This change adds a new type of cache (cache by type descriptor) to the protocol conformance lookup system. This optimization is beneficial for generic types, where the
same conformance can be reused across different instantiations of the generic type.
Key changes:
- Add a `GetOrInsertManyScope` class to `ConcurrentReadableHashMap` for performing
multiple insertions under a single lock
- Add type descriptor-based caching for protocol conformances
- Add environment variables for controlling and debugging the conformance cache
- Add tests to verify the behavior of the conformance cache
- Fix for https://github.com/swiftlang/swift/issues/82889
The implementation is controlled by the `SWIFT_DEBUG_ENABLE_CACHE_PROTOCOL_CONFORMANCES_BY_TYPE_DESCRIPTOR`
environment variable, which is enabled by default.
This reapplies https://github.com/swiftlang/swift/pull/82818 after it's been reverted in https://github.com/swiftlang/swift/pull/83770.
This change adds a new type of cache (cache by type descriptor) to the protocol conformance lookup system. This optimization is beneficial for generic types, where the
same conformance can be reused across different instantiations of the generic type.
Key changes:
- Add a `GetOrInsertManyScope` class to `ConcurrentReadableHashMap` for performing
multiple insertions under a single lock
- Add type descriptor-based caching for protocol conformances
- Add environment variables for controlling and debugging the conformance cache
- Add tests to verify the behavior of the conformance cache
- Fix for https://github.com/swiftlang/swift/issues/82889
The implementation is controlled by the `SWIFT_DEBUG_ENABLE_CACHE_PROTOCOL_CONFORMANCES_BY_TYPE_DESCRIPTOR`
environment variable, which is enabled by default.
When targeting a platform that predates the introduction of isolated
deinit, make a narrow exception that allows main-actor-isolated deinit
to work through a special, inlineable entrypoint that is
back-deployed. This implementation
1. Calls into the real implementation when available, otherwise
2. Checks if we're on the main thread, destroying immediately when
we are, otherwise
3. Creates a new task on the main actor to handle destruction.
This implementation is less efficient than the implementation in the
runtime, but allows us to back-deploy this functionality as far back
as concurrency goes.
Fixes rdar://151029118.
The implementation of the ObjC -retain method saved a local variable and returned that after calling swift_retain, which forced it to create a stack frame. swift_retain returns the object being retained, so we can take advantage of that to have the compiler emit a tail call to it instead.
Without this, llvm would sometimes wrongly assume there's no indirect
accesses and the optimizations can lead to a runtime crash, by
optimizing away initializing options properly.
Resolves rdar://152548190
This only modifies the runtime function `swift_task_enqueueGlobalWithDeadline` to take new clock primitive to
interoperate with existing dispatch wall clock values.
This changes the isIsolatingCurrentContext function to return `Bool?`
and removes all the witness table trickery we did previously to detect
if it was implemented or not. This comes at a cost of trying to invoke
it always, before `checkIsolated`, but it makes for an simpler
implementation and more checkable even by third party Swift code which
may want to ask this question.
Along with the `withSerialExecutor` function, this now enables us to
check the isolation at runtime when we have an `any Actor` e.g. from
`#isolation`.
Updates SE-0471 according to
https://forums.swift.org/t/se-0471-improved-custom-serialexecutor-isolation-checking-for-concurrency-runtime/78834/
review discussions
This memory is part of the conformance cache concurrent hash map, so
when we clear the conformance cache, record each of the allocated
pointers within the concurrent map's free list. This way, it'll be
freed with the rest of the concurrent map when it's safe to do so.
Previously there was still a sneaky hop which caused ordering issues.
This introduced a specific test startSynchronously_order which checks
that the task enqueues indeed are "immediate" and cleans up how we
handle this.
This also prepares for the being discussed in SE review direction of
this API that it SHOULD be ALLOWED to actually hop and NOT be
synchronous at all IF the isolation is specified on the closure and is
DIFFERENT than the callers dynamic isolation.
This effectively implements "synchronously run right now if dynamically
on the exact isolation as requested by the closure; otherwise enqueue
the task as usual".
resolves rdar://149284186
cc @drexin
Function types aren't always trivially copyable, e.g. with address-discriminated signed pointers on ARM64e. Introduce a function_cast helper and use that instead.
* [SUA][IRGen] Add stub for swift_coroFrameAlloc that weakly links against the runtime function
This commit modifies IRGen to emit a stub function `__swift_coroFrameAllocStub` instead of the
newly introduced swift-rt function `swift_coroFrameAlloc`. The stub checks whether the runtime has the symbol
`swift_coroFrameAlloc` and dispatches to it if it exists, uses `malloc` otherwise. This ensures the
ability to back deploy the feature to older OS targets.
rdar://145239850