Tweaked the comment in `Runtime/Config.h`.
Fixed a couple of incorrect ARM64 instruction mnemonics. This still needs
testing on ARM64 Windows.
Fixed an out-of-date comment in `swift-backtrace`.
Use a macro in `Backtrace.cpp` to guarantee we don't overrun the buffer,
and in the process simplify the code slightly.
rdar://101623384
Right now, the backtracer won't work on 32-bit Windows because we
aren't properly dealing with the `stdcall` calling convention on
the Win32 API calls.
rdar://101623384
This doesn't have a working symbolicator yet, but it does build and
it can obtain a basic backtrace.
It also doesn't include a working `swift-backtrace` program yet.
rdar://101623384
We should expose the demangle functionality; It's been widely used by
calling into internal _swift_demangle and instead we should offer a real
API. It's also already used in the Runtime module already when forming
backtraces.
[Previous
discussions](https://forums.swift.org/t/demangle-function/25416/15)
happened between 2019 and 2024, and just never materialized in a
complete implementation and proposal.
Right now, even more tools are in need of this API, as we are building
continious profiling solutions etc, so it is a good time to revisit this
topic.
This PR is roughly based off @Azoy's
https://github.com/swiftlang/swift/pull/25314/files#diff-fd379a721cc9a1c9ef6486eae713e945da842b42170d4d069029a57334371eba
from 2019, however it brings it over to the new Runtime module which is
a great place to put this functionality - even Backtrace had to recently
reinvent calling the demangling infra in this module.
Pending SE review, [proposal
here](https://github.com/swiftlang/swift-evolution/pull/2989).
cc @azoy @al45tair
---------
Co-authored-by: Alastair Houghton <alastair@alastairs-place.net>
In a discussion with @rjmccall, we agreed that these should really be
UNKNOWN_MEMEFFECTS so we are conservative.
Just slicing this off from a larger patch stream.
Add SWIFT_DEBUG_ENABLE_TASK_SLAB_ALLOCATOR, which is on by default. When turned off, async stack allocations call through to malloc/free. This allows memory debugging tools to be used on async stack allocations.
In embedded mode, some things call retain/release using the C++ declarations, but the implementations are in Swift. The Swift implementations don't use preservemost, so the C++ declarations must not declare preservemost in that context.
rdar://163940783
This is necessary because we need to model its stack-allocation
behavior, although I'm not yet doing that in this patch because
StackNesting first needs to be taught to not try to move the
deallocation.
I'm not convinced that `async let` *should* be doing a stack allocation,
but it undoubtedly *is* doing a stack allocation, and until we have an
alternative to that, we will need to model it properly.
functions.
These were introduced in an early draft implementation of async let, but
never used by a released compiler. They are not used as symbols by any
app binaries. There's no reason to keep carrying them.
While I'm at it, dramatically improve the documentation of the remaining
async let API functions.
This is currently disabled by default. Building the client library can be enabled with the CMake option SWIFT_BUILD_CLIENT_RETAIN_RELEASE, and using the library can be enabled with the flags -Xfrontend -enable-client-retain-release.
To improve retain/release performance, we build a static library containing optimized implementations of the fast paths of swift_retain, swift_release, and the corresponding bridgeObject functions. This avoids going through a stub to make a cross-library call.
IRGen gains awareness of these new functions and emits calls to them when the functionality is enabled and the target supports them. Two options are added to force use of them on or off: -enable-client-retain-release and -disable-client-retain-release. When enabled, the compiler auto-links the static library containing the implementations.
The new calls also use LLVM's preserve_most calling convention. Since retain/release doesn't need a large number of scratch registers, this is mostly harmless for the implementation, while allowing callers to improve code size and performance by spilling fewer registers around refcounting calls. (Experiments with an even more aggressive calling convention preserving x2 and up showed an insignificant savings in code size, so preserve_most seems to be a good middle ground.)
Since the implementations are embedded into client binaries, any change in the runtime's refcounting implementation needs to stay compatible with this new fast path implementation. This is ensured by having the implementation use a runtime-provided mask to check whether it can proceed into its fast path. The mask is provided as the address of the absolute symbol _swift_retainRelease_slowpath_mask_v1. If that mask ANDed with the object's current refcount field is non-zero, then we take the slow path. A future runtime that changes the refcounting implementation can adjust this mask to match, or set the mask to all 1s to disable the old embedded fast path entirely (as long as the new representation never uses 0 as a valid refcount field value).
As part of this work, the overall approach for bridgeObjectRetain is changed slightly. Previously, it would mask off the spare bits from the native pointer and then call through to swift_retain. This either lost the spare bits in the return value (when tail calling swift_retain) which is problematic since it's supposed to return its parameter, or it required pushing a stack frame which is inefficient. Now, swift_retain takes on the responsibility of masking off spare bits from the parameter and preserving them in the return value. This is a trivial addition to the fast path (just a quick mask and an extra register for saving the original value) and makes bridgeObjectRetain quite a bit more efficient when implemented correctly to return the exact value it was passed.
The runtime's implementations of swift_retain/release are now also marked as preserve_most so that they can be tail called from the client library. preserve_most is compatible with callers expecting the standard calling convention so this doesn't break any existing clients. Some ugly tricks were needed to prevent the compiler from creating unnecessary stack frames with the new calling convention. Avert your eyes.
To allow back deployment, the runtime now has aliases for these functions called swift_retain_preservemost and swift_release_preservemost. The client library brings weak definitions of these functions that save the extra registers and call through to swift_retain/release. This allows them to work correctly on older runtimes, with a small performance penalty, while still running at full speed on runtimes that have the new preservemost symbols.
Although this is only supported on Darwin at the moment, it shouldn't be too much work to adapt it to other ARM64 targets. We need to ensure the assembly plays nice with the other platforms' assemblers, and make sure the implementation is correct for the non-ObjC-interop case.
rdar://122595871
We scan the target's initial allocation pool, and all 16kB heap allocations. We check each pointer-aligned offset within those areas, and try to read it as Swift metadata and get a name from it. If that fails, quietly move on. It's very unlikely for some random memory to look enough like Swift metadata for this to produce a name, so this works very well to print the generic metadata instantiated in the remote process without requiring `SWIFT_DEBUG_ENABLE_METADATA_ALLOCATION_ITERATION`.
rdar://161120936
swift_coroFrameAlloc was introduced in the Swift 6.2 runtime. Give it
the appropriate availability in IRGen, so that it gets weak
availability when needed (per the deployment target). Then, only
create the stub function for calling into swift_coroFrameAlloc or
malloc (when the former isn't available) when we're back-deploying to
a runtime prior to Swift 6.2. This is a small code size/performance
win when allocating coroutine frames on Swift 6.2-or-newer platforms.
This has a side effect of fixing a bug in Embedded Swift, where the
swift_coroFrameAlloc was getting unconditionally set to have weak
external linkage despite behind defined in the same LLVM module
(because it comes from the standard library).
Fixes rdar://149695139 / issue #80947.
This change adds a new type of cache (cache by type descriptor) to the protocol conformance lookup system. This optimization is beneficial for generic types, where the
same conformance can be reused across different instantiations of the generic type.
Key changes:
- Add a `GetOrInsertManyScope` class to `ConcurrentReadableHashMap` for performing
multiple insertions under a single lock
- Add type descriptor-based caching for protocol conformances
- Add environment variables for controlling and debugging the conformance cache
- Add tests to verify the behavior of the conformance cache
- Fix for https://github.com/swiftlang/swift/issues/82889
The implementation is controlled by the `SWIFT_DEBUG_ENABLE_CACHE_PROTOCOL_CONFORMANCES_BY_TYPE_DESCRIPTOR`
environment variable, which is enabled by default.
This reapplies https://github.com/swiftlang/swift/pull/82818 after it's been reverted in https://github.com/swiftlang/swift/pull/83770.
This change adds a new type of cache (cache by type descriptor) to the protocol conformance lookup system. This optimization is beneficial for generic types, where the
same conformance can be reused across different instantiations of the generic type.
Key changes:
- Add a `GetOrInsertManyScope` class to `ConcurrentReadableHashMap` for performing
multiple insertions under a single lock
- Add type descriptor-based caching for protocol conformances
- Add environment variables for controlling and debugging the conformance cache
- Add tests to verify the behavior of the conformance cache
- Fix for https://github.com/swiftlang/swift/issues/82889
The implementation is controlled by the `SWIFT_DEBUG_ENABLE_CACHE_PROTOCOL_CONFORMANCES_BY_TYPE_DESCRIPTOR`
environment variable, which is enabled by default.
When targeting a platform that predates the introduction of isolated
deinit, make a narrow exception that allows main-actor-isolated deinit
to work through a special, inlineable entrypoint that is
back-deployed. This implementation
1. Calls into the real implementation when available, otherwise
2. Checks if we're on the main thread, destroying immediately when
we are, otherwise
3. Creates a new task on the main actor to handle destruction.
This implementation is less efficient than the implementation in the
runtime, but allows us to back-deploy this functionality as far back
as concurrency goes.
Fixes rdar://151029118.
The implementation of the ObjC -retain method saved a local variable and returned that after calling swift_retain, which forced it to create a stack frame. swift_retain returns the object being retained, so we can take advantage of that to have the compiler emit a tail call to it instead.
Without this, llvm would sometimes wrongly assume there's no indirect
accesses and the optimizations can lead to a runtime crash, by
optimizing away initializing options properly.
Resolves rdar://152548190
This only modifies the runtime function `swift_task_enqueueGlobalWithDeadline` to take new clock primitive to
interoperate with existing dispatch wall clock values.