This is currently disabled by default. Building the client library can be enabled with the CMake option SWIFT_BUILD_CLIENT_RETAIN_RELEASE, and using the library can be enabled with the flags -Xfrontend -enable-client-retain-release.
To improve retain/release performance, we build a static library containing optimized implementations of the fast paths of swift_retain, swift_release, and the corresponding bridgeObject functions. This avoids going through a stub to make a cross-library call.
IRGen gains awareness of these new functions and emits calls to them when the functionality is enabled and the target supports them. Two options are added to force use of them on or off: -enable-client-retain-release and -disable-client-retain-release. When enabled, the compiler auto-links the static library containing the implementations.
The new calls also use LLVM's preserve_most calling convention. Since retain/release doesn't need a large number of scratch registers, this is mostly harmless for the implementation, while allowing callers to improve code size and performance by spilling fewer registers around refcounting calls. (Experiments with an even more aggressive calling convention preserving x2 and up showed an insignificant savings in code size, so preserve_most seems to be a good middle ground.)
Since the implementations are embedded into client binaries, any change in the runtime's refcounting implementation needs to stay compatible with this new fast path implementation. This is ensured by having the implementation use a runtime-provided mask to check whether it can proceed into its fast path. The mask is provided as the address of the absolute symbol _swift_retainRelease_slowpath_mask_v1. If that mask ANDed with the object's current refcount field is non-zero, then we take the slow path. A future runtime that changes the refcounting implementation can adjust this mask to match, or set the mask to all 1s to disable the old embedded fast path entirely (as long as the new representation never uses 0 as a valid refcount field value).
As part of this work, the overall approach for bridgeObjectRetain is changed slightly. Previously, it would mask off the spare bits from the native pointer and then call through to swift_retain. This either lost the spare bits in the return value (when tail calling swift_retain) which is problematic since it's supposed to return its parameter, or it required pushing a stack frame which is inefficient. Now, swift_retain takes on the responsibility of masking off spare bits from the parameter and preserving them in the return value. This is a trivial addition to the fast path (just a quick mask and an extra register for saving the original value) and makes bridgeObjectRetain quite a bit more efficient when implemented correctly to return the exact value it was passed.
The runtime's implementations of swift_retain/release are now also marked as preserve_most so that they can be tail called from the client library. preserve_most is compatible with callers expecting the standard calling convention so this doesn't break any existing clients. Some ugly tricks were needed to prevent the compiler from creating unnecessary stack frames with the new calling convention. Avert your eyes.
To allow back deployment, the runtime now has aliases for these functions called swift_retain_preservemost and swift_release_preservemost. The client library brings weak definitions of these functions that save the extra registers and call through to swift_retain/release. This allows them to work correctly on older runtimes, with a small performance penalty, while still running at full speed on runtimes that have the new preservemost symbols.
Although this is only supported on Darwin at the moment, it shouldn't be too much work to adapt it to other ARM64 targets. We need to ensure the assembly plays nice with the other platforms' assemblers, and make sure the implementation is correct for the non-ObjC-interop case.
rdar://122595871
This new option allows the Driver to pass the path to a compilation
job's own binary swiftmodule artifact to the frontend. The compiler
then stores this path in the debug info, to allow clients like LLDB to
unambiguously know which binary Swift module belongs to which compile
unit.
rdar://163302154
To enable ABIs which store extra info in the frame, add two new slots to
the coroutine allocator function table. For example, a frame could have
a header containing a context pointer at a negative offset from the
address returned from `swift_coro_alloc_frame`. The frame deallocation
function would then know to deallocate more space correspondingly.
Allocator structs are passed in to new ABI yield-once coroutines and
contain pointers to functions to de/allocate memory. Here, those
pointers are signed.
This PR introduces three new instrumentation flags and plumbs them
through to IRGen:
1. `-ir-profile-generate` - enable IR-level instrumentation.
2. `-cs-profile-generate` - enable context-sensitive IR-level
instrumentation.
3. `-ir-profile-use` - IR-level PGO input profdata file to enable
profile-guided optimization (both IRPGO and CSIRPGO)
**Context:**
https://forums.swift.org/t/ir-level-pgo-instrumentation-in-swift/82123
**Swift-driver PR:** https://github.com/swiftlang/swift-driver/pull/1992
**Tests and validation:**
This PR includes ir level verification tests, also checks few edge-cases
when `-ir-profile-use` supplied profile is either missing or is an
invalid IR profile.
However, for argument validation, linking, and generating IR profiles
that can later be consumed by -cs-profile-generate, I’ll need
corresponding swift-driver changes. Those changes are being tracked in
https://github.com/swiftlang/swift-driver/pull/1992
With some changes that I am making, we will no longer need this flag at the SIL
level, so we can just make it an IRGen flag (which it really should have been
in the first place).
The LeastValidPointerValue is hard-coded in the runtime.
Therefore this option is only available in embedded swift - which doesn't have a runtime.
rdar://151755654
* [SUA][IRGen] Add stub for swift_coroFrameAlloc that weakly links against the runtime function
This commit modifies IRGen to emit a stub function `__swift_coroFrameAllocStub` instead of the
newly introduced swift-rt function `swift_coroFrameAlloc`. The stub checks whether the runtime has the symbol
`swift_coroFrameAlloc` and dispatches to it if it exists, uses `malloc` otherwise. This ensures the
ability to back deploy the feature to older OS targets.
rdar://145239850
`-Xfrontend -enable-cond-fail-message-annotation`
LLVM IR produced by the Swift compiler will add the `annotation`
metadata attribute to the branch instruction generated for cond_fail
builtins.
When it's available, use an open-coded allocator function that returns
an alloca without popping if the allocator is nullptr and otherwise
calls swift_coro_alloc. When it's not available, use the malloc
allocator in the synchronous context.
This allows external tools to locate the metadata pointer without needing to call the accessor function.
This is only useful for non-generic types, so we borrow the HasCanonicalMetadataPrespecializations flag to indicate the presence of this pointer on non-generic types, and it continues to indicate the presence of prespecializations for generic types.
Only emit this pointer for internal/private types with no runtime initialization. Public type metadata can be found with the symbol, and it's not useful for types that require runtime initialization.
This patch adds support for emitting the flag
llvm::DINode::FlagAllCallsDescribed when generating LLVM IR from the
Swift compiler to get call-site information for swift source code.
When TMO is enabled, change IRGen to pass the newly introduced runtime function `swift_coroFrameAlloc` (and pass an additional argument — the hash value) instead of `malloc` when it inserts calls to `coro_id_retcon_once`.
The hashValue is computed based on the current function name (computed in `getDiscriminatorForString`)
rdar://141235957
This achieves the same as clang's `-fdebug-info-for-profiling`, which
emits DWARF discriminators to aid in narrowing-down which basic block
corresponds to a particular instruction address. This is particularly
useful for sampling-based profiling.
rdar://135443278
Add a setting to IRGenOptions and key off of it to emit yield_once_2
coroutines using either (1) the same code-path as yield_once coroutines
or (2) a new, not-yet implemented code-path.
Add flags to set the value in both directions. During bringup, by
default, use the existing caller-allocated ABI.