This is currently disabled by default. Building the client library can be enabled with the CMake option SWIFT_BUILD_CLIENT_RETAIN_RELEASE, and using the library can be enabled with the flags -Xfrontend -enable-client-retain-release.
To improve retain/release performance, we build a static library containing optimized implementations of the fast paths of swift_retain, swift_release, and the corresponding bridgeObject functions. This avoids going through a stub to make a cross-library call.
IRGen gains awareness of these new functions and emits calls to them when the functionality is enabled and the target supports them. Two options are added to force use of them on or off: -enable-client-retain-release and -disable-client-retain-release. When enabled, the compiler auto-links the static library containing the implementations.
The new calls also use LLVM's preserve_most calling convention. Since retain/release doesn't need a large number of scratch registers, this is mostly harmless for the implementation, while allowing callers to improve code size and performance by spilling fewer registers around refcounting calls. (Experiments with an even more aggressive calling convention preserving x2 and up showed an insignificant savings in code size, so preserve_most seems to be a good middle ground.)
Since the implementations are embedded into client binaries, any change in the runtime's refcounting implementation needs to stay compatible with this new fast path implementation. This is ensured by having the implementation use a runtime-provided mask to check whether it can proceed into its fast path. The mask is provided as the address of the absolute symbol _swift_retainRelease_slowpath_mask_v1. If that mask ANDed with the object's current refcount field is non-zero, then we take the slow path. A future runtime that changes the refcounting implementation can adjust this mask to match, or set the mask to all 1s to disable the old embedded fast path entirely (as long as the new representation never uses 0 as a valid refcount field value).
As part of this work, the overall approach for bridgeObjectRetain is changed slightly. Previously, it would mask off the spare bits from the native pointer and then call through to swift_retain. This either lost the spare bits in the return value (when tail calling swift_retain) which is problematic since it's supposed to return its parameter, or it required pushing a stack frame which is inefficient. Now, swift_retain takes on the responsibility of masking off spare bits from the parameter and preserving them in the return value. This is a trivial addition to the fast path (just a quick mask and an extra register for saving the original value) and makes bridgeObjectRetain quite a bit more efficient when implemented correctly to return the exact value it was passed.
The runtime's implementations of swift_retain/release are now also marked as preserve_most so that they can be tail called from the client library. preserve_most is compatible with callers expecting the standard calling convention so this doesn't break any existing clients. Some ugly tricks were needed to prevent the compiler from creating unnecessary stack frames with the new calling convention. Avert your eyes.
To allow back deployment, the runtime now has aliases for these functions called swift_retain_preservemost and swift_release_preservemost. The client library brings weak definitions of these functions that save the extra registers and call through to swift_retain/release. This allows them to work correctly on older runtimes, with a small performance penalty, while still running at full speed on runtimes that have the new preservemost symbols.
Although this is only supported on Darwin at the moment, it shouldn't be too much work to adapt it to other ARM64 targets. We need to ensure the assembly plays nice with the other platforms' assemblers, and make sure the implementation is correct for the non-ObjC-interop case.
rdar://122595871
Pass the frame from the outer yield_once_2 coroutine into the
swift_coro_de/alloc functions and from there into all four de/allocation
witnesses. This enables ABIs which key off of information stored in the
frame.
To enable ABIs which store extra info in the frame, add two new slots to
the coroutine allocator function table. For example, a frame could have
a header containing a context pointer at a negative offset from the
address returned from `swift_coro_alloc_frame`. The frame deallocation
function would then know to deallocate more space correspondingly.
Doing so enables allocators to contain additional context for use by
allocation functions. Because the allocator is already passed to
_swift_coro_alloc, on the fast path (no allocator, popless) no
allocation function is used, and the allocator is passed in the
swiftcoro register, this is cheap.
This is currently not wired up to anything. I am going to wire it up in
subsequent commits.
The reason why we are introducing this new Builtin type is to represent that we
are going to start stealing bits from the protocol witness table pointer of the
Optional<any Actor> that this type is bitwise compatible with. The type will
ensure that this value is only used in places where we know that it will be
properly masked out giving us certainty that this value will not be used in any
manner without it first being bit cleared and transformed back to Optional<any
Actor>.
swift_coroFrameAlloc was introduced in the Swift 6.2 runtime. Give it
the appropriate availability in IRGen, so that it gets weak
availability when needed (per the deployment target). Then, only
create the stub function for calling into swift_coroFrameAlloc or
malloc (when the former isn't available) when we're back-deploying to
a runtime prior to Swift 6.2. This is a small code size/performance
win when allocating coroutine frames on Swift 6.2-or-newer platforms.
This has a side effect of fixing a bug in Embedded Swift, where the
swift_coroFrameAlloc was getting unconditionally set to have weak
external linkage despite behind defined in the same LLVM module
(because it comes from the standard library).
Fixes rdar://149695139 / issue #80947.
Delay the emission of SIL global variables that aren't externally
visible until they are actually used. This is the same lazy emission
approach that we use for a number of other entities, such as SIL
functions.
Part of rdar://158363967.
When targeting a platform that predates the introduction of isolated
deinit, make a narrow exception that allows main-actor-isolated deinit
to work through a special, inlineable entrypoint that is
back-deployed. This implementation
1. Calls into the real implementation when available, otherwise
2. Checks if we're on the main thread, destroying immediately when
we are, otherwise
3. Creates a new task on the main actor to handle destruction.
This implementation is less efficient than the implementation in the
runtime, but allows us to back-deploy this functionality as far back
as concurrency goes.
Fixes rdar://151029118.
If `LinkEntity::isTypeKind()` is true, `IRGenModule::getAddrOfLLVMVariable` assumes that we can safely call
`LinkEntity::getType()`, which does `reinterpret_cast` of `LinkEntity::Pointer` to `TypeBase *`. However, for SIL
differentiability witness, the pointer has `SILDifferentiabilityWitness *` type, which is not derived from `TypeBase`. So, such a cast is not allowed.
Just as with `ProtocolWitnessTableLazyAccessFunction` and `ProtocolWitnessTableLazyCacheVariable` link entity kinds (which are also type kinds), we should use `SecondaryPointer` instead of `Pointer` for storing payload here, while setting `Pointer` to `nullptr`.
Calling setupLLVMOptimizationRemarks overwrites the MainRemarkStreamer
in the LLVM context. This prevents LLVM from serializing the remark meta
information for the already emitted SIL remarks into the object file.
Without the meta information bitstream remarks don't work correctly.
Instead, emit SIL remarks and LLVM remarks to the same RemarkSerializer,
and keep the file stream alive until after CodeGen.
When creating a block, ensure that we correctly associate the DLL
Storage on the `_NSConcreteStackBlock` root object declaration based
upon whether we are generating code with `-static-libclosure` being
passed to the clang importer or not.
Expose the createStructType helper to clients of IRGenModule which want
to define types which won't ever be used elsewhere. This is just a
convenience--such clients could already have directly used the API on
llvm::Module directly.
Remove a static variable that caches pointer value from ASTContext that
will become invalid if the same process is re-used for compilation using
a second ASTContext. This configuration is used under
`-enable-deterministic-check` option for output detertiminism checking.
rdar://147438789
When it's available, use an open-coded allocator function that returns
an alloca without popping if the allocator is nullptr and otherwise
calls swift_coro_alloc. When it's not available, use the malloc
allocator in the synchronous context.
Correct the IRGen for the standard library. The thinko here assumed that the else case would be evaluated for the standard library build. We ended up incorrectly handling the well-known VWTs from the runtime when building the standard library.
This adjusts the runtime function declaration handling to track the
owning module for the well known functions. This allows us to ensure
that we are able to properly identify if the symbol should be imported
or not when building the shared libraries. This will require a
subsequent tweak to allow for checking for static library linkage to
ensure that we do not mark the symbol as DLLImport when doing static
linking.
The well known builtin and structural types are strongly defined in the
runtime which is compacted into the standard library. Given that the VWT
is defined in the runtime, it is not visible to the Swift compilation
process and as we do not provide a Swift definition, we would previously
compute the linkage as being module external (`dllimport` for shared
library builds). This formed incorrect references to these variables and
would require thunking to adjust the references.
One special case that we add here is the "any function" type
representation (`@escaping () -> ()`) as we do use the VWT for this type
in the standard library but do not consider it part of the well known
builtin or structural type enumeration.
These errors were previously being swallowed by the build system and
thus escaped from being fixed when the other cases of incorrect DLL
storage were.
Put AvailabilityRange into its own header with very few dependencies so that it
can be included freely in other headers that need to use it as a complete type.
NFC.
Remove `IRGenModule::useDllStorage()` as there is a standalone version
that is available and the necessary information is public from the
`IRGenModule` type. Additionally, avoid the wrapped `isStandardLibrary`
preferring to use the same method off of the public accessors. This
works towards removing some of the standard library special casing so
that it is possible to support both static and dynamic standard
libraries on Windows.