This is currently disabled by default. Building the client library can be enabled with the CMake option SWIFT_BUILD_CLIENT_RETAIN_RELEASE, and using the library can be enabled with the flags -Xfrontend -enable-client-retain-release.
To improve retain/release performance, we build a static library containing optimized implementations of the fast paths of swift_retain, swift_release, and the corresponding bridgeObject functions. This avoids going through a stub to make a cross-library call.
IRGen gains awareness of these new functions and emits calls to them when the functionality is enabled and the target supports them. Two options are added to force use of them on or off: -enable-client-retain-release and -disable-client-retain-release. When enabled, the compiler auto-links the static library containing the implementations.
The new calls also use LLVM's preserve_most calling convention. Since retain/release doesn't need a large number of scratch registers, this is mostly harmless for the implementation, while allowing callers to improve code size and performance by spilling fewer registers around refcounting calls. (Experiments with an even more aggressive calling convention preserving x2 and up showed an insignificant savings in code size, so preserve_most seems to be a good middle ground.)
Since the implementations are embedded into client binaries, any change in the runtime's refcounting implementation needs to stay compatible with this new fast path implementation. This is ensured by having the implementation use a runtime-provided mask to check whether it can proceed into its fast path. The mask is provided as the address of the absolute symbol _swift_retainRelease_slowpath_mask_v1. If that mask ANDed with the object's current refcount field is non-zero, then we take the slow path. A future runtime that changes the refcounting implementation can adjust this mask to match, or set the mask to all 1s to disable the old embedded fast path entirely (as long as the new representation never uses 0 as a valid refcount field value).
As part of this work, the overall approach for bridgeObjectRetain is changed slightly. Previously, it would mask off the spare bits from the native pointer and then call through to swift_retain. This either lost the spare bits in the return value (when tail calling swift_retain) which is problematic since it's supposed to return its parameter, or it required pushing a stack frame which is inefficient. Now, swift_retain takes on the responsibility of masking off spare bits from the parameter and preserving them in the return value. This is a trivial addition to the fast path (just a quick mask and an extra register for saving the original value) and makes bridgeObjectRetain quite a bit more efficient when implemented correctly to return the exact value it was passed.
The runtime's implementations of swift_retain/release are now also marked as preserve_most so that they can be tail called from the client library. preserve_most is compatible with callers expecting the standard calling convention so this doesn't break any existing clients. Some ugly tricks were needed to prevent the compiler from creating unnecessary stack frames with the new calling convention. Avert your eyes.
To allow back deployment, the runtime now has aliases for these functions called swift_retain_preservemost and swift_release_preservemost. The client library brings weak definitions of these functions that save the extra registers and call through to swift_retain/release. This allows them to work correctly on older runtimes, with a small performance penalty, while still running at full speed on runtimes that have the new preservemost symbols.
Although this is only supported on Darwin at the moment, it shouldn't be too much work to adapt it to other ARM64 targets. We need to ensure the assembly plays nice with the other platforms' assemblers, and make sure the implementation is correct for the non-ObjC-interop case.
rdar://122595871
This adjusts the runtime function declaration handling to track the
owning module for the well known functions. This allows us to ensure
that we are able to properly identify if the symbol should be imported
or not when building the shared libraries. This will require a
subsequent tweak to allow for checking for static library linkage to
ensure that we do not mark the symbol as DLLImport when doing static
linking.
Adds sections `__TEXT,__swift_as_entry`, and `__TEXT,__swift_as_ret` that
contain relative pointers to async functlets modelling async function entries,
and function returns, respectively.
Emission of the sections can be trigger with the frontend option
`-Xfrontend -enable-async-frame-push-pop-metadata`.
This is done by:
* IRGen adding a `async_entry` function attribute to async functions.
* LLVM's coroutine splitting identifying continuation funclets that
model the return from an async function call by adding the function
attribute `async_ret`. (see #llvm-project/pull/9204)
* An LLVM pass that keys off these two function attribute and emits the
metadata into the above mention sections.
rdar://134460666
Although I don't plan to bring over new assertions wholesale
into the current qualification branch, it's entirely possible
that various minor changes in main will use the new assertions;
having this basic support in the release branch will simplify that.
(This is why I'm adding the includes as a separate pass from
rewriting the individual assertions)
This would be done automatically if the module was passed into
`Function::Create`, but we're quite specific about its insert location,
so just set it manually instead.
LLVM is presumably moving towards `std::string_view` -
`StringRef::startswith` is deprecated on tip. `SmallString::startswith`
was just renamed there (maybe with some small deprecation inbetween, but
if so, we've missed it).
The `SmallString::startswith` references were moved to
`.str().starts_with()`, rather than adding the `starts_with` on
`stable/20230725` as we only had a few of them. Open to switching that
over if anyone feels strongly though.
`ReadOnly, ArgMemOnly` previously meant `can only read argument memory`.
But with the rebranch changes, this became `(can only read *all* memory)
and (can read or write argument memory)`. Use `ArgMemReadOnly` for this
instead.
To expand on this, these attributes (prior to memory effects) used to be
split into two. There was the *kind* of access, eg.
```
readnone
readonly
writeonly
```
and the accessed *location*, eg.
```
argmemonly
inaccessiblememonly
inaccessiblemem_or_argmemonly
```
So `RuntimeFunctions.def` would use `ReadOnly, ArgMemOnly` to mean `can
only read argument memory`.
In the previous rebranch commits, this was changed such that `ReadOnly`
mapped to `MemoryEffectsBase::readOnly()` and `ArgMemOnly` to
`MemoryEffectsBase::argMemOnly()`.
And there lies the issue -
- `MemoryEffectsBase::readOnly()` == `MemoryEffectsBase(Ref)` ie. all
locations can only read
- `MemoryEffectsBase::argMemOnly()` == `MemoryEffectsBase(ArgMem,
ModRef)`, ie. can only access argument memory
But then OR'ing those together this would become:
```
ArgMem: ModRef, InaccessibleMem: Ref, Other: Ref
```
rather than the previously intended:
```
ArgMem: Ref, InaccessibleMem: NoModRef, Other: NoModRef
```
LLVM deprecated, renamed, and removed a bunch of APIs. This patch
contains a lot of the changes needed to deal with that.
The SetVector type changed the template parameters.
APInt updated multiple names, countPopulation became popcount,
getAllOnesValue became getAllOnes, getNullValue became getZero, etc...
Clang type nullability check stopped taking a clang AST context.
The LLVM IRGen Function type stopped exposing basic block list directly,
but gained enough API surface that the translation isn't too bad.
(GenControl.cpp, LLVMMergeFunctions.cpp)
llvm::Optional had a transform function. That was being used in a couple
of places, so I've added a new implementation under STLExtras that
transforms valid optionals, otherwise it returns nullopt.
`getValue` -> `value`
`getValueOr` -> `value_or`
`hasValue` -> `has_value`
`map` -> `transform`
The old API will be deprecated in the rebranch.
To avoid merge conflicts, use the new API already in the main branch.
rdar://102362022
In preparation for moving to llvm's opaque pointer representation
replace getPointerElementType and CreateCall/CreateLoad/Store uses that
dependent on the address operand's pointer element type.
This means an `Address` carries the element type and we use
`FunctionPointer` in more places or read the function type off the
`llvm::Function`.
This has a few nice benefits:
1. The splitting happens after LLVM optimizations have run. This ensures that
LLVM will not join these blocks no matter what! The author of this commit has
found that in certain cases LLVM does this even at -Onone. By running this late,
we get the benefit we are looking for: working around the bad SelectionDAG
behavior.
2. This block splitting is just a workaround for the above mentioned unfortunate
SelectionDAG behavior. By doing this when we remove the workaround, we will not
have to update SIL level tests... instead we will just remove a small LLVM pass.
Some additional notes:
1. Only moved values will ever have llvm.dbg.addr emitted today, so we do not
have to worry about this impacting the rest of the language.
2. The pass's behavior is tested at the IR level by move_function_dbginfo.swift.
- 3e1c787b3160bed4146d3b2b5f922aeed3caafd7 `arg_operands` was replaced with `args`.
- 80ea2bb57450a65cc724565ecfc9971ad93a3f15 `get*Attributes` was replaced with `get*Attrs`
In low level LLVMARCOptimizer, during canonicalization we don't rauw the result of RT_Retain with its arg similarly to RT_ObjCRetain and RT_BridgeRetain.
And during performLocalReleaseMotion, we assert that we have canonicalized RT_Retain.
In a release compiler, if we optimize such an RT_Retain with a RT_Release, then this can result in a compiler crash
Similarly not rauw'ing, can cause a crash due to performLocalRetainMotion
Fixes rdar://79238115
Otherwise we set it on all targets/languages in a subdirectory (I forgot if it
propagates up). Regardless, this type of viral stuff is something we want to
move away from since it creates a code that is a "forall" piece of code rather
than a piece of code that only effects a single target.
I also conditionalized the actual definitions being added on the compiled file's
language being C,CXX,OBJC,OBJCXX since as we add Swift sources to the host side
of the compiler, we will not want these flags to propagate to Swift sources.
It breaks withCheckedContinuation.
I have not gotten to the bottom of why. But for now disable the pass for
all async functions until we can fix it.
rdar://77166575