* Move differentiability kinds from target function type metadata to trailing objects so that we don't exhaust all remaining bits of function type metadata.
* Differentiability kind is now stored in a tail-allocated word when function type flags say it's differentiable, located immediately after the normal function type metadata's contents (with proper alignment in between).
* Add new runtime function `swift_getFunctionTypeMetadataDifferentiable` which handles differentiable function types.
* Fix mangling of different differentiability kinds in function types. Mangle it like `ConcurrentFunctionType` so that we can drop special cases for escaping functions.
```
function-signature ::= params-type params-type async? sendable? throws? differentiable? // results and parameters
...
differentiable ::= 'jf' // @differentiable(_forward) on function type
differentiable ::= 'jr' // @differentiable(reverse) on function type
differentiable ::= 'jd' // @differentiable on function type
differentiable ::= 'jl' // @differentiable(_linear) on function type
```
Resolves rdar://75240064.
The immediate desire is to minimize the set of ABI dependencies
on the layout of an ExecutorRef. In addition to that, however,
I wanted to generally reduce the code size impact of an unsafe
continuation since it now requires accessing thread-local state,
and I wanted resumption to not have to create unnecessary type
metadata for the value type just to do the initialization.
Therefore, I've introduced a swift_continuation_init function
which handles the default initialization of a continuation
and returns a reference to the current task. I've also moved
the initialization of the normal continuation result into the
caller (out of the runtime), and I've moved the resumption-side
cmpxchg into the runtime (and prior to the task being enqueued).
Tasks shouldn't normally hog the actor context indefinitely after making a call that's bound to
that actor, since that prevents the actor from potentially taking on other jobs it needs to
be able to address. Set up SILGen so that it saves the current executor (using a new runtime
entry point) and hops back to it after every actor call, not only ones where the caller context
is also actor-bound.
The added executor hopping here also exposed a bug in the runtime implementation while processing
DefaultActor jobs, where if an actor job returned to the processing loop having already yielded
the thread back to a generic executor, we would still attempt to make the actor give up the thread
again, corrupting its state.
rdar://71905765
Most of the async runtime functions have been changed to not
expect the task and executor to be passed in. When knowing the
task and executor is necessary, there are runtime functions
available to recover them.
The biggest change I had to make to a runtime function signature
was to swift_task_switch, which has been altered to expect to be
passed the context and resumption function instead of requiring
the caller to park the task. This has the pleasant consequence
of allowing the implementation to very quickly turn around when
it recognizes that the current executor is satisfactory. It does
mean that on arm64e we have to sign the continuation function
pointer as an argument and then potentially resign it when
assigning into the task's resume slot.
rdar://70546948
In their previous form, the non-`_f` variants of these entry points were unused, and IRGen
lowered the `createAsyncTask` builtins to use the `_f` variants with a large amount of caller-side
codegen to manually unpack closure values. Amid all this, it also failed to make anyone responsible
for releasing the closure context after the task completed, causing every task creation to leak.
Redo the `swift_task_create_*` entry points to accept the two words of an async closure value
directly, and unpack the closure to get its invocation entry point and initial context size
inside the runtime. (Also get rid of the non-future `swift_task_create` variant, since it's unused
and it's subtly different in a lot of hairy ways from the future forms. Better to add it later
when it's needed than to have a broken unexercised version now.)
I'm about to replace these with new variations that correctly handle spawning a task with
a closure as its initial entry point, without leaking or requiring large amounts of caller-side
code generation.
This is conditional on UseAsyncLowering and in the future should also be
conditional on `clangTargetInfo.isSwiftAsyncCCSupported()` once that
support is merged.
Update tests to work either with swiftcc or swifttailcc.
The Swift compiler incorrectly sets the LLVM "ReadNone" attribute when
declaring swift_getObjCClassFromObject and swift_projectBox. This
means that the LLVM ARC optimizer will hoist ARC "release" operations
above these runtime calls. Since the implementation of the calls reads
the object, this causes a use-after-free crash.
This problem is easy to reproduce with
swift_getObjCClassFromObject. It was only exposed recently because the
SIL optimizer now shrinks object lifetimes, making it easier for LLVM
optimizations to kick in. It is difficult to expose the problem with
swift_projectBox, but the bug/fix is still obvious.
Fixes rdar://73820091: Use-after free application crash.
It would be more abstractly correct if this got DI support so
that we destroy the member if the constructor terminates
abnormally, but we can get to that later.
In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.
When a function contains a differentiated loop, the closure context is a `Builtin.NativeObject`, which contains a `swift::AutoDiffLinearMapContext` and a tail-allocated top-level linear map struct (which represents the linear map struct that was previously directly partial-applied into the pullback). In branching trace enums, the payloads of previously indirect cases will be allocated by `swift::AutoDiffLinearMapContext::allocate` and stored as a `Builtin.RawPointer`.
Thick async functions store their async context size in the closure
context. Only if the closure context is nil can we assume the
partial_apply_forwarder function to be the address of an async function
pointer struct value.
`Builtin.createAsyncTask` takes flags, an optional parent task, and an
async/throwing function to execute, and passes it along to the
`swift_task_create_f` entry point to create a new (potentially child)
task, returning the new task and its initial context.
Implement a new builtin, `cancelAsyncTask()`, to cancel the given
asynchronous task. This lowers down to a call into the runtime
operation `swift_task_cancel()`.
Use this builtin to implement Task.Handle.cancel().
Add a new entry point for getting generic metadata which adds the
canonical metadata records attached to the nominal type descriptor to
the metadata cache.
Change the implementation of the primary entry-point
swift_getGenericMetadata to stop looking through canonical
prespecialized records.
Change the implementation of swift_getCanonicalSpecializedMetadata to
use the caching token attached to the nominal type descriptor to add
canonical prespecialized metadata records to the metadata cache only
once rather than using the cache variables to limit the number of times
the attempt was made.
The new function swift_getCanonicalSpecializedMetadata takes a metadata
request, a prespecialized non-canonical metadata, and a cache as its
arguments. The idea of the function is either to bless the provided
prespecialized metadata as canonical if there is not currently a
canonical metadata record for the type it describes or else to return
the actual canonical metadata.
When called, the metadata cache checks for a preexisting entry for this
metadata. If none is found, the passed-in prespecialized metadata is
added to the cache. Otherwise, the metadata record found in the cache
is returned.
rdar://problem/56995359
The new function swift_compareProtocolConformanceDescriptors calls
through to the preexisting code in MetadataCacheKey which has been
extracted out from MetadataCacheKey::compareWitnessTables into a new
public static function
MetadataCacheKey::compareProtocolConformanceDescriptors.
The new function's availability is "future" for now.
The new function `swift_compareTypeContextDescriptors` is equivalent to
a call through to swift::equalContexts. The implementation it the same
as that of swift::equalContexts with the following removals:
- Handling of context descriptors of kind other outside of
ContextDescriptorKind::Type_First...ContextDescriptorKind::Type_Last.
Because the arguments are both TypeContextDescriptors, the kinds are
known to fall within that range.
- Casting to TypeContextDescriptor. The arguments are already of that
type.
For now, the new function has "future" availability.
use getTypeByMangledName when abstract metadata state is requested
This can significantly reduce the code size of apps constructing deeply
nested types with conditional conformances.
Requires a new runtime.
rdar://57157619
When we generate code that asks for complete metadata for a fully concrete specific type that
doesn't have trivial metadata access, like `(Int, String)` or `[String: [Any]]`,
generate a cache variable that points to a mangled name, and use a common accessor function
that turns that cache variable into a pointer to the instantiated metadata. This saves a bunch
of code size, and should have minimal runtime impact, since the demangling of any string only
has to happen once.
This mostly just works, though it exposed a couple of issues:
- Mangling a type ref including objc protocols didn't cause the objc protocol record to get
instantiated. Fixed as part of this patch.
- The runtime type demangler doesn't correctly handle retroactive conformances. If there are
multiple retroactive conformances in a process at runtime, then even though the mangled string
refers to a specific conformance, the runtime still just picks one without listening to the
mangler. This is left to fix later, rdar://problem/53828345.
There is some more follow-up work that we can do to further improve the gains:
- We could improve the runtime-provided entry points, adding versions that don't require size
to be cached, and which can handle arbitrary metadata requests. This would allow for mangled
names to also be used for incomplete metadata accesses and improve code size of some generic
type accessors. However, we'd only be able to take advantage of the new entry points in
OSes that ship a new runtime.
- We could choose to always symbolic reference all type references, which would generally reduce
the size of mangled strings, as well as make runtime demangling more efficient, since it wouldn't
need to hit the runtime caches. This would however require that we be able to handle symbolic
references across files in the MetadataReader in order to avoid regressing remote mirror
functionality.
dynamic-replacement runtime functions.
The recent change of how we do dynamic replacements added 2 new runtime
functions. This patch adds those functions to the Compatibility50 static
archive.
This will allow backward deployment to a swift 5.0 runtime.
Patch by Erik Eckstein with a modification to call the standard
libraries implementation (marked as weak) when it is available.
This ensures we can change the implementation in the future and are not
ABI locked.
rdar://problem/51601233
Instead of a thunk insert the dispatch into the original function.
If the original function should be executed the prolog just jumps to the "real" code in the function. Otherwise the replacement function is called.
There is one little complication here: when the replacement function calls the original function, the original function should not dispatch to the replacement again.
To pass this information, we use a flag in thread local storage.
The setting and reading of the flag is done in two new runtime functions.
rdar://problem/51043781
When backward deploying to an OS that may not have these entry points, weak-link them so that they
can be used conditionally in availability contexts that check for them.
rdar://problem/50731151
This is essentially a long-belated follow-up to Arnold's #12606.
The key observation here is that the enum-tag-single-payload witnesses
are strictly more powerful than the XI witnesses: you can simulate
the XI witnesses by using an extra case count that's <= the XI count.
Of course the result is less efficient than the XI witnesses, but
that's less important than overall code size, and we can work on
fast-paths for that.
The extra inhabitant count is stored in a 32-bit field (always present)
following the ValueWitnessFlags, which now occupy a fixed 32 bits.
This inflates non-XI VWTs on 32-bit targets by a word, but the net effect
on XI VWTs is to shrink them by two words, which is likely to be the
more important change. Also, being able to access the XI count directly
should be a nice win.
Currently ignored, but this will allow future compilers to pass down source location information for cast
failure runtime errors without backward deployment constraints.
Introduce a new runtime entry point, swift_getAssociatedConformanceWitness(),
which extracts an associated conformance witness from a witness table.
Teach IRGen to use this entry point rather than loading the witness
from the witness table and calling it directly.
There’s no advantage to doing this now, but it is staging for changing the
representation of associated conformances in witness tables.
Runtime functions need to use the Swift calling convention for any function
returning MetadataResponse, so that we get the two values returned in separate
registers.
Fixes rdar://problem/45042971 and rdar://problem/45851050.
This runtime function doesn’t always perform instantiation; it’s how we
get a witness table given a conformance, type, and set of instantiation
arguments. Name it accordingly.
Witness table accessors return a witness table for a given type's
conformance to a protocol. They are called directly from IRGen
(when we need the witness table instance) and from runtime conformance
checking (swift_conformsToProtocol digs the access function out of the
protocol conformance record). They have two interesting functions:
1) For witness tables requiring instantiation, they call
swift_instantiateWitnessTable directly.
2) For synthesized witness tables that might not be unique, they call
swift_getForeignWitnessTable.
Extend swift_instantiateWitnessTable() to handle both runtime
uniquing (for #2) as well as handling witness tables that don't have
a "generic table", i.e., don't need any actual instantiation. Use it
as the universal entry point for "get a witness table given a specific
conformance descriptor and type", eliminating witness table accessors
entirely.
Make a few related simplifications:
* Drop the "pattern" from the generic witness table. Instead, store
the pattern in the main part of the conformance descriptor, always.
* Drop the "conformance kind" from the protocol conformance
descriptor, since it was only there to distinguish between witness
table (pattern) vs. witness table accessor.
* Internalize swift_getForeignWitnessTable(); IRGen no longer needs to
call it.
Reduces the code size of the standard library (+assertions build) by
~149k.
Addresses rdar://problem/45489388.
Collapse the generic witness table, which was used only as a uniquing
data structure during witness table instantiation, into the protocol
conformance record. This colocates all of the constant protocol conformance
metadata and makes it possible for us to recover the generic witness table
from the conformance descriptor (including looking at the pattern itself).
Rename swift_getGenericWitnessTable() to swift_instantiateWitnessTable()
to make it clearer what its purpose is, and take the conformance descriptor
directly.
Have clients pass the requirement base descriptor to
swift_getAssociatedTypeWitness(), so that the witness index is just one
subtraction away, avoiding several dependent loads (witness table ->
conformance descriptor -> protocol descriptor -> requirement offset)
in the hot path.