The majority of support comes in the form of emitting partial
application forwarders for partial applications of async functions.
Such a partial application forwarder must take an async context which
has been partially populated at the apply site. It is responsible for
populating it "the rest of the way". To do so, like sync partial
application forwarders, it takes a second argument, its context, from
which it pulls the additional arguments which were capture at
partial_apply time.
The size of the async context that is passed to the forwarder, however,
can't be known at the apply site by simply looking at the signature of
the function to be applied (not even by looking at the size associated
with the function in the special async function pointer constant which
will soon be emitted). The reason is that there are an unknown (at the
apply site) number of additional arguments which will be filled by the
partial apply forwarder (and in the case of repeated partial
applications, further filled in incrementally at each level). To enable
this, there will always be a heap object for thick async functions.
These heap objects will always store the size of the async context to be
allocated as their first element. (Note that it may be possible to
apply the same optimization that was applied for thick sync functions
where a single refcounted object could be used as the context; doing so,
however, must be made to interact properly with the async context size
stored in the heap object.)
To continue to allow promoting thin async functions to thick async
functions without incurring a thunk, at the apply site, a null-check
will be performed on the context pointer. If it is null, then the async
context size will be determined based on the signature. (When async
function pointers become pointers to a constant with a size i32 and a
relative address to the underlying function, the size will be read from
that constant.) When it is not-null, the size will be pulled from the
first field of the context (which will in that case be cast to
<{%swift.refcounted, i32}>).
To facilitate sharing code and preserving the original structure of
emitPartialApplicationForwarder (which weighed in at roughly 700 lines
prior to this change), a new small class hierarchy, descending from
PartialApplicationForwarderEmission has been added, with subclasses for
the sync and async case. The shuffling of arguments into and out of the
final explosion that was being performed in the synchronous case has
been preserved there, though the arguments are added and removed through
a number of methods on the superclass with more descriptive names. That
was necessary to enable the async class to handle these different
flavors of parameters correctly.
To get some initial test coverage, the preexisting
IRGen/partial_apply.sil and IRGen/partial_apply_forwarder.sil tests have
been duplicated into the async folder. Those tests cases within these
files which happened to have been crashing have each been extracted into
its own runnable test that both verifies that the compiler does not
crash and also that the partial application forwarder behaves correctly.
The FileChecks in these tests are extremely minimal, providing only
enough information to be sure that arguments are in fact squeezed into
an async context.
Add AccesssedStorage::compute and computeInScope to mirror AccessPath.
Allow recovering the begin_access for Nested storage.
Adds AccessedStorage.visitRoots().
Things that have come up recently but are somewhat blocked on this:
- Moving AccessMarkerElimination down in the pipeline
- SemanticARCOpts correctness and improvements
- AliasAnalysis improvements
- LICM performance regressions
- RLE/DSE improvements
Begin to formalize the model for valid memory access in SIL. Ignoring
ownership, every access is a def-use chain in three parts:
object root -> formal access base -> memory operation address
AccessPath abstracts over this path and standardizes the identity of a
memory access throughout the optimizer. This abstraction is the basis
for a new AccessPathVerification.
With that verification, we now have all the properties we need for the
type of analysis requires for exclusivity enforcement, but now
generalized for any memory analysis. This is suitable for an extremely
lightweight analysis with no side data structures. We currently have a
massive amount of ad-hoc memory analysis throughout SIL, which is
incredibly unmaintainable, bug-prone, and not performance-robust. We
can begin taking advantage of this verifably complete model to solve
that problem.
The properties this gives us are:
Access analysis must be complete over memory operations: every memory
operation needs a recognizable valid access. An access can be
unidentified only to the extent that it is rooted in some non-address
type and we can prove that it is at least *not* part of an access to a
nominal class or global property. Pointer provenance is also required
for future IRGen-level bitfield optimizations.
Access analysis must be complete over address users: for an identified
object root all memory accesses including subobjects must be
discoverable.
Access analysis must be symmetric: use-def and def-use analysis must
be consistent.
AccessPath is merely a wrapper around the existing accessed-storage
utilities and IndexTrieNode. Existing passes already very succesfully
use this approach, but in an ad-hoc way. With a general utility we
can:
- update passes to use this approach to identify memory access,
reducing the space and time complexity of those algorithms.
- implement an inexpensive on-the-fly, debug mode address lifetime analysis
- implement a lightweight debug mode alias analysis
- ultimately improve the power, efficiency, and maintainability of
full alias analysis
- make our type-based alias analysis sensistive to the access path
Previously, EmitPolymorphicParameters dealt directly with an Explosion
from which it pulled values. In one place, there was a conditional
check for async which handled some cases. There was however another
place where the polymorphic parameter was pulled directly from the
explosion. That missed case resulted in attempting to pull a
polymorphic parameter directly from an Explosion which contains only a
%swift.context* per the async calling convention.
Here, those parameters are now pulled from an EntryPointArgumentEmission
subclasses of which are able to provide the relevant definition of what
pulling a parameter means.
rdar://problem/70144083
Previously, the AsyncContextLayout did not make space for the trailing
witness fields (self metadata and self witness table) and the
AsyncNativeCCEntryPointArgumentEmission could consequently not vend
these fields. Here, the fields are added to the layout.
Previously the methods for getting the index into the layout were public
and were being used to directly access the underlying buffer. Here,
that abstraction leakage is fixed and field access is forced to go
through the appropriate methods.
Here, the following is implemented:
- Construction of SwiftContext struct with the fields needed for calling
functions.
- Allocating and deallocating these swift context via runtime calls
before calling async functions and after returning from them.
- Storing arguments (including bindings and the self parameter but not
including protocol fields for witness methods) and returns (both
direct and indirect).
- Calling async functions.
Additional things that still need to be done:
- protocol extension methods
- protocol witness methods
- storing yields
- partial applies
`get_async_continuation[_addr]` begins a suspend operation by accessing the continuation value that can resume
the task, which can then be used in a callback or event handler before executing `await_async_continuation` to
suspend the task.
When we try to emit debug info from C++ types we crash because we can't deserialize them. This is a temporary fix to circumnavigate the crash before a real fix can be created.
Today unchecked_bitwise_cast returns a value with ObjCUnowned ownership. This is
important to do since the instruction can truncate memory meaning we want to
treat it as a new object that must be copied before use.
This means that in OSSA we do not have a purely ossa forwarding unchecked
layout-compatible assuming cast. This role is filled by unchecked_value_cast.
The ``base_addr_for_offset`` instruction creates a base address for offset calculations.
The result can be used by address projections, like ``struct_element_addr``, which themselves return the offset of the projected fields.
IR generation simply creates a null pointer for ``base_addr_for_offset``.
* [SILGenFunction] Don't create redundant nested debug scopes
Instead of emitting:
```
sil_scope 4 { loc "main.swift":6:19 parent 3 }
sil_scope 5 { loc "main.swift":7:3 parent 4 }
sil_scope 6 { loc "main.swift":7:3 parent 5 }
sil_scope 7 { loc "main.swift":7:3 parent 5 }
sil_scope 8 { loc "main.swift":9:5 parent 4 }
```
Emit:
```
sil_scope 4 { loc "main.swift":6:19 parent 3 }
sil_scope 5 { loc "main.swift":7:3 parent 4 }
sil_scope 6 { loc "main.swift":9:5 parent 5 }
```
* [IRGenSIL] Diagnose conflicting shadow copies
If we attempt to store a value with the wrong type into a slot reserved
for a shadow copy, diagnose what went wrong.
* [SILGenPattern] Defer debug description of case variables
Create unique nested debug scopes for a switch, each of its case labels,
and each of its case bodies. This looks like:
```
switch ... { // Enter scope 1.
case ... : // Enter scope 2, nested within scope 1.
<body-1> // Enter scope 3, nested within scope 2.
case ... : // Enter scope 4, nested within scope 1.
<body-2> // Enter scope 5, nested within scope 4.
}
```
Use the new scope structure to defer emitting debug descriptions of case
bindings. Specifically, defer the work until we can nest the scope for a
case body under the scope for a pattern match.
This fixes SR-7973, a problem where it was impossible to inspect a case
binding in lldb when stopped at a case with multiple items.
Previously, we would emit the debug descriptions too early (in the
pattern match), leading to duplicate/conflicting descriptions. The only
reason that the ambiguous description was allowed to compile was because
the debug scopes were nested incorrectly.
rdar://41048339
* Update tests
`DifferentiableFunctionInst` now stores result indices.
`SILAutoDiffIndices` now stores result indices instead of a source index.
`@differentiable` SIL function types may now have multiple differentiability
result indices and `@noDerivative` resutls.
`@differentiable` AST function types do not have `@noDerivative` results (yet),
so this functionality is not exposed to users.
Resolves TF-689 and TF-1256.
Infrastructural support for TF-983: supporting differentiation of `apply`
instructions with multiple active semantic results.
1) Shrink from 4 bytes to 1 byte.
2) Assert that alignments are power-of-two at run time.
3) Assert that alignments are power-of-two at compile time when possible.
4) Eliminate "isZero"/"no alignment" from the type. Use llvm::Optional.
* a new [immutable] attribute on ref_element_addr and ref_tail_addr
* new instructions: begin_cow_mutation and end_cow_mutation
These new instructions are intended to be used for the stdlib's COW containers, e.g. Array.
They allow more aggressive optimizations, especially for Array.
Methods and closures use the same convention for the self/context argument, so when a
partial_apply applies a single argument to a method implementation, the closure can be formed
by simply tupling the implementation and self argument together. This doesn't yet cover a lot
of situations the compiler generates, but it prepares IRGen for an IRGen SIL pass that can
take the heavy lifting of partial_apply lowering off of IRGen by reducing more complex closures
into these "simple" cases, eventually reducing the number of partial_apply forwarding thunks and
extra closure allocations generated.
This became necessary after recent function type changes that keep
substituted generic function types abstract even after substitution to
correctly handle automatic opaque result type substitution.
Instead of performing the opaque result type substitution as part of
substituting the generic args the underlying type will now be reified as
part of looking at the parameter/return types which happens as part of
the function convention apis.
rdar://62560867
Add `linear_function` and `linear_function_extract` instructions.
`linear_function` creates a `@differentiable(linear)` function-typed value from
an original function operand and a transpose function operand (optional).
`linear_function_extract` extracts either the original or transpose function
value from a `@differentiable(linear)` function.
Resolves TF-1142 and TF-1143.
Add `differentiable_function` and `differentiable_function_extract`
instructions.
`differentiable_function` creates a `@differentiable` function-typed
value from an original function operand and derivative function operands
(optional).
`differentiable_function_extract` extracts either the original or
derivative function value from a `@differentiable` function.
The differentiation transform canonicalizes `differentiable_function`
instructions, filling in derivative function operands if missing.
Resolves TF-1139 and TF-1140.
I enabled this recently by piggybacking it in the mega-commit:
commit badc5658bb
Author: Andrew Trick <atrick@apple.com>
Date: Mon Mar 2 16:23:53 2020
Fix SIL MemBehavior queries with access markers.
It miscompiles SwiftNIO's NIOPerformanceTester benchmark causing it to
segfault immediately.
We may consider debugging this to reenable it in the future...
<rdar://60068448> Teach IRGen to emit !invariant metadata for 'let' variables
Fixes <rdar://60038603> Swift CI: SwiftNIO_macOS NIOHTTP2_macOS fails
This is in prepration for other bug fixes.
Clarify the SIL utilities that return canonical address values for
formal access given the address used by some memory operation:
- stripAccessMarkers
- getAddressAccess
- getAccessedAddress
These are closely related to the code in MemAccessUtils.
Make sure passes use these utilities consistently so that
optimizations aren't defeated by normal variations in SIL patterns.
Create an isLetAddress() utility alongside these basic utilities to
make sure it is used consistently with the address corresponding to
formal access. When this query is used inconsistently, it defeats
optimization. It can also cause correctness bugs because some
optimizations assume that 'let' initialization is only performed on a
unique address value.
Functional changes to Memory Behavior:
- An instruction with side effects now conservatively still has side
effects even when the queried value is a 'let'. Let values are
certainly sensitive to side effects, such as the parent object being
deallocated.
- Return the correct MemBehavior for begin/end_access markers.
The `differentiability_witness_function` instruction looks up a
differentiability witness function (JVP, VJP, or transpose) for a referenced
function via SIL differentiability witnesses.
Add round-trip parsing/serialization and IRGen tests.
Notes:
- Differentiability witnesses for linear functions require more support.
`differentiability_witness_function [transpose]` instructions do not yet
have IRGen.
- Nothing currently generates `differentiability_witness_function` instructions.
The differentiation transform does, but it hasn't been upstreamed yet.
Resolves TF-1141.
Before comparing the potential sugared type for equality is needs to be mapped
into the context to resolve generic type parameters to primary archetypes.
<rdar://problem/59238327>
The 'fake use' asm gadget does not always keep variables alive. E.g., in
rdar://57754659, llvm was unable to preserve two local variables despite
the use of these gadgets. These variables were backed by a LoadInst and
an ExtractValueInst respectively. Instead of emitting shadow copies for
just those kinds of instructions, use shadow copies exclusively. This
may cause more variables to appear in the debugger window before they
are initialized, but should result in fewer variables being dropped.
rdar://57754659
In 1b2842bf902a8b52acbef2425120533b63be5ae3 this API was changed.
It seems the right thing to do is turn this into the new alignment
type by doing MaybeAlign(alignment) which is what this patch does.