Add AccesssedStorage::compute and computeInScope to mirror AccessPath.
Allow recovering the begin_access for Nested storage.
Adds AccessedStorage.visitRoots().
Things that have come up recently but are somewhat blocked on this:
- Moving AccessMarkerElimination down in the pipeline
- SemanticARCOpts correctness and improvements
- AliasAnalysis improvements
- LICM performance regressions
- RLE/DSE improvements
Begin to formalize the model for valid memory access in SIL. Ignoring
ownership, every access is a def-use chain in three parts:
object root -> formal access base -> memory operation address
AccessPath abstracts over this path and standardizes the identity of a
memory access throughout the optimizer. This abstraction is the basis
for a new AccessPathVerification.
With that verification, we now have all the properties we need for the
type of analysis requires for exclusivity enforcement, but now
generalized for any memory analysis. This is suitable for an extremely
lightweight analysis with no side data structures. We currently have a
massive amount of ad-hoc memory analysis throughout SIL, which is
incredibly unmaintainable, bug-prone, and not performance-robust. We
can begin taking advantage of this verifably complete model to solve
that problem.
The properties this gives us are:
Access analysis must be complete over memory operations: every memory
operation needs a recognizable valid access. An access can be
unidentified only to the extent that it is rooted in some non-address
type and we can prove that it is at least *not* part of an access to a
nominal class or global property. Pointer provenance is also required
for future IRGen-level bitfield optimizations.
Access analysis must be complete over address users: for an identified
object root all memory accesses including subobjects must be
discoverable.
Access analysis must be symmetric: use-def and def-use analysis must
be consistent.
AccessPath is merely a wrapper around the existing accessed-storage
utilities and IndexTrieNode. Existing passes already very succesfully
use this approach, but in an ad-hoc way. With a general utility we
can:
- update passes to use this approach to identify memory access,
reducing the space and time complexity of those algorithms.
- implement an inexpensive on-the-fly, debug mode address lifetime analysis
- implement a lightweight debug mode alias analysis
- ultimately improve the power, efficiency, and maintainability of
full alias analysis
- make our type-based alias analysis sensistive to the access path
...and avoid reallocation.
This is immediately necessary for LICM, in addition to its current
uses. I suspect this could be used by many passes that work with
addresses. RLE/DSE should absolutely migrate to it.
Even values of trivial type can contain a Builtin.RawPointer, which can be used to read/write from/to.
To compensate for the removed check, enable the escape-analysis check in MemBehavior (as it was before).
This fixes a recently introduced miscompile.
rdar://problem/70220876
1. Do a better alias analysis for "function-local" objects, like alloc_stack and inout parameters
2. Fully support try_apply and begin/end/abort_apply
So far we fully relied on escape analysis. But escape analysis has some shortcomings with SIL address-types.
Therefore, handle two common cases, alloc_stack and inout parameters, with alias analysis.
This gives better results.
The biggest change here is to do a quick check if the address escapes via an address_to_pointer instructions.
RCIdentityAnalysis must not look through casts which cast from a trivial type, like a metatype, to something which is retainable, like an AnyObject.
On some platforms such casts dynamically allocate a ref-counted box for the metatype.
Now, if the RCRoot of such an AnyObject would be a trivial type, ARC optimizations get confused and might eliminate a retain of such an object completely.
rdar://problem/69900051
A key concept in late ARC optimization is "RC Identity". In short, a result of
an instruction is rc-identical to an operand of the instruction if one can
safely move a retain (release) from before the instruction on the result to one
after on the operand without changing the program semantics. This creates a
simple model where one can work on equivalence classes of rc-identical values
(using a dominating definition generally as the representative) and thus
optimize/pair retain, release.
When preparing for late ARC optimization, the optimizer will normalize aggregate
ARC operations (retain_value, release_value) into singular strong_retain,
strong_release operations on leaf types of the aggregate that are
non-trivial. As an example, a retain_value on a KlassPair would be canonicalized
into two strong_retain, one for the lhs and one for the rhs. When this is done,
the optimizer generally just creates new struct_extract at the point where the
retain is. In such a case, we may have that the debug_value for the underlying
type is actually on a reformed aggregate whose underlying parts we are
retaining:
```
bb0(%0 : $Builtin.NativeObject):
strong_retain %0
%1 = struct $Array(%0 : $Builtin.NativeObject, ...)
debug_value %1 : $Array, ...
```
By looking through RC identical uses, we can handle a large subset of these
cases without much effort: ones were there is a single owning pointer like Array.
To handle more complex cases we would have to calculate an inverse access path needed to get
back to our value and somehow deal with all of the complexity therein (I am sure
we can do it I just haven't thought through all of the details).
The only interesting behavior that this results in is that when we emit
diagnostics, we just use the rc-identical transitive use debug_value's name
without a projection path. This is because the source location associated with
that debug_value is with a separate value that is rc-identical to the actual
value that we visited during our opt-remark traversal up the def-use
graph. Consider the following example below, noting the comments that show in
the SIL itself what I attempted to explain above.
```
struct KlassPair {
var lhs: Klass
var rhs: Klass
}
struct StateWithOwningPointer {
var state: TrivialState
var owningPtr: Klass
}
sil @theFunction : $@convention(thin) () -> () {
bb0:
%0 = apply %getKlassPair() : $@convention(thin) () -> @owned KlassPair
// This debug_value's name can be combined...
debug_value %0 : $KlassPair, name "myPair"
// ... with the access path from the struct_extract here...
%1 = struct_extract %0 : $KlassPair, #KlassPair.lhs
// ... to emit a nice diagnostic that 'myPair.lhs' is being retained.
strong_retain %1 : $Klass
// In contrast in the case below, we rely on looking through rc-identity uses
// to find the debug_value. In this case, the source info associated with the
// debug_value (%2) is no longer associated with the underlying access path we
// have been tracking upwards (%1 is in our access path list). Instead, we
// know that the debug_value is rc-identical to whatever value we were
// originally tracking up (%1) and thus the correct identifier to use is the
// direct name of the identifier alone (without access path) since that source
// identifier must be some value in the source that by itself is rc-identical
// to whatever is being manipulated. Thus if we were to emit the access path
// here for na rc-identical use we would get "myAdditionalState.owningPtr"
// which is misleading since ArrayWrapperWithMoreState does not have a field
// named 'owningPtr', its subfield array does. That being said since
// rc-identity means a retain_value on the value with the debug_value upon it
// is equivalent to the access path value we found by walking up the def-use
// graph from our strong_retain's operand.
%0a = apply %getStateWithOwningPointer() : $@convention(thin) () -> @owned StateWithOwningPointer
%1 = struct_extract %0a : $StateWithOwningPointer, #StateWithOwningPointer.owningPtr
strong_retain %1 : $Klass
%2 = struct $Array(%0 : $Builtin.NativeObject, ...)
%3 = struct $ArrayWrapperWithMoreState(%2 : $Array, %moreState : MoreState)
debug_value %2 : $ArrayWrapperWithMoreState, name "myAdditionalState"
}
```
Functions that do not have a return, and instead end with 'unreachable'
due to NoReturnFolding will not have a ReturnNode in the connection
graph.
A caller calling a no return function then may not have a CGNode
corresponding to the call's result.
Fix the ConnectionGraph's verifier so we don't assert in such cases.
* Remove NewInsts from ARCSequenceOpts
* Remove more instances of InsertPts
* Address comments from #33504
* Make bottom up loop traversal simpler. Use better apis
* Update LoopRegion printer with more info
Distinguish ref_tail_addr storage from the other storage classes.
We didn't have this originally because be don't expect a begin_access
to directly operate on tail storage. It could occur after inlining, at
least with static access markers. More importantly it helps ditinguish
regular formal accesses from other unidentified access, so we probably
should have always had this.
At any rate, it's particularly important when AccessedStorage is
generalized to arbitrary memory access.
The immediate motivation is to add an AccessPath utility, which will
need to distinguish tail storage.
In the process, rewrite AccessedStorage::isDistinct. This could have a
large positive impact on exclusivity performance.
For use outside access enforcement passes.
Add isUniquelyIdentifiedAfterEnforcement.
Rename functions for clarity and generality.
Rename isUniquelyIdentifiedOrClass to isFormalAccessBase.
Rename findAccessedStorage to identifyFormalAccess.
Rename findAccessedStorageNonNested to findAccessedStorage.
Part of generalizing the utility for use outside the access
enforcement passes.
Currently we only have load [take] in OSSA which needed to be changed.
(copy_addr is not handled in MemBehavior at all, yet)
Even if the memory is physically not modified, conceptually it's "destroyed" when the value is taken.
Optimizations, like TempRValueOpt rely on this behavior when the check for may-writes.
This fixes a MemoryLifetime failure in TempRValueOpt.
We were not using the primary benefits of an intrusive list, namely the
ability to insert or remove from the middle of the list, so let's switch
to a plain vector. This also avoids linked-list pointer chasing.
This fixes a correctness issue.
The begin_cow_mutation instruction has dependencies with instructions which retain the buffer operand.
This prevents optimizations from moving begin_cow_mutation instructions across such retain instructions.
`DifferentiableFunctionInst` now stores result indices.
`SILAutoDiffIndices` now stores result indices instead of a source index.
`@differentiable` SIL function types may now have multiple differentiability
result indices and `@noDerivative` resutls.
`@differentiable` AST function types do not have `@noDerivative` results (yet),
so this functionality is not exposed to users.
Resolves TF-689 and TF-1256.
Infrastructural support for TF-983: supporting differentiation of `apply`
instructions with multiple active semantic results.
Support differentiation of `is` and `as?` operators.
These operators lower to branching cast SIL instructions, requiring control
flow differentiation support.
Resolves SR-12898.
Used to "finalize" an array literal. It's not used, yet. So this is NFC.
Also handle the "array.finalize_intrinsic" function in various array specific optimizations.
If the only use of an upcast, unchecked_ref_cast or end_cow_mutation is a destroy/release, just destroy the operand and remove the cast/end_cow_mutation.
This simplifies the handling of the subdirectories in the SIL and
SILOptimizer paths. Create individual libraries as object libraries
which allows the analysis of the source changes to be limited in scope.
Because these are object libraries, this has 0 overhead compared to the
previous implementation. However, string operations over the filenames
are avoided. The cost for this is that any new sub-library needs to be
added into the list rather than added with the special local function.
* a new [immutable] attribute on ref_element_addr and ref_tail_addr
* new instructions: begin_cow_mutation and end_cow_mutation
These new instructions are intended to be used for the stdlib's COW containers, e.g. Array.
They allow more aggressive optimizations, especially for Array.
The PassManager should transform all functions in bottom up order.
This is necessary because when optimizations like inlining looks at the
callee function bodies to compute profitability, the callee functions
should have already undergone optimizations to get better profitability
estimates.
The PassManager builds its function worklist based on bottom up order
on initialization. However, newly created SILFunctions due to
specialization etc, are simply appended to the function worklist. This
can cause us to make bad inlining decisions due to inaccurate
profitability estimates. This change now updates the function worklist such
that, all the callees of the newly added SILFunction are proccessed
before it by the PassManager.
Fixes rdar://52202680
This became necessary after recent function type changes that keep
substituted generic function types abstract even after substitution to
correctly handle automatic opaque result type substitution.
Instead of performing the opaque result type substitution as part of
substituting the generic args the underlying type will now be reified as
part of looking at the parameter/return types which happens as part of
the function convention apis.
rdar://62560867
Move differentiation-related SILOptimizer files to
{include/swift,lib}/SILOptimizer/Differentiation/.
This reduces directory nesting and gathers files together.
Potentially source breaking: SR-11700 Diagnose exclusivity violations
with Dictionary.subscript._modify:
Exclusivity violations within code that computes the `default`
argument during Dictionary access are now diagnosed.
```swift
struct Container {
static let defaultKey = 0
var dictionary = [defaultKey:0]
mutating func incrementValue(at key: Int) {
dictionary[key, default: dictionary[Container.defaultKey]!] += 1
}
}
error: overlapping accesses to 'self.dictionary', but modification requires exclusive access; consider copying to a local variable
dictionary[key, default: dictionary[Container.defaultKey]!] += 1
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: conflicting access is here
dictionary[key, default: dictionary[Container.defaultKey]!] += 1
~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
```
This reworks the logic so that four problems end up being fixed:
Fixes three problems related to coroutines:
(1) DiagnoseStaticExclusivity must consider begin_apply as a user of accessed variables. This was an undefined behavior hole in the diagnostics.
(2) AccessedSummaryAnalysis should consider begin_apply as a user of accessed arguments. This does not show up in practice because coroutines don't capture things.
(3) AccessedSummaryAnalysis must consider begin_apply a valid user of
noescape closures.
And fixes one problem related to resilience:
(4) AccessedSummaryAnalysis must conservatively consider arguments to external functions.
Fixes <rdar://problem/56378713> Investigate why AccessSummaryAnalysis is crashing
MSVC does not realize that the switch is exhaustive and requires that
the path is explicitly marked as unreachable. This silences the C4715
warning ("not all control paths return a value").
Add a private scratch context to the ASTContext and allow IntrinsicInfo sole access to it so it can allocate attributes into it. This removes the final dependency on the global context.
For functions which results in > 10000 nodes, just bail and don't compute the connection graph.
The node merging algorithm is quadratic and can result in significant compile times for very large functions.
rdar://problem/56268570