Andy some time ago already created the new API but didn't go through and update
the old occurences. I did that in this PR and then deprecated the old API. The
tree is clean, so I could just remove it, but I decided to be nicer to
downstream people by deprecating it first.
This seems to compile correctly in release builds. But it does go
through an llvm_unreachable path, so really isn't safe to leave unfixed.
When the accessPath has an offset, propagate it through the recursive
calls. This may have happened when the offset was moved outside of the
sequence of path indices. The code rebuilds a path from the indices
without adding back the offset.
LICM asserts during projectLoadValue when it needs to rematerialize a
loaded value within the loop using projections and the loop-invariant
address is an index_addr.
Basically:
%a = index_addr %4 : $*Wrapper, %1 : $Builtin.Word
store %_ to %a : $*Wrapper
br loop:
loop:
%f = struct_element_addr %a
%v = load %f : $Value
%s = struct $Wrapper (%v : $Value)
store %s to %a : $*Wrapper
Where the store inside the loop is deleted. And where the load is
hoisted out of the loop, but now loads $Wrapper instead of $Value.
Fixes rdar://92191909 (LICM assertion: isSubObjectProjection(), MemAccessUtils.h, line 1069)
A guaranteed value produced by a begin_borrow can't be both used as
an operand of an ownership forwarding-instruction and also reborrowed by
being used as a phi argument.
Avoid that by stopping rotation when encountering a header block
containing an ownership-forwarding instruction whose forwarded ownership
kind is guaranteed; such a rotation may result in using both the
original guaranteed value and the resulting guaranteed value as phi
arguments.
AccessPathWithBase::compute can return a valid access path with unidentified base.
In such cases, we cannot LICM stores, because there is no base address to check if it is invariant
Fix innumerable latent bugs with iterator invalidation and callback invocation.
Removes dead code earlier and chips away at all the redundant copies the compiler generates.
Instead of caching alias results globally for the module, make AliasAnalysis a FunctionAnalysisBase which caches the alias results per function.
Why?
* So far the result caches could only grow. They were reset when they reached a certain size. This was not ideal. Now, they are invalidated whenever the function changes.
* It was not possible to actually invalidate an alias analysis result. This is required, for example in TempRValueOpt and TempLValueOpt (so far it was done manually with invalidateInstruction).
* Type based alias analysis results were also cached for the whole module, while it is actually dependent on the function, because it depends on the function's resilience expansion. This was a potential bug.
I also added a new PassManager API to directly get a function-base analysis:
getAnalysis(SILFunction *f)
The second change of this commit is the removal of the instruction-index indirection for the cache keys. Now the cache keys directly work on instruction pointers instead of instruction indices. This reduces the number of hash table lookups for a cache lookup from 3 to 1.
This indirection was needed to avoid dangling instruction pointers in the cache keys. But this is not needed anymore, because of the new delayed instruction deletion mechanism.
This is a quick fix for a stack overflow in case of very large functions.
TODO: Ideally this algorithm would be implemented as an iterative worklist algorithm.
rdar://77563057
Generalize the AccessUseDefChainCloner in MemAccessUtils. It was
always meant to work this way, just needed a client.
Add a new API AccessUseDefChainCloner::canCloneUseDefChain().
Add a bailout for begin_borrow and mark_dependence. Those
projections may appear on an access path, but they can't be
individually cloned without compensating.
Delete InteriorPointerAddressRebaseUseDefChainCloner.
Add a check in OwnershipRAUWHelper for canCloneUseDefChain.
Add test cases for begin_borrow and mark_dependence.
Saves a bunch of compile time when compiling the stdlib. Specifically, when
compiling the iOS 32 bit stdlib + sil-verify-all, this shaves off ~20% of the
compile time.
In OSSA, we do not allow for address phis, but in certain cases the logic of
LoopRotate really wants them. To work around this issue, I added some code in
this PR to loop rotate that as a post-pass fixes up any address phis by
inserting address <-> raw pointer adapters and changing the address phi to
instead be of raw pointer type.
Additional handling of copy_value/destroy_value/load[copy]/
begin_borrow/end_borrow is needed to support OSSA.
TODO: Support handling of 2d array in the pass for OSSA.
Currently hoisting of loads and borrows are not supported in OSSA
This pass generated incorrect borrow scopes:
%stack = alloc_stack
%borrow = begin_borrow %element
store_borrow %borrow to %stack
end_borrow %borrow
try_apply %f(%stack) normal bb1, error bb2
...
destroy_value %element
This was not showing up as a miscompile before because:
- an array holds an extra copy of the unrolled elements, that array is
now being optimized away completely.
- CopyPropagation now canonicalizes OSSA lifetimes independent of
unrelated program side effects.
So, since there is no explicit relationship between %borrow and the
OSSA value in %stack, we end up with:
%stack = alloc_stack
%borrow = begin_borrow %element
store_borrow %borrow to %stack
end_borrow %borrow
destroy_value %element
try_apply %f(%stack) normal bb1, error bb2
Fixes rdar://72904101 ([CanonicalOSSA] Fix ForEachLoopUnroll use-after-free miscompile.)
For combined load-store hoisting, split loads that contain the
loop-stored value into a single load from the same address as the
loop-stores, and a set of loads disjoint from the loop-stores. The
single load will be hoisted while sinking the stores to the same
address. The disjoint loads will be hoisted normally in a subsequent
iteration on the same loop.
loop:
load %outer
store %inner1
exit:
Will be split into
loop:
load %inner1
load %inner2
store %inner1
exit:
Then, combined load/store hoisting will produce:
load %inner1
loop:
load %inner2
exit:
store %inner1
The LICM algorithm was not robust with respect to address projection
because it identifies a projected address by its SILValue. This should
never be done! Use AccessPath instead.
Fixes regressions caused by rdar://66791257 (Print statement provokes
"Can't unsafeBitCast between types of different sizes" when
optimizations enabled)
Pass the DomTree into SILCloner so that edge splitting
properly creates new domtree nodes.
The confusing edge splitting code that was in ArrayPropertyOpt is no
longer needed.
We could cleanup ArrayPropertyOpt even more by moving fixDomTree
into SILCloner. But there's already too much going on in this patch.
to check for improperly nested '@_semantic' functions.
Add a missing @_semantics("array.init") in ArraySlice found by the
diagnostic.
Distinguish between array.init and array.init.empty.
Categorize the types of semantic functions by how they affect the
inliner and pass pipeline, and centralize this logic in
PerformanceInlinerUtils. The ultimate goal is to prevent inlining of
"Fundamental" @_semantics calls and @_effects calls until the late
pipeline where we can safely discard semantics. However, that requires
significant pipeline changes.
In the meantime, this change prevents the situation from getting worse
and makes the intention clear. However, it has no significant effect
on the pass pipeline and inliner.
Add AccesssedStorage::compute and computeInScope to mirror AccessPath.
Allow recovering the begin_access for Nested storage.
Adds AccessedStorage.visitRoots().
Things that have come up recently but are somewhat blocked on this:
- Moving AccessMarkerElimination down in the pipeline
- SemanticARCOpts correctness and improvements
- AliasAnalysis improvements
- LICM performance regressions
- RLE/DSE improvements
Begin to formalize the model for valid memory access in SIL. Ignoring
ownership, every access is a def-use chain in three parts:
object root -> formal access base -> memory operation address
AccessPath abstracts over this path and standardizes the identity of a
memory access throughout the optimizer. This abstraction is the basis
for a new AccessPathVerification.
With that verification, we now have all the properties we need for the
type of analysis requires for exclusivity enforcement, but now
generalized for any memory analysis. This is suitable for an extremely
lightweight analysis with no side data structures. We currently have a
massive amount of ad-hoc memory analysis throughout SIL, which is
incredibly unmaintainable, bug-prone, and not performance-robust. We
can begin taking advantage of this verifably complete model to solve
that problem.
The properties this gives us are:
Access analysis must be complete over memory operations: every memory
operation needs a recognizable valid access. An access can be
unidentified only to the extent that it is rooted in some non-address
type and we can prove that it is at least *not* part of an access to a
nominal class or global property. Pointer provenance is also required
for future IRGen-level bitfield optimizations.
Access analysis must be complete over address users: for an identified
object root all memory accesses including subobjects must be
discoverable.
Access analysis must be symmetric: use-def and def-use analysis must
be consistent.
AccessPath is merely a wrapper around the existing accessed-storage
utilities and IndexTrieNode. Existing passes already very succesfully
use this approach, but in an ad-hoc way. With a general utility we
can:
- update passes to use this approach to identify memory access,
reducing the space and time complexity of those algorithms.
- implement an inexpensive on-the-fly, debug mode address lifetime analysis
- implement a lightweight debug mode alias analysis
- ultimately improve the power, efficiency, and maintainability of
full alias analysis
- make our type-based alias analysis sensistive to the access path
Change ProjectionIndex for ref_tail_addr to std::numeric_limits<int>::max();
This is necessary to disambiguate the tail elements from
ref_element_addr field zero.