The main changes are:
*) Rewrite everything in swift. So far, parts of memory-behavior analysis were already implemented in swift. Now everything is done in swift and lives in `AliasAnalysis.swift`. This is a big code simplification.
*) Support many more instructions in the memory-behavior analysis - especially OSSA instructions, like `begin_borrow`, `end_borrow`, `store_borrow`, `load_borrow`. The computation of end_borrow effects is now much more precise. Also, partial_apply is now handled more precisely.
*) Simplify and reduce type-based alias analysis (TBAA). The complexity of the old TBAA comes from old days where the language and SIL didn't have strict aliasing and exclusivity rules (e.g. for inout arguments). Now TBAA is only needed for code using unsafe pointers. The new TBAA handles this - and not more. Note that TBAA for classes is already done in `AccessBase.isDistinct`.
*) Handle aliasing in `begin_access [modify]` scopes. We already supported truly immutable scopes like `begin_access [read]` or `ref_element_addr [immutable]`. For `begin_access [modify]` we know that there are no other reads or writes to the access-address within the scope.
*) Don't cache memory-behavior results. It turned out that the hit-miss rate was pretty bad (~ 1:7). The overhead of the cache lookup took as long as recomputing the memory behavior.
Although I don't plan to bring over new assertions wholesale
into the current qualification branch, it's entirely possible
that various minor changes in main will use the new assertions;
having this basic support in the release branch will simplify that.
(This is why I'm adding the includes as a separate pass from
rewriting the individual assertions)
This allows one to place types like PointerIntPair and value types that wrap a
pointer into the pointer set.
I am making this change before I use this data structure in TransferNonSendable
and using ARC to validate that I haven't broken anything.
This removes the ambiguity when casting from a SingleValueInstruction to SILNode, which makes the code simpler. E.g. the "isRepresentativeSILNode" logic is not needed anymore.
Also, it reduces the size of the most used instruction class - SingleValueInstruction - by one pointer.
Conceptually, SILInstruction is still a SILNode. But implementation-wise SILNode is not a base class of SILInstruction anymore.
Only the two sub-classes of SILInstruction - SingleValueInstruction and NonSingleValueInstruction - inherit from SILNode. SingleValueInstruction's SILNode is embedded into a ValueBase and its relative offset in the class is the same as in NonSingleValueInstruction (see SILNodeOffsetChecker).
This makes it possible to cast from a SILInstruction to a SILNode without knowing which SILInstruction sub-class it is.
Casting to SILNode cannot be done implicitly, but only with an LLVM `cast` or with SILInstruction::asSILNode(). But this is a rare case anyway.
This removes the ambiguity when casting from a SingleValueInstruction to SILNode, which makes the code simpler. E.g. the "isRepresentativeSILNode" logic is not needed anymore.
Also, it reduces the size of the most used instruction class - SingleValueInstruction - by one pointer.
Conceptually, SILInstruction is still a SILNode. But implementation-wise SILNode is not a base class of SILInstruction anymore.
Only the two sub-classes of SILInstruction - SingleValueInstruction and NonSingleValueInstruction - inherit from SILNode. SingleValueInstruction's SILNode is embedded into a ValueBase and its relative offset in the class is the same as in NonSingleValueInstruction (see SILNodeOffsetChecker).
This makes it possible to cast from a SILInstruction to a SILNode without knowing which SILInstruction sub-class it is.
Casting to SILNode cannot be done implicitly, but only with an LLVM `cast` or with SILInstruction::asSILNode(). But this is a rare case anyway.
ARCSequenceOpts with loop support works on regions inside out.
While processing an outer region, it uses summary from
the inner loop region to compute RefCountState transitions used for
CodeMotionSafe optimization.
We do not compute the loop summary by iterating over it twice or
iterating until a fixed point is reached.
Instead loop summary is just a list of refcount effecting instructions.
And BottomUpRefCountState::updateForDifferentLoopInst and
TopDownRefCountState::updateForDifferentLoopInst are used to compute the
RefCountState transition when processing the inner loop region. These
functions are very conservative and the flow sensitive nature of the
summarized instructions do no matter.
This PR adds a verifying assert to confirm this.
While doing bottom up dataflow, if we encounter an
unmatched retain instruction, that can pair with a 'KnownSafe'
already visited release instruction, we turn off KnownSafety if the two
RCIdentities mayAlias.
This is done in BottomUpRefCountState::checkAndResetKnownSafety.
In order to determine if a retain is umatched, we look at
IncToDecStateMap. If a retain was matched during bottom up dataflow, it
is always found in IncToDecStateMap with value of the matched release's
BottomUpRefCountState.
Similarly, during top down dataflow, if we encounter an unmatched
release instruction, that can pair with a 'KnownSafe' already
visited retain instruction, we turn off KnownSafety if the two RCIdentities
mayAlias.
This is done in TopDownRefCountState::checkAndResetKnownSafety.
In order to determine if a release is umatched, we look at
DecToIncStateMap. If a release was matched during top down dataflow, it
is always found in DecToIncStateMap with value of the matched retain's
TopDownRefCountState.
For ARCLoopOpts, during bottom up and top down traversal of a region with
a nested loop, we find if the retain/release in the loop summary was
matched or not by looking at the persistent RefCountInstToMatched map.
This map is populated when processing the nested loop region from the
IncToDecStateMap/DecToStateMap which gets thrown away after the loop
region is processed.
This fixes the bugs in both ARCSequenceOpts without loop
support and with loop support.
* Remove NewInsts from ARCSequenceOpts
* Remove more instances of InsertPts
* Address comments from #33504
* Make bottom up loop traversal simpler. Use better apis
* Update LoopRegion printer with more info
Historically TermInsts were handled specially while visiting blocks in
bottom up order. TermInsts were not visited while traversing a block.
They were visited while traversing successors, and the most conservative
terminator of all predecessors would affect the refcount state in the
dataflow.
This was needed because ARCSequenceOpts also computed 'insertion points'
for the purposes of ARC code motion. ARC code motion was then removed
from ARCSequenceOpts and this code remained unchanged.
With this change, arc significant terminators are handled like all other
instructions while processing basic blocks bottom up.
Also updateForSameLoopInst, updateForDifferentLoopInst,
updateForPredTerminators all serve similar purpose with subtle differences.
This change removes some code duplication due to this.
Remove code duplication of isARCSignificantTerminator. Create
ARCSequenceOptUtils for common utils used in ARCSequenceOpts
@guaranteed args are treated conservatively in ARCSequenceOpts.
A function call with a @guaranteed arg is assumed to decrement refcount
of the @guaranteed arg. With this change, use
swift::mayDecrementRefCount which uses EscapeAnalysis to determine if we
need to transition conservatively at the instruction being visited.
For a long time, we have:
1. Created methods on SILArgument that only work on either function arguments or
block arguments.
2. Created code paths in the compiler that only allow for "function"
SILArguments or "block" SILArguments.
This commit refactors SILArgument into two subclasses, SILPHIArgument and
SILFunctionArgument, separates the function and block APIs onto the subclasses
(leaving the common APIs on SILArgument). It also goes through and changes all
places in the compiler that conditionalize on one of the forms of SILArgument to
just use the relevant subclass. This is made easier by the relevant APIs not
being on SILArgument anymore. If you take a quick look through you will see that
the API now expresses a lot more of its intention.
The reason why I am performing this refactoring now is that SILFunctionArguments
have a ValueOwnershipKind defined by the given function's signature. On the
other hand, SILBlockArguments have a stored ValueOwnershipKind. Rather than
store ValueOwnershipKind in both instances and in the function case have a dead
variable, I decided to just bite the bullet and fix this.
rdar://29671437
As promised, we separate the duty of moving retain release pairs with the
task of removing them. Now the task of moving retains and releases are in
Retain Release Code Motion committed in 51b1c0bc68.
We were using a stripCast in some places and getRCIdentityRoot in others.
stripCasts is not identical to getRCIdentityRoot.
In particular, it does not look through struct_extract, tuple_extract,
unchecked_enum_data.
Created a struct and tuple test cases for make sure things are optimized
as they should be.
We have test case for unchecked_enum_data before.
Tested via static assert.
There is no reason for these data structures to not have these properties.
Adding these properties will improve the compile time efficiency of ARC by
allowing for cheaper copying and 0 cost destruction.
This speeds and reduces memory consumption of test cases with large
CFGs. The specific test case that spawned this fix was a large function
with many dictionary assignments:
public func func_0(dictIn : [String : MyClass]) -> [String : MyClass] {
var dictOut : [String : MyClass] = [:]
dictOut["key5000"] = dictIn["key500"]
dictOut["key5010"] = dictIn["key501"]
dictOut["key5020"] = dictIn["key502"]
dictOut["key5030"] = dictIn["key503"]
dictOut["key5040"] = dictIn["key504"]
...
}
This continued for 10k - 20k values.
This commit reduces the compile time by 2.5x and reduces the amount of
memory allocated by ARC by 2.6x (the memory allocation number includes
memory that is subsequently freed).
rdar://24350646
Previously, we relied on a quirk in the ARC optimizer so that we only
need to visit terminators top down. This simplified the dataflow. Sadly,
try_apply changes this since it is a terminator that provides a call
with the value, causing this assumption to break program correctness.
Now during the bottom up traversal, while performing the dataflow for a
block B, we (after visiting all instructions), visit B's predecessors to
see if any of them have a terminator that is a use or decrement. We then
take the most conservative result among all of the terminators and
advance the sequence accordingly.
I do not think that we can have multiple such predecessors today since all
interesting terminators can not have any critical edges to successors. Thus if
our block is a successor of any such block, it can not have any other
predecessors. This is mainly for future proofing if we decide that this is able
to be done in the future.
rdar://23853221
SR-102
(libraries now)
It has been generally agreed that we need to do this reorg, and now
seems like the perfect time. Some major pass reorganization is in the
works.
This does not have to be the final word on the matter. The consensus
among those working on the code is that it's much better than what we
had and a better starting point for future bike shedding.
Note that the previous organization was designed to allow separate
analysis and optimization libraries. It turns out this is an
artificial distinction and not an important goal.