The JumpThreadingCost map in Simplify CFG is used to prevent infinite jump threading loops.
There was a missing update of the cost for blocks which are cloned:
Jump threading loops were prevented for infinitely cloning the original block, but not for re-cloning the cloned block.
A test case is already added in 8948f7565a
rdar://73357726, [SR-14068]
rdar://73644659 (Add a safeguard to SimplifyCFG tryJumpThreading to avoid infinite loop peeling)
A case of infinite loop peeling was exposed recently:
([SR-14068]: Compiling with optimisation runs indefinitely for grpc-swift)
It was trivially fixed here:
---
commit 8948f7565a (HEAD -> fix-simplifycfg-tramp, public/fix-simplifycfg-tramp)
Author: Andrew Trick <atrick@apple.com>
Date: Tue Jan 26 17:02:37 2021
Fix a SimplifyCFG typo that leads to unbounded optimization
---
However, that fix isn't a strong guarantee against this behavior. The
obvious complete fix is that jump-threading should not affect loop
structure. But changing that requires a performance investigation. In
the meantime this change introduces a simple mechanism that guarantees
that a loop header is not repeatedly cloned.
This safeguard is worthwhile because jump-threading across loop
boundaries is kicking in more frequently now the critical edges are
being split within SimplifyCFG.
Note that it is both necessary and desirable to split critical edges
between transformations so that SIL remains in a valid state. That
allows other code in SimplifyCFG to call arbitrary SIL utilities,
allows verifying SimplifyCFG by running verification between
transformation, and simplifies the patters that SimplifyCFG itself
needs to consider.
Fixes rdar://73357726 ([SR-14068]: Compiling with optimisation runs
indefinitely for grpc-swift)
The root cause of this problem is that SimplifyCFG::tryJumpThreading
jump threads into loops, effectively peeling loops. This is not the
right way to implement loop peeling. That belongs in a loop
optimization pass. There's is simply no sane way to control jump
threading if it is allowed across loop boundaries, both from the
standpoint of requiring optimizations to terminate and from the
standpoint of reducing senseless code bloat.
SimplifyCFG does have a mechanism to avoid jump-threading into loop in
most cases. That mechanism would actually prevent the infinite loop
peeling in this particular case if it were implemented correctly. But
the original implementation circa 2014 appears to have a typo.
This commit fixes that obvious bug. I do not think it's a sufficient
to ensure we never see the bad behavior. I will file separate bugs for
the broader issue.
This bad behavior was exposed incidentally by splitting critical
edges. Without edge splitting, SimplifyCFG::simplifyBlocks only
performs "jump threading" once, creating a critical edge to the loop
header. Because simplifyBlocks works under the assumption that there
are no critical edges, it never attempts to perform jump threading
again. In other words, the presence of the critical edge "breaks" the
optimization, preventing it from continuing as intended.
With edge splitting, the simplifyBlocks worklist performs "jump
threading" followed by "jump to trampoline" removal, which creates a
new loop-back edge to the original loop header. This is fine. However,
simplifyBlocks iteratively attempts all optimizations up to a fix
point and it does not stop at loop headers! So, splitting the critical
edge causes simplifyBlocks to work as intended, which leads to
infinite loop peeling. The end result is an infinite sequence of
nested loops. Each peeled iteration is actually within the parent
loop.
This removes the ambiguity when casting from a SingleValueInstruction to SILNode, which makes the code simpler. E.g. the "isRepresentativeSILNode" logic is not needed anymore.
Also, it reduces the size of the most used instruction class - SingleValueInstruction - by one pointer.
Conceptually, SILInstruction is still a SILNode. But implementation-wise SILNode is not a base class of SILInstruction anymore.
Only the two sub-classes of SILInstruction - SingleValueInstruction and NonSingleValueInstruction - inherit from SILNode. SingleValueInstruction's SILNode is embedded into a ValueBase and its relative offset in the class is the same as in NonSingleValueInstruction (see SILNodeOffsetChecker).
This makes it possible to cast from a SILInstruction to a SILNode without knowing which SILInstruction sub-class it is.
Casting to SILNode cannot be done implicitly, but only with an LLVM `cast` or with SILInstruction::asSILNode(). But this is a rare case anyway.
Add support for interleaved borrow scopes:
%b1 = begin_borrow %a
%c = copy
%b2 = begin_borrow %b1
end_borrow %b1
use %c
end_borrow %b2
Will be transformed to:
%c = copy %a
%b1 = begin_borrow %a
%b2 = begin_borrow %b1
end_borrow %b1
use %c
end_borrow %b2
This was the original intention but the implementation was incomplete.
This option can be enabled as soon as we need it for performance.
The intention is also to handle multi-block borrows, but that hasn't
been implemented.
I am trying to be more careful about this rather than letting someone make this
mistake in the future. I also added some comments and a DeadEndBlocks instance
to TempRValueElimination so that it gets full simplifications in OSSA.
In OSSA, we enforce that addresses from interior pointer instructions are scoped
within a borrow scope. This means that it is invalid to use such an address
outside of its parent borrow scope and as a result one can not just RAUW an
address value by a dominating address value since the latter may be invalid at
the former. I foresee that I am going to have to solve this problem and so I
decided to write this API to handle the vast majority of cases.
The way this API works is that it:
1. Computes an access path with base for the new value. If we do not have a base
value and a valid access path with root, we bail.
2. Then we check if our base value is the result of an interior pointer
instruction. If it isn't, we are immediately done and can RAUW without further
delay.
3. If we do have an interior pointer instruction, we see if the immediate
guaranteed value we projected from has a single borrow introducer value. If not,
we bail. I think this is reasonable since with time, all guaranteed values will
always only have a single borrow introducing value (once struct, tuple,
destructure_struct, destructure_tuple become reborrows).
4. Then we gather up all inner uses of our access path. If for some reason that
fails, we bail.
5. Then we see if all of those uses are within our borrow scope. If so, we can
RAUW without any further worry.
6. Otherwise, we perform a copy+borrow of our interior pointer's operand value
at the interior pointer, create a copy of the interior pointer instruction upon
this new borrow and then RAUW oldValue with that instead. By construction all
uses of oldValue will be within this new interior pointer scope.
The reason why I am doing this is that I am building up more utilities based on
passing around this struct of context that do not want it for RAUWing
purposes. So it makes sense on a helper (OwnershipRAUWHelper) that composes with
its state.
Just a refactor, should be NFC.
Currently all of these places in the code base perform simplifyInstruction and
then a replaceAllSimplifiedUsesAndErase(...). This is a bad pattern since:
1. simplifyInstruction assumes its result will be passed to
replaceAllSimplifiedUsesAndErase. So by leaving these as separate things, we
allow for users to pointlessly make this mistake.
2. I am going to implement in a subsequent commit a utility that lifetime
extends interior pointer bases when replacing an address with an interior
pointer derived address. To do this efficiently, I want to reuse state I
compute during simplifyInstruction during the actual RAUW meaning that if the
two operations are split, that is difficult without extending the API. So by
removing this, I can make the transform and eliminate mistakes at the same
time.
Access scopes for enforcing exclusivity are currently the only
exception to our ability to canonicalize OSSA lifetime purely based on
the SSA value's known uses. This is because access scopes have
semantics relative to object deinitializers.
In general, deinitializers are asynchronous with respect to code that
is unrelated to the object's uses. Ignoring exclusivity, the optimizer
may always destroy objects as early as it wants, as long as the object
won't be used again. The optimizer may also extend the lifetime
(although in the future this lifetime extension should be limited by
"synchronization points").
The optimizer's freedom is however limited by exclusivity
enforcement. Optimization may never introduce new exclusivity
violations. Destroying an object within an access scope is an
exclusivity violation if the deinitializer accesses the same variable.
To handle this, OSSA canonicalization must detect access scopes that
overlap with the end of the pruned extended lifetime. Essentially:
%def
begin_access // access scope unrelated to def
use %def // pruned liveness ends here
end_access
destroy %def
Support for access scopes composes cleanly with the existing algorithm
without adding significant cost in the usual case. Overlapping access
scopes are unusual. A single CFG walk within the original extended
lifetime is normally sufficient. Only the blocks that are not already
LiveOut in the pruned liveness need to be visited. During this walk,
local overlapping access are detected by scanning for end_access
instructions after the last use point. Global overlapping accesses are
detected by checking NonLocalAccessBlockAnalysis. This avoids scanning
instructions in the common case. NonLocalAccessBlockAnalysis is a
trivial analysis that caches the rare occurence of nonlocal access
scopes. The analysis itself is a single linear scan over the
instruction stream. This analysis can be preserved across most
transformations and I expect it to be used to speed up other
optimizations related to access marker.
When an overlapping access is detected, pruned liveness is simply
extended to include the end_access as a new use point. Extending the
lifetime is iterative, but with each iteration, blocks that are now
marked LiveOut no longer need to be visited. Furthermore, interleaved
accessed scopes are not expected to happen in practice.