This directly adds support to BasicBlockCloner for updating OSSA.
It also adds a much more general-purpose GuaranteedPhiBorrowFixup
utility which can be used for more complicated SSA updates, in which
multiple phis need to be created. More generally, it handles adding
nested borrow scopes for guaranteed phis even when that phi is used by
other guaranteed phis.
Refactor SILGen's ApplyOptions into an OptionSet, add a
DoesNotAwait flag to go with DoesNotThrow, and sink it
all down into SILInstruction.h.
Then, replace the isNonThrowing() flag in ApplyInst and
BeginApplyInst with getApplyOptions(), and plumb it
through to TryApplyInst as well.
Set the flag when SILGen emits a sync call to a reasync
function.
When set, this disables the SIL verifier check against
calling async functions from sync functions.
Finally, this allows us to add end-to-end tests for
rdar://problem/71098795.
Enable most simplify-cfg optimizations as long as the block arguments
have trivial types. Enable most simplify CFG unit tests cases.
This massively reduces the size of the CFG during OSSA passes.
Test cases that aren't supported in OSSA yet have been moved to a
separate test file for disabled OSSA tests,
Full simplify-cfg support is currently blocked on OSSA utilities which
I haven't checked in yet.
The JumpThreadingCost map in Simplify CFG is used to prevent infinite jump threading loops.
There was a missing update of the cost for blocks which are cloned:
Jump threading loops were prevented for infinitely cloning the original block, but not for re-cloning the cloned block.
A test case is already added in 8948f7565a
rdar://73357726, [SR-14068]
rdar://73644659 (Add a safeguard to SimplifyCFG tryJumpThreading to avoid infinite loop peeling)
A case of infinite loop peeling was exposed recently:
([SR-14068]: Compiling with optimisation runs indefinitely for grpc-swift)
It was trivially fixed here:
---
commit 8948f7565a (HEAD -> fix-simplifycfg-tramp, public/fix-simplifycfg-tramp)
Author: Andrew Trick <atrick@apple.com>
Date: Tue Jan 26 17:02:37 2021
Fix a SimplifyCFG typo that leads to unbounded optimization
---
However, that fix isn't a strong guarantee against this behavior. The
obvious complete fix is that jump-threading should not affect loop
structure. But changing that requires a performance investigation. In
the meantime this change introduces a simple mechanism that guarantees
that a loop header is not repeatedly cloned.
This safeguard is worthwhile because jump-threading across loop
boundaries is kicking in more frequently now the critical edges are
being split within SimplifyCFG.
Note that it is both necessary and desirable to split critical edges
between transformations so that SIL remains in a valid state. That
allows other code in SimplifyCFG to call arbitrary SIL utilities,
allows verifying SimplifyCFG by running verification between
transformation, and simplifies the patters that SimplifyCFG itself
needs to consider.
Fixes rdar://73357726 ([SR-14068]: Compiling with optimisation runs
indefinitely for grpc-swift)
The root cause of this problem is that SimplifyCFG::tryJumpThreading
jump threads into loops, effectively peeling loops. This is not the
right way to implement loop peeling. That belongs in a loop
optimization pass. There's is simply no sane way to control jump
threading if it is allowed across loop boundaries, both from the
standpoint of requiring optimizations to terminate and from the
standpoint of reducing senseless code bloat.
SimplifyCFG does have a mechanism to avoid jump-threading into loop in
most cases. That mechanism would actually prevent the infinite loop
peeling in this particular case if it were implemented correctly. But
the original implementation circa 2014 appears to have a typo.
This commit fixes that obvious bug. I do not think it's a sufficient
to ensure we never see the bad behavior. I will file separate bugs for
the broader issue.
This bad behavior was exposed incidentally by splitting critical
edges. Without edge splitting, SimplifyCFG::simplifyBlocks only
performs "jump threading" once, creating a critical edge to the loop
header. Because simplifyBlocks works under the assumption that there
are no critical edges, it never attempts to perform jump threading
again. In other words, the presence of the critical edge "breaks" the
optimization, preventing it from continuing as intended.
With edge splitting, the simplifyBlocks worklist performs "jump
threading" followed by "jump to trampoline" removal, which creates a
new loop-back edge to the original loop header. This is fine. However,
simplifyBlocks iteratively attempts all optimizations up to a fix
point and it does not stop at loop headers! So, splitting the critical
edge causes simplifyBlocks to work as intended, which leads to
infinite loop peeling. The end result is an infinite sequence of
nested loops. Each peeled iteration is actually within the parent
loop.
Currently all of these places in the code base perform simplifyInstruction and
then a replaceAllSimplifiedUsesAndErase(...). This is a bad pattern since:
1. simplifyInstruction assumes its result will be passed to
replaceAllSimplifiedUsesAndErase. So by leaving these as separate things, we
allow for users to pointlessly make this mistake.
2. I am going to implement in a subsequent commit a utility that lifetime
extends interior pointer bases when replacing an address with an interior
pointer derived address. To do this efficiently, I want to reuse state I
compute during simplifyInstruction during the actual RAUW meaning that if the
two operations are split, that is difficult without extending the API. So by
removing this, I can make the transform and eliminate mistakes at the same
time.
Specifically before this PR, if a caller did not customize a specific callback
of InstModCallbacks, we would store a static default std::function into
InstModCallbacks. This means that we always would have an indirect jump. That is
unfortunate since this code is often called in loops.
In this PR, I eliminate this problem by:
1. I made all of the actual callback std::function in InstModCallback private
and gave them a "Func" postfix (e.x.: deleteInst -> deleteInstFunc).
2. I created public methods with the old callback names to actually call the
callbacks. This ensured that as long as we are not escaping callbacks from
InstModCallback, this PR would not result in the need for any source changes
since we are changing a call of a std::function field to a call to a method.
3. I changed all of the places that were escaping inst mod's callbacks to take
an InstModCallback. We shouldn't be doing that anyway.
4. I changed the default value of each callback in InstModCallbacks to be a
nullptr and changed the public helper methods to check if a callback is
null. If the callback is not null, it is called, otherwise the getter falls
back to an inline default implementation of the operation.
All together this means that the cost of a plain InstModCallback is reduced and
one pays an indirect function cost price as one customizes it further which is
better scalability.
P.S. as a little extra thing, I added a madeChange field onto the
InstModCallback. Now that we have the helpers calling the callbacks, I can
easily insert instrumentation like this, allowing for users to pass in
InstModCallback and see if anything was RAUWed without needing to specify a
callback.
This makes it easier to understand conceptually why a ValueOwnershipKind with
Any ownership is invalid and also allowed me to explicitly document the lattice
that relates ownership constraints/value ownership kinds.
I have a need to have SwitchEnum{,Addr}Inst have different base classes
(TermInst, OwnershipForwardingTermInst). To do this I need to add a template to
SwitchEnumInstBase so I can switch that BaseTy. Sadly since we are using
SwitchEnumInstBase as an ADT type as well as an actual base type for
Instructions, this is impossible to do without introducing a template in a ton
of places.
Rather than doing that, I changed the code that was using SwitchEnumInstBase as
an ADT to instead use a proper ADT SwitchEnumBranch. I am happy to change the
name as possible see fit (maybe SwitchEnumTerm?).
In a blatant abuse of OO style, the logic for manipulating the CFG was
encapsulated inside a descriptor of the CFG edge, making it impossible
to work with.
Avoid introducing new critical edges. Passes will end up resplitting
them, forcing SimplifyCFG to continually rerun. Also, we want to allow
SIL utilities to assume no critical edges, and avoid the need for
several passes to internally split edges and modify the CFG for no
reason.
Also, handle arbitrary block arguments which may be trivial and
unrelated to the real optimizations that trampoline removal exposes,
such as "unwrapping" enumeration-type arguments.
The previous code was an example of how to write an unstable
optimization. It could be defeated by other code in the function that
isn't directly related to the SSA graph being optimized. In general,
when an optimization can be defeated by unrelated code in the
function, that leads to instability which can be very hard to track
down (I spent multiple full days on this one). In this case, we have
enum-type branch args which need to be simplified by unwrapping
them. But, the existence of a trivial and entirely unrelated block
argument would defeat the optimization.
I think unconditional branches should be free, period. They will
mostly be removed during LLVM code gen. However, fixing this requires
signficant adjustments to inlining heuristics to avoid microbenchmark
regressions at -Osize. So, instead I am just making this less
sensitive to critical edges for the sake of pipeline stability.
When SimplifyCFG (temporarily) produces an unreachable CFG cycle, some other transformations in SimplifyCFG didn't deal with this situation correctly.
Unfortunately I couldn't create a SIL test case for this bug, so I just added a swift test case.
https://bugs.swift.org/browse/SR-13650
rdar://problem/69942431
`get_async_continuation[_addr]` begins a suspend operation by accessing the continuation value that can resume
the task, which can then be used in a callback or event handler before executing `await_async_continuation` to
suspend the task.
If the branch-block injects a certain enum case and the destination switches on that enum, it's worth jump threading. E.g.
inject_enum_addr %enum : $*Optional<T>, #Optional.some
... // no memory writes here
br DestBB
DestBB:
... // no memory writes here
switch_enum_addr %enum : $*Optional<T>, case #Optional.some ...
This enables removing all code with optionals in a loop, which iterates over an array of address-only elements, e.g.
func test<T>(_ items: [T]) {
for i in items {
print(i)
}
}
This became necessary after recent function type changes that keep
substituted generic function types abstract even after substitution to
correctly handle automatic opaque result type substitution.
Instead of performing the opaque result type substitution as part of
substituting the generic args the underlying type will now be reified as
part of looking at the parameter/return types which happens as part of
the function convention apis.
rdar://62560867
* Update Devirtualizer's analysis invalidation
castValueToABICompatibleType can change CFG, Devirtualizer uses this api but doesn't check if it modified the cfg
When merging many blocks to a single block (in the wrong order), instructions are getting moved over and over again.
This is quadratic and can result in very long compile times for large functions.
To fix this, always move the instruction to smaller block to the larger block.
rdar://problem/56268570
Fixes a potential real bug in the case that SinkAddressProjections moves
projections without notifying SimplifyCFG of the change. This could
fail to update Analyses (probably won't break anything in practice).
Introduce SILInstruction::isPure. Among other things, this can tell
you if it's safe to duplicate instructions at their
uses. SinkAddressProjections should check this before sinking uses. I
couldn't find a way to expose this as a real bug, but it is a
theoretical bug.
Add the SinkAddressProjections functionality to the BasicBlockCloner
utility. Enable address projection sinking for all BasicBlockCloner
clients (the four different kinds of jump-threading that use it). This
brings the compiler much closer to banning all address phis.
The "bugs" were originally introduced a week ago here:
commit f22371bf0b (fork/fix-address-phi, fix-address-phi)
Author: Andrew Trick <atrick@apple.com>
Date: Tue Sep 17 16:45:51 2019
Add SIL SinkAddressProjections utility to avoid address phis.
Enable this utility during jump-threading in SimplifyCFG.
Ultimately, the SIL verifier should prevent all address-phis and we'll
need to use this utility in a few more places.
Fixes <rdar://problem/55320867> SIL verification failed: Unknown
formal access pattern: storage
In SILInstruction::isTriviallyDuplicatable():
- Make deallocating instructions trivially duplicatable. They are by
any useful definition--duplicating an instruction does not imply
reordering it. Tail duplication was already treating deallocations
as duplicatable, but doing it inconsistently. Sometimes it checks
isTriviallyDuplicatable, and sometimes it doesn't, which appears to
have been an accident. Disallowing duplication of deallocations will
cause severe performance regressions. Instead, consistently allow
them to be duplicated, making tail duplication more powerful, which
could expose other bugs.
- Do not duplicate on-stack AllocRefInst (without special
consideration). This is a correctness fix that apparently was never
exposed.
Fix SILLoop::canDuplicate():
- Handle isDeallocatingStack. It's not clear how we were avoiding an
assertion before when a stack allocatable reference was confined to
a loop--probably just by luck.
- Handle begin/end_access inside a loop. This is extremely important
and probably prevented many loop optimizations from working with
exclusivity.
Update LoopRotate canDuplicateOrMoveToPreheader(). This is NFC.
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
Previously, we were not handling properly blocks that we could visit multiple
times. In this commit, I added a SmallPtrSet to ensure that we handle all of the
same cases that we handled previously.
The key reason that we want to follow this approach rather than something else
is that the previous algorithm on purpose allowed for side-entrances from other
checks since often times when we have multiple checks, all of the .none branches
funnel together into a single ultimate block.
This can be seen by the need of this code to support the test two_chained_calls
in simplify_switch_enum_objc.sil.
rdar://55861081
Enable this utility during jump-threading in SimplifyCFG.
Ultimately, the SIL verifier should prevent all address-phis and we'll
need to use this utility in a few more places.
Fixes <rdar://problem/55320867> SIL verification failed: Unknown
formal access pattern: storage