This makes it easier to understand conceptually why a ValueOwnershipKind with
Any ownership is invalid and also allowed me to explicitly document the lattice
that relates ownership constraints/value ownership kinds.
I have a need to have SwitchEnum{,Addr}Inst have different base classes
(TermInst, OwnershipForwardingTermInst). To do this I need to add a template to
SwitchEnumInstBase so I can switch that BaseTy. Sadly since we are using
SwitchEnumInstBase as an ADT type as well as an actual base type for
Instructions, this is impossible to do without introducing a template in a ton
of places.
Rather than doing that, I changed the code that was using SwitchEnumInstBase as
an ADT to instead use a proper ADT SwitchEnumBranch. I am happy to change the
name as possible see fit (maybe SwitchEnumTerm?).
In a blatant abuse of OO style, the logic for manipulating the CFG was
encapsulated inside a descriptor of the CFG edge, making it impossible
to work with.
Avoid introducing new critical edges. Passes will end up resplitting
them, forcing SimplifyCFG to continually rerun. Also, we want to allow
SIL utilities to assume no critical edges, and avoid the need for
several passes to internally split edges and modify the CFG for no
reason.
Also, handle arbitrary block arguments which may be trivial and
unrelated to the real optimizations that trampoline removal exposes,
such as "unwrapping" enumeration-type arguments.
The previous code was an example of how to write an unstable
optimization. It could be defeated by other code in the function that
isn't directly related to the SSA graph being optimized. In general,
when an optimization can be defeated by unrelated code in the
function, that leads to instability which can be very hard to track
down (I spent multiple full days on this one). In this case, we have
enum-type branch args which need to be simplified by unwrapping
them. But, the existence of a trivial and entirely unrelated block
argument would defeat the optimization.
I think unconditional branches should be free, period. They will
mostly be removed during LLVM code gen. However, fixing this requires
signficant adjustments to inlining heuristics to avoid microbenchmark
regressions at -Osize. So, instead I am just making this less
sensitive to critical edges for the sake of pipeline stability.
When SimplifyCFG (temporarily) produces an unreachable CFG cycle, some other transformations in SimplifyCFG didn't deal with this situation correctly.
Unfortunately I couldn't create a SIL test case for this bug, so I just added a swift test case.
https://bugs.swift.org/browse/SR-13650
rdar://problem/69942431
`get_async_continuation[_addr]` begins a suspend operation by accessing the continuation value that can resume
the task, which can then be used in a callback or event handler before executing `await_async_continuation` to
suspend the task.
If the branch-block injects a certain enum case and the destination switches on that enum, it's worth jump threading. E.g.
inject_enum_addr %enum : $*Optional<T>, #Optional.some
... // no memory writes here
br DestBB
DestBB:
... // no memory writes here
switch_enum_addr %enum : $*Optional<T>, case #Optional.some ...
This enables removing all code with optionals in a loop, which iterates over an array of address-only elements, e.g.
func test<T>(_ items: [T]) {
for i in items {
print(i)
}
}
This became necessary after recent function type changes that keep
substituted generic function types abstract even after substitution to
correctly handle automatic opaque result type substitution.
Instead of performing the opaque result type substitution as part of
substituting the generic args the underlying type will now be reified as
part of looking at the parameter/return types which happens as part of
the function convention apis.
rdar://62560867
* Update Devirtualizer's analysis invalidation
castValueToABICompatibleType can change CFG, Devirtualizer uses this api but doesn't check if it modified the cfg
When merging many blocks to a single block (in the wrong order), instructions are getting moved over and over again.
This is quadratic and can result in very long compile times for large functions.
To fix this, always move the instruction to smaller block to the larger block.
rdar://problem/56268570
Fixes a potential real bug in the case that SinkAddressProjections moves
projections without notifying SimplifyCFG of the change. This could
fail to update Analyses (probably won't break anything in practice).
Introduce SILInstruction::isPure. Among other things, this can tell
you if it's safe to duplicate instructions at their
uses. SinkAddressProjections should check this before sinking uses. I
couldn't find a way to expose this as a real bug, but it is a
theoretical bug.
Add the SinkAddressProjections functionality to the BasicBlockCloner
utility. Enable address projection sinking for all BasicBlockCloner
clients (the four different kinds of jump-threading that use it). This
brings the compiler much closer to banning all address phis.
The "bugs" were originally introduced a week ago here:
commit f22371bf0b (fork/fix-address-phi, fix-address-phi)
Author: Andrew Trick <atrick@apple.com>
Date: Tue Sep 17 16:45:51 2019
Add SIL SinkAddressProjections utility to avoid address phis.
Enable this utility during jump-threading in SimplifyCFG.
Ultimately, the SIL verifier should prevent all address-phis and we'll
need to use this utility in a few more places.
Fixes <rdar://problem/55320867> SIL verification failed: Unknown
formal access pattern: storage
In SILInstruction::isTriviallyDuplicatable():
- Make deallocating instructions trivially duplicatable. They are by
any useful definition--duplicating an instruction does not imply
reordering it. Tail duplication was already treating deallocations
as duplicatable, but doing it inconsistently. Sometimes it checks
isTriviallyDuplicatable, and sometimes it doesn't, which appears to
have been an accident. Disallowing duplication of deallocations will
cause severe performance regressions. Instead, consistently allow
them to be duplicated, making tail duplication more powerful, which
could expose other bugs.
- Do not duplicate on-stack AllocRefInst (without special
consideration). This is a correctness fix that apparently was never
exposed.
Fix SILLoop::canDuplicate():
- Handle isDeallocatingStack. It's not clear how we were avoiding an
assertion before when a stack allocatable reference was confined to
a loop--probably just by luck.
- Handle begin/end_access inside a loop. This is extremely important
and probably prevented many loop optimizations from working with
exclusivity.
Update LoopRotate canDuplicateOrMoveToPreheader(). This is NFC.
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
Previously, we were not handling properly blocks that we could visit multiple
times. In this commit, I added a SmallPtrSet to ensure that we handle all of the
same cases that we handled previously.
The key reason that we want to follow this approach rather than something else
is that the previous algorithm on purpose allowed for side-entrances from other
checks since often times when we have multiple checks, all of the .none branches
funnel together into a single ultimate block.
This can be seen by the need of this code to support the test two_chained_calls
in simplify_switch_enum_objc.sil.
rdar://55861081
Enable this utility during jump-threading in SimplifyCFG.
Ultimately, the SIL verifier should prevent all address-phis and we'll
need to use this utility in a few more places.
Fixes <rdar://problem/55320867> SIL verification failed: Unknown
formal access pattern: storage
Change the way how to limit jump threading. Instead of counting the
number of jump threading optimizations on a block, count the number of
copied instructions due to jump threading.
rdar://problem/51416939
Patch by Erik Eckstein!
Disable constant folding and jump threading for functions with > 10000 blocks.
Those optimizations are not strictly linear with the number of blocks and cause compile time issues if the function is really huge.
SR-10209
rdar://problem/49522869
The ownership kind is Any for trivial types, or Owned otherwise, but
whether a type is trivial or not will soon depend on the resilience
expansion.
This means that a SILModule now uniques two SILUndefs per type instead
of one, and serialization uses two distinct sentinel IDs for this
purpose as well.
For now, the resilience expansion is not actually used here, so this
change is NFC, other than changing the module format.
In the statement
optional1?.objc_setter = optional2?.objc_getter?.objc_getter
we can eliminate all optional switches expect for the first switch on
optional1. We must only execute the setter if optional1 has some value.
We can simplify the following switch_enum with a branch as long all
sideffecting instructions in someBB are objc_method calls on the
optional payload or on another objc_method call that transitively uses
the payload.
switch_enum %optionalValue, case #Optional.some!enumelt.1: someBB,
case #Optional.none: noneBB
someBB(%optionalPayload):
%1 = objc_method %optionalPayload
%2 = apply %1(..., %optionalPayload) // self position
br mergeBB(%2)
noneBB:
%4 = enum #Optional.none
br mergeBB(%4)
rdar://48007302
NOTE: I changed all places that the CastOptimizer is created to just pass in
nullptr for now so this is NFC.
----
Right now the interface of the CastOptimizer is muddled and confused. Sometimes
it is returning a value that should be used by the caller, other times it is
returning an instruction that is meant to be reprocessed by the caller.
This series of patches is attempting to clean this up by switching to the
following model:
1. If we are optimizing a cast of a value, we return a SILValue. If the cast
fails, we return an empty SILValue().
2. If we are optimizing a cast of an address, we return a boolean value to show
success/failure and require the user to use the SILBuilderContext to get the
cast if they need to.
This disables a bunch of passes when ownership is enabled. This will allow me to
keep transparent functions in ossa and skip most of the performance pipeline without
being touched by passes that have not been updated for ownership.
This is important so that we can in -Onone code import transparent functions and
inline them into other ossa functions (you can't inline from ossa => non-ossa).
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
This is in preparation for verifying that when ownership verification is enabled
that only enums and trivial values can have any ownership. I am doing this in
preparation for eliminating ValueOwnershipKind::Trivial.
rdar://46294760
Rewrite the SILCLoners used in SimplifyCFG. For convenience, there is
now simply a BasicBlockCloner and a SILFunctionCloner. It's pretty
obvious what they do and almost impossible to use incorrectly.
This is worthwhile on its own just to make the usage clear, but the
real reason is that after this cleanup, it will be possible to remove
many extraneous calls to global critical edge splitting related to
cloning.
Mostly functionally neutral:
- may fix latent bugs.
- may reduce useless basic blocks after inlining.
This rewrite encapsulates the cloner's internal state, providing a
clean API for the CRTP subclasses. The subclasses are rewritten to use
the exposed API and extension points. This makes it much easier to
understand, work with, and extend SIL cloners, which are central to
many optimization passes. Basic SIL invariants are now clearly
expressed and enforced. There is no longer a intricate dance between
multiple levels of subclasses operating on underlying low-level data
structures. All of the logic needed to keep the original SIL in a
consistent state is contained within the SILCloner itself. Subclasses
only need to be responsible for their own modifications.
The immediate motiviation is to make CFG updates self-contained so
that SIL remains in a valid state. This will allow the removal of
critical edge splitting hacks and will allow general SIL utilities to
take advantage of the fact that we don't allow critical edges.
This rewrite establishes a simple principal that should be followed
everywhere: aside from the primitive mutation APIs on SIL data types,
each SIL utility is responsibile for leaving SIL in a valid state and
the logic for doing so should exist in one central location.
This includes, for example:
- Generating a valid CFG, splitting edges if needed.
- Returning a valid instruction iterator if any instructions are removed.
- Updating dominance.
- Updating SSA (block arguments).
(Dominance info and SSA properties are fundamental to SIL verification).
LoopInfo is also somewhat fundamental to SIL, and should generally be
updated, but it isn't required.
This also fixes some latent bugs related to iterator invalidation in
recursivelyDeleteTriviallyDeadInstructions and SILInliner. Note that
the SILModule deletion callback should be avoided. It can be useful as
a simple cache invalidation mechanism, but it is otherwise bug prone,
too limited to be very useful, and basically bad design. Utilities
that mutate should return a valid instruction iterator and provide
their own deletion callbacks.