We used to emit this:
checked_cast_br [exact] %0 : $Foo to $Bar, bb1, bb2
bb1(%1 : $Bar):
%2 = unchecked_ref_cast %0 : $Foo to %1 : $Bar
apply ...(..., %2)
br ...
bb2:
...
This is not ownership SIL-safe, because we're re-using the original
operand, which will have already been consumed at that point.
The more immediate problem here is that this is actually not safe
when combined with other optimizations. Suppose that after the
speculative devirtualization, our function is inlined into another
function where the operand was an upcast. Now we have this code:
%0 = upcast ...
checked_cast_br [exact] %0 : $Foo to $Bar, bb1, bb2
bb1(%1 : $Bar):
%2 = unchecked_ref_cast %0 : $Foo to %1 : $Bar
apply ...(..., %2)
br ...
bb2:
...
The SILCombiner will simplify the unchecked_ref_cast of the upcast
into an unchecked_ref_cast of the upcast's original operand. At
this point, we have an unchecked_ref_cast between two unrelated
class types. While this means the block bb1 is unreachable, we
might perform further optimizations on it before we run the cast
optimizer and delete it altogether.
In particular, the devirtualizer follows chains of reference cast
instructions, and it will get very confused if it finds this invalid
cast between unrelated types.
Fixes <rdar://problem/57712094>, <https://bugs.swift.org/browse/SR-11916>.
For alias analysis query to be generally correct, we need to
effectively merge the escape state and use points for everything in a
defer web.
It was unclear from the current design whether the "escaping" property
applied to the pointer value or its content. The implementation is
inconsistent in how it was treated. It appears that some bugs have
been worked around by propagating forward through defer edges, some
have been worked around by querying the content instead of the
pointer, and others have been worked around be creating fake use
points at block arguments.
If we always simply query the content for escape state and use points,
then we never need to propagate along defer edges. The current code
that propagates escape state along defer edges in one direction is
simply incorrect from the perspective of alias analysis.
One very attractive solution is to merge nodes eagerly without
creating any defer edges, but that would be a much more radical change
even than what I've done here. It would also pose some new issues: how
to resolve the current "node types" when merging and how to deal with
missing content nodes.
This solution of applying escape state to content nodes solves all
these problems without too radical of a change at the expense of
eagerly creating content nodes. (The potential graph memory usage is
not really an issue because it's possible to drastically shrink the
size of the graph anyway in a future commit--I've been able to fit a
node within one cache line). This solution nicely preserves graph
structure which makes it easy to debug and relate to the IR.
Eagerly creating content nodes also solves the missing content node
problem. For example, when querying canEscapeTo, we need to know
whether to look at the escape state for just the pointer value itself,
or also for its content. It may be possible the its content node is
actually part of the same object at the IR level. If the content node
is missing, then we don't know if the object's interior address is not
recognizable/representable or whether we simply never saw an access to
the interior address. We can't simply look at whether the current IR
value happens to be a reference, because that doesn't tell us whether
the graph node may have been merged with a non-reference node or even
with it's own content node. To be correct in general, this query would
need to be extremely conservative. However, if content nodes are
always created for references, then we only need to query the escape
state of a pointer's content node. The content node's flag tells us if
it's an interior node, in which case it will always point to another
content node which also needs to be queried.
and eliminate dead code. This is meant to be a replacement for the utility:
recursivelyDeleteTriviallyDeadInstructions. The new utility performs more aggresive
dead-code elimination for ownership SIL.
This patch also migrates most non-force-delete uses of
recursivelyDeleteTriviallyDeadInstructions to the new utility.
and migrates one force-delete use of recursivelyDeleteTriviallyDeadInstructions
(in IRGenPrepare) to use the new utility.
For alias analysis query to be generally correct, we need to
effectively merge the escape state and use points for everything in a
defer web.
It was unclear from the current design whether the "escaping" property
applied to the pointer value or its content. The implementation is
inconsistent in how it was treated. It appears that some bugs have
been worked around by propagating forward through defer edges, some
have been worked around by querying the content instead of the
pointer, and others have been worked around be creating fake use
points at block arguments.
If we always simply query the content for escape state and use points,
then we never need to propagate along defer edges. The current code
that propagates escape state along defer edges in one direction is
simply incorrect from the perspective of alias analysis.
One very attractive solution is to merge nodes eagerly without
creating any defer edges, but that would be a much more radical change
even than what I've done here. It would also pose some new issues: how
to resolve the current "node types" when merging and how to deal with
missing content nodes.
This solution of applying escape state to content nodes solves all
these problems without too radical of a change at the expense of
eagerly creating content nodes. (The potential graph memory usage is
not really an issue because it's possible to drastically shrink the
size of the graph anyway in a future commit--I've been able to fit a
node within one cache line). This solution nicely preserves graph
structure which makes it easy to debug and relate to the IR.
Eagerly creating content nodes also solves the missing content node
problem. For example, when querying canEscapeTo, we need to know
whether to look at the escape state for just the pointer value itself,
or also for its content. It may be possible the its content node is
actually part of the same object at the IR level. If the content node
is missing, then we don't know if the object's interior address is not
recognizable/representable or whether we simply never saw an access to
the interior address. We can't simply look at whether the current IR
value happens to be a reference, because that doesn't tell us whether
the graph node may have been merged with a non-reference node or even
with it's own content node. To be correct in general, this query would
need to be extremely conservative. However, if content nodes are
always created for references, then we only need to query the escape
state of a pointer's content node. The content node's flag tells us if
it's an interior node, in which case it will always point to another
content node which also needs to be queried.
All the context dependencies in SIL type lowering have been eradicated, but IRGen's
type info lowering is still context-dependent and doesn't systemically pass generic
contexts around. Sink GenericContextScope bookkeeping entirely into IRGen for now.
The cast optimizer was too eager to fold casts where the source and
target lowered types were the same, even though the formal types
might be different. Move the classifyFeasibility() check to deal
with this case.
Also in dead code elimination we have to mark all operands of a
cast branch instruction as live, not just the first operand,
otherwise we miss the special 'type dependent' self metadata
operand and replace it with 'undef'.
SIL type lowering erases DynamicSelfType, so we generate
incorrect code when casting to DynamicSelfType. Fixing this
requires a fair amount of plumbing, but most of the
changes are mechanical.
Note that the textual SIL syntax for casts has changed
slightly; the target type is now a formal type without a '$',
not a SIL type.
Also, the unconditional_checked_cast_value and
checked_cast_value_br instructions now take the _source_
formal type as well, just like the *_addr forms they are
intended to replace.
Fixes a potential real bug in the case that SinkAddressProjections moves
projections without notifying SimplifyCFG of the change. This could
fail to update Analyses (probably won't break anything in practice).
Introduce SILInstruction::isPure. Among other things, this can tell
you if it's safe to duplicate instructions at their
uses. SinkAddressProjections should check this before sinking uses. I
couldn't find a way to expose this as a real bug, but it is a
theoretical bug.
Add the SinkAddressProjections functionality to the BasicBlockCloner
utility. Enable address projection sinking for all BasicBlockCloner
clients (the four different kinds of jump-threading that use it). This
brings the compiler much closer to banning all address phis.
The "bugs" were originally introduced a week ago here:
commit f22371bf0b (fork/fix-address-phi, fix-address-phi)
Author: Andrew Trick <atrick@apple.com>
Date: Tue Sep 17 16:45:51 2019
Add SIL SinkAddressProjections utility to avoid address phis.
Enable this utility during jump-threading in SimplifyCFG.
Ultimately, the SIL verifier should prevent all address-phis and we'll
need to use this utility in a few more places.
Fixes <rdar://problem/55320867> SIL verification failed: Unknown
formal access pattern: storage
In SILInstruction::isTriviallyDuplicatable():
- Make deallocating instructions trivially duplicatable. They are by
any useful definition--duplicating an instruction does not imply
reordering it. Tail duplication was already treating deallocations
as duplicatable, but doing it inconsistently. Sometimes it checks
isTriviallyDuplicatable, and sometimes it doesn't, which appears to
have been an accident. Disallowing duplication of deallocations will
cause severe performance regressions. Instead, consistently allow
them to be duplicated, making tail duplication more powerful, which
could expose other bugs.
- Do not duplicate on-stack AllocRefInst (without special
consideration). This is a correctness fix that apparently was never
exposed.
Fix SILLoop::canDuplicate():
- Handle isDeallocatingStack. It's not clear how we were avoiding an
assertion before when a stack allocatable reference was confined to
a loop--probably just by luck.
- Handle begin/end_access inside a loop. This is extremely important
and probably prevented many loop optimizations from working with
exclusivity.
Update LoopRotate canDuplicateOrMoveToPreheader(). This is NFC.
This is the first in a series of patches that reworks
EscapeAnalysis. For this patch, I extracted every change that does not
introduce new features, rewrite logic, or appear to change
functionality.
These cleanups were done in preparation for:
- adding a graph representation for reference counted objects
- rewriting parts to the query logic
- ...which then allows the analysis to safely assume that all
exclusive arguments are unique
- ...which then allows more aggressive optimization of local variables
that don't escape
There are two possible functional changes:
1. getUnderlyingAddressRoot in InstructionUtils now sees through OSSA
instructions: begin_borrow and copy_value
2. The getPointerBase helper in EscapeAnalysis now sees through all of
these reference and pointer casts:
+ case ValueKind::UncheckedRefCastInst:
+ case ValueKind::ConvertFunctionInst:
+ case ValueKind::UpcastInst:
+ case ValueKind::InitExistentialRefInst:
+ case ValueKind::OpenExistentialRefInst:
+ case ValueKind::RawPointerToRefInst:
+ case ValueKind::RefToRawPointerInst:
+ case ValueKind::RefToBridgeObjectInst:
+ case ValueKind::BridgeObjectToRefInst:
+ case ValueKind::UncheckedAddrCastInst:
+ case ValueKind::UnconditionalCheckedCastInst:
+ case ValueKind::RefTo##Name##Inst:
+ case ValueKind::Name##ToRefInst:
This coalesces a whole bunch of nodes together that were just there
because of casts. The existing code was already doing this for one
level of casts, but there was a bug that prevented it from happening
transitively. So, in theory, anything that breaks with this fix could
also break without this fix, but may not have been exposed. The fact
that this analysis coalesces address-to-reference casts at all is what
caused me to spent vast amounts of time debugging any time I tried to
force some structure on the graph via assertions. If it is done at
all, it should be done everywhere consistently to expose issues as
early as possible.
Here is a description of the changes in diff order. If something in
the diff is not listed here, then the code probably just moved around
in the file:
Rename isNotAliasedIndirectParameter to
isExclusiveIndirectParameter. The argument may be aliased in the
caller's scope and it's contents may have already escaped.
Add comments to SILType APIs (isTrivial, isReferenceCounted) that give
answers about the AST type which don't really make sense for address
SILTypes.
Add comments about CGNode's 'Value' field. I spent lots of time
attempting to tighten this down with asserts, but it's only possible
for non-content nodes. For content nodes, the node's value is highly
unpredictable and basically nonsense but needed for debugging.
Add comments about not assuming that the content nodes pointsTo edge
represents physical indirection. This matters when reasoning about
aliasing and it's a tempting assumption to make.
Add a CGNode::mergeProperties placeholder for adding node properties.
Factor out a CGNode::canAddDeferred helper for use later.
Rename `setPointsTo` to `setPointsToEdge` because it actually creates
an edge rather than just setting `pointsTo`.
Add CGNode::getValue() and related helpers to help maintain invariants.
Factor out a `markEscaping` helper.
Clean up the `escapesInsideFunction` helper.
Add node visitor helpers: visitSuccessors, visitDefers. This made is
much easier to prototype utilities.
Add comments to clarify the `pointsTo` invariant. In particular, an
entire defer web may have a null pointsTo for a while.
Add an `activeWorklist` to avoid nasty bugs when composing multiple
helpers that each use the worklist.
Remove the `EA` argument from `getNode`. I ended up needing access to
the `EA` context from the ConnectionGraph many times during
prototyping and passing `this` was all the `getNode` calls was very
silly.
Add graph visitor helpers: backwardTraverse, forwardTraverseDefer,
forwardTraversePointsToEdges, and mayReach for ease in developing new
logic and utilities.
Add isExclusiveArgument helper and distinguish exclusive arguments
from local objects. Confusing these properties could lead to scary
bugs. For example, unlike a local object, an exclusive argument's
contents may still escape even when the content's connection graph
node is non-escaping!
Add isUniquelyIdentified helper when we want to treat exclusive
arguments and local objects uniformly.
getUnderlyingAddressRoot now looks through OSSA instructions.
Rename `getAccessedMemory` to `getDirectlyAccessedMemory` with
comments. This is another dangerous API because it assumes the memory
access to a given address won't touch memory represented by different
graph nodes, but graph edges don't necessarily represent physical
indirection. Further clarify this issue in comments in
AliasAnalysis.cpp.
Factor out a 'findRecursiveRefType' helper from the old
'mayContainReference' for checking whether types may or must contain
references. Support both kinds of queries so the analysis can be
certain when a pointer's content is a physical heap object.
Factor out 'getPointerBase' and 'getPointerRoot' helpers that follow
address projections within what EscapeAnalysis will consider a single
node.
Create a CGNodeWorklist abstraction to safely formalize the worklist
mechanism that's used all over the place. In one place, there were
even two separate independent lists used independently (nodes added to
one list could appear to be in the other list).
The CGNodeMap abstraction did not significantly change, it was just moved.
Added 'dumpCG' for dumping .dot files making it possible to remote debug.
Added '-escapes-enable-graphwriter' option to dump .dot files, since
they are so much more useful than the textual dump of the connection
graph, which lacks node identity!
By convention, most structs and classes in the Swift compiler include a `dump()` method which prints debugging information. This method is meant to be called only from the debugger, but this means they’re often unused and may be eliminated from optimized binaries. On the other hand, some parts of the compiler call `dump()` methods directly despite them being intended as a pure debugging aid. clang supports attributes which can be used to avoid these problems, but they’re used very inconsistently across the compiler.
This commit adds `SWIFT_DEBUG_DUMP` and `SWIFT_DEBUG_DUMPER(<name>(<params>))` macros to declare `dump()` methods with the appropriate set of attributes and adopts this macro throughout the frontend. It does not pervasively adopt this macro in SILGen, SILOptimizer, or IRGen; these components use `dump()` methods in a different way where they’re frequently called from debugging code. Nor does it adopt it in runtime components like swiftRuntime and swiftReflection, because I’m a bit worried about size.
Despite the large number of files and lines affected, this change is NFC.
ProtocolConformanceRef already has an invalid state. Drop all of the
uses of Optional<ProtocolConformanceRef> and just use
ProtocolConformanceRef::forInvalid() to represent it. Mechanically
translate all of the callers and callsites to use this new
representation.