Use the new EscapeAnalysis infrastructure to make ARC code motion and
ARC sequence opts much more powerful and fix a latent bug in
AliasAnalysis.
Adds a new API `EscapeAnalysis::mayReleaseContent()`. This replaces
all uses if `EscapeAnalysis::canEscapeValueTo()`, which affects
`AliasAnalysis::can[Apply|Builtin]DecrementRefCount()`.
Also rewrite `AliasAnalysis::mayValueReleaseInterferWithInstruction` to
directly use `EscapeAnalysis::mayReleaseContent`.
The new implementation in `EscapeAnalysis::mayReleaseContent()`
generalizes the logic to handle more cases while avoiding an incorrect
assumption in the prior code. In particular, it adds support for
disambiguating local references from accessed addresses. This helps
handle cases in which inlining was defeating ARC optimization. The
incorrect assumption was that a non-escaping address is never
reachable via a reference. However, if a reference does not escape,
then an address into its object also does not escape.
The bug in `AliasAnalysis::mayValueReleaseInterfereWithInstruction()`
appears not to have broken anything yet because it is always called by
`AliasAnalysis::mayHaveSymmetricInteference()`, which later checks
whether the accessed address may alias with the released reference
using a separate query, `EscapeAnalysis::canPointToSameMemory()`. This
happens to work because an address into memory that is directly
released when destroying a reference necesasarilly points to the same
memory object. For this reason, I couldn't figure out a simple way to
hand-code SIL tests to expose this bug.
The changes in diff order:
Replace EscapeAnalysis `canEscapeToValue` with `mayReleaseContent` to
make the semantics clear. It queries: "Can the given reference release
the content pointed to the given address".
Change `AliasAnalysis::canApplyDecrementRefCount` to use
`mayReleaseContent` instead if 'canEscapeToValue'.
Change `AliasAnalysis::mayValueReleaseInterferWithInstruction`: after
getting the memory address accessed by the instruction, simply call
`EscapeAnalysis::mayReleaseContent`, which now implements all the
logic. This avoids the bad assumption made by AliasAnalysis.
Handle two cases in mayReleaseContent: non-escaping instruction
addresses and non-escaping referenecs. Fix the non-escaping address
case by following all content nodes to determine whether the address
is reachable from the released reference. Introduce a new optimization
for the case in which the reference being released is allocated
locally.
The following test case is now optimized in arcsequenceopts.sil:
remove_as_local_object_indirectly_escapes_to_callee. It was trying to
test that ARC optimization was not too aggressive when it removed a
retain/release of a child object whose parent container is still in
use. But the retain/release should be removed. The original example
already over-releases the parent object.
Add new unit tests to late_release_hoisting.sil.
For alias analysis query to be generally correct, we need to
effectively merge the escape state and use points for everything in a
defer web.
It was unclear from the current design whether the "escaping" property
applied to the pointer value or its content. The implementation is
inconsistent in how it was treated. It appears that some bugs have
been worked around by propagating forward through defer edges, some
have been worked around by querying the content instead of the
pointer, and others have been worked around be creating fake use
points at block arguments.
If we always simply query the content for escape state and use points,
then we never need to propagate along defer edges. The current code
that propagates escape state along defer edges in one direction is
simply incorrect from the perspective of alias analysis.
One very attractive solution is to merge nodes eagerly without
creating any defer edges, but that would be a much more radical change
even than what I've done here. It would also pose some new issues: how
to resolve the current "node types" when merging and how to deal with
missing content nodes.
This solution of applying escape state to content nodes solves all
these problems without too radical of a change at the expense of
eagerly creating content nodes. (The potential graph memory usage is
not really an issue because it's possible to drastically shrink the
size of the graph anyway in a future commit--I've been able to fit a
node within one cache line). This solution nicely preserves graph
structure which makes it easy to debug and relate to the IR.
Eagerly creating content nodes also solves the missing content node
problem. For example, when querying canEscapeTo, we need to know
whether to look at the escape state for just the pointer value itself,
or also for its content. It may be possible the its content node is
actually part of the same object at the IR level. If the content node
is missing, then we don't know if the object's interior address is not
recognizable/representable or whether we simply never saw an access to
the interior address. We can't simply look at whether the current IR
value happens to be a reference, because that doesn't tell us whether
the graph node may have been merged with a non-reference node or even
with it's own content node. To be correct in general, this query would
need to be extremely conservative. However, if content nodes are
always created for references, then we only need to query the escape
state of a pointer's content node. The content node's flag tells us if
it's an interior node, in which case it will always point to another
content node which also needs to be queried.
For alias analysis query to be generally correct, we need to
effectively merge the escape state and use points for everything in a
defer web.
It was unclear from the current design whether the "escaping" property
applied to the pointer value or its content. The implementation is
inconsistent in how it was treated. It appears that some bugs have
been worked around by propagating forward through defer edges, some
have been worked around by querying the content instead of the
pointer, and others have been worked around be creating fake use
points at block arguments.
If we always simply query the content for escape state and use points,
then we never need to propagate along defer edges. The current code
that propagates escape state along defer edges in one direction is
simply incorrect from the perspective of alias analysis.
One very attractive solution is to merge nodes eagerly without
creating any defer edges, but that would be a much more radical change
even than what I've done here. It would also pose some new issues: how
to resolve the current "node types" when merging and how to deal with
missing content nodes.
This solution of applying escape state to content nodes solves all
these problems without too radical of a change at the expense of
eagerly creating content nodes. (The potential graph memory usage is
not really an issue because it's possible to drastically shrink the
size of the graph anyway in a future commit--I've been able to fit a
node within one cache line). This solution nicely preserves graph
structure which makes it easy to debug and relate to the IR.
Eagerly creating content nodes also solves the missing content node
problem. For example, when querying canEscapeTo, we need to know
whether to look at the escape state for just the pointer value itself,
or also for its content. It may be possible the its content node is
actually part of the same object at the IR level. If the content node
is missing, then we don't know if the object's interior address is not
recognizable/representable or whether we simply never saw an access to
the interior address. We can't simply look at whether the current IR
value happens to be a reference, because that doesn't tell us whether
the graph node may have been merged with a non-reference node or even
with it's own content node. To be correct in general, this query would
need to be extremely conservative. However, if content nodes are
always created for references, then we only need to query the escape
state of a pointer's content node. The content node's flag tells us if
it's an interior node, in which case it will always point to another
content node which also needs to be queried.
This is the first in a series of patches that reworks
EscapeAnalysis. For this patch, I extracted every change that does not
introduce new features, rewrite logic, or appear to change
functionality.
These cleanups were done in preparation for:
- adding a graph representation for reference counted objects
- rewriting parts to the query logic
- ...which then allows the analysis to safely assume that all
exclusive arguments are unique
- ...which then allows more aggressive optimization of local variables
that don't escape
There are two possible functional changes:
1. getUnderlyingAddressRoot in InstructionUtils now sees through OSSA
instructions: begin_borrow and copy_value
2. The getPointerBase helper in EscapeAnalysis now sees through all of
these reference and pointer casts:
+ case ValueKind::UncheckedRefCastInst:
+ case ValueKind::ConvertFunctionInst:
+ case ValueKind::UpcastInst:
+ case ValueKind::InitExistentialRefInst:
+ case ValueKind::OpenExistentialRefInst:
+ case ValueKind::RawPointerToRefInst:
+ case ValueKind::RefToRawPointerInst:
+ case ValueKind::RefToBridgeObjectInst:
+ case ValueKind::BridgeObjectToRefInst:
+ case ValueKind::UncheckedAddrCastInst:
+ case ValueKind::UnconditionalCheckedCastInst:
+ case ValueKind::RefTo##Name##Inst:
+ case ValueKind::Name##ToRefInst:
This coalesces a whole bunch of nodes together that were just there
because of casts. The existing code was already doing this for one
level of casts, but there was a bug that prevented it from happening
transitively. So, in theory, anything that breaks with this fix could
also break without this fix, but may not have been exposed. The fact
that this analysis coalesces address-to-reference casts at all is what
caused me to spent vast amounts of time debugging any time I tried to
force some structure on the graph via assertions. If it is done at
all, it should be done everywhere consistently to expose issues as
early as possible.
Here is a description of the changes in diff order. If something in
the diff is not listed here, then the code probably just moved around
in the file:
Rename isNotAliasedIndirectParameter to
isExclusiveIndirectParameter. The argument may be aliased in the
caller's scope and it's contents may have already escaped.
Add comments to SILType APIs (isTrivial, isReferenceCounted) that give
answers about the AST type which don't really make sense for address
SILTypes.
Add comments about CGNode's 'Value' field. I spent lots of time
attempting to tighten this down with asserts, but it's only possible
for non-content nodes. For content nodes, the node's value is highly
unpredictable and basically nonsense but needed for debugging.
Add comments about not assuming that the content nodes pointsTo edge
represents physical indirection. This matters when reasoning about
aliasing and it's a tempting assumption to make.
Add a CGNode::mergeProperties placeholder for adding node properties.
Factor out a CGNode::canAddDeferred helper for use later.
Rename `setPointsTo` to `setPointsToEdge` because it actually creates
an edge rather than just setting `pointsTo`.
Add CGNode::getValue() and related helpers to help maintain invariants.
Factor out a `markEscaping` helper.
Clean up the `escapesInsideFunction` helper.
Add node visitor helpers: visitSuccessors, visitDefers. This made is
much easier to prototype utilities.
Add comments to clarify the `pointsTo` invariant. In particular, an
entire defer web may have a null pointsTo for a while.
Add an `activeWorklist` to avoid nasty bugs when composing multiple
helpers that each use the worklist.
Remove the `EA` argument from `getNode`. I ended up needing access to
the `EA` context from the ConnectionGraph many times during
prototyping and passing `this` was all the `getNode` calls was very
silly.
Add graph visitor helpers: backwardTraverse, forwardTraverseDefer,
forwardTraversePointsToEdges, and mayReach for ease in developing new
logic and utilities.
Add isExclusiveArgument helper and distinguish exclusive arguments
from local objects. Confusing these properties could lead to scary
bugs. For example, unlike a local object, an exclusive argument's
contents may still escape even when the content's connection graph
node is non-escaping!
Add isUniquelyIdentified helper when we want to treat exclusive
arguments and local objects uniformly.
getUnderlyingAddressRoot now looks through OSSA instructions.
Rename `getAccessedMemory` to `getDirectlyAccessedMemory` with
comments. This is another dangerous API because it assumes the memory
access to a given address won't touch memory represented by different
graph nodes, but graph edges don't necessarily represent physical
indirection. Further clarify this issue in comments in
AliasAnalysis.cpp.
Factor out a 'findRecursiveRefType' helper from the old
'mayContainReference' for checking whether types may or must contain
references. Support both kinds of queries so the analysis can be
certain when a pointer's content is a physical heap object.
Factor out 'getPointerBase' and 'getPointerRoot' helpers that follow
address projections within what EscapeAnalysis will consider a single
node.
Create a CGNodeWorklist abstraction to safely formalize the worklist
mechanism that's used all over the place. In one place, there were
even two separate independent lists used independently (nodes added to
one list could appear to be in the other list).
The CGNodeMap abstraction did not significantly change, it was just moved.
Added 'dumpCG' for dumping .dot files making it possible to remote debug.
Added '-escapes-enable-graphwriter' option to dump .dot files, since
they are so much more useful than the textual dump of the connection
graph, which lacks node identity!
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
This removes it from the AST and largely replaces it with AnyObject
at the SIL and IRGen layers. Some notes:
- Reflection still uses the notion of "unknown object" to mean an
object with unknown refcounting. There's no real reason to make
this different from AnyObject (an existential containing a
single object with unknown refcounting), but this way nothing
changes for clients of Reflection, and it's consistent with how
native objects are represented.
- The value witness table and reflection descriptor for AnyObject
use the mangling "BO" instead of "yXl".
- The demangler and remangler continue to support "BO" because it's
still in use as a type encoding, even if it's not an AST-level
Type anymore.
- Type-based alias analysis for Builtin.UnknownObject was incorrect,
so it's a good thing we weren't using it.
- Same with enum layout. (This one assumed UnknownObject never
referred to an Objective-C tagged pointer. That certainly wasn't how
we were using it!)
This is a large patch; I couldn't split it up further while still
keeping things working. There are four things being changed at
once here:
- Places that call SILType::isAddressOnly()/isLoadable() now call
the SILFunction overload and not the SILModule one.
- SILFunction's overloads of getTypeLowering() and getLoweredType()
now pass the function's resilience expansion down, instead of
hardcoding ResilienceExpansion::Minimal.
- Various other places with '// FIXME: Expansion' now use a better
resilience expansion.
- A few tests were updated to reflect SILGen's improved code
generation, and some new tests are added to cover more code paths
that previously were uncovered and only manifested themselves as
standard library build failures while I was working on this change.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Make this a generic analysis so that it can be used to analyze any
kind of function effect.
FunctionSideEffect becomes a trivial specialization of the analysis.
The immediate need for this is to introduce an new
AccessedStorageAnalysis, although I foresee it as a generally very
useful utility. This way, new kinds of function effects can be
computed without adding any complexity or compile time to
FunctionSideEffects. We have the flexibility of computing different
kinds of function effects at different points in the pipeline.
In the case of AccessedStorageAnalysis, it will compute both
FunctionSideEffects and FunctionAccessedStorage in the same pass by
implementing a simple wrapper on top of FunctionEffects.
This cleanup reflects my feeling that nested classes make the code
extremely unreadable unless they are very small and either private or
only used directly via its parent class. It's easier to see how these
classes compose with a flat type system.
In addition to enabling new kinds of function effects analyses, I
think this makes the implementation of side effect analysis easier to
understand by separating concerns.
introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
Replace `NameOfType foo = dyn_cast<NameOfType>(bar)` with DRY version `auto foo = dyn_cast<NameOfType>(bar)`.
The DRY auto version is by far the dominant form already used in the repo, so this PR merely brings the exceptional cases (redundant repetition form) in line with the dominant form (auto form).
See the [C++ Core Guidelines](https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es11-use-auto-to-avoid-redundant-repetition-of-type-names) for a general discussion on why to use `auto` to avoid redundant repetition of type names.
For a long time, we have:
1. Created methods on SILArgument that only work on either function arguments or
block arguments.
2. Created code paths in the compiler that only allow for "function"
SILArguments or "block" SILArguments.
This commit refactors SILArgument into two subclasses, SILPHIArgument and
SILFunctionArgument, separates the function and block APIs onto the subclasses
(leaving the common APIs on SILArgument). It also goes through and changes all
places in the compiler that conditionalize on one of the forms of SILArgument to
just use the relevant subclass. This is made easier by the relevant APIs not
being on SILArgument anymore. If you take a quick look through you will see that
the API now expresses a lot more of its intention.
The reason why I am performing this refactoring now is that SILFunctionArguments
have a ValueOwnershipKind defined by the given function's signature. On the
other hand, SILBlockArguments have a stored ValueOwnershipKind. Rather than
store ValueOwnershipKind in both instances and in the function case have a dead
variable, I decided to just bite the bullet and fix this.
rdar://29671437
The new instructions are: ref_tail_addr, tail_addr and a new attribute [ tail_elems ] for alloc_ref.
For details see docs/SIL.rst
As these new instructions are not generated so far, this is a NFC.
The new instructions are: ref_tail_addr, tail_addr and a new attribute [ tail_elems ] for alloc_ref.
For details see docs/SIL.rst
As these new instructions are not generated so far, this is a NFC.
Strict aliasing only applies to memory operations that use strict
addresses. The optimizer needs to be aware of this flag. Uses of raw
addresses should not have their address substituted with a strict
address.
Also add Builtin.LoadRaw which will be used by raw pointer loads.
We now consider effect of deinit in addition to the released value.
rdar://25362826
This is the only 10%+ regression i measured on my machine. no performance improvement.
Sim2DArray | 326 | 366 | +12.3% | **0.89x**
In many places, we're interested in whether a type with archetypes *might be* a superclass of another type with the right bindings, particularly in the optimizer. Provide a separate Type::isBindableToSuperclassOf method that performs this check. Use it in the devirtualizer to fix rdar://problem/24993618. Using it might unblock other places where the optimizer is conservative, but we can fix those separately.
This patch also implements some of the missing functions used by RLE and DSE in new projection
that exist in the old projection.
New projection provides better memory usage, eventually we will phase out the old projection code.
New projection is now copyable, i.e. we have a proper constructor for it. This helps make the code
more readable.
We do see a bit increase in compilation time in compiling stdlib -O, this is a result of the way
we now get types of a projection path, but I expect this to go down (away) with further improvement
on how memory locations are constructed and cached with later patches.
=== With the OLD Projection. ===
Total amount of memory allocated.
--------------------------------
Bytes Used Count Symbol Name
13032.01 MB 50.6% 2158819 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2879.70 MB 11.1% 3076018 (anonymous namespace)::ARCSequenceOpts::run()
2663.68 MB 10.3% 1375465 (anonymous namespace)::RedundantLoadElimination::run()
1534.35 MB 5.9% 5067928 (anonymous namespace)::SimplifyCFGPass::run()
1278.09 MB 4.9% 576714 (anonymous namespace)::SILCombine::run()
1052.68 MB 4.0% 935809 (anonymous namespace)::DeadStoreElimination::run()
771.75 MB 2.9% 1677391 (anonymous namespace)::SILCSE::run()
715.07 MB 2.7% 4198193 (anonymous namespace)::GenericSpecializer::run()
434.87 MB 1.6% 652701 (anonymous namespace)::SILSROA::run()
402.99 MB 1.5% 658563 (anonymous namespace)::SILCodeMotion::run()
341.13 MB 1.3% 962459 (anonymous namespace)::DCE::run()
279.48 MB 1.0% 415031 (anonymous namespace)::StackPromotion::run()
Compilation time breakdown.
--------------------------
Running Time Self (ms) Symbol Name
25716.0ms 35.8% 0.0 swift::runSILOptimizationPasses(swift::SILModule&)
25513.0ms 35.5% 0.0 swift::SILPassManager::runOneIteration()
20666.0ms 28.8% 24.0 swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19664.0ms 27.4% 77.0 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3272.0ms 4.5% 12.0 (anonymous namespace)::SimplifyCFGPass::run()
3266.0ms 4.5% 7.0 (anonymous namespace)::ARCSequenceOpts::run()
2608.0ms 3.6% 5.0 (anonymous namespace)::SILCombine::run()
2089.0ms 2.9% 104.0 (anonymous namespace)::SILCSE::run()
1929.0ms 2.7% 47.0 (anonymous namespace)::RedundantLoadElimination::run()
1280.0ms 1.7% 14.0 (anonymous namespace)::GenericSpecializer::run()
1010.0ms 1.4% 45.0 (anonymous namespace)::DeadStoreElimination::run()
966.0ms 1.3% 191.0 (anonymous namespace)::DCE::run()
496.0ms 0.6% 6.0 (anonymous namespace)::SILCodeMotion::run()
=== With the NEW Projection. ===
Total amount of memory allocated.
--------------------------------
Bytes Used Count Symbol Name
11876.64 MB 48.4% 22112349 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
2887.22 MB 11.8% 3079485 (anonymous namespace)::ARCSequenceOpts::run()
1820.89 MB 7.4% 1877674 (anonymous namespace)::RedundantLoadElimination::run()
1533.16 MB 6.2% 5073310 (anonymous namespace)::SimplifyCFGPass::run()
1282.86 MB 5.2% 577024 (anonymous namespace)::SILCombine::run()
772.21 MB 3.1% 1679154 (anonymous namespace)::SILCSE::run()
721.69 MB 2.9% 936958 (anonymous namespace)::DeadStoreElimination::run()
715.08 MB 2.9% 4196263 (anonymous namespace)::GenericSpecializer::run()
Compilation time breakdown.
--------------------------
Running Time Self (ms) Symbol Name
25137.0ms 37.3% 0.0 swift::runSILOptimizationPasses(swift::SILModule&)
24939.0ms 37.0% 0.0 swift::SILPassManager::runOneIteration()
20226.0ms 30.0% 29.0 swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>)
19241.0ms 28.5% 83.0 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
3214.0ms 4.7% 10.0 (anonymous namespace)::SimplifyCFGPass::run()
3005.0ms 4.4% 14.0 (anonymous namespace)::ARCSequenceOpts::run()
2438.0ms 3.6% 7.0 (anonymous namespace)::SILCombine::run()
2217.0ms 3.2% 54.0 (anonymous namespace)::RedundantLoadElimination::run()
2212.0ms 3.2% 131.0 (anonymous namespace)::SILCSE::run()
1195.0ms 1.7% 11.0 (anonymous namespace)::GenericSpecializer::run()
1168.0ms 1.7% 39.0 (anonymous namespace)::DeadStoreElimination::run()
853.0ms 1.2% 150.0 (anonymous namespace)::DCE::run()
499.0ms 0.7% 7.0 (anonymous namespace)::SILCodeMotion::run()
SILValue.h/.cpp just defines the SIL base classes. Referring to specific instructions is a (small) kind of layering violation.
Also I want to keep SILValue small so that it is really just a type alias of ValueBase*.
NFC.
As there are no instructions left which produce multiple result values, this is a NFC regarding the generated SIL and generated code.
Although this commit is large, most changes are straightforward adoptions to the changes in the ValueBase and SILValue classes.