An error found in DiagnoseInvalidEscapingCaptures can indicate invalid
SIL, which will cause DiagnoseStaticExclusivity to assert during SIL
verification. When the source-level closure captures an inout
argument, it appears in SIL to be a non-escaping closure. The SIL
verification then fails because the "nonescaping" closure actually
escapes.
Ensure that capture diagnostics run on closures before exclusivity
enforcement runs on the parent function. Bypass the SIL verification
assert if a diagnostic error was found.
Fixes rdar://75364904 (Crash with assertion `noescape partial_apply
has unexpected use`)
* add is_escaping_closure as allowed instruction
* don't restrict the uses of copy_value to be a store
Fixes a verification crash (only in assert builds).
https://bugs.swift.org/browse/SR-14394
rdar://75740622
* ``begin_access [modify]`` returned MayWrite, but "modify" means, it can be a read as well.
Instead, return MayReadWrite. Only for ``begin_access [init]`` return MayWrite.
This is more or less cosmetic - probably this bug had no real impact on any optimization.
* ``begin_access [deinit]`` needs to return MayReadWrite for the same reason ``destroy_addr`` returns MayReadWrite (see SILInstruction::MemoryBehavior).
Deinitializer side effects severely handicap the connection-graph
based EscapeAnalysis. Because it's not flow-sensitive, this approach
falls apart whenever an object is released in the current function,
which makes it seem to have escaped everywhere (it generally doesn't
matter if it doesn't escape until the release point, but the analysis
can't discern that).
This can be slightly mitigated by realizing that releasing an object
can only cause things it points to to escape if the object itself has
references or pointers.
Fix: Don't create a escaping content node when releasing an object
that can't contain any references or pointers. The previous commit,
"Fix EscapeAnalysis::mayReleaseContent" would defeat release-hoisting
in important cases without doing anything else. Adding this extra
precision to the connection graph avoids some regressions.
The previous implementation made extremely subtle and specific
assumptions about how the API is used which doesn't apply
everywhere. It was trying very hard to avoid regressing performance
relative to an even older implementation that didn't even try to consider deinitializer side effects.
The aggressive logic was based on the idea that a release must have a
corresponding retain somewhere in the same function and we don't care
if the last release happens early if there are no more aliasing
uses. All the unit tests I wrote previously were based on release
hoisting, which happens to work given the way the API is used.
But this logic is incorrect for retain sinking. In that case sinking
past an "unrelated" release could cause the object to be freed
early. See test/SILOptimizer/arc_crash.swift.
With SemanticARC and other SIL improvements being made, I'm afraid
bugs like this will begin to surface.
To fix it, just remove the subtle logic to leave behind a simple and
sound EscapeAnalysis API. To do better, we will need to rewrite the
AliasAnalysis logic for release side effects, which is currently
a tangled web. In the meantime, SemanticARC can handle many cases without EscapeAnalysis.
Fixes rdar://74469299 (ARC miscompile:
EscapeAnalysis::mayReleaseContent; potential use-after-free)
While fixing this, add support for address-type queries too:
Fixes rdar://74360041 (Assertion failed:
(!releasedReference->getType().isAddress() && "an address is never a
reference"), function mayReleaseContent
strong_retain/strong_release with an optional reference as operand are rejected by the verifier and are not supported by IRGen.
SILCombine created such retains/releases when optimizing ref_cast instructions.
This fixes a compiler crash.
rdar://74146617
If we know that we have a FunctionRefInst (and not another variant of FunctionRefBaseInst), we know that getting the referenced function will not be null (in contrast to FunctionRefBaseInst::getReferencedFunctionOrNull).
NFC
If an argument is deallocated in the function, it cannot be converted to a guaranteed argument.
We already handled the dealloc_ref instruction, but were missing the dealloc_partial_ref instruction.
It fixes a miscompile.
rdar://73871606
Instead make `findJointPostDominatingSet` a stand-alone function.
There is no need to keep the temporary SmallVector alive across multiple calls of findJointPostDominatingSet for the purpose of re-using malloc'ed memory. The worklist usually contains way less elements than its small size.
Some notes:
1. I moved the identity round-trip case to InstSimplify since that is where
optimizations like that are.
2. I did not update in this commit the code that eliminates convert_function
when it is only destroyed. In a subsequent commit I am going to implement
that in a general way and apply it to all forwarding instructions.
3. I implemented eliminating convert_function with ownership only uses in a
utility so that I can reuse it for other similar optimizations in SILCombine.
This removes the ambiguity when casting from a SingleValueInstruction to SILNode, which makes the code simpler. E.g. the "isRepresentativeSILNode" logic is not needed anymore.
Also, it reduces the size of the most used instruction class - SingleValueInstruction - by one pointer.
Conceptually, SILInstruction is still a SILNode. But implementation-wise SILNode is not a base class of SILInstruction anymore.
Only the two sub-classes of SILInstruction - SingleValueInstruction and NonSingleValueInstruction - inherit from SILNode. SingleValueInstruction's SILNode is embedded into a ValueBase and its relative offset in the class is the same as in NonSingleValueInstruction (see SILNodeOffsetChecker).
This makes it possible to cast from a SILInstruction to a SILNode without knowing which SILInstruction sub-class it is.
Casting to SILNode cannot be done implicitly, but only with an LLVM `cast` or with SILInstruction::asSILNode(). But this is a rare case anyway.
This removes the ambiguity when casting from a SingleValueInstruction to SILNode, which makes the code simpler. E.g. the "isRepresentativeSILNode" logic is not needed anymore.
Also, it reduces the size of the most used instruction class - SingleValueInstruction - by one pointer.
Conceptually, SILInstruction is still a SILNode. But implementation-wise SILNode is not a base class of SILInstruction anymore.
Only the two sub-classes of SILInstruction - SingleValueInstruction and NonSingleValueInstruction - inherit from SILNode. SingleValueInstruction's SILNode is embedded into a ValueBase and its relative offset in the class is the same as in NonSingleValueInstruction (see SILNodeOffsetChecker).
This makes it possible to cast from a SILInstruction to a SILNode without knowing which SILInstruction sub-class it is.
Casting to SILNode cannot be done implicitly, but only with an LLVM `cast` or with SILInstruction::asSILNode(). But this is a rare case anyway.
Importantly this also lets us use the analysis framework to validate that we do
properly invalidate DeadEndBlocks, preventing bugs.
I did not thread this all over the compiler. Instead I just used it for now in
SemanticARCOpts just to add some coverage without threading it into too many
places.
I am trying to be more careful about this rather than letting someone make this
mistake in the future. I also added some comments and a DeadEndBlocks instance
to TempRValueElimination so that it gets full simplifications in OSSA.
Btw for those curious, the reason why this optimization is important is that the
stdlib often times performs reinterpret casts using address to pointer, pointer
to address to perform the case. So we need to be able to reliably eliminate
these. Since the underlying base object's liveness is already what provides the
liveness to the underlying address, our approach of treating this as a lifetime
extension requiring pattern makes sense since we are adding information into the
IR by canonicalizing these escapes to be normal address operations.
NOTE: In this commit, we only handle the identity transform. In a subsequent
commit, I am going to hit the non-identity transform that is handled by
SILCombine.
Also, I added some specific more general OSSA RAUW tests for this case to
ossa_rauw_tests using this functionality in a more exhaustive way that would be
in appropriate in sil_combine_ossa.sil (since I am stress testing this specific
piece of code).
In OSSA, we enforce that addresses from interior pointer instructions are scoped
within a borrow scope. This means that it is invalid to use such an address
outside of its parent borrow scope and as a result one can not just RAUW an
address value by a dominating address value since the latter may be invalid at
the former. I foresee that I am going to have to solve this problem and so I
decided to write this API to handle the vast majority of cases.
The way this API works is that it:
1. Computes an access path with base for the new value. If we do not have a base
value and a valid access path with root, we bail.
2. Then we check if our base value is the result of an interior pointer
instruction. If it isn't, we are immediately done and can RAUW without further
delay.
3. If we do have an interior pointer instruction, we see if the immediate
guaranteed value we projected from has a single borrow introducer value. If not,
we bail. I think this is reasonable since with time, all guaranteed values will
always only have a single borrow introducing value (once struct, tuple,
destructure_struct, destructure_tuple become reborrows).
4. Then we gather up all inner uses of our access path. If for some reason that
fails, we bail.
5. Then we see if all of those uses are within our borrow scope. If so, we can
RAUW without any further worry.
6. Otherwise, we perform a copy+borrow of our interior pointer's operand value
at the interior pointer, create a copy of the interior pointer instruction upon
this new borrow and then RAUW oldValue with that instead. By construction all
uses of oldValue will be within this new interior pointer scope.
The reason why I am doing this is that I am building up more utilities based on
passing around this struct of context that do not want it for RAUWing
purposes. So it makes sense on a helper (OwnershipRAUWHelper) that composes with
its state.
Just a refactor, should be NFC.
Currently all of these places in the code base perform simplifyInstruction and
then a replaceAllSimplifiedUsesAndErase(...). This is a bad pattern since:
1. simplifyInstruction assumes its result will be passed to
replaceAllSimplifiedUsesAndErase. So by leaving these as separate things, we
allow for users to pointlessly make this mistake.
2. I am going to implement in a subsequent commit a utility that lifetime
extends interior pointer bases when replacing an address with an interior
pointer derived address. To do this efficiently, I want to reuse state I
compute during simplifyInstruction during the actual RAUW meaning that if the
two operations are split, that is difficult without extending the API. So by
removing this, I can make the transform and eliminate mistakes at the same
time.
Access scopes for enforcing exclusivity are currently the only
exception to our ability to canonicalize OSSA lifetime purely based on
the SSA value's known uses. This is because access scopes have
semantics relative to object deinitializers.
In general, deinitializers are asynchronous with respect to code that
is unrelated to the object's uses. Ignoring exclusivity, the optimizer
may always destroy objects as early as it wants, as long as the object
won't be used again. The optimizer may also extend the lifetime
(although in the future this lifetime extension should be limited by
"synchronization points").
The optimizer's freedom is however limited by exclusivity
enforcement. Optimization may never introduce new exclusivity
violations. Destroying an object within an access scope is an
exclusivity violation if the deinitializer accesses the same variable.
To handle this, OSSA canonicalization must detect access scopes that
overlap with the end of the pruned extended lifetime. Essentially:
%def
begin_access // access scope unrelated to def
use %def // pruned liveness ends here
end_access
destroy %def
Support for access scopes composes cleanly with the existing algorithm
without adding significant cost in the usual case. Overlapping access
scopes are unusual. A single CFG walk within the original extended
lifetime is normally sufficient. Only the blocks that are not already
LiveOut in the pruned liveness need to be visited. During this walk,
local overlapping access are detected by scanning for end_access
instructions after the last use point. Global overlapping accesses are
detected by checking NonLocalAccessBlockAnalysis. This avoids scanning
instructions in the common case. NonLocalAccessBlockAnalysis is a
trivial analysis that caches the rare occurence of nonlocal access
scopes. The analysis itself is a single linear scan over the
instruction stream. This analysis can be preserved across most
transformations and I expect it to be used to speed up other
optimizations related to access marker.
When an overlapping access is detected, pruned liveness is simply
extended to include the end_access as a new use point. Extending the
lifetime is iterative, but with each iteration, blocks that are now
marked LiveOut no longer need to be visited. Furthermore, interleaved
accessed scopes are not expected to happen in practice.
This is a low level API already being used in multiple places besides
InstSimplify (e.x.: Utils/OwnershipOptUtils), so it makes sense to move it into
InstOptUtil.
When a non-value instruction (e.g. a destroy_addr) was deleted, the corresponding cache entry in MemoryBehaviorCache was not invalidated.
If a new instruction was allocated at the same memory location, the old - and invalid - cache entry was re-used.
This bug triggered a SIL memory lifetime failure in TempRValueElimination.
https://bugs.swift.org/browse/SR-13985
rdar://problem/72614608
This change also fixes another problem (which I found by inspection): Individual result values of MultipleValueInstructions were not invalidated correctly.
This allows for me to do a couple of things improving quality/correctness/ease of use:
1. I reimplemented InstMod's RAUW and RAUW/erase helpers on top of
setUseValue/deleteInst. Beyond allowing the caller to specify less things, we
gain an orthogonality preventing bugs like overriding erase/RAUW but not
overriding erase or having the erase used in erase/RAUW act differently than
the erase for deleteInst.
2. There were a bunch of places using InstModCallback that also were setting
uses without having the ability for InstModCallbacks perform it (since it
only supported RAUW). This is an anti-pattern and could cause subtle bugs to
be introduced by appropriate state in the caller not being updated.
Specifically before this PR, if a caller did not customize a specific callback
of InstModCallbacks, we would store a static default std::function into
InstModCallbacks. This means that we always would have an indirect jump. That is
unfortunate since this code is often called in loops.
In this PR, I eliminate this problem by:
1. I made all of the actual callback std::function in InstModCallback private
and gave them a "Func" postfix (e.x.: deleteInst -> deleteInstFunc).
2. I created public methods with the old callback names to actually call the
callbacks. This ensured that as long as we are not escaping callbacks from
InstModCallback, this PR would not result in the need for any source changes
since we are changing a call of a std::function field to a call to a method.
3. I changed all of the places that were escaping inst mod's callbacks to take
an InstModCallback. We shouldn't be doing that anyway.
4. I changed the default value of each callback in InstModCallbacks to be a
nullptr and changed the public helper methods to check if a callback is
null. If the callback is not null, it is called, otherwise the getter falls
back to an inline default implementation of the operation.
All together this means that the cost of a plain InstModCallback is reduced and
one pays an indirect function cost price as one customizes it further which is
better scalability.
P.S. as a little extra thing, I added a madeChange field onto the
InstModCallback. Now that we have the helpers calling the callbacks, I can
easily insert instrumentation like this, allowing for users to pass in
InstModCallback and see if anything was RAUWed without needing to specify a
callback.
This is a generic API that when ownership is enabled allows one to replace all
uses of a value with a value with a differing ownership by transforming/lifetime
extending as appropriate.
This API supports all pairings of ownership /except/ replacing a value with
OwnershipKind::None with a value without OwnershipKind::None. This is a more
complex optimization that we do not support today. As a result, we include on
our state struct a helper routine that callers can use to know if the two values
that they want to process can be handled by the algorithm.
My moticiation is to use this to to update InstSimplify and SILCombiner in a
less bug prone way rather than just turn stuff off.
Noting that this transformation inserts ownership instructions, I have made sure
to test this API in two ways:
1. With Mandatory Combiner alone (to make sure it works period).
2. With Mandatory Combiner + Semantic ARC Opts to make sure that we can
eliminate the extra ownership instructions it inserts.
As one can see from the tests, the optimizer today is able to handle all of
these transforms except one conditional case where I need to eliminate a dead
phi arg. I have a separate branch that hits that today but I have exposed unsafe
behavior in ClosureLifetimeFixup that I need to fix first before I can land
that. I don't want that to stop this PR since I think the current low level ARC
optimizer may be able to help me here since this is a simple transform it does
all of the time.
Fixes a compile time problem. The single linked list of merge targets in connection graph nodes can be very large.
Update the final merge target in the map, so that it has to be traversed only once for a given SILValue.
rdar://problem/71602804
load, store in ossa can have side-effects and stores can release. Specifically:
Memory Behavior
---------------
* Load: unqualified, trivial, take have a read side-effect, but copy retains so
has side-effects.
* Store: unqualified, trivial, init may write but assign releases so it may have
side-effects.
Release Behavior
----------------
* Load: No changes.
* Store: May release if store has assign as an ownership qualifier.
When casting from existentials to class - and vice versa - it can happen that a cast is not RC identity preserving (because of potential bridging).
This also affects mayRelease() of such cast instructions.
For details see the comments in SILDynamicCastInst::isRCIdentityPreserving().
This change also includes some refactoring: I centralized the logic in SILDynamicCastInst::isRCIdentityPreserving().
rdar://problem/70454804
to check for improperly nested '@_semantic' functions.
Add a missing @_semantics("array.init") in ArraySlice found by the
diagnostic.
Distinguish between array.init and array.init.empty.
Categorize the types of semantic functions by how they affect the
inliner and pass pipeline, and centralize this logic in
PerformanceInlinerUtils. The ultimate goal is to prevent inlining of
"Fundamental" @_semantics calls and @_effects calls until the late
pipeline where we can safely discard semantics. However, that requires
significant pipeline changes.
In the meantime, this change prevents the situation from getting worse
and makes the intention clear. However, it has no significant effect
on the pass pipeline and inliner.