To avoid introducing new copies--which is illegal for move-only values--
don't rewrite `load [take]`s and `copy_addr [take]`s as `load [copy]`s
and `copy_addr`s respectively and introduce new `destroy_addr`s after
them. Instead, get the effect of folding such a newly created
`destroy_addr` into the preceding rewritten `load [copy]` or
`copy_addr`. Get that effect by neither modifying the `copy_addr [take]`
or `load [take]` nor adding a subsequent `destroy_addr`.
An example for each kind (`load [take]` and `copy_addr [take]`):
```
// Input 1 (`load [take]`)
copy_addr [take] %src to [init] %tmp
%val = load [take] %src
// Old Output 1
%val = load [copy] %src
destroy_addr %src
// New Output 2
%val = load [take] %src
```
```
// Input 2 (`copy_addr [take]`)
copy_addr [take] %src to [init] %tmp
copy_addr [take] %src to [init] %dst
// Old Output 2
copy_addr %src to [init] %dst
destroy_addr %src
// New Output 2
copy_addr [take] %src to [init] %dst
```
rdar://107839979
In preparation for "folding" an "inserted destroy" into a load [copy] or
copy_addr, rename the variable that indicates whether the copyInst's
source must be deinitialized after its last "load".
Previously, whenever an alloc_stack [lexical] was seen, optimization
bailed conservatively.
In the fullness of time, we should optimize an alloc_stack [lexical] so
long as:
(1) the root of the source of the store/copy_addr is lexical
(2) the range in which the alloc_stack [lexical] contains the value
store'd/copy_addr'd into it (i.e. the range ending at
destroy_addr/etc) is contained within the live range of that lexical
root
Here, an incremental step in that direction is taken: if the base of the
accessed storage of the source of a copy_addr is a guaranteed function
argument, then we know
(1) the base is lexical--guaranteed function arguments are always
lexical
(2) the range in which the alloc_stack contains the value copy_addr'd in
is trivially contained within the range of that lexical root--the
live range of a guaranteed function argument is the whole function
Added TODOs for the full optimization.
When optimizing an enum `store` to an `alloc_stack`, require that all uses are in the same block.
Otherwise it could be a `switch_enum` of an optional where the none-case does not have a destroy of the enum value.
After transforming such an `alloc_stack`, the value would leak in the none-case block.
Fixes a OSSA verification error.
Previously, TempRValueElimination would peephole simple alloc_stacks,
even when they were lexical; here, they are left for Mem2Reg to properly
handle.
Previously, SemanticARCOpts would eliminate lexical begin_borrows,
incorrectly allowing the lifetime of the value borrowed by them to be
observably shortened. Here, those borrow scopes are not eliminated if
they are lexical.
Added an executable test that verifies that a local variable strongly
referencing a delegate object keeps that delegate alive through the call
to an object that weakly references the delegate and calls out to it.
Instead of caching alias results globally for the module, make AliasAnalysis a FunctionAnalysisBase which caches the alias results per function.
Why?
* So far the result caches could only grow. They were reset when they reached a certain size. This was not ideal. Now, they are invalidated whenever the function changes.
* It was not possible to actually invalidate an alias analysis result. This is required, for example in TempRValueOpt and TempLValueOpt (so far it was done manually with invalidateInstruction).
* Type based alias analysis results were also cached for the whole module, while it is actually dependent on the function, because it depends on the function's resilience expansion. This was a potential bug.
I also added a new PassManager API to directly get a function-base analysis:
getAnalysis(SILFunction *f)
The second change of this commit is the removal of the instruction-index indirection for the cache keys. Now the cache keys directly work on instruction pointers instead of instruction indices. This reduces the number of hash table lookups for a cache lookup from 3 to 1.
This indirection was needed to avoid dangling instruction pointers in the cache keys. But this is not needed anymore, because of the new delayed instruction deletion mechanism.
Without this when constructing an InstModCallback it is hard to distinguish
which closure is meant for which operation when passed to the constructor of
InstModCallback (if this was in Swift, we could use argument labels, but we do
not have such things in c++).
This new value type sort of formulation makes it unambiguous which callback is
used for what when constructing one of these.
Don't move an end_access over a (non-aliasing) end_access. This would destroy the proper nesting of accesses.
Also, add some comments, asserts and tests.
Try to move an end_access down to extend the access scope over all uses of the temporary.
For example:
%a = begin_access %src
copy_addr %a to [initialization] %temp : $*T
end_access %a
use %temp
We must not replace %temp with %a after the end_access. Instead we try to move the end_access after "use %temp".
This fixes generation of invalid SIL and/or the invalid removal of access checks.
I am trying to be more careful about this rather than letting someone make this
mistake in the future. I also added some comments and a DeadEndBlocks instance
to TempRValueElimination so that it gets full simplifications in OSSA.
Currently all of these places in the code base perform simplifyInstruction and
then a replaceAllSimplifiedUsesAndErase(...). This is a bad pattern since:
1. simplifyInstruction assumes its result will be passed to
replaceAllSimplifiedUsesAndErase. So by leaving these as separate things, we
allow for users to pointlessly make this mistake.
2. I am going to implement in a subsequent commit a utility that lifetime
extends interior pointer bases when replacing an address with an interior
pointer derived address. To do this efficiently, I want to reuse state I
compute during simplifyInstruction during the actual RAUW meaning that if the
two operations are split, that is difficult without extending the API. So by
removing this, I can make the transform and eliminate mistakes at the same
time.
Specifically before this PR, if a caller did not customize a specific callback
of InstModCallbacks, we would store a static default std::function into
InstModCallbacks. This means that we always would have an indirect jump. That is
unfortunate since this code is often called in loops.
In this PR, I eliminate this problem by:
1. I made all of the actual callback std::function in InstModCallback private
and gave them a "Func" postfix (e.x.: deleteInst -> deleteInstFunc).
2. I created public methods with the old callback names to actually call the
callbacks. This ensured that as long as we are not escaping callbacks from
InstModCallback, this PR would not result in the need for any source changes
since we are changing a call of a std::function field to a call to a method.
3. I changed all of the places that were escaping inst mod's callbacks to take
an InstModCallback. We shouldn't be doing that anyway.
4. I changed the default value of each callback in InstModCallbacks to be a
nullptr and changed the public helper methods to check if a callback is
null. If the callback is not null, it is called, otherwise the getter falls
back to an inline default implementation of the operation.
All together this means that the cost of a plain InstModCallback is reduced and
one pays an indirect function cost price as one customizes it further which is
better scalability.
P.S. as a little extra thing, I added a madeChange field onto the
InstModCallback. Now that we have the helpers calling the callbacks, I can
easily insert instrumentation like this, allowing for users to pass in
InstModCallback and see if anything was RAUWed without needing to specify a
callback.
When instructions are changed within a pass in a way that affects subsequent alias queries in the same pass run,
their alias analysis information must be invalidated.
Otherwise it can result in miscompiles and/or invalid SIL.
rdar://71924430
Instead of reusing the existing destroy_addr (or load/copy_addr [take]) of the temporary, insert a new destroy_addr - at the correct location.
This fixes a memory lifetime failure in case the copy-source is modified (actually: re-initialized) after the last use of the temporary: E.g. (before copy elimination):
copy_addr [take] %src to [initialization] %temp
%x = load %temp // last use of %temp
store %y to [init] %src
destroy_addr %temp
The correct location for the destroy_addr is after the last use (after copy elimination):
%x = load %src
destroy_addr %src
store %y to [init] %src
rdar://problem/69757314
load [take] was not considered as a use and it was not detected if it's in a different basic block.
This fixes a miscompile in case there is a modification of the copy-source before a load [take].
rdar://problem/69757314
Consider the related end_access instructions as uses to correctly mark the end of the lifetime of the temporary.
This fixes a miscompile in case there is a modification of the copy-source between an begin_access and end_access.
... where it belongs.
This is mostly refactoring, but it also fixes a bug: we don't recurse into a begin_access in collectLoads.
If there is an apply in such a scope, the mayWrite-check wouldn't be done.
In checkNoSourceModification all instructions are visited, so the check is always done.
This fixes a miscompile in case the source of the optimized copy_addr is modified in a called function with to a not visible alias.
This can happen with class properties or global variables.
This fix removes the special handling of function parameters, which was just wrong.
Instead it simply uses the alias analysis API to check for modifications of the source object.
The fix makes TempRValueElimination more conservative and this can cause some performance regressions, but this is unavoidable.
rdar://problem/69605657
* Handle begin_apply in TempRVO
A tempobj passed to begin_apply instruction and cannot be modified by it
(is guaranteed and doesn't alias with other inout args) can be optimzed
away.
We should not optimize away the copy_addr of a guaranteed parameter of a
non-onstack partial_apply.
This is not a bug currently, because TempRVO checks if the lifetime end
points of the temp object are destroy_addr's in
TempRValueOptPass::checkTempObjectDestroy.
This change makes it explicit on which partial_apply's are ok to
optimize.
rdar://61349083
* Use in_guaranteed for let captures
With this all let values will be captured with in_guaranteed convention
by the closure. Following are the main changes :
SILGen changes:
- A new CaptureKind::Immutable is introduced, to capture let values as in_guaranteed.
- SILGen of in_guaranteed capture had to be fixed.
in_guaranteed captures as per convention are consumed by the closure. And so SILGen should not generate a destroy_addr for an in_guaranteed capture.
But LetValueInitialization can push Dealloc and Release states of the captured arg in the Cleanup stack, and there is no way to access the CleanupHandle and disable the emission of destroy_addr while emitting the captures in SILGenFunction::emitCaptures.
So we now create, temporary allocation of the in_guaranteed capture iduring SILGenFunction::emitCaptures without emitting destroy_addr for it.
SILOptimizer changes:
- Handle in_guaranteed in CopyForwarding.
- Adjust dealloc_stack of in_guaranteed capture to occur after destroy_addr for on_stack closures in ClosureLifetimeFixup.
IRGen changes :
- Since HeapLayout can be non-fixed now, make sure emitSize is used conditionally
- Don't consider ClassPointerSource kind parameter type for fulfillments while generating code for partial apply forwarder.
The TypeMetadata of ClassPointSource kind sources are not populated in HeapLayout's NecessaryBindings. If we have a generic parameter on the HeapLayout which can be fulfilled by a ClassPointerSource, its TypeMetaData will not be found while constructing the dtor function of the HeapLayout.
So it is important to skip considering sources of ClassPointerSource kind, so that TypeMetadata of a dependent generic parameters gets populated in HeapLayout's NecessaryBindings.
In the case of copy_addr, we move it onto the source address and in the case of
a store, put it on the source object.
I just noted this pattern happening a bunch in the stdlib when I was looking
through it for ownership patterns.
unchecked_take_enum_data_addr only writes to memory in certain cases. Optional
is not one of those cases luckily.
Given that I am adding support here just for optionals so that we can get rid of
temporaries that are used with a switch_enum_addr. This is an example of a case
where we need to eliminate temporary allocations to allow for Semantic ARC Opts
to eliminate further traffic.
Using fallthroughs in general can lead to easy mistakes. This PR changes the
relevant code to use a helper instead. The reason why is that I am about to add
another instance of needing this exact helper and having to fall through
multiple switch cases will just make this even worse.
Specifically, we do not handle cases where we use projections,
open_existential_addr, or load_borrow. We should be able to handle those in the
future.
The best part is that ossa makes this really easy to do correctly.
rdar://60064817
- consistently operate on the address value of the variable via
stripAccessMarkers. Never use address of the transient access which
may not have sufficient lifetime.
- Recognize begin_access [read] as a load.
- delete dead access markers after deleting copies
Fixes <rdar://59345115> Unnecessary copy_addr not eliminated with
exclusivity checking
There are a few interesting things here:
1. All of this code is conditionalized implicitly on ossa by us checking for
load [take]. If we are in non-ossa, then we will have unqualified loads and this
code path will be skipped. I left in the old test that made sure we did not
support this in non-ossa code for this purpose.
2. We treat the load [take] like a destroy_addr. Thus the current method of
providing safety via using lifetime analysis to show all "destroys" form a
minimal post-dominating set for the copy_addr.
3. We do not allow for load [take] on sub-element initializations. The reason
why is that SILGen initializes temporaries completely in memory and generally
destroys addresses completely as well. Given that is the pattern we are looking
for, we just reduce the state space by banning this pattern.
4. If we have a copy_addr [init], then we not only change the load [take] to
have the original source address as an argument, but we also change the load
[take] to a load [copy] so we don't lose the +1. Additionally, we have to hoist
the load [copy] to the copy_addr [init] site to ensure that we do not insert a
use-after-free from the retain happening too late.
I am doing this to clean up some simple temp rvalue patterns in SIL before
semantic arc opts runs. This ensures that we are more likely to have a load
operation come from a more meaningful semantic entity (for instance a let field
of a guaranteed class) and thus allow us to convert a load [copy] to a
load_borrow.
I am also going to add support in subsequent commits for eliminating alloc_stack
that are stored into as well (another pattern I am seeing).