* Remove dead `load_borrow` instructions (replaces the old peephole optimization in SILCombine)
* If the `load_borrow` is followed by a `copy_value`, combine both into a `load [copy]`
It hoists `destroy_value` instructions without shrinking an object's lifetime.
This is done if it can be proved that another copy of a value (either in an SSA value or in memory) keeps the referenced object(s) alive until the original position of the `destroy_value`.
```
%1 = copy_value %0
...
last_use_of %0
// other instructions
destroy_value %0 // %1 is still alive here
```
->
```
%1 = copy_value %0
...
last_use_of %0
destroy_value %0
// other instructions
```
The benefit of this optimization is that it can enable copy-propagation by moving destroys above deinit barries and access scopes.
It removes a `copy_value` where the source is a guaranteed value, if possible:
```
%1 = copy_value %0 // %0 = a guaranteed value
// uses of %1
destroy_value %1 // borrow scope of %0 is still valid here
```
->
```
// uses of %0
```
This optimization is very similar to the LoadCopyToBorrow optimization.
Therefore I merged both optimizations into a single file and renamed it to "CopyToBorrowOptimization".
In preparation for only recording the defs once, replace the
GraphNodeWorklist of defs with a SetVector. Preserve the current
visitation order by creating a worklist of indices to be visited.
Add a type which distinguishes among the types of defs that are pushed
onto the "def-use worklist". Note that it's not possible to rely on the
kind of value because the root may itself be a copy_value. For now, the
distinction is discarded as soon as the def is visited.
As the utility runs, new gens may become local: as access scopes are
determined to contain deinit barriers, their `end_access` instructions
become kills; if such an `end_access` occurs in the same block above an
initially-non-local gen, that gen is now local.
Previously, it was asserted that initially-non-local gens would not
encounter when visiting the block backwards from that gen. Iteration
would also _stop_ at the discovered kill, if any. As described above,
the assertion was incorrect.
Stopping at the discovered kill was also incorrect. It's necessary to
continue walking the block after finding such a new kill because the
book-keeping the utility does for which access scopes contain barriers.
Concretely, there are two cases:
(1) It may contain another `end_access` and above it a deinit barrier
which must result in that second scope becoming a deinit barrier.
(2) Some of its predecessors may be in the region, all the access scopes
which are open at the begin of this block must be unioned into the set
of scopes open at each predecessors' end, and more such access scopes
may be discovered above the just-visited `end_access`.
Here, both the assertion failure and the early bailout are fixed by
walking from the indicated initially-non-local gen backwards over the
entire block, regardless of whether a kill was encountered. If a kill
is encountered, it is asserted that the kill is an `end_access` to
account for the case described above.
rdar://139840307
In terms of the test suite the only difference is that we allow for non-Sendable
types to be returned from nonisolated functions. This is safe due to the rules
of rbi. We do still error when we return non-Sendable functions across isolation
boundaries though.
The reason that I am doing this now is that I am implementing a prototype that
allows for nonisolated functions to inherit isolation from their caller. This
would have required me to implement support both in Sema for results and
arguments in SIL. Rather than implement results in Sema, I just finished the
work of transitioning the result checking out of Sema and into SIL. The actual
prototype will land in a subsequent change.
rdar://127477211
Propagating array element values is done by load-simplification and redundant-load-elimination.
So ArrayElementPropagation is not needed anymore.
ArrayElementPropagation also replaced `Array.append(contentsOf:)` with individual `Array.append` calls.
This optimization is removed, because the benefit is questionably, anyway.
In most cases it resulted in a code size increase.
I am doing this since I discovered that we are not printing certain errors as
early as we used to (due to the refactoring I did here), which makes it harder
to see the errors that we are emitting while processing individual instructions
and before we run the actual dataflow.
A nice side-effect of this is that it will make it easy to dump the error in the
debugger rather than having to wait until the point in the code where the normal
logging takes place.
TLDR: Was looking at some performance traces and saw that we need to cache the
result of this value.
----
Specifically, I noticed that we were spending a lot of time computing this
operation. When I looked at the code I saw that we already had a cache along the
relevant code paths... but the cache was from equivalence class representative
-> state. Before we hit that cache, we were performing the work to map the value
to the equivalence class representative... so the work to perform the relevant
lookup from value -> state (which goes through the equivalence class
representative) was not just a hash table lookup. This operation makes it
cheaper by making it two cache lookups.
It may be possible to make this cheaper by redoing the actual mapping of
information so that we can go straight from value to state. I think it would be
slightly different since we would probably need to represent the state in a
separate array and map with indices... which is really just a more efficient
hash table. We could also use malloc/etc but lets not even talk about that.
rdar://139520959
This makes it so that one does not need to deal with the differences in text in
between the task isolated case and the actor isolated case. This is done by
swallowing the entire part of this message in one method rather than having the
caller do the work.
This is going to let me just pass through the error struct to the diagnostic
rather than having the CRTP and then constructing an info object per CRTP.
Currently, to make it easier to refactor, I changed the code in
TransferNonSendable to just take in the new error and call the current CRTP
routines. In the next commit, I am going to refactor TransferNonSendable.cpp
itself. This just makes it easier to test that I did not break anything.
The optimization replaces a `load [copy]` with a `load_borrow` if possible.
```
%1 = load [copy] %0
// no writes to %0
destroy_value %1
```
->
```
%1 = load_borrow %0
// no writes to %0
end_borrow %1
```
The new implementation uses alias-analysis (instead of a simple def-use walk), which is much more powerful.
rdar://115315849
In Embedded Swift, witness method lookup is done from specialized witness tables.
For this to work, the type of witness_method must be specialized as well.
Otherwise the method call would be done with wrong parameter conventions (indirect instead of direct).