Although I don't plan to bring over new assertions wholesale
into the current qualification branch, it's entirely possible
that various minor changes in main will use the new assertions;
having this basic support in the release branch will simplify that.
(This is why I'm adding the includes as a separate pass from
rewriting the individual assertions)
Compute, update and handle borrowed-from instruction in various utilities and passes.
Also, used borrowed-from to simplify `gatherBorrowIntroducers` and `gatherEnclosingValues`.
Replace those utilities by `Value.getBorrowIntroducers` and `Value.getEnclosingValues`, which return a lazily computed Sequence of borrowed/enclosing values.
This is a tiny routine that is useful for writing assertions to be
used for quick sanity checks without computing Dominance.
It doesn't always make sense to require a pass to maintain and pass
around Dominance just for an assert.
Instead make `findJointPostDominatingSet` a stand-alone function.
There is no need to keep the temporary SmallVector alive across multiple calls of findJointPostDominatingSet for the purpose of re-using malloc'ed memory. The worklist usually contains way less elements than its small size.
1. dead-end blocks (= blocks which eventually end up in an unreachable):
We cannot just ignore all dead-end blocks. This causes crashes for some corner cases (see the "infinite_loop_and_unreachable" test case). Instead just handle the common case of a simple single dead-end block - like we do in DestroyHoisting.
For other (more complex) dead-end control flows, the analysis is not incorrect. In worst case we end up inserting a not-needed destroy instruction.
2. sortUnique
I restructured the code a bit so that sortUnique is not needed anymore. sortUnique on pointer arrays can result in non-deterministic behavior.
3. lower_bound
Also, using lower_bound on a vector is not good in this function, because it can result in quadratic behavior. Though, in practice, there are only very few elements in the vector. So it's more a theoretical thing.
The restructuring made the code a bit simpler, e.g. beside the worklist, no other vectors are needed anymore.
Importantly this also lets us use the analysis framework to validate that we do
properly invalidate DeadEndBlocks, preventing bugs.
I did not thread this all over the compiler. Instead I just used it for now in
SemanticARCOpts just to add some coverage without threading it into too many
places.
While computing blocksThatLeakIfNeverVisited by processing the worklist,
we should also add all the successors of a predecessor block that was in
the dominatedBlockSet initially. If not we will end up with incorrect
result for a control flow like this :
dominating block : bb1
dominated block : bb2
+----+
| bb1|
+----+
+------------>|
| v
| +-+--+
| | bb2| -----+
| +-+--+ |
| | |
| v v
| +-+--+ +-+--+
| |bb3 | | bb4|
| +--+-+ +----+
| |
| |
+--------------+
Without the fix blocksThatLeakIfNeverVisited will not have bb4.
The sil test for this fix is rle_redundantload_does_not_postdominate2
in redundant_load_elim_ossa_complex.sil
This is a generic API that when ownership is enabled allows one to replace all
uses of a value with a value with a differing ownership by transforming/lifetime
extending as appropriate.
This API supports all pairings of ownership /except/ replacing a value with
OwnershipKind::None with a value without OwnershipKind::None. This is a more
complex optimization that we do not support today. As a result, we include on
our state struct a helper routine that callers can use to know if the two values
that they want to process can be handled by the algorithm.
My moticiation is to use this to to update InstSimplify and SILCombiner in a
less bug prone way rather than just turn stuff off.
Noting that this transformation inserts ownership instructions, I have made sure
to test this API in two ways:
1. With Mandatory Combiner alone (to make sure it works period).
2. With Mandatory Combiner + Semantic ARC Opts to make sure that we can
eliminate the extra ownership instructions it inserts.
As one can see from the tests, the optimizer today is able to handle all of
these transforms except one conditional case where I need to eliminate a dead
phi arg. I have a separate branch that hits that today but I have exposed unsafe
behavior in ClosureLifetimeFixup that I need to fix first before I can land
that. I don't want that to stop this PR since I think the current low level ARC
optimizer may be able to help me here since this is a simple transform it does
all of the time.
Often times when one is working with ownership one has a value V and a set of
use points Uses where you want V's lifetime to end at, but those Uses together
(while not reachable from each other) only partially post-dominate
V. JointPostDominanceSetComputer is a struct that implements a general solution
to that operation at the block level. The struct itself is just a set of state
that the computation uses so that a pass can clear the state (allowing for us to
avoid needing to remalloc if we had any small data structures that went big).
To get into the semantics, the API
JointPostDominanceSetComputer::findJointPostDominatingSet() takes in a
"dominating block" and a "dominated block set" and returns two things to the user:
1. A set of blocks that together with the "dominated block set"
jointly-postdominate the "dominating block".
2. A list of blocks in the "dominated block set" that were reachable from any of
the other "dominated blocks", including itself in the case of a block in aloop.
Conceptually we are performing a backwards walk up the CFG towards the
"dominating block" starting at each block in the "dominated block set". As we
go, we track successor blocks and report any successor blocks that we do not hit
during our traversal as result blocks and are passed to the result callback.
Now what does this actually mean:
1. All blocks in the "dominated blockset" that are at the same loop nest level
as our dominating block will always be part of the final post-dominating block
set.
2. All "lifetime ending" blocks that are at a different loop nest level than our
dominating block are not going to be in our final result set. Let
LifetimeEndingBlock be such a block. Then note that our assumed condition
implies that there must be a sub-loop, SubLoop, at the same level of the
loop-nest as the dominating block that contains LifetimeEndingBlock. The
algorithm will yield to the caller the exiting blocks of that loop. It will also
flag the blocks that were found to be a different use level, so the caller can
introduce a copy at that point if needed.
NOTE: Part of the reason why I am writing this rather than using the linear
lifetime checker (LLChecker) is that LLChecker is being used in too many places
because it is convenient. Its original true use was for emitting diagnostics and
that can be seen through the implementation. I don't want to add more
contortions to that code, so as I am finding new use cases where I could either
write something new or add contortions to the LLChecker, I am doing the former.
This makes it easier to understand conceptually why a ValueOwnershipKind with
Any ownership is invalid and also allowed me to explicitly document the lattice
that relates ownership constraints/value ownership kinds.
I have a need to have SwitchEnum{,Addr}Inst have different base classes
(TermInst, OwnershipForwardingTermInst). To do this I need to add a template to
SwitchEnumInstBase so I can switch that BaseTy. Sadly since we are using
SwitchEnumInstBase as an ADT type as well as an actual base type for
Instructions, this is impossible to do without introducing a template in a ton
of places.
Rather than doing that, I changed the code that was using SwitchEnumInstBase as
an ADT to instead use a proper ADT SwitchEnumBranch. I am happy to change the
name as possible see fit (maybe SwitchEnumTerm?).
`get_async_continuation[_addr]` begins a suspend operation by accessing the continuation value that can resume
the task, which can then be used in a callback or event handler before executing `await_async_continuation` to
suspend the task.
Specifically, I split it into 3 initial categories: IR, Utils, Verifier. I just
did this quickly, we can always split it more later if we want.
I followed the model that we use in SILOptimizer: ./lib/SIL/CMakeLists.txt vends
a macro (sil_register_sources) to the sub-folders that register the sources of
the subdirectory with a global state variable that ./lib/SIL/CMakeLists.txt
defines. Then after including those subdirs, the parent cmake declares the SIL
library. So the output is the same, but we have the flexibility of having
subdirectories to categorize source files.