Previously, if it was determined that a proactive phi was unnecessary,
it was removed, along with the phis for the lifetime and the original
value of which the proactive phi was a copy. The uses of only one of
the three phis (namely, the proactive phi) was RAUW undef. In the case
where the only usage of the phi was to branch back to the block that
took the phi as an argument, that was a problem. Here, that is fixed by
giving all three phis the same treatment. To avoid duplicating code,
that treatment is pulled out into a new lambda.
This was exposed by adding lifetime versions of some OSSA versions of
mem2reg tests that had been missed previously.
Previously, the flag was a LangOptioins. That didn't make much sense because
this isn't really a user-facing behavior. More importantly, as a member
of that type type it couldn't be accessed when setting up pass
pipelines. Here, the flag is moved to SILOptions.
Previously, Mem2Reg would delete write-only alloc_stacks. That is
incorrect for lexical alloc_stacks--if a var was assigned but never
read, we still want to keep it alive for the duration of that var's
lexical scope.
Previously, the lexical borrow scopes that were introduced to replace
lexical stack allocations were tied to the uses of the stored value.
That meant that the borrow scope could be both too long and also too
short. It could be too long if there were uses of the value after the
dealloc stack and it would be too short if there were not uses of the
value while the value was still stored into the stack allocation.
Here, instead, the lexical borrow scopes are tied to the storage of a
value into the stack allocation. A value's lexical borrow scope begins
when the storage is initialied with the value; its lexical borrow scope
ends when the storage is deinitialized. That corresponds to the range
during which a var in the Swift source has a particular value assigned
to it.
Mem2Reg's implementation is split into a few steps:
(1) visiting the instructions in a basic block which contains all uses
of an alloc_stack
(2.a) visiting the instructions in each basic block which contains a use
of the alloc_stack
(2.b) adding phis
(2.c) using the last stored values as arguments to the new outgoing phi
arguments
(2.c) replacing initial uses of the storage with the new incoming phi
arguments
And here, (1) amounts to a special case of (2.a).
During (1) and (2.a):
(a) lexical borrow scopes are begun after store instructions for the
values that were stored
(b) when possible, lexical borrow scopes are ended before instructions
that deinitialize memory
- destroy_addr
- store [assign]
- load [take]
For (1), that is enough to create valid borrow scopes.
For (2), there are two complications:
(a) Borrow scopes that are begun may not be ended (when visiting a
single block's instructions).
For example, when visiting
bb1:
store %instance to [init] %addr
br bb2
a borrow scope is started after the store but cannot be ended.
(b) There may not be enough information available to end borrow scopes
when visiting instructions that would typically introduce borrow
scopes.
For example, when visiting
bb2:
%copy = load [copy] %addr
%instance = load [take] %addr
br bb3
there is not enough information available to end the borrow scope
that should be ended before the load [take]
To resolve these issues, both sorts of instructions are tracked. For
(a), in StackAllocationPromoter::initializationPoints. For (b), in
StackAllocationPromoter::deinitializationPoints. Finally, a new
step is added:
(2.d) StackAllocationPromoter::endLexicalLifetimes does a forward CFG
walk starting from the out-edge of each of the blocks which began
but did not end lexical lifetimes. At an out-edge, we do a check
regarding unreachables is done, and we may end the borrow scope.
Otherwise, the walk continues to the in-edge of each successor.
At an in-edge, we look for an instruction from (b) (in
unprecedentedDeinitializations) above. If one is found, then we
end the borrow scope before that instruction. Otherwise, the walk
continues to the out-edge of the block.
The ubuntu 18.04 Linux builder fails to build when the SmallVector does
not include the number of elements. This is a known issue and is
incorrect, but that is the version of clang on that OS.
The definition of SmallVector is already available transitively since
there are values of that type, I've just made it explicit.
llvm::SmallVector has a default parameter for the number of elements
while swift::SmallVector doesn't. I've gone ahead and made it explicitly
use the llvm::SmallVector to receive that.
SILGen turns vars into alloc_boxes. When possible, AllocBoxToStack
turns those into alloc_stacks. In order to preserve the lexical
lifetime of those vars, the alloc_stacks are annotated with the
[lexical] attribute. When Mem2Reg runs, it promotes the loads from
and stores into those alloc_stacks to uses of registers. In order to
preserve the lexical lifetime during that transformation, lexical borrow
scopes must be introduces that encode the same lifetime as the
alloc_stacks did. Here, that is done.
This patch replace all in-memory objects of DebugValueAddrInst with
DebugValueInst + op_deref, and duplicates logics that handles
DebugValueAddrInst with the latter. All related check in the tests
have been updated as well.
Note that this patch neither remove the DebugValueAddrInst class nor
remove `debug_value_addr` syntax in the test inputs.
With such alloc_stack we can pass over-consumed values to a
proactively added phi. Even though such a SIL does not have
issues at runtime, they raise an ownership verification error.
Mem2Reg creates phis proactively that may be unnecessary.
Unnecessary phis are those without uses or operands to other
proactive phis that are unnecessary.
Even though this does not translate to a real issue at runtime, we can
see ownership verification errors. This PR identifies such phis and deletes them.
Track in-use iterators and update them both when instructions are
deleted and when they are added.
Safe iteration in the presence of arbitrary changes now looks like
this:
for (SILInstruction *inst : deleter.updatingRange(&bb)) {
modify(inst);
}
Fix innumerable latent bugs with iterator invalidation and callback invocation.
Removes dead code earlier and chips away at all the redundant copies the compiler generates.
We previously were computing this eagerly meaning that if we did not actually
need the DomTreeLevel map, we would calculate it and additionally incur a
relatively large malloc for the DenseMap (which stores by default IIRC 64 key
value pairs).
Now, we do it lazily ensuring that we only compute this if we actually need it
(when promoting non-single block stack allocations).
We have for a long time talked about creating a scope like data structure for
use in the SILOptimizer. The discussion was whether or not to reuse the
infrastructure in SILGen that does this already. There were concerns about doing
so since the code in the SILOptimizer and SILGen can work differently.
With that in mind, I added a small AssertingScope class and built on top of that
a composition SIL level class called SILOptScope that one can use to add various
cleanups. One is able to both destructively pop at end of scope and pop along
early exits.
At an implementation level, I kept it simple and:
1. Represented a scope as a stack of Optional<Cleanup> which are just a wrapper
around a std::function. The Optional is so that we can invalidate a cleanup.
2. Based all of these scopes around the idea that the user of the scope must
invalidate the scope by hand. If not, the scope object will assert at the end
of its RAII scope.
3. Rather than creating a whole class hierarchy, I just used std::function
closures to keep things simple.
... with a fix for a non-assert build crash: I used the wrong ilist type for SlabList. This does not explain the crash, though. What I think happened here is that llvm miscompiled and put the llvm_unreachable from the Slab's deleteNode function unconditionally into the SILModule destructor.
Now by using simple_ilist, there is no need for a deleteNode at all.
This is a pretty old style of code that we are trying to move away from.
Specifically, we use to always store a global SILBuilder and manually manipulate
its insertion point/other state when we moved places. This was very bugprone so
we instead now create new SILBuilders when we want to create instructions at a
new insertion point and pass down a SILBuilderContext for global state (like the
list of inserted instructions/etc) to each SILBuilder.
I went through and updated all of the code to now follow this pattern.
This code organization already existed in the file in content, but all of the
routines were mixed together. This splits the code by topic providing to the eye
easy code locality when reading.
This is code that was written at the very beginning of Swift and has some early
style remaining interspirsed with newer swift style code. Since I am going to be
touching this and I don't expect other people to touch it for a while, I am
fixing up the formatting/reorganizing rather than leaving the style inconsistent.
This PR simplifies the handling of store [assign] in Mem2Reg by always
converting it to a store [init].
Also fixes an edge case where store [assign] in a non entry block was
not the first store of an alloc_stack with uses in multiple blocks.
This removes the ambiguity when casting from a SingleValueInstruction to SILNode, which makes the code simpler. E.g. the "isRepresentativeSILNode" logic is not needed anymore.
Also, it reduces the size of the most used instruction class - SingleValueInstruction - by one pointer.
Conceptually, SILInstruction is still a SILNode. But implementation-wise SILNode is not a base class of SILInstruction anymore.
Only the two sub-classes of SILInstruction - SingleValueInstruction and NonSingleValueInstruction - inherit from SILNode. SingleValueInstruction's SILNode is embedded into a ValueBase and its relative offset in the class is the same as in NonSingleValueInstruction (see SILNodeOffsetChecker).
This makes it possible to cast from a SILInstruction to a SILNode without knowing which SILInstruction sub-class it is.
Casting to SILNode cannot be done implicitly, but only with an LLVM `cast` or with SILInstruction::asSILNode(). But this is a rare case anyway.
This makes it easier to understand conceptually why a ValueOwnershipKind with
Any ownership is invalid and also allowed me to explicitly document the lattice
that relates ownership constraints/value ownership kinds.
SILMem2Reg has roughly 2 central algorithms, removal of alloc_stack with uses in a
single block vs multiple blocks. While replacing loads and stores to the
stack location in each of the 2 algorithms, new handling of qualifiers
like [assign], [copy] and [take] which are new to OSSA are needed.
Also Disable SILMem2Reg when we see this pattern:
load [take] (struct_element_addr/tuple_element_addr %ASI)
Convert SILMem2Reg tests into ossa
And add new SILMem2Reg tests for non-trivial values.
Thanks to zoecarver for additional tests.
A dealloc_stack ends the lifetime of an alloc_stack on a path. We don't
have to pass RunningVal beyond the dealloc_stack as phi argument to the
post dominating block.
An empty tuple doesn't need to be assigned before it is passed to destroy_addr.
I found this by experimenting with the pass pipeline. Not sure if this crash can happen with the current pipeline.
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
While tightening the requirements of the debug info generator in
IRGenSIL I noticed that SILCloner didn't correctly transfer variable
debug info on alloc_box and alloc_stack instructions. In order to make
these mistakes easier to find I added an assertion to SILBuilder and
fixed all issues uncovered by that assertion, too.
The result is a moderate increase in debug info coverage in optimized code.
On stdlib/public/core/OSX/x86_64/Swift.o "variables with location"
increases from 60134 to 60299.
This is a large patch; I couldn't split it up further while still
keeping things working. There are four things being changed at
once here:
- Places that call SILType::isAddressOnly()/isLoadable() now call
the SILFunction overload and not the SILModule one.
- SILFunction's overloads of getTypeLowering() and getLoweredType()
now pass the function's resilience expansion down, instead of
hardcoding ResilienceExpansion::Minimal.
- Various other places with '// FIXME: Expansion' now use a better
resilience expansion.
- A few tests were updated to reflect SILGen's improved code
generation, and some new tests are added to cover more code paths
that previously were uncovered and only manifested themselves as
standard library build failures while I was working on this change.
The ownership kind is Any for trivial types, or Owned otherwise, but
whether a type is trivial or not will soon depend on the resilience
expansion.
This means that a SILModule now uniques two SILUndefs per type instead
of one, and serialization uses two distinct sentinel IDs for this
purpose as well.
For now, the resilience expansion is not actually used here, so this
change is NFC, other than changing the module format.