Instead make `findJointPostDominatingSet` a stand-alone function.
There is no need to keep the temporary SmallVector alive across multiple calls of findJointPostDominatingSet for the purpose of re-using malloc'ed memory. The worklist usually contains way less elements than its small size.
The key thing is that all of these while they do modify the branches of the CFG
do not invalidate block level CFG analyses like dominance and dead end
blocks.
All of the non-SILCombiner specific helpers have already been updated for OSSA,
so this was not too bad.
NOTE: I also added two small combines that delete copy_value, destroy_value with
.none arguments. The reason why I added this is that this is a pretty small
addition and many of the tests of this code rely on SILCombine being able to
eliminate such operations on thin_to_thick_function.
NOTE: I also disabled TypePropagation in OSSA, we are going to redo that code
when we bring up opaque values.
In OSSA, we enforce that addresses from interior pointer instructions are scoped
within a borrow scope. This means that it is invalid to use such an address
outside of its parent borrow scope and as a result one can not just RAUW an
address value by a dominating address value since the latter may be invalid at
the former. I foresee that I am going to have to solve this problem and so I
decided to write this API to handle the vast majority of cases.
The way this API works is that it:
1. Computes an access path with base for the new value. If we do not have a base
value and a valid access path with root, we bail.
2. Then we check if our base value is the result of an interior pointer
instruction. If it isn't, we are immediately done and can RAUW without further
delay.
3. If we do have an interior pointer instruction, we see if the immediate
guaranteed value we projected from has a single borrow introducer value. If not,
we bail. I think this is reasonable since with time, all guaranteed values will
always only have a single borrow introducing value (once struct, tuple,
destructure_struct, destructure_tuple become reborrows).
4. Then we gather up all inner uses of our access path. If for some reason that
fails, we bail.
5. Then we see if all of those uses are within our borrow scope. If so, we can
RAUW without any further worry.
6. Otherwise, we perform a copy+borrow of our interior pointer's operand value
at the interior pointer, create a copy of the interior pointer instruction upon
this new borrow and then RAUW oldValue with that instead. By construction all
uses of oldValue will be within this new interior pointer scope.
b644c80f90fb7099ec956bb44065b50e432c5146 caused all owned forwarding
instructions to be sunk to their uses in the exact cases where we could
eliminate the parent forwarding inst (namely that the value we want to fold has
no non-debug, non-consuming users). So I was able to implement this just by
implementing a single basic block algorithm that works via a planner struct
using said canonicalization. One initializes the planner struct with the
instruction that is going to either be eliminated or have its forwarding operand
set. Then one adds each of the individual chains that lead to the use that we
wish to fold, each time checking that we can eliminate the instruction.
Once the user has added all of the intermediate forwarding instructions, by
construction (see paragraph above), we know we can optimize. So we eliminate all
intermediate values and then depending on whether the user called the set value
or replace value method we either set front's operand to be the passed in value
or we RAUW/erase front with that value. It is important to note that before we
do one of those two operations, front's operand is undef, so we need to perform
one of these two operations.
There are a bunch of optimizations in SILCombine where we try to fold an
ownership forwarding instruction A into another ownership forwarding instruction
B without deleting A. Consider the upcasts in the example below:
```
%0 = upcast %x : $X->Y
%1 = upcast %0 : $Y->Z
```
These sorts of optimizations fold the first instruction into the second like so:
```
%0 = upcast %x : $X->Y
%1 = upcast %x : $X->Z
```
This creates a problem when we are dealing with owned values since we have just
introduced two consumes for %x. To work around this, we have two options:
1. Introduce extra copies.
2. We recognize the situations where we can guarantee that we can delete the
first upcast.
The first choice I believe is not a choice since breaking a forwarding chain of
ownership in favor of extra copies is a less canonical form. That leaves us with
the second form. What are the necessary/sufficient conditions for deleting the
first upcast. Simply it is that the upcast cannot have any non-debug,
non-consuming uses! In such a case, we know that along all paths through the
program the value has exactly one non-debug use, one of its consuming uses. If
when optimizing upcasts we could recognize that pattern, duplicate the inst
along paths not through our 2nd upcast and thus delete the original upcast
fixing the ownership error!
While this is all nice and good there is a problem with this: it doesn't
scale. As I was writing a few optimizations like this I began to note that I had
to write different versions of this same helper for many of the visitors (they
generally varied by how many forwarding instructions they looked through).
As I pondered the above, I chatted a bit with @atrick and during our
conversation, we both realized that it is much easier to solve this problem in
one block and that the condition above would allow us to sink these instructiosn
into the same block and thus if we could check for this condition and
canonicialize the IR to sink these instructions before we visiting, we could use
a single helper to handle all of these cases.
This allows for me to do a couple of things improving quality/correctness/ease of use:
1. I reimplemented InstMod's RAUW and RAUW/erase helpers on top of
setUseValue/deleteInst. Beyond allowing the caller to specify less things, we
gain an orthogonality preventing bugs like overriding erase/RAUW but not
overriding erase or having the erase used in erase/RAUW act differently than
the erase for deleteInst.
2. There were a bunch of places using InstModCallback that also were setting
uses without having the ability for InstModCallbacks perform it (since it
only supported RAUW). This is an anti-pattern and could cause subtle bugs to
be introduced by appropriate state in the caller not being updated.
NOTE: The stdlib count/capacity propagation code is tested in an end<->end
fashion in a separate Swift test. Once I flip the switch, that test will run.
The code is pretty simple, so I feel relatively confident with it.
These are always safe in OSSA since what we are doing here is hoisting the
ref_to_raw_pointer up the def-use chain without deleting any instructions unless
we know that they do not have any uses (in a strict sense so destroy_value is
considered a use). E.x.:
```
%0 = ...
%1 = unchecked_ref_cast %0
%2 = ref_to_raw_pointer %1
```
->
```
%0 = ...
%1 = unchecked_ref_cast %0
%2 = ref_to_raw_pointer %0
```
Notice, how we are actually not changing %1 at all. Instead we are just moving
an instantaneous use earlier. One thing that is important to realize is that
this /does/ cause us to need to put the ref_to_raw_pointer at the insert
location of %0 since %0's lifetime ends at the unchecked_ref_cast if the value
is owned.
NOTE: I also identified the tests from sil_combine.sil that had to do with these
simplifications and extracted them into sil_combine_casts.sil and did the
ossa/non-ossa tests side by side. I am trying to fix up the SILCombine tests as
I update stuff, so if I find opportunities to move tests into a more descriptive
sub-file, I am going to do so.
As an aside, to make it easier to transition SILCombine away from using a
central builder, I added a withBuilder method that creates a new SILBuilder at a
requested insertPt and uses the same context as the main builder of
SILCombine. It also through the usage of auto makes really concise pieces of
code. Today to do this just using builder, we would do:
```
SILBuilderWithScope builder(insertPt, Builder);
builder.createInst1(insertPt->getLoc(), ...);
builder.createInst2(insertPt->getLoc(), ...);
builder.createInst3(insertPt->getLoc(), ...);
auto *finalValue = builder.createInst4(insertPt->getLoc(), ...);
```
Thats a lot of typing and wastes a really commonly used temp name (builder) in
the local scope! Instead, using this API, one can write:
auto *finalValue = withBuilder(insertPt, [&](auto &b, auto l) {
b.createInst1(l, ...);
b.createInst2(l, ...);
b.createInst3(l, ...);
return b.createInst4(l, ...);
});
There is significantly less to type and auto handles the types for us. The
withBuilder construct is just syntactic since we always inline it.
This is a generic API that when ownership is enabled allows one to replace all
uses of a value with a value with a differing ownership by transforming/lifetime
extending as appropriate.
This API supports all pairings of ownership /except/ replacing a value with
OwnershipKind::None with a value without OwnershipKind::None. This is a more
complex optimization that we do not support today. As a result, we include on
our state struct a helper routine that callers can use to know if the two values
that they want to process can be handled by the algorithm.
My moticiation is to use this to to update InstSimplify and SILCombiner in a
less bug prone way rather than just turn stuff off.
Noting that this transformation inserts ownership instructions, I have made sure
to test this API in two ways:
1. With Mandatory Combiner alone (to make sure it works period).
2. With Mandatory Combiner + Semantic ARC Opts to make sure that we can
eliminate the extra ownership instructions it inserts.
As one can see from the tests, the optimizer today is able to handle all of
these transforms except one conditional case where I need to eliminate a dead
phi arg. I have a separate branch that hits that today but I have exposed unsafe
behavior in ClosureLifetimeFixup that I need to fix first before I can land
that. I don't want that to stop this PR since I think the current low level ARC
optimizer may be able to help me here since this is a simple transform it does
all of the time.
This works around an issue where using an apply with an unsubstituted
substitution map causes issues in downstream optimizations.
```
%9 = alloc_stack $@opened("60E354F4-17B9-11EB-9427-ACDE48001122") NonClassProto
copy_addr %8 to [initialization] %9 : $*@opened("60E354F4-17B9-11EB-9427-ACDE48001122") NonClassProto
%11 = witness_method $ConformerClass, #NonClassProto.myVariable!getter : <Self where Self : NonClassProto> (Self) -> () -> SomeValue :
$@convention(witness_method: NonClassProto) <τ_0_0 where τ_0_0 : NonClassProto> (@in_guaranteed τ_0_0) -> SomeValue
apply %11<@opened("60E354F4-17B9-11EB-9427-ACDE48001122") NonClassProto>(%9) : $@convention(witness_method: NonClassProto) <τ_0_0 where τ_0_0 : NonClassProto> (@in_guaranteed τ_0_0) -> SomeValue
```
The problem arise when the devirtualizer replace
`witness_method $ConformerClass, #NonClassProto.myVariable!getter` with the
underlying implementation. That implementation for better or worse is further
constrained to `Self : ConformerClass` and applying an opened existential
which is not class constraint is a recipe for disaster. The proper
solution would probably be for the devirtualizer to insert the cast if necessary
and update the substitution list.
That fix will be left for another day though.
rdar://70582785
Optimize the unconditional_checked_cast_addr in this pattern:
%box = alloc_existential_box $Error, $ConcreteError
%a = project_existential_box $ConcreteError in %b : $Error
store %value to %a : $*ConcreteError
%err = alloc_stack $Error
store %box to %err : $*Error
%dest = alloc_stack $ConcreteError
unconditional_checked_cast_addr Error in %err : $*Error to ConcreteError in %dest : $*ConcreteError
to:
...
retain_value %value : $ConcreteError
destroy_addr %err : $*Error
store %value to %dest $*ConcreteError
This lets the alloc_existential_box become dead and it can be removed in following optimizations.
The same optimization is also done for conditional_checked_cast_addr.
There is also an implication for debugging:
Each "throw" in the code calls the runtime function swift_willThrow. The function is used by the debugger to set a breakpoint and also add hooks.
This optimization can completely eliminate a "throw", including the runtime call.
So, with optimized code, the user might not see the program to break at a throw, whereas in the source code it is actually throwing.
On the other hand, eliminating the existential box is a significant performance win and we don't guarantee any debugging behavior for optimized code anyway. So I think this is a reasonable trade-off.
I added an option "-Xllvm -keep-will-throw-call" to keep the runtime call which can be used if someone want's to reliably break on "throw" in optimized builds.
rdar://problem/66055678
This reinstates commit d7d829c059 with a fix for C tail-allocated arrays.
Replace a call of the getter of AnyKeyPath._storedInlineOffset with a "constant" offset, in case of a keypath literal.
"Constant" offset means a series of struct_element_addr and tuple_element_addr instructions with a 0-pointer as base address.
These instructions can then be lowered to "real" constants in IRGen for concrete types, or to metatype offset lookups for generic or resilient types.
Replace:
%kp = keypath ...
%offset = apply %_storedInlineOffset_method(%kp)
with:
%zero = integer_literal $Builtin.Word, 0
%null_ptr = unchecked_trivial_bit_cast %zero to $Builtin.RawPointer
%null_addr = pointer_to_address %null_ptr
%projected_addr = struct_element_addr %null_addr
... // other address projections
%offset_ptr = address_to_pointer %projected_addr
%offset_builtin_int = unchecked_trivial_bit_cast %offset_ptr
%offset_int = struct $Int (%offset_builtin_int)
%offset = enum $Optional<Int>, #Optional.some!enumelt, %offset_int
rdar://problem/53309403
Replace a call of the getter of AnyKeyPath._storedInlineOffset with a "constant" offset, in case of a keypath literal.
"Constant" offset means a series of struct_element_addr and tuple_element_addr instructions with a 0-pointer as base address.
These instructions can then be lowered to "real" constants in IRGen for concrete types, or to metatype offset lookups for generic or resilient types.
Replace:
%kp = keypath ...
%offset = apply %_storedInlineOffset_method(%kp)
with:
%zero = integer_literal $Builtin.Word, 0
%null_ptr = unchecked_trivial_bit_cast %zero to $Builtin.RawPointer
%null_addr = pointer_to_address %null_ptr
%projected_addr = struct_element_addr %null_addr
... // other address projections
%offset_ptr = address_to_pointer %projected_addr
%offset_builtin_int = unchecked_trivial_bit_cast %offset_ptr
%offset_int = struct $Int (%offset_builtin_int)
%offset = enum $Optional<Int>, #Optional.some!enumelt, %offset_int
rdar://problem/53309403
Replaces an alloc_stack of an enum by an alloc_stack of the payload if only one enum case (with payload) is stored to that location.
For example:
%loc = alloc_stack $Optional<T>
%payload = init_enum_data_addr %loc
store %value to %payload
...
%take_addr = unchecked_take_enum_data_addr %loc
%l = load %take_addr
is transformed to
%loc = alloc_stack $T
store %value to %loc
...
%l = load %loc
https://bugs.swift.org/browse/SR-12710
I am going to use this in mandatory combine, and it seems like a generally
useful transformation.
I also updated the routine to construct its own SILBuilder that injects a user
passed in SILBuilderContext eliminating the bad pattern of passing in
SILBuilders.
This should be an NFC change.
Changes:
* Allow optimizing partial_apply capturing opened existential: we didn't do this originally because it was complicated to insert the required alloc/dealloc_stack instructions at the right places. Now we have the StackNesting utility, which makes this easier.
* Support indirect-in parameters. Not super important, but why not? It's also easy to do with the StackNesting utility.
* Share code between dead closure elimination and the apply(partial_apply) optimization. It's a bit of refactoring and allowed to eliminate some code which is not used anymore.
* Fix an ownership problem: We inserted copies of partial_apply arguments _after_ the partial_apply (which consumes the arguments).
* When replacing an apply(partial_apply) -> apply and the partial_apply becomes dead, avoid inserting copies of the arguments twice.
These changes don't have any immediate effect on our current benchmarks, but will allow eliminating curry thunks for existentials.
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
In the previous commit, various methods for adding, replacing, and
removing instructions were duplicate from SILCombiner into
SILInstructionWorklist. Here, SILCombiner is modified to call through
to the methods which were added to SILInstructionWorklist.
- Replaced usage of raw map and vector with the type that wraps the
combination (BlotSetVector); that provided a significant deduplication
since a sizeable portion of the worklist's implementation was in
vector and map management now provided by the BlotSetVector.
- Templated over the type of map and vector used by the blot set vector.
- Added SmallSILInstructionWorklist where the map and vector are
specified to be SmallDenseMap and SmallVector respectively.
- Replaced usages of bare ValueBase with usages of SILValue.
- Renamed zap to resetChecked.
Added a bit of functionality to BlotSetVector, specifically to support
SILInstructionWorklist:
- Made insert return not just the index that the potentially-inserted
item is at on return but additionally whether an insertion occurred,
matching the behavior of llvm::DenseMap::insert.
- Added a method to reserve capacity in the backing vector and map:
BlotSetVector::reserve.
- Added a method to free extra storage used by the backing vector and
map: BlotSetVector::clear.
- Modified SmallBlotSetVector's template parameters so that only a value
and size can be specified--that type will always use SmallVector and
SmallDenseMap for its superclass' VectorT and MapT template
parameters.
- Updated variable names.
In the previous commit, SILInstructionWorklist was added as a verbatim
extraction (modulo some minor style tweaks) of SILCombineWorklist.
Here, SILCombine is moved over to using that renamed type.
Returns `true` if `T.Type` is known to refer to a concrete type. The
implementation allows for the optimizer to specialize this at -O and
eliminate conditional code.
Includes `Swift._isConcrete<T>(T.Type) -> Bool` wrapper function.
If the keypath argument of a keypath access function is a keypath literal instruction, generate the projection inline and remove the access function.
For example, replaces (simplified SIL):
%kp = keypath ... stored_property #Foo.bar
apply %keypath_runtime_function(%root_object, %kp, %addr)
with:
%addr = struct_element_addr %root_object, #Foo.bar
load/store %addr
Currently this only handles stored property patterns.
rdar://problem/36244734
This is the complement to load canonicalization. Although store
canonicalization is not required before diagnostics, it should be
defined in the same utility.
NOTE: I changed all places that the CastOptimizer is created to just pass in
nullptr for now so this is NFC.
----
Right now the interface of the CastOptimizer is muddled and confused. Sometimes
it is returning a value that should be used by the caller, other times it is
returning an instruction that is meant to be reprocessed by the caller.
This series of patches is attempting to clean this up by switching to the
following model:
1. If we are optimizing a cast of a value, we return a SILValue. If the cast
fails, we return an empty SILValue().
2. If we are optimizing a cast of an address, we return a boolean value to show
success/failure and require the user to use the SILBuilderContext to get the
cast if they need to.
Generalizes the ConcreteExistentialInfo abstraction so it can be used
both by the ExistentialSpecializer and SILCombine, allowing redundant
code in ExistentialSpecializer.cpp to be deleted.
Splits OpenedArchetypeInfo from ConcreteExistentialInfo. Adds a
ConcreteOpenedArchetypeInfo convenience wrapper around them both, for
use wherever we were originally using ConcreteExistentialInfo.
Splits getAddressOfStackInit into getStackInitInst, This is cleaner and
allows both the ExistentialSpecializer and SILCombine to handle more
interesting cases in the future, like unconditional_checked_cast.
Creates utilities, initializeSubstitutionMap, and
initializeConcreteTypeDef to simplify an generalize
ConcreteExistentialInfo.
While rewriting ExistentialSpecializer to use the new
abstraction, I fixed a latent bug in which is was using a SIL
argument index as a function type parameter index (this would
have broken up if/when we decide to enable calls with indirect
results).
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done