Fix innumerable latent bugs with iterator invalidation and callback invocation.
Removes dead code earlier and chips away at all the redundant copies the compiler generates.
I recently have been running into the issue that many of these APIs perform the
deletion operation themselves and notify the caller it is going to delete
instead of allowing the caller to specify how the instruction is deleted. This
causes interesting semantic issues (see the loop in deleteInstruction I
simplified) and breaks composition since many parts of the optimizer use
InstModCallbacks for this purpose.
To fix this, I added a notify will be deleted construct to InstModCallback. In a
similar way to the rest of it, if the notify is not set, we do not call any code
implying that we should have good predictable performance in loops since we will
always skip the function call.
I also changed InstModCallback::deleteInst() to notify before deleting so we
have a default safe behavior. All previous use sites of this API do not care
about being notified and the only new use sites of this API are in
InstructionDeleter that perform special notification behavior (it notifies for
certain sets of instructions it is going to delete before it deletes any of
them). To work around this, I added a bool to deleteInst to control this
behavior and defaulted to notifying. This should ensure that all other use sites
still compose correctly.
This makes it easier to understand conceptually why a ValueOwnershipKind with
Any ownership is invalid and also allowed me to explicitly document the lattice
that relates ownership constraints/value ownership kinds.
MSVC does not realize that the switch is exhaustive and requires that
the path is explicitly marked as unreachable. This silences the C4715
warning ("not all control paths return a value").
and eliminate dead code. This is meant to be a replacement for the utility:
recursivelyDeleteTriviallyDeadInstructions. The new utility performs more aggresive
dead-code elimination for ownership SIL.
This patch also migrates most non-force-delete uses of
recursivelyDeleteTriviallyDeadInstructions to the new utility.
and migrates one force-delete use of recursivelyDeleteTriviallyDeadInstructions
(in IRGenPrepare) to use the new utility.
In certain cases, in OSSA, non-trivial values with .none ownership can be merged
into .owned aggregates such that when we extract the value back out from the
aggregate we lost the information that the original value was not .owned. As an
example of this consider the following SIL:
bb0(%0 : @owned Builtin.NativeObject):
%1 = enum $Optional<Builtin.NativeObject>, #Optional.none!enumelt
%2 = tuple(%0 : $Builtin.NativeObject, %1 : $Optional<Builtin.NativeObject>)
(%3, %4) = destructure_tuple %2 : $(Builtin.NativeObject, Optional<Builtin.NativeObject>)
In this case, %4 has .owned ownership, while %1 has .none ownership. This is
because we have lost the refined information that we originally had a .none
value as input to the tuple we are destructuring. Due to this when we RAUW, we
would need to insert a destroy on %4 to make sure that we maintain ossa
invariants. This is safe since the destroy_value will be on a dynamically .none
value, which is a no-op.
That being said, the intention here was actually not to implement this pattern
in the code (as can be seen by not handling @owned destructures). Thus this
commit just makes the constant folding code more conservative to ensure that we
do not try to handle this case.
The code that is deleted in this PR attempted to save a bit of compile time by:
1. Checking if after constant folding we have a tuple that is immediately tuple
extracted from.
2. RAUW the tuple extracts directly from the tuple's operands.
This adds a bunch of complexity to the pass and also moves code that should be
in a visitor into the main pass workloop function.
After this PR what happens instead is that after we fold, we add the tuple to
the worklist. Then during the next iteration we look at the tuple_extract users
and simplify them at that point.
add -enable-ownership-stripping-after-serialization flag to OSLog optimization tests,
and update the folding logic and end-of-use discovery logic to handle ownership
and non-ownership SIL.
recursivelyDeleteTriviallyDeadInstructions will handle this for us when it
deletes the instruction. So in fact, this was actually a use of an invalid
pointer. Good thing we never dereferenced it.
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
This involves teaching the constant folder to look through a borrow when trying
to find the string literal. I also added an additional run with ownership
lowering after diagnostics enabled to make sure this doesn't break again.
Specifically, this transforms:
builtin "generic_add"<Builtin.Vec4xInt32>(
->
builtin "add_Vec4xInt32"(
If we do not have a static overload for the type, we just leave the generic call
alone. If the generic builtin takes addresses as its arguments (i.e. 2x
in_guaranteed + 1x out), we load the arguments, evaluate the static overloaded
builtin and then store the result into the out parameter.
TLDR: This patch introduces a new kind of builtin, "a polymorphic builtin". One
calls it like any other builtin, e.x.:
```
Builtin.generic_add(x, y)
```
but it has a contract: it must be specialized to a concrete builtin by the time
we hit Lowered SIL. In this commit, I add support for the following generic
operations:
Type | Op
------------------------
FloatOrVector |FAdd
FloatOrVector |FDiv
FloatOrVector |FMul
FloatOrVector |FRem
FloatOrVector |FSub
IntegerOrVector|AShr
IntegerOrVector|Add
IntegerOrVector|And
IntegerOrVector|ExactSDiv
IntegerOrVector|ExactUDiv
IntegerOrVector|LShr
IntegerOrVector|Mul
IntegerOrVector|Or
IntegerOrVector|SDiv
IntegerOrVector|SRem
IntegerOrVector|Shl
IntegerOrVector|Sub
IntegerOrVector|UDiv
IntegerOrVector|Xor
Integer |URem
NOTE: I only implemented support for the builtins in SIL and in SILGen. I am
going to implement the optimizer parts of this in a separate series of commits.
DISCUSSION
----------
Today there are polymorphic like instructions in LLVM-IR. Yet, at the
swift and SIL level we represent these operations instead as Builtins whose
names are resolved by splatting the builtin into the name. For example, adding
two things in LLVM:
```
%2 = add i64 %0, %1
%2 = add <2 x i64> %0, %1
%2 = add <4 x i64> %0, %1
%2 = add <8 x i64> %0, %1
```
Each of the add operations are done by the same polymorphic instruction. In
constrast, we splat out these Builtins in swift today, i.e.:
```
let x, y: Builtin.Int32
Builtin.add_Int32(x, y)
let x, y: Builtin.Vec4xInt32
Builtin.add_Vec4xInt32(x, y)
...
```
In SIL, we translate these verbatim and then IRGen just lowers them to the
appropriate polymorphic instruction. Beyond being verbose, these prevent these
Builtins (which need static types) from being used in polymorphic contexts where
we can guarantee that eventually a static type will be provided.
In contrast, the polymorphic builtins introduced in this commit can be passed
any type, with the proviso that the expert user using this feature can guarantee
that before we reach Lowered SIL, the generic_add has been eliminated. This is
enforced by IRGen asserting if passed such a builtin and by the SILVerifier
checking that the underlying builtin is never called once the module is in
Lowered SIL.
In forthcoming commits, I am going to add two optimizations that give the stdlib
tool writer the tools needed to use this builtin:
1. I am going to add an optimization to constant propagation that changes a
"generic_*" op to the type of its argument if the argument is a type that is
valid for the builtin (i.e. integer or vector).
2. I am going to teach the SILCloner how to specialize these as it inlines. This
ensures that when we transparent inline, we specialize the builtin automatically
and can then form SSA at -Onone using predictable memory access operations.
The main implication around these polymorphic builtins are that if an author is
not able to specialize the builtin, they need to ensure that after constant
propagation, the generic builtin has been DCEed. The general rules are that the
-Onone optimizer will constant fold branches with constant integer operands. So
if one can use a bool of some sort to trigger the operation, one can be
guaranteed that the code will not codegen. I am considering putting in some sort
of diagnostic to ensure that the stdlib writer has a good experience (e.x. get
an error instead of crashing the compiler).
Returns `true` if `T.Type` is known to refer to a concrete type. The
implementation allows for the optimizer to specialize this at -O and
eliminate conditional code.
Includes `Swift._isConcrete<T>(T.Type) -> Bool` wrapper function.
This simplifies the IR and eliminates in a certain sense "constant information"
from the IR by allowing us to remove extract, nominal literal round trips.
NOTE: I had to modify the pound_assert test slightly on the location of a note
that it emits. The reason that I did this is that the test output is technically
correct. The instruction we are interpreting when we error is (after this
commit), a debug_value in the prelude of the function (and thus has the new
location). I am going to talk with Ravi and others on what to do with this.
When this was originally implemented, this transform was put into
SILCombine. This patch also puts it into constant folding so we can also perform
the transform at -Onone.
My hunch is that the reason why this isn't happening with the -Onone
serialization change is that we were relying only this code being specialized
before serialization in the stdlib. But transparent functions are now not
optimized until after serialization, so it isn't done. Rather than mess with
that, I just added the support here.
I put in a little bit of infrastructure that should provide the appropriate
places for adding information about other cases where we can run into this with
other casts.
The reason I added this is that the codegen around condfail_message has changed
slightly and thus has this round trip in it, messing with various pattern
matching routines.
Example:
```
%x = string_literal
%x1 = ptr_to_int %x // <=== This messes with pattern matching from
%x2 = int_to_ptr %x1 // <=== the cond_fail to %x.
cond_fail(%x2)
```
=>
```
%x = string_literal // <=== Happiness = ).
cond_fail(%x)
```
"globalStringTablePointer": String -> Builtin.RawPointer` to a
string_literal instruction if the string that is passed is constructed
from a literal. Otherwise, emit diagnostics.
With the advent of dynamic_function_ref the actual callee of such a ref
my vary. Optimizations should not assume to know the content of a
function referenced by dynamic_function_ref. Introduce
getReferencedFunctionOrNull which will return null for such function
refs. And getInitialReferencedFunction to return the referenced
function.
Use as appropriate.
rdar://50959798
The specific case where this happened here is:
```
// Worklist = [
%0 = struct $Int (...)
%1 = $Klass
%2 = tuple (%0, %1)
(%3, %4) = destructure_tuple %2
store %3 to [trivial] %3stack
store %4 to [init] %4stack
```
What would happen is we would visit the destructure_tuple, replace %3 with %0
but fail to propagate %4:
```
%0 = struct $Int (...)
%1 = $Klass
%2 = tuple (%0, %1)
(%3, %4) = destructure_tuple %2
store %0 to [trivial] %3stack
store %4 to [init] %4stack
```
This then causes the tuple to be added to the worklist. When we visit the tuple,
we see that we have a destructure_tuple that is a user of the tuple, we see that
it still has that Struct as a user despite us having constant propagated that
component of the tuple. This then causes us to add the struct back to the
worklist despite that tuple component having no uses. Then when we visit the
struct. Which causes us to visit the tuple, etc.
rdar://49947112
For context, String, Nil, Bool, and Int already behave this way.
Note: Swift can compile against 80 or 64 bit floats as the builtin
literal type. Thus, it was necessary to capture this bit somehow in the
FloatLiteralExpr. This was done as another Type field capturing this
info.
For context, String, Nil, and Bool already behave this way.
Note: Before it used to construct (call, ... (integer_literal)), and the
call would be made explicit / implicit based on if you did eg: Int(3) or
just 3. This however did not translate to the new world so this PR adds
a IsExplicitConversion bit to NumberLiteralExpr. Some side results of
all this are that some warnings changed a little and some instructions are
emitted in a different order.
I also ported the constant_propagation.sil tests over for ownership and updated
a few parts of the cast optimizer so that those tests pass with and without
ownership. I purposely only updated the parts of the cast optimizer that crashed
with ownership in the relevant test so that I can add new sil code coverage for
those uncovered code paths.
NOTE: I changed all places that the CastOptimizer is created to just pass in
nullptr for now so this is NFC.
----
Right now the interface of the CastOptimizer is muddled and confused. Sometimes
it is returning a value that should be used by the caller, other times it is
returning an instruction that is meant to be reprocessed by the caller.
This series of patches is attempting to clean this up by switching to the
following model:
1. If we are optimizing a cast of a value, we return a SILValue. If the cast
fails, we return an empty SILValue().
2. If we are optimizing a cast of an address, we return a boolean value to show
success/failure and require the user to use the SILBuilderContext to get the
cast if they need to.