When the self operand of a fixed_storage.check_index call is an address
type (e.g. @inout Span<Int>), the value at that address can be mutated
between bounds checks. Add a bailout to prevent illegal optimization in such cases.
We cannot use spare bits or other overlapping storage layout tricks with fundamentally
address-only enums, and we can take advantage of this to do borrowing switches or other
in-place projections without copying the value. However, for resilient enums, the
implementation may use spare bit packing, but the type must be handled address-only
outside of its defining module, and we didn't have a way to express that with
borrowing switch. Optimization passes have also been running into problems with the
complexity that we were using `unchecked_take_enum_data_addr` sometimes as a pure
operation. This patch splits the instruction into three:
- `unchecked_inplace_enum_data_addr` represents a nondestructive in-place enum
projection. It is only allowed for enums whose projection operation is
nondestructive.
- `unchecked_take_enum_data_addr` represents a destructive enum projection,
invalidating the enum and leaving the payload to be further consumed.
This matches the current instruction's semantics.
- `unchecked_borrow_enum_data_addr` represents a borrowing enum projection.
The instruction takes a second operand for "scratch" space, which the
enum representation may be copied into in order to avoid invalidating the
enum value, so the result is dependent on the lifetime of both the
original enum and the scratch buffer. This allows for borrowing switches
over resilient enums.
`unchecked_borrow_enum_data_addr` is implemented by taking advantage of the
"address-only enums can't do spare bit optimization" property at runtime.
We inspect the operand type's bitwise-borrowability from its metadata. If
the type is bitwise-borrowable, then we are allowed to bitwise-copy the
enum to the scratch space and apply the projection to the scratch space,
preserving the original value. If the type is not bitwise-borrowable, then
we cannot use spare bit optimization in its layout, so we apply the
projection in-place.
Fixes rdar://174952822.
When merging fixed-storage bounds checks with the same self value,
cloneFixedStorageIndex clones the index computation of a later check
to place it adjacent to an earlier one. The previous implementation could lead to
illegal SIL due to dominance violation in some cases by cloning an instruction
before it's operands were cloned.
Fix this with a recursive DFS that appends instructions in postorder.
Since a node is only added to the worklist after all its operands are recursively
processed, the resulting order is a valid topological sort that guarantees every
cloned instruction's operands are already available.
In cloneFixedStorageIndex, bail out when an instruction to be cloned has
a lifetime-ending operand. Cloning such an instruction (e.g.
destructure_struct/destructure_tuple with an @owned operand, end_borrow of a begin_borrow)
may create a second consume, violating OSSA invariants.
This change introduces an optimization that hoists bounds checks with fixed storage semantics when they operate on the same self value,
grouping them together within a basic block to enable downstream optimizations like partial vectorization.
Since the language guarantees the validity of underlying storage for types annotated with fixed storage semantics like Span, InlineArray etc,
we don't need to consult alias analysis to perform this optimization.
We have always hoisted cond_fail instructions over side effects in loops. This change enables hoisting over side-effects in blocks.
rdar://164947648 and rdar://166409418
ColdBlockInfo previously had to be constructed and run manually for each function.
Turning it into a SILAnalysis makes it lazily computed and cached per-function,
with automatic invalidation whenever the control-flow graph changes.
This is otherwise a non-functional change.
Mainly:
* look through ownership instructions in more places
* support `load_borrow` and `store_borrow` where `load` and `store`s are expected
* support `destructure_struct` where `struct_extract` is expected
* enable inout-keypath optimization for OSSA
* OSSA support in StringOptimization
This new OSSA invariant simplifies many optimizations because they don't have to take care of the corner case of incomplete lifetimes in dead-end blocks.
The implementation basically consists of these changes:
* add the lifetime completion utility
* add a flag in SILFunction which tells optimization that they need to run the lifetime completion utility
* let all optimizations complete lifetimes if necessary
* enable the ownership verifier to check complete lifetimes
These two new invariants eliminate corner cases which caused bugs if optimization didn't handle them.
Also, it will significantly simplify lifetime completion.
The implementation basically consists of these changes:
* add a flag in SILFunction which tells optimization if they need to take care of infinite loops
* add a utility to break infinite loops
* let all optimizations remove unreachable blocks and break infinite loops if necessary
* add verification to check the new SIL invariants
The new `breakIfniniteLoops` utility breaks infinite loops in the control flow by inserting an "artificial" loop exit to a new dead-end block with an `unreachable`.
It inserts a `cond_br` with a `builtin "infinite_loop_true_condition"`:
```
bb0:
br bb1
bb1:
br bb1 // back-end branch
```
->
```
bb0:
br bb1
bb1:
%1 = builtin "infinite_loop_true_condition"() // always true, but the compiler doesn't know
cond_br %1, bb2, bb3
bb2: // new back-end block
br bb1
bb3: // new dead-end block
unreachable
```
If a guaranteed value is used in a dead-end exit block and the enclosing value is _not_ destroyed in this block, we end up missing the enclosing value as phi-argument after duplicating the loop.
TODO: once we have complete lifetimes we can remove this check again.
rdar://159125605
When using a densemap subscript expression on both sides of an
assignment in the same statement of the same map we run into the issue
that the map can reallocate because of the assignment but we are
referencing the value of the RHS map subscript by reference --
i.e we can reference deallocated memory.
Not good.
My attempts at making a reproducer crash failed.
rdar://151031297
When creating a tuple, the type needs to be specified because otherwise, if the original tuple has labels, it will cause a type mismatch verification error.
rdar://152588539
Prevent sinking of stores if there are instructions other than `load` which may read from memory.
This kind of memory dependencies were ignored.
Fixes SIL verifier crashes or - in worst case - miscompiles.
rdar://150205299
The SimplifyCFG and LoopRotate passes result in verification failures
when built in a compiler that is not built with Swift sources enabled.
Fixes: rdar://146357242
The body of a function has to be re-analyzed for every call
site of the function, which is very expensive and if the
body is not changed would produce the same result.
This takes about ~10% from swift-syntax overall build time
in release configuration.
Currently we don't support hoisting ownership instructions.
But the check was missing a store of Optional.none because such a value has no ownership even if the optional is not trivial.
Fixes a SIL verifier crash.
https://github.com/swiftlang/swift/issues/79491
LICM in OSSA is enabled only for instructions involving trvial values.
begin_access/end_access is eligible for LICM in OSSA. However we need to
collect applies to check for conflicts.
rdar://143835241
Incomplete liveranges in the dead-end exit block can cause a missing adjacent phi-argument for a re-borrow if there is a borrow-scope is in the loop.
But even when we have complete lifetimes, it's probably not worth rotating a loop where the header block branches to a dead-end block.
Fixes a SIL verifier crash
The old analysis pass doesn't take into account profile data, nor does
it consider post-dominance. It primarily dealt with _fastPath/_slowPath.
A block that is dominated by a cold block is itself cold. That's true
whether it's forwards or backwards dominance.
We can also consider a call to any `Never` returning function as a
cold-exit, though the block(s) leading up to that call may be executed
frequently because of concurrency. For now, I'm ignoring the concurrency
case and assuming it's cold. To make use of this "no return" prediction,
use the `-enable-noreturn-prediction` flag, which is currently off by
default.
Use RegularLocation::getAutoGeneratedLocation() while hoisting a load.
Previously the source location of the preheader's terminator was used,
this need not be a RegularLocationKind triggering verification errors.