* Remove NewInsts from ARCSequenceOpts
* Remove more instances of InsertPts
* Address comments from #33504
* Make bottom up loop traversal simpler. Use better apis
* Update LoopRegion printer with more info
Add differentiation support for non-active `try_apply` SIL instructions.
Notable pullback generation changes:
* Original basic blocks are now visited in a different order:
* starting from the original basic block, all its predecessors
* are visited in a breadth-first search order. This ensures that
* all successors of any block are visited before the block itself.
Resolves TF-433.
LLVM, as of 77e0e9e17daf0865620abcd41f692ab0642367c4, now builds with
-Wsuggest-override. Let's clean up the swift sources rather than disable
the warning locally.
TLDR: This fixes an ownership verifier assert caused by not placing end_borrows
along paths where an enum is provable to have a trivial case. It only happens if
all non-trivial cases in a switch_enum are "dead end blocks" where the program
will end and we leak objects.
The Problem
-----------
The actual bug here only occurs in cases where we have a switch_enum on an enum
with mixed trivial, non-trivial cases and all of the non-trivial payloaded cases
are "dead end blocks". As an example, lets look at a simple switch_enum over an
optional where the .some case is a dead end block and we leak the Klass object
into program termination:
```
%0 = load [copy] %mem : $Klass
switch_enum %0 : $Optional<Klass>, case #Optional.some: bbDeadEnd, case #Optional.none: bbContinue
bbDeadEnd(%0a : @owned $Klass): // %0 is leaked into program end!
unreachable
bbContinue:
... // program continue.
```
In this case, if we were only looking at final destroying uses, we would pass a
def without any uses to the ValueLifetimeChecker causing us to not have a
frontier at all causing us to not insert any end_borrows, yielding:
```
%0 = load_borrow %mem : $Klass
switch_enum %0 : $Optional<Klass>, case #Optional.some: bbDeadEnd, case #Optional.none: bbContinue
bbDeadEnd(%0a : @guaranteed $Klass): // %0 is leaked into program end and
// doesnt need an end_borrow!
unreachable
bbContinue:
... // program continue... we need an end_borrow here though!
```
This then trips the ownership verifier since switch_enum is a transforming
terminator that acts like a forwarding instruction implying we need an
end_borrow on the base value along all non-dead end paths through the program.
Importantly this is not actually a leak of a value or unsafe behavior since the
only time that we enter into unsafe territory is along paths where the enum was
actually trivial. So the load_borrow is actually just loaded the trivial enum
value.
The Fix
-------
In order to work around this, I realized that the right solution is to also
include the forwarding consuming uses (in this case the switch_enum use) when
determining the lifetime and that this solves the problem.
That being said, after I made that change, I noticed that I needed to remove my
previous manner of computing the insertion point to use for arguments when
finding the lifetime using ValueLifetimeAnalysis. Previously since I was using
only the destroying uses I knew that the destroy_value could not be the first
instruction in the block of my argument since I handled that case individually
before using the ValueLifetimeAnalysis. That invariant is no longer true as can
be seen in the case above if %0 was from a SILArgument itself instead of a load
[copy] and we were converting that argument to be a guaranteed argument.
To fix this, I taught ValueLifetimeAnalysis how to handle defs from
Arguments. The key thing is I noticed while reading the code that the analysis
only generally cared about the instruction's parent block. Beyond that, the def
being from an instruction was only needed to determine if a user is earlier in
the same block as the def instruction. Those concerns not apply to SILArgument
which dominate all instructions in the same block, so in this patch, we just
skip those conditional checks when we have a SILArgument. The rest of the code
that uses the parent block is the same for both SILArgument/SILInstructions.
rdar://65244617
Specifically:
1. I made methods, variables camelCase.
2. I expanded out variable names (e.x.: bb -> block, predBB -> predBlocks, U -> wrappedUse).
3. I changed typedef -> using.
4. I changed a few c style for loops into for each loops using llvm::enumerate.
NOTE: I left the parts needed for syncing to LLVM in the old style since LLVM
needs these to exist for CRTP to work correctly for the SILSSAUpdater.
Optimize the unconditional_checked_cast_addr in this pattern:
%box = alloc_existential_box $Error, $ConcreteError
%a = project_existential_box $ConcreteError in %b : $Error
store %value to %a : $*ConcreteError
%err = alloc_stack $Error
store %box to %err : $*Error
%dest = alloc_stack $ConcreteError
unconditional_checked_cast_addr Error in %err : $*Error to ConcreteError in %dest : $*ConcreteError
to:
...
retain_value %value : $ConcreteError
destroy_addr %err : $*Error
store %value to %dest $*ConcreteError
This lets the alloc_existential_box become dead and it can be removed in following optimizations.
The same optimization is also done for conditional_checked_cast_addr.
There is also an implication for debugging:
Each "throw" in the code calls the runtime function swift_willThrow. The function is used by the debugger to set a breakpoint and also add hooks.
This optimization can completely eliminate a "throw", including the runtime call.
So, with optimized code, the user might not see the program to break at a throw, whereas in the source code it is actually throwing.
On the other hand, eliminating the existential box is a significant performance win and we don't guarantee any debugging behavior for optimized code anyway. So I think this is a reasonable trade-off.
I added an option "-Xllvm -keep-will-throw-call" to keep the runtime call which can be used if someone want's to reliably break on "throw" in optimized builds.
rdar://problem/66055678
Optimizes String operations with constant operands.
Specifically:
* Replaces x.append(y) with x = y if x is empty.
* Removes x.append("")
* Replaces x.append(y) with x = x + y if x and y are constant strings.
* Replaces _typeName(T.self) with a constant string if T is statically known.
With this optimization it's possible to constant fold string interpolations, like "the \(Int.self) type" -> "the Int type"
This new pass runs on high-level SIL, where semantic calls are still in place.
rdar://problem/65642843
Start `linear_function` canonicalization skeleton copying from
`differentiable_function` canonicalization. For now, transpose function
operands are filled in with `undef`.
Rename the existing pass to AccessedStorageAnalysisDumper.
AccessedStorage is useful on its own as a utility without the
analysis. We need a way to test the utility itself.
Add test cases for the previous commit that introduced
FindPhiStorageVisitor.
For use outside access enforcement passes.
Add isUniquelyIdentifiedAfterEnforcement.
Rename functions for clarity and generality.
Rename isUniquelyIdentifiedOrClass to isFormalAccessBase.
Rename findAccessedStorage to identifyFormalAccess.
Rename findAccessedStorageNonNested to findAccessedStorage.
Part of generalizing the utility for use outside the access
enforcement passes.
We do allow for limited def-use traversal to do things like find debug_value.
But nothing like a full data flow pass. As a proof of concept I just emit
opt-remarks when we see retains, releases. Eventually I would like to also print
out which lvalue the retain is from but that is more than I want to do with this
proof of concept.
`JVPCloner.h` is now tiny: `JVPCloner` exposes only a `bool run()` entry point.
All of the implementation is moved to `JVPCloner::Implementation` in
`JVPCloner.cpp`. Methods can be defined directly in `JVPCloner.cpp` without
separate declarations.
`VJPCloner.h` is now tiny: `VJPCloner` exposes only a `bool run()` entry point.
All of the implementation is moved to `VJPCloner::Implementation` in
`VJPCloner.cpp`. Methods can be defined directly in `VJPCloner.cpp` without
separate declarations.
Clarify `llvm_unreachable` after exhaustive switch is needed to silence
MSVC C4715, so we don't accidentally remove it.
Standardize some assertion messages.
Reimplement `PullbackCloner` using the pointer-to-implementation pattern.
`PullbackCloner.h` is now tiny: `PullbackCloner` exposes only a `bool run()`
entry point.
All of the implementation is moved to `PullbackCloner::Implementation` in
`PullbackCloner.cpp`.
Benefits of this approach:
- A main benefit is that methods can be defined directly in `PullbackCloner.cpp`
without needing to separately declare them in `PullbackCloner.h`.
- There is now no code duplication between `PullbackCloner.h` and
`PullbackCloner.cpp`.
- Consequently, method documentation is easier to read because it appears
directly on method definitions, instead of on method declarations in a
separate file. This is important for documentation of `PullbackCloner`
instruction visitor methods, which explain pullback transformation rules.
- Incremental recompilation may be faster since `PullbackCloner.h` changes less
often.
Partially resolves SR-13182.
Add base type parameter to `TangentStoredPropertyRequest`.
Use `TypeBase::getTypeOfMember` instead of `VarDecl::getType` to correctly
compute the member type of original stored properties, using the base type.
Resolves SR-13134.
Reverse-mode differentiation now supports `apply` instructions with multiple
active "semantic results" (formal results or `inout` parameters).
The "cannot differentiate through multiple results" non-differentiability error
is lifted.
Resolves TF-983.
Previously, PullbackEmitter assumed that original values' value category
matches their `TangentVector` types' value category. This was problematic
for loadable types with address-only `TangentVector` types.
Now, PullbackEmitter starts to support differentiation of loadable types with
address-only `TangentVector` types. This patch focuses on supporting and testing
class types, more support can be added incrementally.
Resolves TF-1149.
Use TangentStoredPropertyRequest in differentiation transform.
Improve non-differentiability diagnostics regarding invalid stored
property projection instructions:
`struct_extract`, `struct_element_addr`, `ref_element_addr`.
Diagnose the following cases:
- Original property's type does not conform to `Differentiable`.
- Base type's `TangentVector` is not a struct.
- Tangent property not found: base type's `TangentVector` does not have a
stored property with the same name as the original property.
- Tangent property's type is not equal to the original property's
`TangentVector` type.
- Tangent property is not a stored property.
Resolves TF-969 and TF-970.
Optimizes copies from a temporary (an "l-value") to a destination.
%temp = alloc_stack $Ty
instructions_which_store_to %temp
copy_addr [take] %temp to %destination
dealloc_stack %temp
is optimized to
destroy_addr %destination
instructions_which_store_to %destination
The name TempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side.
The TempLValueOpt is similar to CopyForwarding::backwardPropagateCopy.
It's more restricted (e.g. the copy-source must be an alloc_stack).
That enables other patterns to be optimized, which backwardPropagateCopy cannot handle.
This pass also performs a small peephole optimization which simplifies copy_addr - destroy sequences.
copy_addr %source to %destination
destroy_addr %source
is replace with
copy_addr [take] %source to %destination
Since libDemangling is included in the Swift standard library,
ODR violations can occur on platforms that allow statically
linking stdlib if Swift code is linked with other compiler
libraries that also transitively pull in libDemangling, and if
the stdlib version and compiler version do not match exactly
(even down to commit drift between releases). This lets the
runtime conditionally segregate its copies of the libDemangling
symbols from those in the compiler using an inline namespace
without affecting usage throughout source.
- Show SIL type when printing `AdjointValue`.
- Add utilities to print `PullbackEmitter` adjoint value and buffer mappings.
- Print generated VJP before printing generated pullback.
- This is useful because pullback generation may crash after VJP
generation succeeds.
As part of bringing up ossa on the rest of the optimizer, I am going to be first
moving ossa back for the stdlib module since the stdlib module does not have any
sil based dependencies. To do so, I am adding this pass that I can place at the
beginning of the pipeline (where NonTransparentFunctionOwnershipModelEliminator
runs today) and then move NonTransparentFunctionOwnershipModelEliminator down as
I update passes. If we are processing the stdlib, the ome doesn't run early and
instead runs late. If we are not processing the stdlib, we perform first an OME
run early and then perform an additional OME run that does nothing since
lowering ownership is an idempotent operation.