To give a bit more information, currently the way the move function is
implemented is that:
1. SILGen emits a builtin "move" that is called within the function _move in the
stdlib.
2. Mandatory Inlining today if the final inlined type is address only, inlines
builtin "move" as mark_unresolved_move_addr. Otherwise, if the inlined type
is loadable, it performs a load [take] + move [diagnostic] + store [init].
3. In the diagnostic pipeline before any mem optimizations have run, we run the
move checker for addresses. This eliminates /all/ mark_unresolved_move_addr
as part of emitting diagnostics. In order to make this work, we perform a
small optimization before the checker runs that moves the
mark_unresolved_move_addr from being on temporary alloc_stacks to the true
base underlying address we are trying to move. This optimization is necessary
since _move is generic and often times SILGen will emit this temporary that
we do not want.
4. Then after we have run the guaranteed mem optimizations, we run the object
based move checker emitting diagnostics.
This PR changes the scheme above to the following:
1. SILGen emits a builtin "move" that is called within the function _move in the
stdlib.
2. Mandatory Inlining inlines builtin "move" as mark_unresolved_move_addr.
3. In the diagnostic pipeline before we have run any mem optimizations and
before we have run the actual move address checker, we massage the IR as we
do above but in a separate pass where in addition we try to match this pattern:
```
%temporary = alloc_stack $LoadableType
store %1 to [init] %temporary : $*LoadableType
mark_unresolved_move_addr %temporary to %otherAddr : $*LoadableType
destroy_addr %temporary : $*LoadableType
```
and transform it to:
```
%temporary = alloc_stack $LoadableType
%2 = move_value [allows_diagnostics] %1 : $*LoadableType
store %2 to [init] %temporary : $*LoadableType
destroy_addr %temporary : $*LoadableType
```
ensuring that the object move checker will handle this.
4. Then after we have run the guaranteed mem optimizations, we run the object
based move checker emitting diagnostics.
* Replace the uniqueness result of a begin_cow_mutation of an empty Array/Set/Dictionary singleton with zero.
* Remove empty begin_cow_mutation - end_cow_mutation pairs
* Remove empty end_cow_mutation - begin_cow_mutation pairs
NOTE: This pass is disabled when -enable-experimental-lexical-lifetimes is
enabled.
When that flag is disabled, this removes the lexical flag from begin_borrow and
alloc_stack. This ensures that we can begin using begin_borrow [lexical] and
friends to emit diagnostics without impacting performance. I am going to be
preparing a subsequent patch that causes us to emit lexical lifetimes by
default. Due to this pass, I am not expecting any issues around perf.
This is just an initial prototype for people to play with. It is as always
behind the -enable-experimental-move-only flag.
NOTE: In this PR I implemented this only for 'local let' like things (local
lets/params). I did not implement in this PR support for local var and haven't
done anything with class ivars or globals.
rdar://83957028
NOTE: This is only available when the flag -enable-experimental-move-only. There
are no effects when the flag is disabled.
The way that this works is that it takes advantage of the following changes to
SILGen emission:
* When SILGen initializes a let with NoImplicitCopyAttribute, SILGen now emits
a begin_borrow [lexical] + copy + move_only. This is a pattern that we can check
and know that we are processing a move only value. When performing move
checking, we check move_only as a move only value and that it isn't consumed
multiple times.
* The first point works well for emitting all diagnostics except for
initializing an additional let var. To work around that I changed let
initialization to always bind to an owned value to a move of that owned
value. There is no semantic difference since that value is going to be consumed
by the binding operation anyways so we effectively just move the cleanup from
the original value we wanted to bind to the move. We still then actually borrow
the new let value with a begin_borrow [lexical] for the new let value. This
ensures that an initialization of a let value appears to be a consuming use to
the move only value checker while ensuring that the value has a proper
begin_borrow [lexical].
Some notes on functionality:
1. This attribute can only be applied to local 'let'.
2. "print" due to how we call it today with a vararg array is treated as a
consuming use (unfortunately).
3. I have not added the builtin copy operator yet, but I recently added a _move
skeleton attribute so one can end the lifetimes of these values early.
4. This supports all types that are not address only types (similar to
_move). To support full on address only types we need opaque values.
rdar://83957088
The PerformanceDiagnostics pass issues performance diagnostics for functions which are annotated with performance annotations, like @_noLocks, @_noAllocation.
This is done recursively for all functions which are called from performance-annotated functions.
rdar://83882635
This pass is only used for functions with performance annotations (@_noLocks, @_noAllocation).
It runs in the mandatory pipeline and specializes all function calls in performance-annotated functions and functions which are called from such functions.
In addition, the pass also does some other related optimizations: devirtualization, constant-folding Builtin.canBeClass, inlining of transparent functions and memory access optimizations.
To print the module, use the new llvm flag -sil-print-canonical-module
which parallels the existing flag -sil-view-canonical-cfg. When that
flag is passed, the new pass ModulePrinter is added to the diagnostic
pass pipeline after mandatory diagnostics have run. The new pass just
prints the module to stdout.
ARC operations don't have an effect on immortal objects, like the empty array singleton or statically allocated arrays.
Therefore we can freely remove and retain/release instructions on such objects, even if there is no paired balanced ARC operation.
This optimization can only be done with a minimum deployment target of Swift 5.1, because in that version we added immortal ref count bits.
The optimization is implemented in libswift. Additionally, the remaining logic of simplifying strong_retain and strong_release is also ported to libswift.
rdar://81482156
* unify FunctionPassContext and InstructionPassContext
* add a modification API: PassContext.setOperand
* automatic invalidation notifications when the SIL is modified
TLDR: The reason why I am doing this is that often times people confuse assembly
vision remarks for normal opt remarks. I want to accentuate that this is
actually trying to do something different than a traditional opt remark. To that
end I renamed things in the compiler and added a true attribute
`@_assemblyVision` to trigger the compiler to emit these remarks to help
everyone remember what this is in their ontology. I explain below the
difference.
----
Normal opt remarks work by the optimizer telling you if it succeeded or failed
to perform an optimization. Another way of putting this is that opt remarks is
trying to give back feedback to the user from an expert system about why it did
or not do something. There is inherently an act of interpretation in the
optimizer about whether or not to report an 'action' that it perpetrated to the
user.
Assembly Vision Remarks is instead trying to be an expert tool that acts like an
xray. Instead of telling the user about what the optimizer did, it is instead a
simple visitor that visits the IR and emits SourceLocations for where specific
hazards ending up in the program. In this sense it is just telling the user
where certain instructions ended up and using heuristics to relate this to
information at the IR level. To a get a sense of this difference, consider the
following Swift Code:
```
public class Klass {
func doSomething() {}
}
var global: Klass = Klass()
@inline(__always)
func bar() -> Klass { global }
@_assemblyVision
@inline(never)
func foo() {
bar().doSomething()
}
```
In this case, we will emit the following remarks:
```
test.swift:16:5: remark: begin exclusive access to value of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:9: remark: end exclusive access to value of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:11: remark: retain of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:23: remark: release of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
```
Notice how the begin/end exclusive access are marked as actually being before
the retain, release of global. That seems weird since exclusive access to memory
seems like something that should not escape an exclusivity scope... but in fact
this corresponds directly to what we eventually see in the SIL:
```
// test.sil
sil hidden [noinline] [_semantics "optremark"] @$ss3fooyyF : $@convention(thin) () -> () {
bb0:
%0 = global_addr @$ss6globals5KlassCvp : $*Klass
%1 = begin_access [read] [dynamic] [no_nested_conflict] %0 : $*Klass
%2 = load %1 : $*Klass
end_access %1 : $*Klass
%4 = class_method %2 : $Klass, #Klass.doSomething : (Klass) -> () -> (), $@convention(method) (@guaranteed Klass) -> ()
strong_retain %2 : $Klass
%6 = apply %4(%2) : $@convention(method) (@guaranteed Klass) -> ()
strong_release %2 : $Klass
%8 = tuple ()
return %8 : $()
} // end sil function '$ss3fooyyF'
```
and assembly,
```
// test.S
_$ss3fooyyF:
pushq %rbp
movq %rsp, %rbp
pushq %r13
pushq %rbx
subq $32, %rsp
leaq _$ss6globals5KlassCvp(%rip), %rdi
leaq -40(%rbp), %rsi
xorl %edx, %edx
xorl %ecx, %ecx
callq _swift_beginAccess
movq _$ss6globals5KlassCvp(%rip), %r13
movq (%r13), %rax
movq 80(%rax), %rbx
movq %r13, %rdi
callq _swift_retain
callq *%rbx
movq %r13, %rdi
callq _swift_release
addq $32, %rsp
popq %rbx
popq %r13
popq %rbp
retq
```
so as one can see what we are trying to do is inform the user of hazards in the
code without trying to reason about it, automated a task that users often have
to perform by hand: inspection of assembly to determine where runtime calls and
other hazards ended up.
Instruction passes are basically visit functions in SILCombine for a specific instruction type.
With the macro SWIFT_INSTRUCTION_PASS such a pass can be declared in Passes.def.
SILCombine then calls the run function of the pass in libswift.
StackList is a very efficient data structure for worklist type things.
This is a port of the C++ utility with the same name.
Compared to Array, it does not require any memory allocations.
With the macro SWIFT_FUNCTION_PASS a new libswift function pass can be defined in Passes.def.
The SWIFT_FUNCTION_PASS_WITH_LEGACY is similar, but it allows to keep an original C++ “legacy” implementation of the pass, which is used if the compiler is not built with libswift.
It's not needed anymore with delayed instruction deletion.
It was used for two purposes:
1. For analysis, which cache instructions, to avoid dangling instruction pointers
2. For passes, which maintain worklists of instructions, to remove a deleted instructions from the worklist. This is now done by checking SILInstruction::isDeleted().
Instead of caching alias results globally for the module, make AliasAnalysis a FunctionAnalysisBase which caches the alias results per function.
Why?
* So far the result caches could only grow. They were reset when they reached a certain size. This was not ideal. Now, they are invalidated whenever the function changes.
* It was not possible to actually invalidate an alias analysis result. This is required, for example in TempRValueOpt and TempLValueOpt (so far it was done manually with invalidateInstruction).
* Type based alias analysis results were also cached for the whole module, while it is actually dependent on the function, because it depends on the function's resilience expansion. This was a potential bug.
I also added a new PassManager API to directly get a function-base analysis:
getAnalysis(SILFunction *f)
The second change of this commit is the removal of the instruction-index indirection for the cache keys. Now the cache keys directly work on instruction pointers instead of instruction indices. This reduces the number of hash table lookups for a cache lookup from 3 to 1.
This indirection was needed to avoid dangling instruction pointers in the cache keys. But this is not needed anymore, because of the new delayed instruction deletion mechanism.
Previously, because partial apply forwarders for async functions were
not themselves fully-fledged async functions, they were not able to
handle dynamic functions. Specifically, the reason was that it was not
possible to produce an async function pointer for the partial apply
forwarder because the size to be used was not knowable.
Thanks to https://github.com/apple/swift/pull/36700, that cause has been
eliminated. With it, partial apply forwarders are fully-fledged async
functions and in particular have their own async function pointers.
Consequently, it is again possible for these partial apply forwarders to
handle non-constant function pointers.
Here, that behavior is restored, by way of reverting part of
ee63777332 while preserving the ABI it
introduced.
rdar://76122027
The comment in LowerHopToActor explains the design here.
We want SILGen to emit hops to actors, ignoring executors,
because it's easier to fully optimize in a world where deriving
an executor is a non-trivial operation. But we also want something
prior to IRGen to lower the executor derivation because there are
useful static optimizations we can do, such as doing the derivation
exactly once on a dominance path and strength-reducing the derivation
(e.g. exploiting static knowledge that an actor is a default actor).
There are probably phase-ordering problems with doing this so late,
but hopefully they're restricted to situations like actors that
share an executor. We'll want to optimize that eventually, but
in the meantime, this unblocks the executor work.
Previously, thick async functions were represented sometimes as a pair
of (AsyncFunctionPointer, nullptr)--when the thick function was produced
via a thin_to_thick_function, e.g.--and sometimes as a pair of
(FunctionPointer, ThickContext)--when the thick function was produced by
a partial_apply--with the size stored in the slot of the ThickContext.
That optimized for the wrong case: partial applies of dynamic async
functions; in that case, there is no appropriate AsyncFunctionPointer to
form when lowering the partial_apply instruction. The far more common
case is to know exactly which function is being partially applied. In
that case, we can form the appropriate AsyncFunctionPointer.
Furthermore, the previous representation made calling a thick function
more complex: it was always necessary to check whether the context was
in fact null and then proceed along two different paths depending.
Here, that behavior is corrected by creating a thunk in a mandatory
IRGen SIL pass in the case that the function that is being partially
applied is dynamic. That new thunk is then partially applied in place
of the original partial_apply of the dynamic function.
This bleeds into the implementation where "guaranteed" is used
everywhere to talk about optimization of guaranteed values. We need to
use mandatory to indicate we're talking about the pass pipeline.
The DiagnoseLifetimeIssuesPass pass prints a warning if an object is stored to a weak property (or is weakly captured) and destroyed before the property (or captured reference) is ever used again.
This can happen if the programmer relies on the lexical scope to keep an object alive, but copy-propagation can shrink the object's lifetime to its last use.
For example:
func test() {
let k = Klass()
// k is deallocated immediately after the closure capture (a store_weak).
functionWithClosure({ [weak k] in
// crash!
k!.foo()
})
}
Unfortunately this pass can only catch simple cases, but it's better than nothing.
rdar://73910632
The main change is to rename DeadFunctionElimination -> DeadFunctionAndGlobalElimination, because the pass is now also doing dead-global elimination.
A second change is to remove the FunctionLivenessComputation base class. It’s not used anywhere else.
MandatoryCopyPropagation must be a separate pass in order to preserve
all debug_value instructions. CopyPropagation cannot preserve
debug_value because, as a rule, debug information cannot affect -O
behavior.
It's against the principles of pass design to check the driver mode
within the pass. A pass always needs to do the same thing regardless
of where it runs in the pass pipeline. It also needs to be possible to
test passes in isolation.
Adds a new flag "-experimental-skip-all-function-bodies" that skips
typechecking and SIL generation for all function bodies (where
possible).
`didSet` functions are still typechecked and have SIL generated as their
body is checked for the `oldValue` parameter, but are not serialized.
Parsing will generally be skipped as well, but this isn't necessarily
the case since other flags (eg. "-verify-syntax-tree") may force delayed
parsing off.
* Redundant hop_to_executor elimination: if a hop_to_executor is dominated by another hop_to_executor with the same operand, it is eliminated:
hop_to_executor %a
... // no suspension points
hop_to_executor %a // can be eliminated
* Dead hop_to_executor elimination: if a hop_to_executor is not followed by any code which requires to run on its actor's executor, it is eliminated:
hop_to_executor %a
... // no instruction which require to run on %a
return
rdar://problem/70304809
to check for improperly nested '@_semantic' functions.
Add a missing @_semantics("array.init") in ArraySlice found by the
diagnostic.
Distinguish between array.init and array.init.empty.
Categorize the types of semantic functions by how they affect the
inliner and pass pipeline, and centralize this logic in
PerformanceInlinerUtils. The ultimate goal is to prevent inlining of
"Fundamental" @_semantics calls and @_effects calls until the late
pipeline where we can safely discard semantics. However, that requires
significant pipeline changes.
In the meantime, this change prevents the situation from getting worse
and makes the intention clear. However, it has no significant effect
on the pass pipeline and inliner.
This attribute allows to define a pre-specialized entry point of a
generic function in a library.
The following definition provides a pre-specialized entry point for
`genericFunc(_:)` for the parameter type `Int` that clients of the
library can call.
```
@_specialize(exported: true, where T == Int)
public func genericFunc<T>(_ t: T) { ... }
```
Pre-specializations of internal `@inlinable` functions are allowed.
```
@usableFromInline
internal struct GenericThing<T> {
@_specialize(exported: true, where T == Int)
@inlinable
internal func genericMethod(_ t: T) {
}
}
```
There is syntax to pre-specialize a method from a different module.
```
import ModuleDefiningGenericFunc
@_specialize(exported: true, target: genericFunc(_:), where T == Double)
func prespecialize_genericFunc(_ t: T) { fatalError("dont call") }
```
Specially marked extensions allow for pre-specialization of internal
methods accross module boundries (respecting `@inlinable` and
`@usableFromInline`).
```
import ModuleDefiningGenericThing
public struct Something {}
@_specializeExtension
extension GenericThing {
@_specialize(exported: true, target: genericMethod(_:), where T == Something)
func prespecialize_genericMethod(_ t: T) { fatalError("dont call") }
}
```
rdar://64993425
LLVM, as of 77e0e9e17daf0865620abcd41f692ab0642367c4, now builds with
-Wsuggest-override. Let's clean up the swift sources rather than disable
the warning locally.
Optimizes String operations with constant operands.
Specifically:
* Replaces x.append(y) with x = y if x is empty.
* Removes x.append("")
* Replaces x.append(y) with x = x + y if x and y are constant strings.
* Replaces _typeName(T.self) with a constant string if T is statically known.
With this optimization it's possible to constant fold string interpolations, like "the \(Int.self) type" -> "the Int type"
This new pass runs on high-level SIL, where semantic calls are still in place.
rdar://problem/65642843
Rename the existing pass to AccessedStorageAnalysisDumper.
AccessedStorage is useful on its own as a utility without the
analysis. We need a way to test the utility itself.
Add test cases for the previous commit that introduced
FindPhiStorageVisitor.
We do allow for limited def-use traversal to do things like find debug_value.
But nothing like a full data flow pass. As a proof of concept I just emit
opt-remarks when we see retains, releases. Eventually I would like to also print
out which lvalue the retain is from but that is more than I want to do with this
proof of concept.
Optimizes copies from a temporary (an "l-value") to a destination.
%temp = alloc_stack $Ty
instructions_which_store_to %temp
copy_addr [take] %temp to %destination
dealloc_stack %temp
is optimized to
destroy_addr %destination
instructions_which_store_to %destination
The name TempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side.
The TempLValueOpt is similar to CopyForwarding::backwardPropagateCopy.
It's more restricted (e.g. the copy-source must be an alloc_stack).
That enables other patterns to be optimized, which backwardPropagateCopy cannot handle.
This pass also performs a small peephole optimization which simplifies copy_addr - destroy sequences.
copy_addr %source to %destination
destroy_addr %source
is replace with
copy_addr [take] %source to %destination
As part of bringing up ossa on the rest of the optimizer, I am going to be first
moving ossa back for the stdlib module since the stdlib module does not have any
sil based dependencies. To do so, I am adding this pass that I can place at the
beginning of the pipeline (where NonTransparentFunctionOwnershipModelEliminator
runs today) and then move NonTransparentFunctionOwnershipModelEliminator down as
I update passes. If we are processing the stdlib, the ome doesn't run early and
instead runs late. If we are not processing the stdlib, we perform first an OME
run early and then perform an additional OME run that does nothing since
lowering ownership is an idempotent operation.
Start with some high-level checks of whether a declaration is formally final, or has no visible overrides
in the domain of the program visible to the SIL module.
We can eventually adopt more of the logic from the Devirtualizer pass to tell whether call sites are
"effectively final", and maybe make the Devirtualizer consume that information applied by this pass.