NOTE: debug_value [moved] appearing in the source code implies a _move was
used. So this will not effect current stable swift code.
This is just a first version of this that I am using to commit/bring up tests
for IRGen supporting a full dataflow version of this patch.
Big picture is that there is a bunch of work that is done in the LLVM level in
the coroutine splitter to work around communicating live variables in the
various coroutine func-lets. This logic is all done with debug.declare and we
would need to update that logic in the coroutine splitter to handle
debug.addr. Rather than do this, after some conversation, AdrianP and I realized
that we could get the same effect of a debug.declare by just redeclaring the
current live set of debug_value after each possible coroutine funclet start. To
do this in full generality, we need a full dataflow but just to bring this up we
initially perform a dominance propagation algorithm of the following sort:
1. We walk the CFG along successors. By doing this we guarantee that we visit
blocks after their dominators.
2. When we visit a block, we walk the block from start->end. During this walk:
a. We grab a new block state from the centralized block->blockState map. This
state is a [SILDebugVariable : DebugValueInst].
b. If we see a debug_value, we map blockState[debug_value.getDbgVar()] =
debug_value. This ensures that when we get to the bottom of the block, we
have pairs of SILDebugVariable + last debug_value on it.
c. If we see any coroutine funclet boundaries, we clone the current tracked
set of our block state and then walk up the dom tree dumping in each block
any debug_value with a SILDebugVariable that we have not already
dumped. This is maintained by using a visited set of SILDebugVariable for
each funclet boundary.
The end result is that at the beginning of each funclet we will basically
declare the debug info for an addr.
This is insufficient of course for moves that are in conditional control flow,
e.x.:
```
let x = Klass()
if boolValue {
await asyncCall()
let _ = _move(x)
}
```
but this at least lets me begin to write tests for this in lldb using straight
line code and work out the rest of the issues in CodeGen using those tests.
Mandatory copy propagation was primarily a stop-gap until lexcial
lifetimes were implemented. It supposedly made variables lifetimes
more consistent between -O and -Onone builds. Now that lexical
lifetimes are enabled, it is no longer needed for that purpose (and
will never satisfactorily meet that goal anyway).
Mandatory copy propagation may be enabled again later as a -Onone "
optimization. But that requires a more careful audit of the effect on
debug information.
For now, it should be disabled.
Flow-isolation is a diagnostic SIL pass that finds
unsafe accesses to properties in initializers and
deinitializers that cannot gain isolation to otherwise
protect those accesses from concurrent modifications.
See SE-327 for more details about how and why it exists.
This commit includes changes and features like:
- The removal of the escaping-use restriction
- Flow-isolation that works properly with `defer` statements
- Flow-isolation with an emphasis on helpful diagnostics.
It also includes known issues like:
- Local / nonescaping functions are not analyzed by
flow-isolation, despite it being technically possible.
The main challenge in supporting it efficiently is that
such functions do not have a single exit-point, like
a `defer`. In particular, arbitrary functions can throw
so there are points where nonisolation should _not_ flow
out of the function at a call-site in the initializer, etc.
- The implementation of the flow-isolation pass is not
particularly memory efficient; it relies on BitDataflow
even though the particular flow problem is simple.
So, a more efficient implementation would be specialized for
this particular problem, etc.
There are also some changes to the Swift language itself: defer
will respect its context when deciding its property access kind.
Previously, a defer in an initializer would always access a stored
property through its accessor methods, instead of doing so directly
like its enclosing function might. This inconsistency is unfortunate,
so for Swift 6+ we make this consistent. For Swift 5, only a defer
in a function that is a member of the following kinds of types
will gain this consistency:
- an actor type
- any nominal type that is actor-isolated, excluding UnsafeGlobalActor.
These types are still rather new, so there is much less of a chance of
breaking expected behaviors around defer. In particular, the danger is
that users are relying on the behavior of defer triggering a property
observer within an init or deinit, when it would not be triggering it
without the defer.
This is part of a larger piece of work that is going to introduce the ability in
SIL for us to wrap trivial values in a move only type wrapper and thus perform
ownership based diagnostics upon those trivial values. This is being done now so
that we can perform SSA based ownership diagnostics on trivial types. This will
allow us to eventually be able to do things like no escape analysis on
UnsafePointer.
That all sounds great, but also we must consider how this effects the rest of
the optimizer. For instance, what if we want to have a no escape integer and
have overflow checks used upon it! To ensure that we can do this, the authors
realized that we did not need to persist the ownership information so late in
that part of the pipeline and we can just do the ownership checking earlier than
constant propagation and then lower. This is safe to do since the rest of the
optimizer will not introduce escapes of a pointer or extra copies unlike if the
underlying non move only type variant was also non-trivial.
With that in hand, this PR moves these two move only passes earlier than
constant propagation for this purpose. The reason they were put in the current
position vs earlier is that I wanted them to run after predictable dead
allocation elimination since it cleaned up the SIL I was reading as I designed
it. There isn't any reason that they can't run earlier. Once I bring in the new
SIL move only type wrapper, after these run, the trivial type lowering will then
run.
* rename the CrossModuleSerializationSetup pass to simply CrossModuleOptimization
* remove the CMO specific serializer pass. Instead run the CrossModuleSerializationSetup pass directly before the standard serializer pass.
* correctly handle shared functions (e.g. specializations)
* refactoring
To give a bit more information, currently the way the move function is
implemented is that:
1. SILGen emits a builtin "move" that is called within the function _move in the
stdlib.
2. Mandatory Inlining today if the final inlined type is address only, inlines
builtin "move" as mark_unresolved_move_addr. Otherwise, if the inlined type
is loadable, it performs a load [take] + move [diagnostic] + store [init].
3. In the diagnostic pipeline before any mem optimizations have run, we run the
move checker for addresses. This eliminates /all/ mark_unresolved_move_addr
as part of emitting diagnostics. In order to make this work, we perform a
small optimization before the checker runs that moves the
mark_unresolved_move_addr from being on temporary alloc_stacks to the true
base underlying address we are trying to move. This optimization is necessary
since _move is generic and often times SILGen will emit this temporary that
we do not want.
4. Then after we have run the guaranteed mem optimizations, we run the object
based move checker emitting diagnostics.
This PR changes the scheme above to the following:
1. SILGen emits a builtin "move" that is called within the function _move in the
stdlib.
2. Mandatory Inlining inlines builtin "move" as mark_unresolved_move_addr.
3. In the diagnostic pipeline before we have run any mem optimizations and
before we have run the actual move address checker, we massage the IR as we
do above but in a separate pass where in addition we try to match this pattern:
```
%temporary = alloc_stack $LoadableType
store %1 to [init] %temporary : $*LoadableType
mark_unresolved_move_addr %temporary to %otherAddr : $*LoadableType
destroy_addr %temporary : $*LoadableType
```
and transform it to:
```
%temporary = alloc_stack $LoadableType
%2 = move_value [allows_diagnostics] %1 : $*LoadableType
store %2 to [init] %temporary : $*LoadableType
destroy_addr %temporary : $*LoadableType
```
ensuring that the object move checker will handle this.
4. Then after we have run the guaranteed mem optimizations, we run the object
based move checker emitting diagnostics.
The reason why I am doing this is that we are going to be enabling lexical
lifetimes early in the pipeline so that I can use it for the move operator's
diagnostics.
To make it easy for passes to know whether or not they should support lexical
lifetimes, I included a query on SILOptions called
supportsLexicalLifetimes. This will return true if the pass (given the passed in
option) should insert the lexical lifetime flag. This ensures that passes that
run in both pipelines (e.x.: AllocBoxToStack) know whether or not to set the
lexical lifetime flag without having to locally reason about it.
This is just chopping off layers of a larger patch I am upstreaming.
NOTE: This is technically NFC since it leaves the default alone of not inserting
lexical lifetimes at all.
NOTE: This pass is disabled when -enable-experimental-lexical-lifetimes is
enabled.
When that flag is disabled, this removes the lexical flag from begin_borrow and
alloc_stack. This ensures that we can begin using begin_borrow [lexical] and
friends to emit diagnostics without impacting performance. I am going to be
preparing a subsequent patch that causes us to emit lexical lifetimes by
default. Due to this pass, I am not expecting any issues around perf.
This is just an initial prototype for people to play with. It is as always
behind the -enable-experimental-move-only flag.
NOTE: In this PR I implemented this only for 'local let' like things (local
lets/params). I did not implement in this PR support for local var and haven't
done anything with class ivars or globals.
rdar://83957028
Replace the dynamic initialization of trivial globals with statically initialized globals, even in -Onone.
This is required to be able to use global variables in performance-annotated functions.
Also, it's a small performance improvement for -Onone.
NOTE: This is only available when the flag -enable-experimental-move-only. There
are no effects when the flag is disabled.
The way that this works is that it takes advantage of the following changes to
SILGen emission:
* When SILGen initializes a let with NoImplicitCopyAttribute, SILGen now emits
a begin_borrow [lexical] + copy + move_only. This is a pattern that we can check
and know that we are processing a move only value. When performing move
checking, we check move_only as a move only value and that it isn't consumed
multiple times.
* The first point works well for emitting all diagnostics except for
initializing an additional let var. To work around that I changed let
initialization to always bind to an owned value to a move of that owned
value. There is no semantic difference since that value is going to be consumed
by the binding operation anyways so we effectively just move the cleanup from
the original value we wanted to bind to the move. We still then actually borrow
the new let value with a begin_borrow [lexical] for the new let value. This
ensures that an initialization of a let value appears to be a consuming use to
the move only value checker while ensuring that the value has a proper
begin_borrow [lexical].
Some notes on functionality:
1. This attribute can only be applied to local 'let'.
2. "print" due to how we call it today with a vararg array is treated as a
consuming use (unfortunately).
3. I have not added the builtin copy operator yet, but I recently added a _move
skeleton attribute so one can end the lifetimes of these values early.
4. This supports all types that are not address only types (similar to
_move). To support full on address only types we need opaque values.
rdar://83957088
The PerformanceDiagnostics pass issues performance diagnostics for functions which are annotated with performance annotations, like @_noLocks, @_noAllocation.
This is done recursively for all functions which are called from performance-annotated functions.
rdar://83882635
This pass is only used for functions with performance annotations (@_noLocks, @_noAllocation).
It runs in the mandatory pipeline and specializes all function calls in performance-annotated functions and functions which are called from such functions.
In addition, the pass also does some other related optimizations: devirtualization, constant-folding Builtin.canBeClass, inlining of transparent functions and memory access optimizations.
Since we are now moving non-transparent OME down the pipeline by default, add a flag to
turn off this behaviour.
When turned on, non-transparent OME will run early, just before PerformanceSILLinker.
To print the module, use the new llvm flag -sil-print-canonical-module
which parallels the existing flag -sil-view-canonical-cfg. When that
flag is passed, the new pass ModulePrinter is added to the diagnostic
pass pipeline after mandatory diagnostics have run. The new pass just
prints the module to stdout.
Optimize code like:
puts("\(String.self)")
Optimizing string interpolation and optimizing C-strings are both done in StringOptimization.
A second run of the StringOptimization is needed in the pipeline to optimize such code, because the result of the interpolation-optimization must be cleaned up so that the C-String optimization can kick in.
Also, StringOptimization must handle struct_extract(struct(literal)), where the struct_extract may be in a called function.
To solve a phase ordering problem with inlining String semantics and inlining the `String(stringInterpolation: DefaultStringInterpolation)` constructor, we do a simple analysis of the callee. Doing this simple "interprocedural" analysis avoids relying on inlining that String constructor.
rdar://74941849
- If any of the `-g<kind>` flag is given -- except `-gnone`, debug
info will be printed into every generated SIL files.
- The `-gsil` is deprecated in favor of `-sil-based-debuginfo`. The
SILDebugInfoGenerator Pass now generates intermediate SIL file with
name "<output file>.sil_dbg_<n>.sil". Other functionalities of that
Pass remain the same.
Only issue weak lifetime warnings for users who select object lifetime
optimization. The risk of spurious warnings outweighs the benefits.
Although the warnings are generally useful regardless of the level of
optimization, it isn't really critical to issue them unless the optimizer
aggressively shrinks reference lifetimes.
Fixes rdar://79146338 Xcode warns that "referenced object is
deallocated here" but that object was passed into a method that causes
strong retention)
TLDR: The reason why I am doing this is that often times people confuse assembly
vision remarks for normal opt remarks. I want to accentuate that this is
actually trying to do something different than a traditional opt remark. To that
end I renamed things in the compiler and added a true attribute
`@_assemblyVision` to trigger the compiler to emit these remarks to help
everyone remember what this is in their ontology. I explain below the
difference.
----
Normal opt remarks work by the optimizer telling you if it succeeded or failed
to perform an optimization. Another way of putting this is that opt remarks is
trying to give back feedback to the user from an expert system about why it did
or not do something. There is inherently an act of interpretation in the
optimizer about whether or not to report an 'action' that it perpetrated to the
user.
Assembly Vision Remarks is instead trying to be an expert tool that acts like an
xray. Instead of telling the user about what the optimizer did, it is instead a
simple visitor that visits the IR and emits SourceLocations for where specific
hazards ending up in the program. In this sense it is just telling the user
where certain instructions ended up and using heuristics to relate this to
information at the IR level. To a get a sense of this difference, consider the
following Swift Code:
```
public class Klass {
func doSomething() {}
}
var global: Klass = Klass()
@inline(__always)
func bar() -> Klass { global }
@_assemblyVision
@inline(never)
func foo() {
bar().doSomething()
}
```
In this case, we will emit the following remarks:
```
test.swift:16:5: remark: begin exclusive access to value of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:9: remark: end exclusive access to value of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:11: remark: retain of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:23: remark: release of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
```
Notice how the begin/end exclusive access are marked as actually being before
the retain, release of global. That seems weird since exclusive access to memory
seems like something that should not escape an exclusivity scope... but in fact
this corresponds directly to what we eventually see in the SIL:
```
// test.sil
sil hidden [noinline] [_semantics "optremark"] @$ss3fooyyF : $@convention(thin) () -> () {
bb0:
%0 = global_addr @$ss6globals5KlassCvp : $*Klass
%1 = begin_access [read] [dynamic] [no_nested_conflict] %0 : $*Klass
%2 = load %1 : $*Klass
end_access %1 : $*Klass
%4 = class_method %2 : $Klass, #Klass.doSomething : (Klass) -> () -> (), $@convention(method) (@guaranteed Klass) -> ()
strong_retain %2 : $Klass
%6 = apply %4(%2) : $@convention(method) (@guaranteed Klass) -> ()
strong_release %2 : $Klass
%8 = tuple ()
return %8 : $()
} // end sil function '$ss3fooyyF'
```
and assembly,
```
// test.S
_$ss3fooyyF:
pushq %rbp
movq %rsp, %rbp
pushq %r13
pushq %rbx
subq $32, %rsp
leaq _$ss6globals5KlassCvp(%rip), %rdi
leaq -40(%rbp), %rsi
xorl %edx, %edx
xorl %ecx, %ecx
callq _swift_beginAccess
movq _$ss6globals5KlassCvp(%rip), %r13
movq (%r13), %rax
movq 80(%rax), %rbx
movq %r13, %rdi
callq _swift_retain
callq *%rbx
movq %r13, %rdi
callq _swift_release
addq $32, %rsp
popq %rbx
popq %r13
popq %rbp
retq
```
so as one can see what we are trying to do is inform the user of hazards in the
code without trying to reason about it, automated a task that users often have
to perform by hand: inspection of assembly to determine where runtime calls and
other hazards ended up.
Problem: We continue to uncover code that assumes either precise local
variable lifetimes (to the end of the lexical scope) or extended
temporary lifetimes (to the end of the statement). These bugs require
heroic debugging to find the root cause. Because they only show up in
Release builds, they often manifest just before the affected project
“ships” under an impending deadline.
We now have enough information from projects that have been tested
with copy propagation that we can both understand common patterns and
identify some specific APIs that may cause trouble. We know what API
annotations the compiler will need for helpful warnings and can begin
adding those annotations.
Disabling copy propagation now is only a temporary deferral, we will
still need to bring it back by default. However, by then we should
have:
- LLDB and runtime support for debugging deinitialized objects
- A variant of lifetime sortening that can run in Debug builds to
catch problems before code ships
- Static compiler warnings for likely invalid lifetime assumptions
- Source annotations that allow those warnings to protect programmers
against existing dangerous APIs
In the meantime...
Projects can experiment with the behavior and gradually migrate.
Copy propagation will automatically be enabled in -enable-ossa-modules
mode. It is important to work toward a single performance
target. Supporting full OSSA and improving ARC performance without
copy propagation would be prohibitively complicated.
rdar://76438920 (Temporarily disable -O copy propagation by default)
Previously, because partial apply forwarders for async functions were
not themselves fully-fledged async functions, they were not able to
handle dynamic functions. Specifically, the reason was that it was not
possible to produce an async function pointer for the partial apply
forwarder because the size to be used was not knowable.
Thanks to https://github.com/apple/swift/pull/36700, that cause has been
eliminated. With it, partial apply forwarders are fully-fledged async
functions and in particular have their own async function pointers.
Consequently, it is again possible for these partial apply forwarders to
handle non-constant function pointers.
Here, that behavior is restored, by way of reverting part of
ee63777332 while preserving the ABI it
introduced.
rdar://76122027
This feature degrades the debugging experience and causes a large
number of unit test failures.
These were both known issues, but our planned debugger improvements
won't be ready for a while. Until then, we'll leave the feature under
a compiler option, and developers can adopt it at there own speed for
now when they are ready to fix lifetime issues in their code.
rdar://76177280 (Disable mandatory-copy-propagation (-Onone only))
The comment in LowerHopToActor explains the design here.
We want SILGen to emit hops to actors, ignoring executors,
because it's easier to fully optimize in a world where deriving
an executor is a non-trivial operation. But we also want something
prior to IRGen to lower the executor derivation because there are
useful static optimizations we can do, such as doing the derivation
exactly once on a dominance path and strength-reducing the derivation
(e.g. exploiting static knowledge that an actor is a default actor).
There are probably phase-ordering problems with doing this so late,
but hopefully they're restricted to situations like actors that
share an executor. We'll want to optimize that eventually, but
in the meantime, this unblocks the executor work.
Previously, thick async functions were represented sometimes as a pair
of (AsyncFunctionPointer, nullptr)--when the thick function was produced
via a thin_to_thick_function, e.g.--and sometimes as a pair of
(FunctionPointer, ThickContext)--when the thick function was produced by
a partial_apply--with the size stored in the slot of the ThickContext.
That optimized for the wrong case: partial applies of dynamic async
functions; in that case, there is no appropriate AsyncFunctionPointer to
form when lowering the partial_apply instruction. The far more common
case is to know exactly which function is being partially applied. In
that case, we can form the appropriate AsyncFunctionPointer.
Furthermore, the previous representation made calling a thick function
more complex: it was always necessary to check whether the context was
in fact null and then proceed along two different paths depending.
Here, that behavior is corrected by creating a thunk in a mandatory
IRGen SIL pass in the case that the function that is being partially
applied is dynamic. That new thunk is then partially applied in place
of the original partial_apply of the dynamic function.
This shortens -Onone lifetimes.
To eliminate ARC traffic, the optimizer reorders object
destruction. This changes observable program behavior. If a custom
deinitializer produces side effects, code may observe those side
effects earlier after optimization. Similarly, code that dereferences
a weak reference may observe a 'nil' reference after optimization,
while the unoptimized code observed a valid object.
Developers have overwhelmingly requested that object lifetimes have
similar behavior in -Onone and -O builds in order to find and diagnose
program bugs involving weak references and other lifetime assumptions.
Enabling the copy propagation at -Onone is simply a matter of flipping
a switch. -Onone runtime and code size will improve. By design, copy
propagation, has no direct affect on compile time. It will indirectly
improve optimized compile times, but in debug builds, it simply isn't
a factor.
To support debugging, a "poison" flag was (in prior commits) added to
new destroy_value instructions generated by copy propagation. When
OwnershipModelEliminator lowers destroy_value [poison] it will
generate new debug_value instructions with a “poison” flag.
These additional poison stores to the stack could increase both code
size and -Onone runtime.
rdar://75012368 (-Onone compiler support for early object deinitialization with sentinel dead references)
-enable-copy-propagation: enables whatever form of copy propagation
the current pipeline runs (mandatory-copy-propagation at -Onone,
regular copy-propation at -O).
-disable-copy-propagation: similarly disables any form of copy
propagation in the current pipelien.