To print the module, use the new llvm flag -sil-print-canonical-module
which parallels the existing flag -sil-view-canonical-cfg. When that
flag is passed, the new pass ModulePrinter is added to the diagnostic
pass pipeline after mandatory diagnostics have run. The new pass just
prints the module to stdout.
Optimize code like:
puts("\(String.self)")
Optimizing string interpolation and optimizing C-strings are both done in StringOptimization.
A second run of the StringOptimization is needed in the pipeline to optimize such code, because the result of the interpolation-optimization must be cleaned up so that the C-String optimization can kick in.
Also, StringOptimization must handle struct_extract(struct(literal)), where the struct_extract may be in a called function.
To solve a phase ordering problem with inlining String semantics and inlining the `String(stringInterpolation: DefaultStringInterpolation)` constructor, we do a simple analysis of the callee. Doing this simple "interprocedural" analysis avoids relying on inlining that String constructor.
rdar://74941849
- If any of the `-g<kind>` flag is given -- except `-gnone`, debug
info will be printed into every generated SIL files.
- The `-gsil` is deprecated in favor of `-sil-based-debuginfo`. The
SILDebugInfoGenerator Pass now generates intermediate SIL file with
name "<output file>.sil_dbg_<n>.sil". Other functionalities of that
Pass remain the same.
Only issue weak lifetime warnings for users who select object lifetime
optimization. The risk of spurious warnings outweighs the benefits.
Although the warnings are generally useful regardless of the level of
optimization, it isn't really critical to issue them unless the optimizer
aggressively shrinks reference lifetimes.
Fixes rdar://79146338 Xcode warns that "referenced object is
deallocated here" but that object was passed into a method that causes
strong retention)
TLDR: The reason why I am doing this is that often times people confuse assembly
vision remarks for normal opt remarks. I want to accentuate that this is
actually trying to do something different than a traditional opt remark. To that
end I renamed things in the compiler and added a true attribute
`@_assemblyVision` to trigger the compiler to emit these remarks to help
everyone remember what this is in their ontology. I explain below the
difference.
----
Normal opt remarks work by the optimizer telling you if it succeeded or failed
to perform an optimization. Another way of putting this is that opt remarks is
trying to give back feedback to the user from an expert system about why it did
or not do something. There is inherently an act of interpretation in the
optimizer about whether or not to report an 'action' that it perpetrated to the
user.
Assembly Vision Remarks is instead trying to be an expert tool that acts like an
xray. Instead of telling the user about what the optimizer did, it is instead a
simple visitor that visits the IR and emits SourceLocations for where specific
hazards ending up in the program. In this sense it is just telling the user
where certain instructions ended up and using heuristics to relate this to
information at the IR level. To a get a sense of this difference, consider the
following Swift Code:
```
public class Klass {
func doSomething() {}
}
var global: Klass = Klass()
@inline(__always)
func bar() -> Klass { global }
@_assemblyVision
@inline(never)
func foo() {
bar().doSomething()
}
```
In this case, we will emit the following remarks:
```
test.swift:16:5: remark: begin exclusive access to value of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:9: remark: end exclusive access to value of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:11: remark: retain of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
test.swift:16:23: remark: release of type 'Klass'
bar().doSomething()
^
test.swift:7:5: note: of 'global'
var global: Klass = Klass()
^
```
Notice how the begin/end exclusive access are marked as actually being before
the retain, release of global. That seems weird since exclusive access to memory
seems like something that should not escape an exclusivity scope... but in fact
this corresponds directly to what we eventually see in the SIL:
```
// test.sil
sil hidden [noinline] [_semantics "optremark"] @$ss3fooyyF : $@convention(thin) () -> () {
bb0:
%0 = global_addr @$ss6globals5KlassCvp : $*Klass
%1 = begin_access [read] [dynamic] [no_nested_conflict] %0 : $*Klass
%2 = load %1 : $*Klass
end_access %1 : $*Klass
%4 = class_method %2 : $Klass, #Klass.doSomething : (Klass) -> () -> (), $@convention(method) (@guaranteed Klass) -> ()
strong_retain %2 : $Klass
%6 = apply %4(%2) : $@convention(method) (@guaranteed Klass) -> ()
strong_release %2 : $Klass
%8 = tuple ()
return %8 : $()
} // end sil function '$ss3fooyyF'
```
and assembly,
```
// test.S
_$ss3fooyyF:
pushq %rbp
movq %rsp, %rbp
pushq %r13
pushq %rbx
subq $32, %rsp
leaq _$ss6globals5KlassCvp(%rip), %rdi
leaq -40(%rbp), %rsi
xorl %edx, %edx
xorl %ecx, %ecx
callq _swift_beginAccess
movq _$ss6globals5KlassCvp(%rip), %r13
movq (%r13), %rax
movq 80(%rax), %rbx
movq %r13, %rdi
callq _swift_retain
callq *%rbx
movq %r13, %rdi
callq _swift_release
addq $32, %rsp
popq %rbx
popq %r13
popq %rbp
retq
```
so as one can see what we are trying to do is inform the user of hazards in the
code without trying to reason about it, automated a task that users often have
to perform by hand: inspection of assembly to determine where runtime calls and
other hazards ended up.
Problem: We continue to uncover code that assumes either precise local
variable lifetimes (to the end of the lexical scope) or extended
temporary lifetimes (to the end of the statement). These bugs require
heroic debugging to find the root cause. Because they only show up in
Release builds, they often manifest just before the affected project
“ships” under an impending deadline.
We now have enough information from projects that have been tested
with copy propagation that we can both understand common patterns and
identify some specific APIs that may cause trouble. We know what API
annotations the compiler will need for helpful warnings and can begin
adding those annotations.
Disabling copy propagation now is only a temporary deferral, we will
still need to bring it back by default. However, by then we should
have:
- LLDB and runtime support for debugging deinitialized objects
- A variant of lifetime sortening that can run in Debug builds to
catch problems before code ships
- Static compiler warnings for likely invalid lifetime assumptions
- Source annotations that allow those warnings to protect programmers
against existing dangerous APIs
In the meantime...
Projects can experiment with the behavior and gradually migrate.
Copy propagation will automatically be enabled in -enable-ossa-modules
mode. It is important to work toward a single performance
target. Supporting full OSSA and improving ARC performance without
copy propagation would be prohibitively complicated.
rdar://76438920 (Temporarily disable -O copy propagation by default)
Previously, because partial apply forwarders for async functions were
not themselves fully-fledged async functions, they were not able to
handle dynamic functions. Specifically, the reason was that it was not
possible to produce an async function pointer for the partial apply
forwarder because the size to be used was not knowable.
Thanks to https://github.com/apple/swift/pull/36700, that cause has been
eliminated. With it, partial apply forwarders are fully-fledged async
functions and in particular have their own async function pointers.
Consequently, it is again possible for these partial apply forwarders to
handle non-constant function pointers.
Here, that behavior is restored, by way of reverting part of
ee63777332 while preserving the ABI it
introduced.
rdar://76122027
This feature degrades the debugging experience and causes a large
number of unit test failures.
These were both known issues, but our planned debugger improvements
won't be ready for a while. Until then, we'll leave the feature under
a compiler option, and developers can adopt it at there own speed for
now when they are ready to fix lifetime issues in their code.
rdar://76177280 (Disable mandatory-copy-propagation (-Onone only))
The comment in LowerHopToActor explains the design here.
We want SILGen to emit hops to actors, ignoring executors,
because it's easier to fully optimize in a world where deriving
an executor is a non-trivial operation. But we also want something
prior to IRGen to lower the executor derivation because there are
useful static optimizations we can do, such as doing the derivation
exactly once on a dominance path and strength-reducing the derivation
(e.g. exploiting static knowledge that an actor is a default actor).
There are probably phase-ordering problems with doing this so late,
but hopefully they're restricted to situations like actors that
share an executor. We'll want to optimize that eventually, but
in the meantime, this unblocks the executor work.
Previously, thick async functions were represented sometimes as a pair
of (AsyncFunctionPointer, nullptr)--when the thick function was produced
via a thin_to_thick_function, e.g.--and sometimes as a pair of
(FunctionPointer, ThickContext)--when the thick function was produced by
a partial_apply--with the size stored in the slot of the ThickContext.
That optimized for the wrong case: partial applies of dynamic async
functions; in that case, there is no appropriate AsyncFunctionPointer to
form when lowering the partial_apply instruction. The far more common
case is to know exactly which function is being partially applied. In
that case, we can form the appropriate AsyncFunctionPointer.
Furthermore, the previous representation made calling a thick function
more complex: it was always necessary to check whether the context was
in fact null and then proceed along two different paths depending.
Here, that behavior is corrected by creating a thunk in a mandatory
IRGen SIL pass in the case that the function that is being partially
applied is dynamic. That new thunk is then partially applied in place
of the original partial_apply of the dynamic function.
This shortens -Onone lifetimes.
To eliminate ARC traffic, the optimizer reorders object
destruction. This changes observable program behavior. If a custom
deinitializer produces side effects, code may observe those side
effects earlier after optimization. Similarly, code that dereferences
a weak reference may observe a 'nil' reference after optimization,
while the unoptimized code observed a valid object.
Developers have overwhelmingly requested that object lifetimes have
similar behavior in -Onone and -O builds in order to find and diagnose
program bugs involving weak references and other lifetime assumptions.
Enabling the copy propagation at -Onone is simply a matter of flipping
a switch. -Onone runtime and code size will improve. By design, copy
propagation, has no direct affect on compile time. It will indirectly
improve optimized compile times, but in debug builds, it simply isn't
a factor.
To support debugging, a "poison" flag was (in prior commits) added to
new destroy_value instructions generated by copy propagation. When
OwnershipModelEliminator lowers destroy_value [poison] it will
generate new debug_value instructions with a “poison” flag.
These additional poison stores to the stack could increase both code
size and -Onone runtime.
rdar://75012368 (-Onone compiler support for early object deinitialization with sentinel dead references)
-enable-copy-propagation: enables whatever form of copy propagation
the current pipeline runs (mandatory-copy-propagation at -Onone,
regular copy-propation at -O).
-disable-copy-propagation: similarly disables any form of copy
propagation in the current pipelien.
This bleeds into the implementation where "guaranteed" is used
everywhere to talk about optimization of guaranteed values. We need to
use mandatory to indicate we're talking about the pass pipeline.
It is currently disabled so this commit is NFC.
MandatoryCopyPropagation canonicalizes all all OSSA lifetimes with
either CopyValue or DestroyValue operations. While regular
CopyPropagation only canonicalizes lifetimes with copies. This ensures
that more lifetime program bugs are found in debug builds. Eventually,
regular CopyPropagation will also canonicalize all lifetimes, but for
now, we don't want to expose optimized code to more behavior change
than necessary.
Add frontend flags for developers to easily control copy propagation:
-enable-copy-propagation: enables whatever form of copy propagation
the current pipeline runs (mandatory-copy-propagation at -Onone,
regular copy-propation at -O).
-disable-copy-propagation: similarly disables any form of copy
propagation in the current pipelien.
To control a specific variant of the passes, use
-Xllvm -disable-pass=mandatory-copy-propagation
or -Xllvm -disable-pass=copy-propagation instead.
The meaning of these flags will stay the same as we adjust the
defaults. Soon mandatory-copy-propagation will be enabled by
default. There are two reasons to do this, both related to predictable
behavior across Debug and Release builds.
1. Shortening object lifetimes can cause observable changes in program
behavior in the presense of weak/unowned reference and
deinitializer side effects.
2. Programmers need to know reliably whether a given code pattern will
copy the storage for copy-on-write types (Array, Set). Eliminating
the "unexpected" copies the same way at -Onone and -O both makes
debugging tractable and provides assurance that the code isn't
relying on the luck of the optimizer in a particular compiler
release.
The DiagnoseLifetimeIssuesPass pass prints a warning if an object is stored to a weak property (or is weakly captured) and destroyed before the property (or captured reference) is ever used again.
This can happen if the programmer relies on the lexical scope to keep an object alive, but copy-propagation can shrink the object's lifetime to its last use.
For example:
func test() {
let k = Klass()
// k is deallocated immediately after the closure capture (a store_weak).
functionWithClosure({ [weak k] in
// crash!
k!.foo()
})
}
Unfortunately this pass can only catch simple cases, but it's better than nothing.
rdar://73910632
There is some sort of ASAN issue that this exposes on Linux, so I am going to do
this on Darwin and then debug the Linux issue using ASAN over the weekend/next
week.
This eliminates some regressions by eliminating phase ordering in between
ARCSequenceOpts/inlining with read only functions whose read onlyness is lost
after inlining.
The main change is to rename DeadFunctionElimination -> DeadFunctionAndGlobalElimination, because the pass is now also doing dead-global elimination.
A second change is to remove the FunctionLivenessComputation base class. It’s not used anywhere else.
-enable-subst-sil-function-types-for-function-values
-enable-large-loadable-types
These defaulted to on, and there were no corresponding flags for
turning them off, so the flags had no effect.
For now simply run the pass before SemanticARCOpts. This will probably
be called as a utility from within SemanticARCOpts so it can be
iteratively applied after other ARC-related transformations.
It's against the principles of pass design to check the driver mode
within the pass. A pass always needs to do the same thing regardless
of where it runs in the pass pipeline. It also needs to be possible to
test passes in isolation.
Adds a new flag "-experimental-skip-all-function-bodies" that skips
typechecking and SIL generation for all function bodies (where
possible).
`didSet` functions are still typechecked and have SIL generated as their
body is checked for the `oldValue` parameter, but are not serialized.
Parsing will generally be skipped as well, but this isn't necessarily
the case since other flags (eg. "-verify-syntax-tree") may force delayed
parsing off.
* Redundant hop_to_executor elimination: if a hop_to_executor is dominated by another hop_to_executor with the same operand, it is eliminated:
hop_to_executor %a
... // no suspension points
hop_to_executor %a // can be eliminated
* Dead hop_to_executor elimination: if a hop_to_executor is not followed by any code which requires to run on its actor's executor, it is eliminated:
hop_to_executor %a
... // no instruction which require to run on %a
return
rdar://problem/70304809
to check for improperly nested '@_semantic' functions.
Add a missing @_semantics("array.init") in ArraySlice found by the
diagnostic.
Distinguish between array.init and array.init.empty.
Categorize the types of semantic functions by how they affect the
inliner and pass pipeline, and centralize this logic in
PerformanceInlinerUtils. The ultimate goal is to prevent inlining of
"Fundamental" @_semantics calls and @_effects calls until the late
pipeline where we can safely discard semantics. However, that requires
significant pipeline changes.
In the meantime, this change prevents the situation from getting worse
and makes the intention clear. However, it has no significant effect
on the pass pipeline and inliner.
This attribute allows to define a pre-specialized entry point of a
generic function in a library.
The following definition provides a pre-specialized entry point for
`genericFunc(_:)` for the parameter type `Int` that clients of the
library can call.
```
@_specialize(exported: true, where T == Int)
public func genericFunc<T>(_ t: T) { ... }
```
Pre-specializations of internal `@inlinable` functions are allowed.
```
@usableFromInline
internal struct GenericThing<T> {
@_specialize(exported: true, where T == Int)
@inlinable
internal func genericMethod(_ t: T) {
}
}
```
There is syntax to pre-specialize a method from a different module.
```
import ModuleDefiningGenericFunc
@_specialize(exported: true, target: genericFunc(_:), where T == Double)
func prespecialize_genericFunc(_ t: T) { fatalError("dont call") }
```
Specially marked extensions allow for pre-specialization of internal
methods accross module boundries (respecting `@inlinable` and
`@usableFromInline`).
```
import ModuleDefiningGenericThing
public struct Something {}
@_specializeExtension
extension GenericThing {
@_specialize(exported: true, target: genericMethod(_:), where T == Something)
func prespecialize_genericMethod(_ t: T) { fatalError("dont call") }
}
```
rdar://64993425
This can compensate the performance regression of the more conservative handling of function calls in TempRValueOpt (see previous commit).
The pass runs after the inlining passes and can therefore optimize in some cases where it's not possible before inlining.