The differentiation transform does the following:
- Canonicalizes differentiability witnesses by filling in missing derivative
function entries.
- Canonicalizes `differentiable_function` instructions by filling in missing
derivative function operands.
- If necessary, performs automatic differentiation: generating derivative
functions for original functions.
- When encountering non-differentiability code, produces a diagnostic and
errors out.
Partially resolves TF-1211: add the main canonicalization loop.
To incrementally stage changes, derivative functions are currently created
with empty bodies that fatal error with a nice message.
Derivative emitters will be upstreamed separately.
RedundantPhiElimination eliminates block phi-arguments which have the same value as other arguments of the same block.
This also works with cycles, like two equivalent loop induction variables. Such patterns are generated e.g. when using stdlib's enumerated() on Array.
preheader:
br bb1(%initval, %initval)
header(%phi1, %phi2):
%next1 = builtin "add" (%phi1, %one)
%next2 = builtin "add" (%phi2, %one)
cond_br %loopcond, header(%next1, %next2), exit
exit:
is replaced with
preheader:
br bb1(%initval)
header(%phi1):
%next1 = builtin "add" (%phi1, %one)
%next2 = builtin "add" (%phi1, %one) // dead: will be cleaned-up later
cond_br %loopcond, header(%next1), exit
exit:
Any remaining dead or "trivially" equivalent instructions will then be cleaned-up by DCE and CSE, respectively.
rdar://problem/33438123
We need this anyways for -Onone and I want to do some experiments with running
this very early so I can expose more of the stdlib (modulo inlining) to the new
ownership optimizing passes.
I also changed how the inliner handles inlining around OSSA by changing it to
check early that if the caller is in ossa, then we only inline if all of the
callees that the caller calls are in ossa. The intention is to hopefully avoid
weird swings in code-size/perf due to the inliner heuristic's calculation being
artificially manipulated due to some callees not being available to inline (due
to this difference) when others are already available.
Add ExecuteSILPipelineRequest which executes a
pipeline plan on a given SIL (and possibly IRGen)
module. This serves as a top-level request for
the SILOptimizer that we'll be able to hang
dependencies off.
Rather than registering individual IRGen passes
when we want to execute them, store function
pointers to all the pass constructors on the
ASTContext. This will make it easier to requestify
the execution of pass pipelines.
calls over arrays created from array literals. This enables optimizing
further the output of the OSLogOptimization pass, and results in
highly-compact and optimized IR for calls to the new os log API.
<rdar://58928427>
... including all SIL functions with are transitively referenced from such witness tables.
After the last devirtualizer run witness tables are not needed in the optimizer anymore.
We can delete witness tables with an available-externally linkage. IRGen does not emit such witness tables anyway.
This can save a little bit of compile time, because it reduces the amount of SIL at the end of the optimizer pipeline.
It also reduces the size of the SIL output after the optimizer, which makes debugging the SIL output easier.
Otherwise it can happen that e.g. specialization runs between CrossModuleSerializationSetup and serialization, resulting that an inlinable function references a shared function (which doesn't have a public linkage).
The solution is to move serialization right after CrossModuleSerializationSetup. But only do that if cross-module-optimization is enabled (it would be a disruptive change to move serialization in general).
This is a first version of cross module optimization (CMO).
The basic idea for CMO is to use the existing library evolution compiler features, but in an automated way. A new SIL module pass "annotates" functions and types with @inlinable and @usableFromInline. This results in functions being serialized into the swiftmodule file and thus available for optimizations in client modules.
The annotation is done with a worklist-algorithm, starting from public functions and continuing with entities which are used from already selected functions. A heuristic performs a preselection on which functions to consider - currently just generic functions are selected.
The serializer then writes annotated functions (including function bodies) into the swiftmodule file of the compiled module. Client modules are able to de-serialize such functions from their imported modules and use them for optimiations, like generic specialization.
The optimization is gated by a new compiler option -cross-module-optimization (also available in the swift driver).
By default this option is off. Without turning the option on, this change is (almost) a NFC.
rdar://problem/22591518
These are separate, mostly unrelated passes. Putting them in their own
files makes it easier to read the code, understand how to control the
passes, and makes it possible to independently trace, and debug them.
This pipeline just runs the Serialize SIL pass. The reason I am adding this is
that currently if one passes -disable-sil-perf-optzns or mess with
-sil-opt-pass-count, one can cause serialization to not occur, making it
difficult to bisect/turn off-on opts.
This flag, currently staged in as `-experimental-skip-non-inlinable-function-bodies`, will cause the typechecker to skip typechecking bodies of functions that will not be serialized in the resulting `.swiftmodule`. This patch also includes a SIL verifier that ensures that we don’t accidentally include a body that we should have skipped.
There is still some work left to make sure the emitted .swiftmodule is exactly the same as what’s emitted without the flag, which is what’s causing the benchmark noise above. I’ll be committing follow-up patches to address those, but for now I’m going to land the implementation behind a flag.
of Swift code snippets. Add a new test: constant_evaluable_subset_test.swift
that tests Swift code snippets that are expected to be constant evaluable and
those that are not.
Minimal commit to get the mandatory combiner rolling. The intent of the
mandatory combiner is to do basic transformations after the diagnostics
passes have run at the beginning of both the Onone and the performance
pipelines. For now, it simply visits reachable instructions and does
no work. In a subsequent commit, apply instructions will be processed
and replaced if they are calls to partial applies defined in the same
function. At that point an attempt will be made to eliminate the
partial apply altogether.
For now, the pass does no work and is not part of any pipeline.
Minimal commit to get the mandatory combiner rolling. At the moment,
the mandatory combiner only processes partial apply instructions,
attempting both to replace their applies with applies of their
underlying function refs and also to eliminate them altogether. That
processing is only done if all the arguments to the apply and to the
partial apply are trivial.
For now, the pass is not part of any pipeline.
DestroyHoisting moves destroys of memory locations up the control flow as far as possible.
Beside destroy_addr, also "store [assign]" is considered a destroy, because is is equivalent to an destroy_addr + a "store [init]".
The main purpose of this optimization is to minimize copy-on-write operations for arrays, etc. Especially if such COW containers are used as enum payloads and modified in-place. E.g.
switch e {
case .A(var arr):
arr.append(x)
self = .A(arr)
...
In such a case DestroyHoisting can move the destroy of the self-assignment up before the switch and thus let the array buffer only be single-referenced at the time of the append.
When we have ownership SIL throughout the pass pipeline this optimization will replace the current destroy hoisting optimization in CopyForwarding.
For now, this optimization only runs in the mandatory pipeline (but not for -Onone) where we already have ownership SIL.
SR-10605
rdar://problem/50463362
(also referred to as flow-sensitive mode) so that the evaluator
can be used by clients to constant evaluate instructions in a
SILFunction body one by one following the flow of control.
The new pass is based on existing asserts in DiagnoseStaticExclusivity.
They were compiled out in release builds and only checked for captures of
inout parameters. This patch converts the assertions into diagnostics and
adds checks for captures of non-escaping function values.
Unlike the Sema-based checks that this replaces, the new code handles
transitive captures from recursive local functions, which means certain
invalid code that used to compile will now be rejected with an error.
The new analysis also looks at the ultimate usages of a local function
instead of just assuming all local functions are escaping, which fixes
issues where the compiler would reject valid code.
Fixes a bunch of related issues, including:
- <rdar://problem/29403178>
- <https://bugs.swift.org/browse/SR-8546> / <rdar://problem/43355341>
- <https://bugs.swift.org/browse/SR-9043> / <rdar://problem/45511834>
yields in generalized accessors: _read and _modify, which are
yield-once corountines. This pass is based on the existing SIL verifier
checks but diagnoses only those errors that can be introduced by programmers
when using yields.
<rdar://43578476>
This normalizes the creation of pass pipelines by ensuring that all pass
pipelines take a SILOption instead of only some. It also makes it so that we do
not need to propagate options through various pipeline creation helpers.
This will let me strip ownership from non-transparent functions at the beginning
of the perf pipeline. Then after we serialize, I will run OME on the transparent
functions. Otherwise, we can not perform mandatory inlining successfully since
we can not inline ossa into non-ossa functions.
I discovered while updating PMO for ownership that for ~5 years there has been a
bug where we were treating copy_addr of trivial values like an "Assign" (in PMO
terminology) of a non-trivial value and thus stopping allocation
elimination. When I fixed this I discovered that this caused us to no longer
emit diagnostics in a predictable way. Specifically, consider the following
swift snippet:
var _: UInt = (-1) >> 0
Today, we emit a diagnostic that -1 can not be put into a UInt. This occurs
since even though the underlying allocation is only stored into, the copy_addr
assign keeps it alive, causing the diagnostics pass to see the conversion. With
my fix though, we see that we are only storing into the allocation, causing the
allocation to be eliminated before the constant propagation diagnostic pass
runs, causing the diagnostic to no longer be emitted.
We should truly not be performing this type of DCE before we emit such
diagnostics. So in this commit, I split the pass into two parts:
1. A load promotion pass that performs the SSA formation needed for SSA based
diagnostics to actually work.
2. An allocation elimination passes that run /after/ SSA based diagnostics.
This should be NFC since the constant propagation SSA based diagnostics do not
create memory operations so the output should be the same.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
General case:
begin_access A
...
strong_release / release_value / destroy
end_access
The release instruction can be sunk below the end_access instruction,
This extends the lifetime of the released value, but, might allow us to
Mark the access scope as no nested conflict.
General case:
—
begin_access A (may or may not have no_nested_conflict)
load/store
end_access
apply // may have a scoped access that conflicts with A
begin_access A [no_nested_conflict]
load/store
end_access A
—
The second access scope does not need to be emitted.
NOTE: KeyPath access must be identified at the top-level, non-inlinable stdlib entry point.
As such, The sodlib entry pointed is annotated by a new @_semantics that is equivalent to inline(never)
Introduce an "early redundant load elimination", which does not optimize loads from arrays.
Later array optimizations, like ABCOpt, get confused if an array load in a loop is converted to a pattern with a phi argument.
This problem was introduced with accessors.
rdar://problem/44184763
This is a simple "utility" pass that canonicalizes SSA SILValues with
respect to copies and destroys. It is a self-contained, provably
complete pass that eliminates spurious copy_value instructions from
scalar SSA SILValues. It fundamentally depends on ownership SIL, but
otherwise can be run efficiently after any other pass. It separates
the pure problem of handling scalar SSA values from the more important
and complex problems:
- Promoting variables to SSA form (PredictableMemOps and Mem2Reg
partially do this).
- Optimizing copies within "SIL borrow" scopes (another mandatory pass
will be introduced to do this).
- Composing and decomposing aggregates (SROA handles some of this).
- Coalescing phis (A BlockArgumentOptimizer will be introduced as part
of AddressLowering).
- Removing unnecessary retain/release when nothing within its scope
may release the same object (ARC Code Motion does some of this).
Note that removing SSA copies was more obviously necessary before the
migration to +0 argument convention.
To do so this commit does a few different things:
1. I changed SILOptFunctionBuilder to notify the pass manager's logging
functionality when new functions are added to the module and to notify analyses
as well. NOTE: This on purpose does not put the new function on the pass manager
worklist since we do not want to by mistake introduce a large amount of
re-optimizations. Such a thing should be explicit.
2. I eliminated SILModuleTransform::notifyAddFunction. This just performed the
operations from 1. Now that SILOptFunctionBuilder performs this operation for
us, it is not needed.
3. I changed SILFunctionTransform::notifyAddFunction to just add the function to
the passmanager worklist. It does not need to notify the pass manager's logging
or analyses that a new function was added to the module since
SILOptFunctionBuilder now performs that operation. Given its reduced
functionality, I changed the name to addFunctionToPassManagerWorklist(...). The
name is a little long/verbose, but this is a feature since one should think
before getting the pass manager to rerun transforms on a function. Also, giving
it a longer name calls out the operation in the code visually, giving this
operation more prominance when reading code. NOTE: I did the rename using
Xcode's refactoring functionality!
rdar://42301529
I am going to add the code in a bit that does the notifications. I tried to pass
down the builder instead of the pass manager. I also tried not to change the
formatting.
rdar://42301529
This name makes it clear that the function has not yet been deleted and also
contrasts with the past tense used in the API notifyAddedOrModifiedFunction to
show that said function has already added/modified the function.
The name notifyAddFunction is actively harmful since the pass manager uses this
entrypoint to notify analyses of added *OR* modified functions. It is up to the
caller analysis to distinguish in between these cases.
I am not vouching for the design, just trying to make names match the
current behavior.