These do not specifically have to do with PartitionUtils... they are really
logging options for the whole infrastructure, so it makes sense to have them in
the a different file.
I am going to reuse this for TransferNonSendable. In the process I made a few
changes to make it more amenable to both use cases and used the current set of
tests that we have for noncopyable types to validate that I didn't break
anything.
One needs to pass in the explicit flag to enable this as well as
-debug-flag=send-non-sendable. This makes it easier to debug the affect of
applying specific partition ops.
The new utility, to be run as part of copy propagation, hoists
destroy_values of owned lexical values up to deinit barriers. It is
heavily based on the rewritten ShrinkBorrowScope.
The new utility folds patterns like
TOP:
// %borrowee is some owned value
%lifetime = begin_borrow %borrowee
BOTTOM:
// %copy is some transitive copy of %borrowee
apply %f(%copy)
end_borrow %lifetime
destroy_value %borrowee
into
TOP:
%move = move_value [lexical] %borrowee
%lifetime begin_borrow [lexical] %move
BOTTOM:
end_borrow %lifetime
apply %f(%move)
It is intended to be run after ShrinkBorrowScope moves the end_borrow up
to just before a relevant apply and after CanonicalizeOSSALifetime moves
destroy_value instructions up to just after their last guaranteed use,
at which point these patterns will exist.
During copy propagation (for which -enable-copy-propagation must still
be passed), also try to shrink borrow scopes by hoisting end_borrows
using the newly added ShrinkBorrowScope utility.
Allow end_borrow instructions to be hoisted over instructions that are
not deinit barriers for the value which is borrowed. Deinit barriers
include uses of the value, loads of memory, loads of weak references
that may be zeroed during deinit, and "synchronization points".
rdar://79149830
We need to be able to inject a call to a distributed actor's
transport.actorReady, passing the actor instance to it,
during definite initialization. This means that its dependence
on SILGenFunction must be broken, hence this refactoring as
a SILOptimizer utility.
This rewrites functionality that was mostly disabled but is now ready
to be enabled.
Allow lifetime canonicalization of owned values and function arguments
as a simple stand-alone utility. This is now being called from within
SILCombine, so we should only do the kind of canonicalization that
makes sense in that context.
Canonicalizing other borrow scopes should *not* be invoked as a
single-value cleanup because it affects other lifetimes outside the
borrow scope boundary. It is a somewhat complicated process that
hoists and sinks forwarding instructions and can generate surrounding
compensation code. The copy propagation pass knows how to post-process
the related lifetimes in just the right order. So borrow scope
rewriting should only be done in the copy propagation pass.
Similarly, only do simple canonicalization of owned values and
function arguments at -Onone.
The feature to canoncalize borrow scopes is now ready to be
enabled (-canonical-ossa-rewrite-borrows), but flipping the switch
should be a separate commit. So most of the functionality that was
affected is not exposed by this PR.
Changes:
Split canonicalization of owned lifetimes vs. borrowed lifetimes into
separate utilities. The owned lifetime utility is now back to being
the simple utility that I originally envisioned. So not much happened
to it other than removing complexity.
We now have a separate entry point for finding the starting point for
rewriting borrow scopes:
CanonicalizeBorrowScope::getCanonicalBorrowedDef.
We now have a utility that defines forwarding instructions that we can
treat consistently as part of a guaranteed lifetime,
CanonicalizeBorrowScope::isRewritableOSSAForward.
We now have a utility that defines the uses of a borrowed value that
are considered part of its lifetime,
CanonicalizeBorrowScope::visitBorrowScopeUses. This single utility is
used to implement three different parts of the alogrithm:
1. Find any uses of the borrowed value that need to be propagated
outside the borrow scope
2. RewriteInnerBorrowUses for SILFunction arguments and borrow scopes
with no outer uses.
3. RewriteOuterBorrowUses for borrow scopes with outer uses. Handling
these involves creating new copies outside the borrow scope and
hoisting forwarding instructions.
The end result is that a lot of borrow scopes can be eliminated and
owned values can be forwarded to destructures, reducing copies and
destroys.
If we stop generating borrow scopes for all interior pointers, then
we'll need to design a comparable optimization that works on
"implicit" borrow scopes:
%ownedDef = ...
%element struct_extract %ownedDef
%copy = copy_value %element
apply(@guaranteed %element)
apply(@owned %copy)
destroy %ownedDef
Should be:
%ownedDef = ...
%borrowedElement = destructure_struct @guaranteed %ownedDef
apply(@guaranteed %borrowedElement)
%ownedElement = destructure_struct %ownedDef
apply(@owned %copy)
- `Mangle::ASTMangler::mangleAutoDiffDerivativeFunction()` and `Mangle::ASTMangler::mangleAutoDiffLinearMap()` accept original function declarations and return a mangled name for a derivative function or linear map. This is called during SILGen and TBDGen.
- `Mangle::DifferentiationMangler` handles differentiation function mangling in the differentiation transform. This part is necessary because we need to perform demangling on the original function and remangle it as part of a differentiation function mangling tree in order to get the correct substitutions in the mangled derivative generic signature.
A mangled differentiation function name includes:
- The original function.
- The differentiation function kind.
- The parameter indices for differentiation.
- The result indices for differentiation.
- The derivative generic signature.
Canonicalizing OSSA provably minimizes the number of retains and
releases within the boundaries of that lifetime. This eliminates the
need for ad-hoc optimization of OSSA copies.
This initial implementation only canonicalizes owned values, but
canonicalizing guaranteed values is a simple extension.
This was originally part of the CopyPropagation prototype years
ago. Now OSSA is specified completely enough that it can be turned
into a simple utility instead.
CanonicalOSSALifetime uses PrunedLiveness to find the extended live
range and identify the consumes on the boundary. All other consumes
need their own copy. No other copies are needed.
By running this after other transformations that affect OSSA
lifetimes, we can avoid the need to run pattern-matching optimization
to SemanticARC to recover from suboptimal patterns, which is not
robust, maintainable, or efficient.
This bare-bones utility will be the basis for
CanonicalizeOSSALifetime. It is maximally flexible and can be adopted
by any analysis that needs SSA-based liveness expressed in terms of
the live blocks. It's meant to be layered underneath various
higher-level analyses.
We could consider revamping ValueLifetimeAnalysis and layering it on
top of this. If PrunedLiveness is adopted widely enough, we can
combine it with a block numbering analysis so we can micro-optimize
the internal data structures.
This is a generic API that when ownership is enabled allows one to replace all
uses of a value with a value with a differing ownership by transforming/lifetime
extending as appropriate.
This API supports all pairings of ownership /except/ replacing a value with
OwnershipKind::None with a value without OwnershipKind::None. This is a more
complex optimization that we do not support today. As a result, we include on
our state struct a helper routine that callers can use to know if the two values
that they want to process can be handled by the algorithm.
My moticiation is to use this to to update InstSimplify and SILCombiner in a
less bug prone way rather than just turn stuff off.
Noting that this transformation inserts ownership instructions, I have made sure
to test this API in two ways:
1. With Mandatory Combiner alone (to make sure it works period).
2. With Mandatory Combiner + Semantic ARC Opts to make sure that we can
eliminate the extra ownership instructions it inserts.
As one can see from the tests, the optimizer today is able to handle all of
these transforms except one conditional case where I need to eliminate a dead
phi arg. I have a separate branch that hits that today but I have exposed unsafe
behavior in ClosureLifetimeFixup that I need to fix first before I can land
that. I don't want that to stop this PR since I think the current low level ARC
optimizer may be able to help me here since this is a simple transform it does
all of the time.
This simplifies the handling of the subdirectories in the SIL and
SILOptimizer paths. Create individual libraries as object libraries
which allows the analysis of the source changes to be limited in scope.
Because these are object libraries, this has 0 overhead compared to the
previous implementation. However, string operations over the filenames
are avoided. The cost for this is that any new sub-library needs to be
added into the list rather than added with the special local function.
Move differentiation-related SILOptimizer files to
{include/swift,lib}/SILOptimizer/Differentiation/.
This reduces directory nesting and gathers files together.
The differentiation transform does the following:
- Canonicalizes differentiability witnesses by filling in missing derivative
function entries.
- Canonicalizes `differentiable_function` instructions by filling in missing
derivative function operands.
- If necessary, performs automatic differentiation: generating derivative
functions for original functions.
- When encountering non-differentiability code, produces a diagnostic and
errors out.
Partially resolves TF-1211: add the main canonicalization loop.
To incrementally stage changes, derivative functions are currently created
with empty bodies that fatal error with a nice message.
Derivative emitters will be upstreamed separately.
We have an optimization in SILCombiner that "inlines" the use of compile-time constant key paths by performing the property access directly instead of calling a runtime function (leading to huge performance gains e.g. for heavy use of @dynamicMemberLookup). However, this optimization previously only supported key paths which solely access stored properties, so computed properties, optional chaining, etc. still had to call a runtime function. This commit generalizes the optimization to support all types of key paths.
We have an optimization in SILCombiner that "inlines" the use of compile-time constant key paths by performing the property access directly instead of calling a runtime function (leading to huge performance gains e.g. for heavy use of @dynamicMemberLookup). However, this optimization previously only supported key paths which solely access stored properties, so computed properties, optional chaining, etc. still had to call a runtime function. This commit generalizes the optimization to support all types of key paths.
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
CanonicalizeInstruction will be a superset of
simplifyInstruction (once all the transforms are fixed for ownership
SIL). Additionally, it will also include simple SSA-based
canonicalization that requires new instruction creation. It may not
perform any optimization that interferes with diagnostics or increases
compile time.
Canonicalization replaces simplifyInstruction in SILCombine so we can
easily factor some existing SILCombine transforms into canonicalization.
Implements a constant interpreter that can deal with basic integer operations.
Summary of the features that it includes:
* builtin integer values, and builtin integer insts
* struct and tuple values, and insts that construct and extract them (necessary to use stdlib integers)
* function referencing and application (necessary to call stdlib integer functions)
* error handling data structures and logic, for telling you why your value is not evaluatable
* metatype values (not necessary for integers, but it's only a few extra lines, so I thought it would be more trouble than it's worth to put them in a separate PR)
* conditional branches (ditto)
All this does is automate the creation of the ${DIRNAME}_SOURCES variables that we already create and allows for the author to avoid having to prefix with the directory name, i.e.:
set(FOOBAR_SOURCES
FooBar/Source.cpp
PARENT_SCOPE)
=>
silopt_register_sources(
Source.cpp)
Much easier and cleaner to read. I put the code that implements this in the
CMakeLists.txt file just for the SILOptimizer.
This functionality is really specific to FunctionSignatureOpts. It really
doesn't make sense to have it as a utils until it becomes more general or we
need it in multiple places.
NFC.
rdar://38196046
Local.cpp was ~3k lines of which 1.5k (i.e. 1/2) was the cast optimizer. This
commit extracts the cast optimizer into its own .cpp and .h file. It is large
enough to stand on its own and allows for Local.cpp to return to being a small
group of helper functions.
I am making some changes in this area due to the change in certain function
conventions caused by the +0-normal-arg work. I am just trying to leave the area
a little cleaner than before.
This patch implements collection and dumping of statistics about SILModules, SILFunctions and memory consumption during the execution of SIL optimization pipelines.
The following statistics can be collected:
* For SILFunctions: the number of SIL basic blocks, the number of SIL instructions, the number of SIL instructions of a specific kind, duration of a pass
* For SILModules: the number of SIL basic blocks, the number of SIL instructions, the number of SIL instructions of a specific kind, the number of SILFunctions, the amount of memory used by the compiler, duration of a pass
By default, any collection of statistics is disabled to avoid affecting compile times.
One can enable the collection of statistics and dumping of these statistics for the whole SILModule and/or for SILFunctions.
To reduce the amount of produced data, one can set thresholds in such a way that changes in the statistics are only reported if the delta between the old and the new values are at least X%. The deltas are computed as using the following formula:
Delta = (NewValue - OldValue) / OldValue
Thresholds provide a simple way to perform a simple filtering of the collected statistics during the compilation. But if there is a need for a more complex analysis of collected data (e.g. aggregation by a pipeline stage or by the type of a transformation), it is often better to dump as much data as possible into a file using e.g. -sil-stats-dump-all -sil-stats-modules -sil-stats-functions and then e.g. use the helper scripts to store the collected data into a database and then perform complex queries on it. Many kinds of analysis can be then formulated pretty easily as SQL queries.
This is useful for optimizations (like AllocBoxToStack) which create (de-)alloc_stack instructions.
They can just insert the new instructions anywhere without worrying about nesting and correct the nesting afterwards.
Following classes provide symbol mangling for specific purposes:
*) Mangler: the base mangler class, just providing some basic utilities
*) ASTMangler: for mangling AST declarations
*) SpecializationMangler: to be used in the optimizer for mangling specialized function names
*) IRGenMangler: mangling all kind of symbols in IRGen
All those classes are not used yet, so it’s basically a NFC.
Another change is that some demangler node types are added (either because they were missing or the new demangler needs them).
Those new nodes also need to be handled in the old demangler, but this should also be a NFC as those nodes are not created by the old demangler.
My plan is to keep the old and new mangling implementation in parallel for some time. After that we can remove the old mangler.
Currently the new implementation is scoped in the NewMangling namespace. This namespace should be renamed after the old mangler is removed.
- Move the common performance inliner functionality into PerformanceInlinerUtils.cpp.
- Move the functionality specific to non-generic inlining into NonGenericPerformanceInliner.cpp
- Temporarily disable the inlining of generics. It will be enabled in the subsequent commit.
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
(libraries now)
It has been generally agreed that we need to do this reorg, and now
seems like the perfect time. Some major pass reorganization is in the
works.
This does not have to be the final word on the matter. The consensus
among those working on the code is that it's much better than what we
had and a better starting point for future bike shedding.
Note that the previous organization was designed to allow separate
analysis and optimization libraries. It turns out this is an
artificial distinction and not an important goal.