This became necessary after recent function type changes that keep
substituted generic function types abstract even after substitution to
correctly handle automatic opaque result type substitution.
Instead of performing the opaque result type substitution as part of
substituting the generic args the underlying type will now be reified as
part of looking at the parameter/return types which happens as part of
the function convention apis.
rdar://62560867
When inlining many functions in a very large basic block, the splitting of the block at the call sites is quadratic, when traversing in forward order.
Traversing backwards, fixes the problem.
rdar://problem/56268570
We need this anyways for -Onone and I want to do some experiments with running
this very early so I can expose more of the stdlib (modulo inlining) to the new
ownership optimizing passes.
I also changed how the inliner handles inlining around OSSA by changing it to
check early that if the caller is in ossa, then we only inline if all of the
callees that the caller calls are in ossa. The intention is to hopefully avoid
weird swings in code-size/perf due to the inliner heuristic's calculation being
artificially manipulated due to some callees not being available to inline (due
to this difference) when others are already available.
This reverts commit 66474ed5a2.
The original commit triggers a crash in IRGen: rdar://problem/59456064
Reverting for now until the IRGen issue is fixed.
The returned partial_apply of a thunk is most likely being optimized away if inlined.
Because some thunks cannot be specialized (e.g. if an opened existential is in the subsitution list), inline such thunks also in case they are generic.
https://bugs.swift.org/browse/SR-12115
rdar://problem/59061452
https://forums.swift.org/t/improving-the-representation-of-polymorphic-interfaces-in-sil-with-substituted-function-types/29711
This prepares SIL to be able to more accurately preserve the calling convention of
polymorphic generic interfaces by letting the type system represent "substituted function types".
We add a couple of fields to SILFunctionType to support this:
- A substitution map, accessed by `getSubstitutions()`, which maps the generic signature
of the function to its concrete implementation. This will allow, for instance, a protocol
witness for a requirement of type `<Self: P> (Self, ...) -> ...` for a concrete conforming
type `Foo` to express its type as `<Self: P> (Self, ...) -> ... for <Foo>`, preserving the relation
to the protocol interface without relying on the pile of hacks that is the `witness_method`
protocol.
- A bool for whether the generic signature of the function is "implied" by the substitutions.
If true, the generic signature isn't really part of the calling convention of the function.
This will allow closure types to distinguish a closure being passed to a generic function, like
`<T, U> in (*T, *U) -> T for <Int, String>`, from the concrete type `(*Int, *String) -> Int`,
which will make it easier for us to differentiate the representation of those as types, for
instance by giving them different pointer authentication discriminators to harden arm64e
code.
This patch is currently NFC, it just introduces the new APIs and takes a first pass at updating
code to use them. Much more work will need to be done once we start exercising these new
fields.
This does bifurcate some existing APIs:
- SILFunctionType now has two accessors to get its generic signature.
`getSubstGenericSignature` gets the generic signature that is used to apply its
substitution map, if any. `getInvocationGenericSignature` gets the generic signature
used to invoke the function at apply sites. These differ if the generic signature is
implied.
- SILParameterInfo and SILResultInfo values carry the unsubstituted types of the parameters
and results of the function. They now have two APIs to get that type. `getInterfaceType`
returns the unsubstituted type of the generic interface, and
`getArgumentType`/`getReturnValueType` produce the substituted type that is used at
apply sites.
The XXOptUtils.h convention is already established and parallels
the SIL/XXUtils convention.
New:
- InstOptUtils.h
- CFGOptUtils.h
- BasicBlockOptUtils.h
- ValueLifetime.h
Removed:
- Local.h
- Two conflicting CFG.h files
This reorganization is helpful before I introduce more
utilities for block cloning similar to SinkAddressProjections.
Move the control flow utilies out of Local.h, which was an
unreadable, unprincipled mess. Rename it to InstOptUtils.h, and
confine it to small APIs for working with individual instructions.
These are the optimizer's additions to /SIL/InstUtils.h.
Rename CFG.h to CFGOptUtils.h and remove the one in /Analysis. Now
there is only SIL/CFG.h, resolving the naming conflict within the
swift project (this has always been a problem for source tools). Limit
this header to low-level APIs for working with branches and CFG edges.
Add BasicBlockOptUtils.h for block level transforms (it makes me sad
that I can't use BBOptUtils.h, but SIL already has
BasicBlockUtils.h). These are larger APIs for cloning or removing
whole blocks.
Co-routines are so expensive (e.g. Array.subscript.read) that it makes sense to enable generic inlining of co-routines.
This will speed up array iteration (e.g. for elem in array { }) in a generic context significantly.
Another example is ManagedBuffer.header.read, which gets much faster.
In both cases, the speedup is mainly because there is no malloc happening anymore.
https://bugs.swift.org/browse/SR-11231
rdar://problem/53777612
With the advent of dynamic_function_ref the actual callee of such a ref
my vary. Optimizations should not assume to know the content of a
function referenced by dynamic_function_ref. Introduce
getReferencedFunctionOrNull which will return null for such function
refs. And getInitialReferencedFunction to return the referenced
function.
Use as appropriate.
rdar://50959798
This is an obvious drive-by fix. It will crash when building
Foundation after I commit changes to the pipeline. My attempts at
creating a unit test were unsuccessful because it depends on some
interaction between inlining and specialization heuristics.
Beside fixing the compiler crash, this change also improves the stack-nesting correction mechanisms in the inliners:
* Instead of trying to correct the nesting after each inlining of a callee, correct the nesting once when inlining is finished for a caller function.
This fixes a potential compile time problem, because StackNesting iterates over the whole function.
In worst case this can lead to quadratic behavior in case many begin_apply instructions with overlapping stack locations are inlined.
* Because we are doing it only once for a caller, we can remove the complex logic for checking if it is necessary.
We can just do it unconditionally in case any coroutine gets inlined.
The inliners iterate over all instruction of a function anyway, so this does not increase the computational complexity (StackNesting is roughly linear with the number of instructions).
rdar://problem/47615442
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
A recent SILCloner rewrite removed a special case hack for single
basic block callee functions:
commit c6865c0dff
Merge: 76e6c4157e9e440d13a6
Author: Andrew Trick <atrick@apple.com>
Date: Thu Oct 11 14:23:32 2018
Merge pull request #19786 from atrick/silcloner-cleanup
SILCloner and SILInliner rewrite.
Instead, the new inliner simply merges trivial unconditional branches
after inlining the return block. This way, the CFG is always in
canonical state after inlining. This is more robust, and avoids
interfering with subsequent SIL passes when non-single-block callees
are inlined.
The problem is that inlining a series of calls within a large block
could result in interleaved block splitting and merging operations,
which is quadratic in the block size. This showed up when inlining the
tens of thousands of array subscript calls emitted for a large array
initialization.
The first half of the fix is to simply defer block merging until all
calls are inlined. We can't expect SimplifyCFG to run immediately
after inlining, nor would we want to do that, *especially* for
mandatory inlining. This fix instead exposes block merging as a
trivial utility.
Note: by eliminating some unconditional branches, this change could
reduce the number of debug locations emitted. This does not
fundamentally change any debug information guarantee, and I was unable
to observe any behavior difference in the debugger.
Currently if a caller is > 400 blocks, the inliner bails out of finding
inlinable targets. This is incorrect behavior for inline always functions. In
such cases, we should continue inlining inline always functions and skip any
functions that are not inline always.
rdar://45976860
This is a performance hack: inlining a coroutine can promote heap
allocations of the frame to stack allocations, which is valuable
out of proportion to the normal weight. There are surely more
principled ways of getting this effect, though.
Mostly functionally neutral:
- may fix latent bugs.
- may reduce useless basic blocks after inlining.
This rewrite encapsulates the cloner's internal state, providing a
clean API for the CRTP subclasses. The subclasses are rewritten to use
the exposed API and extension points. This makes it much easier to
understand, work with, and extend SIL cloners, which are central to
many optimization passes. Basic SIL invariants are now clearly
expressed and enforced. There is no longer a intricate dance between
multiple levels of subclasses operating on underlying low-level data
structures. All of the logic needed to keep the original SIL in a
consistent state is contained within the SILCloner itself. Subclasses
only need to be responsible for their own modifications.
The immediate motiviation is to make CFG updates self-contained so
that SIL remains in a valid state. This will allow the removal of
critical edge splitting hacks and will allow general SIL utilities to
take advantage of the fact that we don't allow critical edges.
This rewrite establishes a simple principal that should be followed
everywhere: aside from the primitive mutation APIs on SIL data types,
each SIL utility is responsibile for leaving SIL in a valid state and
the logic for doing so should exist in one central location.
This includes, for example:
- Generating a valid CFG, splitting edges if needed.
- Returning a valid instruction iterator if any instructions are removed.
- Updating dominance.
- Updating SSA (block arguments).
(Dominance info and SSA properties are fundamental to SIL verification).
LoopInfo is also somewhat fundamental to SIL, and should generally be
updated, but it isn't required.
This also fixes some latent bugs related to iterator invalidation in
recursivelyDeleteTriviallyDeadInstructions and SILInliner. Note that
the SILModule deletion callback should be avoided. It can be useful as
a simple cache invalidation mechanism, but it is otherwise bug prone,
too limited to be very useful, and basically bad design. Utilities
that mutate should return a valid instruction iterator and provide
their own deletion callbacks.
A few places around the compiler were checking for this module by its
name. The implementation still checks by name, but at least that only
has to occur in one place.
(Unfortunately I can't eliminate the string constant altogether,
because the implicit import for SwiftOnoneSupport happens by name.)
No functionality change.
When compiling benchmark/single-source/DictTest.swift, a multiplication can overflow in SILPerformanceInliner::isProfitableToInline(). Notice when the value is large enough to overflow and zero out the value it would have been subtracted from.
To do so this commit does a few different things:
1. I changed SILOptFunctionBuilder to notify the pass manager's logging
functionality when new functions are added to the module and to notify analyses
as well. NOTE: This on purpose does not put the new function on the pass manager
worklist since we do not want to by mistake introduce a large amount of
re-optimizations. Such a thing should be explicit.
2. I eliminated SILModuleTransform::notifyAddFunction. This just performed the
operations from 1. Now that SILOptFunctionBuilder performs this operation for
us, it is not needed.
3. I changed SILFunctionTransform::notifyAddFunction to just add the function to
the passmanager worklist. It does not need to notify the pass manager's logging
or analyses that a new function was added to the module since
SILOptFunctionBuilder now performs that operation. Given its reduced
functionality, I changed the name to addFunctionToPassManagerWorklist(...). The
name is a little long/verbose, but this is a feature since one should think
before getting the pass manager to rerun transforms on a function. Also, giving
it a longer name calls out the operation in the code visually, giving this
operation more prominance when reading code. NOTE: I did the rename using
Xcode's refactoring functionality!
rdar://42301529
This works around a potential circular dependence issue where TypeSubstCloner
needs access to SILOptFunctionBuilder but is in libswiftSIL.
rdar://42301529
@effects is too low a level, and not meant for general usage outside
the standard library. Therefore it deserves to be underscored like
other such attributes.
* allow small class methods to be inlined with -Osize
* inline pure calls: references to objects, which are initialized with constants, are considered as constant arguments
* allow small class methods to be inlined with -Osize
* inline pure calls: references to objects, which are initialized with constants, are considered as constant arguments
This commit is mostly refactoring.
*) Introduce a new OptimizationMode enum and use that in SILOptions and IRGenOptions
*) Allow the optimization mode also be specified for specific SILFunctions. This is not used in this commit yet and thus still a NFC.
Also, fixes a minor bug: we didn’t run mandatory IRGen passes for functions with @_semantics("optimize.sil.never")
This brings the capability from clang to save remarks in an external YAML files.
YAML files can be viewed with tools like the opt-viewer.
Saving the remarks is activated with the new option -save-optimization-record.
Similarly to -emit-tbd, I've only added support for single-compile mode for now.
In this case the default filename is determined by
getOutputFilenameFromPathArgOrAsTopLevel, i.e. unless explicitly specified
with -save-optimization-record-path, the file is placed in the directory of the
main output file as <modulename>.opt.yaml.