Beside fixing the compiler crash, this change also improves the stack-nesting correction mechanisms in the inliners:
* Instead of trying to correct the nesting after each inlining of a callee, correct the nesting once when inlining is finished for a caller function.
This fixes a potential compile time problem, because StackNesting iterates over the whole function.
In worst case this can lead to quadratic behavior in case many begin_apply instructions with overlapping stack locations are inlined.
* Because we are doing it only once for a caller, we can remove the complex logic for checking if it is necessary.
We can just do it unconditionally in case any coroutine gets inlined.
The inliners iterate over all instruction of a function anyway, so this does not increase the computational complexity (StackNesting is roughly linear with the number of instructions).
rdar://problem/47615442
I am starting to use the linear lifetime checker in an optimizer role where it
no longer asserts but instead tells the optimizer pass what is needed to cause
the lifetime to be linear. To do so I need to be able to return richer
information to the caller such as whether or not a leak, double consume, or
use-after-free occurs.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Inlining has always been quadratic for no good reason. There was a
special hack for single-block callees that allowed linear inlining.
Instead, the now iterates over blocks and instructions in reverse,
splitting blocks as it inlines. There no longer needs to be special
case for single block callees, and the inliner is linear for all kinds
of callees.
This further simplifies and cleans up the code. There are just a few
basic invariants that the common inliner needs to provide about how
blocks are split and laid out. We can do this if we don't add hacks
for special cases within the inliner. Those invariants allow the
inliner clients to be much simpler and more efficient.
PerformanceInliner still needs to be fixed.
Fixes SR-9223: Inliner exhibits slow compilation time with a large
static array.
A recent SILCloner rewrite removed a special case hack for single
basic block callee functions:
commit c6865c0dff
Merge: 76e6c4157e9e440d13a6
Author: Andrew Trick <atrick@apple.com>
Date: Thu Oct 11 14:23:32 2018
Merge pull request #19786 from atrick/silcloner-cleanup
SILCloner and SILInliner rewrite.
Instead, the new inliner simply merges trivial unconditional branches
after inlining the return block. This way, the CFG is always in
canonical state after inlining. This is more robust, and avoids
interfering with subsequent SIL passes when non-single-block callees
are inlined.
The problem is that inlining a series of calls within a large block
could result in interleaved block splitting and merging operations,
which is quadratic in the block size. This showed up when inlining the
tens of thousands of array subscript calls emitted for a large array
initialization.
The first half of the fix is to simply defer block merging until all
calls are inlined. We can't expect SimplifyCFG to run immediately
after inlining, nor would we want to do that, *especially* for
mandatory inlining. This fix instead exposes block merging as a
trivial utility.
Note: by eliminating some unconditional branches, this change could
reduce the number of debug locations emitted. This does not
fundamentally change any debug information guarantee, and I was unable
to observe any behavior difference in the debugger.
Mostly functionally neutral:
- may fix latent bugs.
- may reduce useless basic blocks after inlining.
This rewrite encapsulates the cloner's internal state, providing a
clean API for the CRTP subclasses. The subclasses are rewritten to use
the exposed API and extension points. This makes it much easier to
understand, work with, and extend SIL cloners, which are central to
many optimization passes. Basic SIL invariants are now clearly
expressed and enforced. There is no longer a intricate dance between
multiple levels of subclasses operating on underlying low-level data
structures. All of the logic needed to keep the original SIL in a
consistent state is contained within the SILCloner itself. Subclasses
only need to be responsible for their own modifications.
The immediate motiviation is to make CFG updates self-contained so
that SIL remains in a valid state. This will allow the removal of
critical edge splitting hacks and will allow general SIL utilities to
take advantage of the fact that we don't allow critical edges.
This rewrite establishes a simple principal that should be followed
everywhere: aside from the primitive mutation APIs on SIL data types,
each SIL utility is responsibile for leaving SIL in a valid state and
the logic for doing so should exist in one central location.
This includes, for example:
- Generating a valid CFG, splitting edges if needed.
- Returning a valid instruction iterator if any instructions are removed.
- Updating dominance.
- Updating SSA (block arguments).
(Dominance info and SSA properties are fundamental to SIL verification).
LoopInfo is also somewhat fundamental to SIL, and should generally be
updated, but it isn't required.
This also fixes some latent bugs related to iterator invalidation in
recursivelyDeleteTriviallyDeadInstructions and SILInliner. Note that
the SILModule deletion callback should be avoided. It can be useful as
a simple cache invalidation mechanism, but it is otherwise bug prone,
too limited to be very useful, and basically bad design. Utilities
that mutate should return a valid instruction iterator and provide
their own deletion callbacks.
In order to make this reasonable, I needed to shift responsibilities
around a little; the devirtualization operation is now responsible for
replacing uses of the original apply. I wanted to remove the
phase-separation completely, but there was optimization-remark code
relying on the old apply site not having been deleted yet.
The begin_apply aspects of this aren't testable independently of
replacing materializeForSet because coroutines are currently never
called indirectly.
With this change and some other changes that I am committing in parallel, the
stdlib and all of the overlays send all proper pass manager notifications.
rdar://42301529
To do so this commit does a few different things:
1. I changed SILOptFunctionBuilder to notify the pass manager's logging
functionality when new functions are added to the module and to notify analyses
as well. NOTE: This on purpose does not put the new function on the pass manager
worklist since we do not want to by mistake introduce a large amount of
re-optimizations. Such a thing should be explicit.
2. I eliminated SILModuleTransform::notifyAddFunction. This just performed the
operations from 1. Now that SILOptFunctionBuilder performs this operation for
us, it is not needed.
3. I changed SILFunctionTransform::notifyAddFunction to just add the function to
the passmanager worklist. It does not need to notify the pass manager's logging
or analyses that a new function was added to the module since
SILOptFunctionBuilder now performs that operation. Given its reduced
functionality, I changed the name to addFunctionToPassManagerWorklist(...). The
name is a little long/verbose, but this is a feature since one should think
before getting the pass manager to rerun transforms on a function. Also, giving
it a longer name calls out the operation in the code visually, giving this
operation more prominance when reading code. NOTE: I did the rename using
Xcode's refactoring functionality!
rdar://42301529
This works around a potential circular dependence issue where TypeSubstCloner
needs access to SILOptFunctionBuilder but is in libswiftSIL.
rdar://42301529
We only need to deserialize the function itself, not its transitive
dependencies. Also, only deserialize a function after we've checked
that its transparent.
For now, this doesn't reduce the volume of SIL linking, because the
mandatory linker pass still links everything. But we're almost
there.
We only need to deserialize the function itself, not its transitive
dependencies. Also, only deserialize a function after we've checked
that its transparent.
For now, this doesn't reduce the volume of SIL linking, because the
mandatory linker pass still links everything. But we're almost
there.
If we have an apply of a partial_apply, all the substitutions
are performed at the partial_apply. We don't have substitutions
on the apply itself. This is unlikely to change in the future,
and it's not valid to 'concatenate' two lists of substitutions
like this anyway.
And fix it's handling of guaranteed closure contexts.
Guaranteed/unowned captures and guaranteed contexts are *not* released
by a call of the closure.
I assume we have not seen this because we don't see code that would
trigger this comming out of the frontend ...
SR-5441
rdar://33255593
@_silgen_name and @_cdecl functions are assumed to be referenced from
C code. Public and internal functions marked as such must not be deleted
by the optimizer, and their C symbols must be public or hidden respectively.
rdar://33924873, SR-6209
Support for @noescape SILFunctionTypes.
These are the underlying SIL changes necessary to implement the new
closure capture ABI.
Note: This includes a change to function name mangling that
primarily affects reabstraction thunks.
The new ABI will allow stack allocation of non-escaping closures as a
simple optimization.
The new ABI, and the stack allocation optimization, also require
closure context to be @guaranteed. That will be implemented as the
next step.
Many SIL passes pattern match partial_apply sequences. These all
needed to be fixed to handle the convert_function that SILGen now
emits. The conversion is now needed whenever a function declaration,
which has an escaping type, is passed into a @NoEscape argument.
In addition to supporting new SIL patterns, some optimizations like
inlining and SIL combine are now stronger which could perturb some
benchmark results.
These underlying SIL changes should be merged now to avoid conflicting
with other work. Minor benchmark discrepancies can be investigated as part of
the stack-allocation work.
* Add a noescape attribute to SILFunctionType.
And set this attribute correctly when lowering formal function types to SILFunctionTypes based on @escaping.
This will allow stack allocation of closures, and unblock a related ABI change.
* Flip the polarity on @noescape on SILFunctionType and clarify that
we don't default it.
* Emit withoutActuallyEscaping using a convert_function instruction.
It might be better to use a specialized instruction here, but I'll leave that up to Andy.
Andy: And I'll leave that to Arnold who is implementing SIL support for guaranteed ownership of thick function types.
* Fix SILGen and SIL Parsing.
* Fix the LoadableByAddress pass.
* Fix ClosureSpecializer.
* Fix performance inliner constant propagation.
* Fix the PartialApplyCombiner.
* Adjust SILFunctionType for thunks.
* Add mangling for @noescape/@escaping.
* Fix test cases for @noescape attribute, mangling, convert_function, etc.
* Fix exclusivity test cases.
* Fix AccessEnforcement.
* Fix SILCombine of convert_function -> apply.
* Fix ObjC bridging thunks.
* Various MandatoryInlining fixes.
* Fix SILCombine optimizeApplyOfConvertFunction.
* Fix more test cases after merging (again).
* Fix ClosureSpecializer. Hande convert_function cloning.
Be conservative when combining convert_function. Most of our code doesn't know
how to deal with function type mismatches yet.
* Fix MandatoryInlining.
Be conservative with function conversion. The inliner does not yet know how to
cast arguments or convert between throwing forms.
* Fix PartialApplyCombiner.
introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
This is one reason for the weird iteration maintainance code in
MandatoryInlining. This comment at least makes it clear what is going on (more
for historical purposes, since I am hoping to fix this soon).
rdar://31521023
I am refactoring this for two reasons:
1. This code was 60-70% of cleanupCalleeValue, yet cleanupCalleeValue also
performs other tasks. This is a classic situation where one should extract a
subroutine to make cleanupCalleeValue more readable.
2. I am going to be extracting out the cleanup of the callee to be outside the
main inlining loop. Without this refactoring, I would have to use goto to
perform a continue from the inside of an inner for loop of an outer for loop.
Instead by refactoring it this way, I can separate continuing in the inner loop
from continuing in the outer loop.
rdar://31521023
The main loop of mandatory inlining is spending a lot of time managing complex
iterator invalidation issues. This is the first in a series of commits that move
the main inlining loop to only delete the callee and to do all cleanups after we
have finished inlining.
This specific optimization (the quick retain/release peephole), I am not going
to do in MandatoryInlining, we already have guaranteed arc opts afterwards that
will be able to hit such a peephole so no perf should be lost.
*NOTE* The reason why I had to touch some of the code motion tests is that the
routine I am using to ensure that strong_retain/release_value is emitted as
appropriate is also used by codemotion. Code motion tests had cargo culted some
code from previous tests that retained Builtin.Int32. I changed the routines
though so that when a retain/release is inserted, if it is trivial, nothing is
inserted. No routine was relying on the actual usage of the inserted
retain/releases, so everything will be safe. This addition to the relevant code
caused me to need to change the tests in code motion to use actual non-trivial
values. The same code paths are being tested in terms of blocking code
motion/etc.
rdar://31521023
F{I,E} is generally used for function iterators over a module's functions.
B{I,E} are generally standard for block iterator, block end. I make the II
change just to make it clearer that the given value is an instruction iterator
like in other parts of the SIL Optimizer.
rdar://31521023