Adds a combined API to output both debug message and optimization remarks.
The previously added test partial_specialization_debug.sil ensures that it's an
NFC for debug output.
Currently, the ownership verifier assumes that all cond_br with critical edges
do not have non-trivial arguments. This utility method fixes any such issues and
should be run /after/ any passes that mess with the CFG in ownership SIL if this
could come up.
Chopping this off of a larger commit.
rdar://31521023
This helper utility given a set of uses/defs determines the set of blocks that
together with the user blocks jointly post-dominate the def blocks. In a sense
this is completing the providing set of user blocks to a joint-post dominating
set. This completion is not unique.
I am adding this method because often times when working with Ownership SIL, one
needs to hoist/sink operations and then compensate to ensure that ownership
properties are still preserved.
I think with a little bit of work I can get the ownership model eliminator to
use this routine (since the condition for ownership to be correct on an @owned
value is an empty joint-post dominating completion. I wrote it b/c I need it in
pred-memopts and I expect it to be a general useful routine for Ownership SIL.
rdar://31521023
introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
This patch implements collection and dumping of statistics about SILModules, SILFunctions and memory consumption during the execution of SIL optimization pipelines.
The following statistics can be collected:
* For SILFunctions: the number of SIL basic blocks, the number of SIL instructions, the number of SIL instructions of a specific kind, duration of a pass
* For SILModules: the number of SIL basic blocks, the number of SIL instructions, the number of SIL instructions of a specific kind, the number of SILFunctions, the amount of memory used by the compiler, duration of a pass
By default, any collection of statistics is disabled to avoid affecting compile times.
One can enable the collection of statistics and dumping of these statistics for the whole SILModule and/or for SILFunctions.
To reduce the amount of produced data, one can set thresholds in such a way that changes in the statistics are only reported if the delta between the old and the new values are at least X%. The deltas are computed as using the following formula:
Delta = (NewValue - OldValue) / OldValue
Thresholds provide a simple way to perform a simple filtering of the collected statistics during the compilation. But if there is a need for a more complex analysis of collected data (e.g. aggregation by a pipeline stage or by the type of a transformation), it is often better to dump as much data as possible into a file using e.g. -sil-stats-dump-all -sil-stats-modules -sil-stats-functions and then e.g. use the helper scripts to store the collected data into a database and then perform complex queries on it. Many kinds of analysis can be then formulated pretty easily as SQL queries.
The main loop of mandatory inlining is spending a lot of time managing complex
iterator invalidation issues. This is the first in a series of commits that move
the main inlining loop to only delete the callee and to do all cleanups after we
have finished inlining.
This specific optimization (the quick retain/release peephole), I am not going
to do in MandatoryInlining, we already have guaranteed arc opts afterwards that
will be able to hit such a peephole so no perf should be lost.
*NOTE* The reason why I had to touch some of the code motion tests is that the
routine I am using to ensure that strong_retain/release_value is emitted as
appropriate is also used by codemotion. Code motion tests had cargo culted some
code from previous tests that retained Builtin.Int32. I changed the routines
though so that when a retain/release is inserted, if it is trivial, nothing is
inserted. No routine was relying on the actual usage of the inserted
retain/releases, so everything will be safe. This addition to the relevant code
caused me to need to change the tests in code motion to use actual non-trivial
values. The same code paths are being tested in terms of blocking code
motion/etc.
rdar://31521023
The reason to do this is:
1. The check in SILInliner if we can inline can be done without triggering
side-effects.
2. This enables us to know if inlining will succeed before attempting to inline.
This enables for arguments to be adjusted with new SILInstructions and the like
before inlining occurs. I use this in a forthcoming patch that updates mandatory
inlining for ownership.
rdar://31521023
The etymology of these terms isn't about race, but "black" = "blocked"
and "white" = "allowed" isn't really a good look these days. In most
cases we weren't using these terms particularly precisely anyway, so
the rephrasing is actually an improvement.
The existing simple mechanism for avoiding infinite generic specialization loops is based on checking the structural depth and width of types passed as generic type parameters. If the depth or the width of a type is above a certain threshold, the type is considered too complex for generic specialization and no specialization is produced. While this approach prevents the possibility of producing an infinite number of generic specializations for ever-growing generic type parameters, it catches the issue too late in some cases, leading to excessive CPU and memory usage.
Therefore, the new method tries to solve the problem at its root. An infinite generic specialization loop can be triggered by specializing a given generic call-site if and only if:
- Doing so would result in a loop inside the specialization graph represented by the `GenericSpecializationInformations`, i.e. it would produce direct or indirect recursion involving a generic call
- The substitutions used by the current generic call-site are structurally more complex than the substitutions used by the same call-site in the previous iteration inside specialization graph. More complex in this context means that the new generic type parameter structurally contains the generic type parameter from a previous iteration inside the specialization graph and has greater structural depth, e.g. `Array<Int>` is more complex than `Int`.
The generic specializer now records all the required information about specializations it produces and uses it later to detect and prevent any generic specializations which would result in an infinite specialization loop. It detects them as early as possible and thus reduces compile times, memory consumption and potentially also reduces the code-size by not generating useless specializations.
In dead-array elimination we assume that the array allocation is post-dominated by all its final releases.
The only exception are branches to dead-end ("unreachable") blocks. So we just ignored all paths which didn't end up in a final release.
Now we explicitly pass the set of dead-end blocks and just ignore those blocks.
This is safer and it's also needed in the upcoming re-write of StackPromotion.
IndexTrie is a more light-weight representation and it works well in this case.
This requires recovering the represented sequence from an IndexTrieNode, so
also add a getParent() method.
Move IndexTrieNode from DeadObjectElimination into its own header. I plan to
use this data structure when diagnosing static violations of exclusive access.
A function is pure if it has no side-effects.
If there is a call of a pure function with constant arguments, it always makes sense to inline it, because we know that the whole computation will be constant folded.
- Introduced a new helper class FunctionSignaturePartialSpecializer which provides most of the functionality required for producing a specialized generic signature based on the provided substitutions or requirements. The class consists of many small functions, which should make it easier to understand the code.
- Added a full support for partial specialization of generic parameters with generic substitutions (use flag `-Xllvm -sil-partial-specialization-with-generic-substitutions` to enable it)
- Removed the simpler version of the partial specializer which could partially specialize only generic parameters with non-generic substitutions. It is not needed anymore, because we can handle any substations now when performing the partial specialization.
- The functionality used by the EagerSpecializer to implement the partial specializations required by @_specialize is expressed in terms of FunctionSignaturePartialSpecializer as well. The code implementing it is much smaller now.
Partial specialization of generic parameters with generic substitutions is fully functional, but it is disabled by default, because it needs some tweaks when it comes to compile times and size of produced code. These issues will be addressed in the subsequent commits.
This commit changes how inline information is stored in SILDebugScope
from a tree to a linear chain of inlined call sites (similar to what
LLVM is using). This makes creating inlined SILDebugScopes slightly
more expensive, but makes lowering SILDebugScopes into LLVM metadata
much faster because entire inlined-at chains can now be cached. This
means that SIL is no longer preserve the inlining history (i.e., ((a
was inlined into b) was inlined into c) is represented the same as (a
was inlined into (b was inlined into c)), but this information was not
used by anyone.
On my late 2012 i7 iMac, this saves about 4 seconds when compiling the
RelWithDebInfo x86_64 swift standard library — or 40% of IRGen time.
rdar://problem/28311051
In particular, support the following optimizations:
- owned-to-guaranteed
- dead argument elimination
Argument explosion is disabled for generics at the moment as it usually leads to a slower code.
Also, add a third [serializable] state for functions whose bodies we
*can* serialize, but only do so if they're referenced from another
serialized function.
This will be used for bodies synthesized for imported definitions,
such as init(rawValue:), etc, and various thunks, but for now this
change is NFC.
This is useful for optimizations (like AllocBoxToStack) which create (de-)alloc_stack instructions.
They can just insert the new instructions anywhere without worrying about nesting and correct the nesting afterwards.
Use -sil-partial-specialization-with-generic-substitutions to enable the partial specialization even in cases of substitutions containing generic replacement types.