The main loop of mandatory inlining is spending a lot of time managing complex
iterator invalidation issues. This is the first in a series of commits that move
the main inlining loop to only delete the callee and to do all cleanups after we
have finished inlining.
This specific optimization (the quick retain/release peephole), I am not going
to do in MandatoryInlining, we already have guaranteed arc opts afterwards that
will be able to hit such a peephole so no perf should be lost.
*NOTE* The reason why I had to touch some of the code motion tests is that the
routine I am using to ensure that strong_retain/release_value is emitted as
appropriate is also used by codemotion. Code motion tests had cargo culted some
code from previous tests that retained Builtin.Int32. I changed the routines
though so that when a retain/release is inserted, if it is trivial, nothing is
inserted. No routine was relying on the actual usage of the inserted
retain/releases, so everything will be safe. This addition to the relevant code
caused me to need to change the tests in code motion to use actual non-trivial
values. The same code paths are being tested in terms of blocking code
motion/etc.
rdar://31521023
The reason to do this is:
1. The check in SILInliner if we can inline can be done without triggering
side-effects.
2. This enables us to know if inlining will succeed before attempting to inline.
This enables for arguments to be adjusted with new SILInstructions and the like
before inlining occurs. I use this in a forthcoming patch that updates mandatory
inlining for ownership.
rdar://31521023
The etymology of these terms isn't about race, but "black" = "blocked"
and "white" = "allowed" isn't really a good look these days. In most
cases we weren't using these terms particularly precisely anyway, so
the rephrasing is actually an improvement.
The existing simple mechanism for avoiding infinite generic specialization loops is based on checking the structural depth and width of types passed as generic type parameters. If the depth or the width of a type is above a certain threshold, the type is considered too complex for generic specialization and no specialization is produced. While this approach prevents the possibility of producing an infinite number of generic specializations for ever-growing generic type parameters, it catches the issue too late in some cases, leading to excessive CPU and memory usage.
Therefore, the new method tries to solve the problem at its root. An infinite generic specialization loop can be triggered by specializing a given generic call-site if and only if:
- Doing so would result in a loop inside the specialization graph represented by the `GenericSpecializationInformations`, i.e. it would produce direct or indirect recursion involving a generic call
- The substitutions used by the current generic call-site are structurally more complex than the substitutions used by the same call-site in the previous iteration inside specialization graph. More complex in this context means that the new generic type parameter structurally contains the generic type parameter from a previous iteration inside the specialization graph and has greater structural depth, e.g. `Array<Int>` is more complex than `Int`.
The generic specializer now records all the required information about specializations it produces and uses it later to detect and prevent any generic specializations which would result in an infinite specialization loop. It detects them as early as possible and thus reduces compile times, memory consumption and potentially also reduces the code-size by not generating useless specializations.
In dead-array elimination we assume that the array allocation is post-dominated by all its final releases.
The only exception are branches to dead-end ("unreachable") blocks. So we just ignored all paths which didn't end up in a final release.
Now we explicitly pass the set of dead-end blocks and just ignore those blocks.
This is safer and it's also needed in the upcoming re-write of StackPromotion.
IndexTrie is a more light-weight representation and it works well in this case.
This requires recovering the represented sequence from an IndexTrieNode, so
also add a getParent() method.
Move IndexTrieNode from DeadObjectElimination into its own header. I plan to
use this data structure when diagnosing static violations of exclusive access.
A function is pure if it has no side-effects.
If there is a call of a pure function with constant arguments, it always makes sense to inline it, because we know that the whole computation will be constant folded.
- Introduced a new helper class FunctionSignaturePartialSpecializer which provides most of the functionality required for producing a specialized generic signature based on the provided substitutions or requirements. The class consists of many small functions, which should make it easier to understand the code.
- Added a full support for partial specialization of generic parameters with generic substitutions (use flag `-Xllvm -sil-partial-specialization-with-generic-substitutions` to enable it)
- Removed the simpler version of the partial specializer which could partially specialize only generic parameters with non-generic substitutions. It is not needed anymore, because we can handle any substations now when performing the partial specialization.
- The functionality used by the EagerSpecializer to implement the partial specializations required by @_specialize is expressed in terms of FunctionSignaturePartialSpecializer as well. The code implementing it is much smaller now.
Partial specialization of generic parameters with generic substitutions is fully functional, but it is disabled by default, because it needs some tweaks when it comes to compile times and size of produced code. These issues will be addressed in the subsequent commits.
This commit changes how inline information is stored in SILDebugScope
from a tree to a linear chain of inlined call sites (similar to what
LLVM is using). This makes creating inlined SILDebugScopes slightly
more expensive, but makes lowering SILDebugScopes into LLVM metadata
much faster because entire inlined-at chains can now be cached. This
means that SIL is no longer preserve the inlining history (i.e., ((a
was inlined into b) was inlined into c) is represented the same as (a
was inlined into (b was inlined into c)), but this information was not
used by anyone.
On my late 2012 i7 iMac, this saves about 4 seconds when compiling the
RelWithDebInfo x86_64 swift standard library — or 40% of IRGen time.
rdar://problem/28311051
In particular, support the following optimizations:
- owned-to-guaranteed
- dead argument elimination
Argument explosion is disabled for generics at the moment as it usually leads to a slower code.
Also, add a third [serializable] state for functions whose bodies we
*can* serialize, but only do so if they're referenced from another
serialized function.
This will be used for bodies synthesized for imported definitions,
such as init(rawValue:), etc, and various thunks, but for now this
change is NFC.
This is useful for optimizations (like AllocBoxToStack) which create (de-)alloc_stack instructions.
They can just insert the new instructions anywhere without worrying about nesting and correct the nesting afterwards.
Use -sil-partial-specialization-with-generic-substitutions to enable the partial specialization even in cases of substitutions containing generic replacement types.
Previously it was part of swiftBasic.
The demangler library does not depend on llvm (except some header-only utilities like StringRef). Putting it into its own library makes sure that no llvm stuff will be linked into clients which use the demangler library.
This change also contains other refactoring, like moving demangler code into different files. This makes it easier to remove the old demangler from the runtime library when we switch to the new symbol mangling.
Also in this commit: remove some unused API functions from the demangler Context.
fixes rdar://problem/30503344
Partial specialization is disabled by default. Use -sil-partial-specialization to enable it.
Use -sil-partial-specialization-with-generic-substitutions to enable the partial specialization even in cases of substitutions containing generic replacement types.
It looks like the devirtualizer used to have problems computing
method types in the presence of generic substitutions, covariant
returns and other things, so it would bail if a perticular set
of pre-conditions was not met on the types of original method call
and the devirtualized method call.
I don't think any of this is necessary anymore. If this patch
introduces any regressions, we need to fix the root cause instead
of re-introducing this logic.
It is important to de-serialize the devirtualized function (and its callees), especially because we must make sure that all transparent functions are de-serialized.
SILCombine did not do that. But as we have the same optimization in the Devirtualizer, it's not needed to duplicate the code in SILCombine.
The only reason we had this peephole in SILCombine is that the Devirtualizer pass could not handle partial applies.
So with this change the Devirtualizer can now also handle partial applies.
Fixes rdar://problem/30544344 (again, after my first attempt failed)
This predicate can be used to check if a given call can be devirtualized. One of the clients of this new API will be the inliner, which may want to check if a given method call becomes devirtualizable after inlining.
This reverts commit 1b3d29a163, reversing
changes made to b32424953e.
We're seeing a handful of issues from turning on inlining of generics,
so I'm reverting to unblock the bots.