The current dumping format consists of 1 row of information per function. This
will become unweildy to write patterns for when I add additional state to
FunctionInfo.
Instead, this commit converts the dumping format of the caller analysis into a
multi line yaml format. This yaml format looks as follows:
---
calleeName: closure1
hasCaller: false
minPartialAppliedArgs: 1
partialAppliers:
- partial_apply_one_arg
- partial_apply_two_args1
fullAppliers:
...
This can easily expand over time as we expand the queries that caller analysis
can answer.
As an additional advantage, there are definitely yaml parsers that can handle
multiple yaml documents in sequence in a stream. This means that by running via
sil-opt the caller-analysis-printer pass, one now will get a yaml description of
the caller analysis state, perfect and ready for analysis.
This converts a DenseMap to a SmallMapVector and a SetVector to a
SmallSetVector. Both of these create large malloced data structures by
default. This really makes no sense when there are many functions that don't use
a partial apply or many applies.
Additionally, by changing the DenseMap to a MapVector container, this commit is
eliminating a potential source of non-determinism in the compiler since often
times we are iterating over the DenseMap to produce the results. Today all of
the usages of the DenseMap in this way are safe, but to defensively future proof
this analysis, it makes sense to use a MapVector here.
I am tuning a new argument explosion heuristic to reduce code-size. One part of
the heuristic I am playing with is the part of the algorithm that attempts to
figure out if we could eliminate additonal arguments after performing
owned->guaranteed an additional release when we run FSO a second time. Today we
do this unconditionally. I am trying to do it in a more conservative way where
we only do it if we know that we aren't going to increase the number of
arguments too much.
rdar://41146023
This is particularly egrigious since we are only /reading/ from the DenseSet. So
we are basically mallocing/copying a DenseSet just to read from it... I don't
think I need to say more.
rdar://41146023
@effects is too low a level, and not meant for general usage outside
the standard library. Therefore it deserves to be underscored like
other such attributes.
Use AccessedStorageAnalysis to find access markers with no nested conflicts.
This optimization analyzes the scope of each access to determine
whether it contains a potentially conflicting access. If not, then it
can be demoted to an instantaneous check, which still catches
conflicts on any enclosing outer scope.
This removes up to half of the runtime calls associated with
exclusivity checking.
The actual algorithm used here has not changed at all so this is basically a NFC
commit. What this PR does is change the underlying algorithm to return the
operands that it computes internally rather than transforming the operand list
into the user list internally. This enables the callers of the optimization to
find the operand number related to the uses. This makes working with
instructions with multiple operands much easier since one does not need to mess
around with rederiving the operand number from the user instruction/SILValue
pair.
getRCUsers() works now by running getRCUses() internally and then maps the
operand list to the user list.
rdar://38196046
An interprocedural analysis pass that summarizes the dynamically
enforced formal accesses within a function. These summaries will be
used by a new AccessEnforcementOpts pass to locally fold access scopes
and remove dynamic checks based on whole module analysis.
Make this a generic analysis so that it can be used to analyze any
kind of function effect.
FunctionSideEffect becomes a trivial specialization of the analysis.
The immediate need for this is to introduce an new
AccessedStorageAnalysis, although I foresee it as a generally very
useful utility. This way, new kinds of function effects can be
computed without adding any complexity or compile time to
FunctionSideEffects. We have the flexibility of computing different
kinds of function effects at different points in the pipeline.
In the case of AccessedStorageAnalysis, it will compute both
FunctionSideEffects and FunctionAccessedStorage in the same pass by
implementing a simple wrapper on top of FunctionEffects.
This cleanup reflects my feeling that nested classes make the code
extremely unreadable unless they are very small and either private or
only used directly via its parent class. It's easier to see how these
classes compose with a flat type system.
In addition to enabling new kinds of function effects analyses, I
think this makes the implementation of side effect analysis easier to
understand by separating concerns.
The EscapeAnalysis:canEscapeTo function was actually broken, because it did not detect all escapes of a reference/pointer.
I completely replaced the implementation with the correct one (canObjectOrContentEscapeTo) and removed the now obsolete canObjectOrContentEscapeTo.
Fixes a miscompile.
rdar://problem/39161309
Support for @noescape SILFunctionTypes.
These are the underlying SIL changes necessary to implement the new
closure capture ABI.
Note: This includes a change to function name mangling that
primarily affects reabstraction thunks.
The new ABI will allow stack allocation of non-escaping closures as a
simple optimization.
The new ABI, and the stack allocation optimization, also require
closure context to be @guaranteed. That will be implemented as the
next step.
Many SIL passes pattern match partial_apply sequences. These all
needed to be fixed to handle the convert_function that SILGen now
emits. The conversion is now needed whenever a function declaration,
which has an escaping type, is passed into a @NoEscape argument.
In addition to supporting new SIL patterns, some optimizations like
inlining and SIL combine are now stronger which could perturb some
benchmark results.
These underlying SIL changes should be merged now to avoid conflicting
with other work. Minor benchmark discrepancies can be investigated as part of
the stack-allocation work.
* Add a noescape attribute to SILFunctionType.
And set this attribute correctly when lowering formal function types to SILFunctionTypes based on @escaping.
This will allow stack allocation of closures, and unblock a related ABI change.
* Flip the polarity on @noescape on SILFunctionType and clarify that
we don't default it.
* Emit withoutActuallyEscaping using a convert_function instruction.
It might be better to use a specialized instruction here, but I'll leave that up to Andy.
Andy: And I'll leave that to Arnold who is implementing SIL support for guaranteed ownership of thick function types.
* Fix SILGen and SIL Parsing.
* Fix the LoadableByAddress pass.
* Fix ClosureSpecializer.
* Fix performance inliner constant propagation.
* Fix the PartialApplyCombiner.
* Adjust SILFunctionType for thunks.
* Add mangling for @noescape/@escaping.
* Fix test cases for @noescape attribute, mangling, convert_function, etc.
* Fix exclusivity test cases.
* Fix AccessEnforcement.
* Fix SILCombine of convert_function -> apply.
* Fix ObjC bridging thunks.
* Various MandatoryInlining fixes.
* Fix SILCombine optimizeApplyOfConvertFunction.
* Fix more test cases after merging (again).
* Fix ClosureSpecializer. Hande convert_function cloning.
Be conservative when combining convert_function. Most of our code doesn't know
how to deal with function type mismatches yet.
* Fix MandatoryInlining.
Be conservative with function conversion. The inliner does not yet know how to
cast arguments or convert between throwing forms.
* Fix PartialApplyCombiner.
introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
This patch implements collection and dumping of statistics about SILModules, SILFunctions and memory consumption during the execution of SIL optimization pipelines.
The following statistics can be collected:
* For SILFunctions: the number of SIL basic blocks, the number of SIL instructions, the number of SIL instructions of a specific kind, duration of a pass
* For SILModules: the number of SIL basic blocks, the number of SIL instructions, the number of SIL instructions of a specific kind, the number of SILFunctions, the amount of memory used by the compiler, duration of a pass
By default, any collection of statistics is disabled to avoid affecting compile times.
One can enable the collection of statistics and dumping of these statistics for the whole SILModule and/or for SILFunctions.
To reduce the amount of produced data, one can set thresholds in such a way that changes in the statistics are only reported if the delta between the old and the new values are at least X%. The deltas are computed as using the following formula:
Delta = (NewValue - OldValue) / OldValue
Thresholds provide a simple way to perform a simple filtering of the collected statistics during the compilation. But if there is a need for a more complex analysis of collected data (e.g. aggregation by a pipeline stage or by the type of a transformation), it is often better to dump as much data as possible into a file using e.g. -sil-stats-dump-all -sil-stats-modules -sil-stats-functions and then e.g. use the helper scripts to store the collected data into a database and then perform complex queries on it. Many kinds of analysis can be then formulated pretty easily as SQL queries.
This patch fixes a number of issues:
The analysis was using EpilogueARCContext as a temporary when computing. This is
an performance problem since EpilogueARCContext contains all of the memory used
in the analysis. So essentially, we were mallocing tons of memory every time we
missed the analyses cache. This patch changes the pass to instead have 1
EpilogueARCContext whose internal state is cleared in between invocations. Since
the data structures (see below) used after this patch do not shrink memory after
being cleared, this should cause us to have far less memory churn.
The analysis was managing its block state data structure by allocating the
individual block state structs using a BumpPtrAllocator/DenseMap stored in
EpilogueARCContext. The individual state structures were allocated from the
BumpPtrAllocator and the DenseMap then mapped a specific SILBasicBlock to its
State data structure. Ignoring that we were mallocing this memory every time we
computed rather than reusing global state, this pessimizes performance on small
functions significantly. This is because the BumpPtrAllocator by default heap
allocates initially a page and DenseMap initially mallocs a 64 entry hash
table. Thus for a 1 block function, we would be allocating a large amount of
memory that is just unneeded.
Instead this patch changes the analysis to use a std::vector in combination with
PostOrderFunctionInfo to manage the per block state. The way this works is that
PostOrderFunctionInfo already contains a map from a SILBasicBlock to its post
order number. So, when we are allocating memory for each block, we visit the CFG
in post order. Thus we know that each block's state will be stored in the vector
at vector[post order number].
This has a number of nice effects:
1. By eliminating the need for the DenseMap, in large test cases, we are
signficiantly reducing the memory overhead (by 24 bytes per basic block assuming
8 byte ptrs).
2. We will use far less memory when applying this analysis to small functions.
rdar://33841629
Commonly when an analysis uses subanalyses, we eagerly create the sub function
info when constructing the main function info. This is not always necessary and
when the subanalyses do work in their constructor, can be actively harmful if
the parent analysis is never invoked.
This utility class solves this problem by being a very easy way to perform a
delayed call to the sub-analysis to get the sub-functioninfo combined with a
cache so that after the LazyFunctionInfo is used once, we do not reuse the
DenseMap in the sub-analysis unnecessarily.
An example of where this can happen is in EpilogueARCAnalysis in combination
with PostOrderFunctionInfo. PostOrderFunctionInfo eagerly creates a new post
order. So, if we were to eagerly create the PostOrderFunctionInfo (the
sub-functioninfo) when we created an EpilogueARCFunctionInfo, we would be
creating a post order even if we never actually invoke EpilogueARCFunctionInfo.
By using this the maybeGet API on FunctionAnalysisBase instead of get, we stop
EpilogueARCAnalysis from building itself if it does not yet exist, only to
invalidate itself.
rdar://33841629
Today, if one wants to invalidate state relative to your own function analysis,
you have to use FunctionAnalysisBase::get() to get the analysis. The problem
here is that if the analysis does not exist yet, then you are actually creating
the analysis. This is an issue when one wants to perform an action on an
analysis only if the analysis has already been built. An example of such a
situation is when one is processing a delete notification. If one does not have
an analysis for a function, one should just do nothing.
I am going to use this to fix a delete notification problem in
EpilogueARCAnalysis.
rdar://33841629
Otherwise, every time we include Analysis.h, we will try to include those other
files even if we have already included Analysis.h. This can increase compile
time.
rdar://33841629
Make the static enforcement of accesses in noescape closures stored-property
sensitive. This will relax the existing enforcement so that the following is
not diagnosed:
struct MyStruct {
var x = X()
var y = Y()
mutating
func foo() {
x.mutatesAndTakesClosure() {
_ = y.read() // no-warning
}
}
}
To do this, update the access summary analysis to summarize accesses to
subpaths of a capture.
rdar://problem/32987932
Make the static enforcement of accesses in noescape closures stored-property
sensitive. This will relax the existing enforcement so that the following is
not diagnosed:
struct MyStruct {
var x = X()
var y = Y()
mutating
func foo() {
x.mutatesAndTakesClosure() {
_ = y.read()
}
}
}
To do this, update the access summary analysis to be stored-property sensitive.
rdar://problem/32987932