With this re-abstraction a specialized function has the same calling convention as if it would have been written with the specialized types in the first place.
In general this results in less alloc_stacks and load/stores.
It also can eliminate some re-abstraction thunks, e.g. if a generic closure is used in a non-generic context.
It some (hopefully rare) cases it may require to add re-abstraction thunks.
In case a function has multiple indirect results, only the first is converted to a direct result. This is an open TODO.
Similarly to how we've always handled parameter types, we
now recursively expand tuples in result types and separately
determine a result convention for each result.
The most important code-generation change here is that
indirect results are now returned separately from each
other and from any direct results. It is generally far
better, when receiving an indirect result, to receive it
as an independent result; the caller is much more likely
to be able to directly receive the result in the address
they want to initialize, rather than having to receive it
in temporary memory and then copy parts of it into the
target.
The most important conceptual change here that clients and
producers of SIL must be aware of is the new distinction
between a SILFunctionType's *parameters* and its *argument
list*. The former is just the formal parameters, derived
purely from the parameter types of the original function;
indirect results are no longer in this list. The latter
includes the indirect result arguments; as always, all
the indirect results strictly precede the parameters.
Apply instructions and entry block arguments follow the
argument list, not the parameter list.
A relatively minor change is that there can now be multiple
direct results, each with its own result convention.
This is a minor change because I've chosen to leave
return instructions as taking a single operand and
apply instructions as producing a single result; when
the type describes multiple results, they are implicitly
bound up in a tuple. It might make sense to split these
up and allow e.g. return instructions to take a list
of operands; however, it's not clear what to do on the
caller side, and this would be a major change that can
be separated out from this already over-large patch.
Unsurprisingly, the most invasive changes here are in
SILGen; this requires substantial reworking of both call
emission and reabstraction. It also proved important
to switch several SILGen operations over to work with
RValue instead of ManagedValue, since otherwise they
would be forced to spuriously "implode" buffers.
If a value is returned as @owned, we can move the epilogue retain
to the caller and convert the return value to @unowned. This gives
ARC optimizer more freedom to optimize the retain out on the caller's
side.
It appears that epilgue retains are harder to find than epilogue
releases. Most of the time they are not in the return block.
(1) Sometimes, they are in predecessors
(2) Sometimes they come from a call which returns an @owned return value.
This should be improved if we fix (1) and go bottom up.
(3) We do not handle exploded retain_value.
Currently, this catches a small number of opportunities.
We probably need to improve epilogue retain matcher if we are to handle
more cases.
This is part of rdar://24022375.
We also need some refactoring in the pass. e.g. break functions into smaller
functions. I will do with subsequent commit.
optimization.
We get some improvements on the # of parameters converted to guanrateed
from owned on the stdlib.
before
======
103 sil-function-signature-opts - Total owned args -> guaranteed args
after
======
118 sil-function-signature-opts - Total owned args -> guaranteed args
I see the following improvements by running benchmarks with and without this
change. Only difference >=1.05X
ErrorHandling 8154 7497 -657 -8.1% 1.09x
LinkedList 9973 9529 -444 -4.5% 1.05x
ObjectAllocation 239 222 -17 -7.1% 1.08x
RC4 23167 21993 -1174 -5.1% 1.05x (!)
This is part of rdar://22380547
So instead of only being able to match %1 and release %1 in (1). we
can also match %1 with (release %2, and release%3, i.e. exploded release_value)
in (2).
(1)
foo(%1)
strong_release %1
(2)
foo(%1)
%2 = struct_extract %1, field_a
%3 = struct_extract %1, field_b
strong_release %2
strong_release %3
This will allow function signature to better move the release instructions to
the callers.
Currently, this is a NFC other than testing using the epilogue match dumper.
We don't use this information after this point and are already
invalidating this (and everything else for this function body) at the
end of the pass, so this is not necessary.
NFC.
Prior to splitting devirtualization out of inlining we needed this to
ensure that we would not go into an infinite loop in certain cases.
Now that these are split out and we no longer iterate within the inliner
over new opportunities, we only need to check for self-recursive
functions.
NFC.
function signature opt.
Instead of replacing %1 with UNDEF in debugvalueinst %1, we form an aggregate,
taking the alive part of %1 and fill the dead part with undef.
rdar://23727705
Now that we process functions in bottom-up order in the pass manager and
have a mechanism to restart the pass pipeline on the current
function (or on a newly created callee function), we can split these
passes back out from the inliner and end up with the same benefits we
had from initially integrating them. We get the further benefit of fully
optimizing newly created callee functions before continuing with the
function that resulted in the creation of those callee
functions (e.g. as a result of a specialization pass running).
This effectively returns us to the code from dc65f70.
With the recent pass manager changes, combined with upcoming inliner
changes, we can potentially run the inliner more than we currently
do. Allowing self-recursive functions to be inlined, and running the
inliner more often, can result in a lot of code bloat, which increases
binary sizes and compile times. Even with a relatively small value (10)
for the number of times we allow a function to run through the pass
pipeline, we end up with a significant increase in the stdlib and
stdlib unit test build times.
This results in some performance regressions, but I think the trade-off
here is reasonable.
This is a first step towards moving the analysis portion of function
signature optimization into an actual SILAnalysis. I'll split this class
into two pieces (one for the analysis it does, and one for the rewrites
it does) next.
As there are no instructions left which produce multiple result values, this is a NFC regarding the generated SIL and generated code.
Although this commit is large, most changes are straightforward adoptions to the changes in the ValueBase and SILValue classes.