Similarly to how we've always handled parameter types, we
now recursively expand tuples in result types and separately
determine a result convention for each result.
The most important code-generation change here is that
indirect results are now returned separately from each
other and from any direct results. It is generally far
better, when receiving an indirect result, to receive it
as an independent result; the caller is much more likely
to be able to directly receive the result in the address
they want to initialize, rather than having to receive it
in temporary memory and then copy parts of it into the
target.
The most important conceptual change here that clients and
producers of SIL must be aware of is the new distinction
between a SILFunctionType's *parameters* and its *argument
list*. The former is just the formal parameters, derived
purely from the parameter types of the original function;
indirect results are no longer in this list. The latter
includes the indirect result arguments; as always, all
the indirect results strictly precede the parameters.
Apply instructions and entry block arguments follow the
argument list, not the parameter list.
A relatively minor change is that there can now be multiple
direct results, each with its own result convention.
This is a minor change because I've chosen to leave
return instructions as taking a single operand and
apply instructions as producing a single result; when
the type describes multiple results, they are implicitly
bound up in a tuple. It might make sense to split these
up and allow e.g. return instructions to take a list
of operands; however, it's not clear what to do on the
caller side, and this would be a major change that can
be separated out from this already over-large patch.
Unsurprisingly, the most invasive changes here are in
SILGen; this requires substantial reworking of both call
emission and reabstraction. It also proved important
to switch several SILGen operations over to work with
RValue instead of ManagedValue, since otherwise they
would be forced to spuriously "implode" buffers.
As there are no instructions left which produce multiple result values, this is a NFC regarding the generated SIL and generated code.
Although this commit is large, most changes are straightforward adoptions to the changes in the ValueBase and SILValue classes.
Having a separate address and container value returned from alloc_stack is not really needed in SIL.
Even if they differ we have both addresses available during IRGen, because a dealloc_stack is always dominated by the corresponding alloc_stack in the same function.
Although this commit quite large, most changes are trivial. The largest non-trivial change is in IRGenSIL.
This commit is a NFC regarding the generated code. Even the generated SIL is the same (except removed #0, #1 and @local_storage).
Don't allow this optimization to kick in for "inout" args.
The optimization may expose local writes to any aliases of the argument.
I can't prove that is memory safe.
Erik pointed out this case.
This requires a bit of code motion.
e.g.
1. %Tmp = alloc_stack
2. copy_addr %InArg to [initialization] %Tmp
3. DataAddr = init_enum_data_addr %OutArg
4. copy_addr %Tmp#1 to [initialization] %DataAddr
becomes
1. %Tmp = alloc_stack
4. DataAddr = init_enum_data_addr %OutArg
2. copy_addr %InArg to [initialization] %DataAddr
Fixes at least one regression resulting from '++' removal.
See rdar://23874709 [perf] -Onone Execution Time regression of up-to 19%
-Onone results
|.benchmark............|.bestbase.|.bestopt.|..delta.|.%delta.|speedup.|
|.StringWalk...........|....33570.|...20967.|.-12603.|.-37.5%.|..1.60x.|
|.OpenClose............|......446.|.....376.|....-70.|.-15.7%.|..1.19x.|
|.SmallPT..............|....98959.|...83964.|.-14995.|.-15.2%.|..1.18x.|
|.StrToInt.............|....17550.|...16377.|..-1173.|..-6.7%.|..1.07x.|
|.BenchLangCallingCFunc|......453.|.....428.|....-25.|..-5.5%.|..1.06x.|
|.CaptureProp..........|....50758.|...48156.|..-2602.|..-5.1%.|..1.05x.|
|.ProtocolDispatch.....|.....5276.|....5017.|...-259.|..-4.9%.|..1.05x.|
|.Join.................|.....1433.|....1372.|....-61.|..-4.3%.|..1.04x.|
AFAICT, this does not fix any existing bug, but eliminates unverified
assumptions about well-formed SIL, which could be broken by future
optimization.
Forward: The optimization will replace all in-scope uses of the
destination address with the source. With this change we will be sure
not eliminate writes into a destination address unless the destination
is an AllocStackInst. This hasn't been a problem in practice because
the optimization requires an in-scope deinit of the destination
address, which can't happen on typical address projections.
Backward: The optimization will replace in-scope uses of the source
with the destination. With this change we will be sure not to write
into the destination location prior to the copy unless the destination
is an AllocStackInst. This hasn't been a problem in practice because
the optimization requires the copy to be an initialization of the
address, which can't happen on typical address projections.
This change prevents both optimizations without an obvious guarantee
that any dependency on the destination address will manifest as a
SIL-level dependence on the address producer. For example,
init_enum_data_addr would not qualify because it simply projects an
address within a value that may have other dependencies.
(libraries now)
It has been generally agreed that we need to do this reorg, and now
seems like the perfect time. Some major pass reorganization is in the
works.
This does not have to be the final word on the matter. The consensus
among those working on the code is that it's much better than what we
had and a better starting point for future bike shedding.
Note that the previous organization was designed to allow separate
analysis and optimization libraries. It turns out this is an
artificial distinction and not an important goal.