Now we handle this case:
%stack = alloc_stack $Protocol
copy_addr %var to [initialization] %stack
open_existential_addr immutable_access %stack
...
destroy_addr %stack
dealloc_stack %stack
We only do this if we have immutable_access. To be conservative I still only let
the normal whitelist of other instructions through.
I am adding this optimization since I am going to be eliminating a SILGen
peephole so I can enable SIL ownership verification everywhere.
A recent fix added some code that is inconsistent with the design of
the pass, creates an implicit coupling between separate parts of the
optimization, and relies on incredibly subtle reasoning to ensure
correctness.
This sort of thing needs a big fat comment.
This disables a bunch of passes when ownership is enabled. This will allow me to
keep transparent functions in ossa and skip most of the performance pipeline without
being touched by passes that have not been updated for ownership.
This is important so that we can in -Onone code import transparent functions and
inline them into other ossa functions (you can't inline from ossa => non-ossa).
The problematic scenario is that a function receives an @in_guaranteed and @inout parameter where one is a copy of the other value. For example, when appending a container to itself.
In this case the optimization removed the copy_addr, resulting in passing the same stack location to both parameters.
rdar://problem/47632890
Once the algorithm has begun hoisting destroys globally, there's no
way to cleanly bailout. The previous attempt to bailout could result
in an assert or lost destroy in release mode.
This is continued fall-out from changes in the previous release to
upstream SILGen or mandatory passes, such as PredictableMemOps, that
no longer preserve natural variable lifetimes.
In this case, we end up with SIL like this before CopyForwarding:
bb(%arg)
%local_addr = alloc_stack
store %arg to %local_addr
%payload = switch_enum(%arg)
retain %arg
store %arg to %some_addr
destroy_addr %local_addr
release_value %arg
We're attempting to hoist the destroy_addr to its last use, but can't
because the lifetimes of the alloc_stack (%local_addr) and the value
being stored on the stack (%arg) have become mixed up by an upstream
pass. We actually detect this situation now in order to bail-out of
destroy hoisting. Sadly, the bailout might only partially recover in
the case of interesting control flow, as happens in the test case's
Graph.init function. This triggers an assert, but in release mode it
simply drops the destroy.
Fixed <rdar://problem/43888666> [SR-8526]: Memory leak after switch in
release configuration.
At least most of these were latent bugs since the code was
unreachable in the PartialApply case. But that's no excuse to misuse
the API.
Also, whenever referring to an integer index, be explicit about
whether it is an applied argument or callee argument.
Fixes <rdar://problem/39209102> [SR-7354]: Swift 4.1 Regression: EXC_BAD_ACCESS for Optimized Builds in Xcode 9.3
Commentary:
The underlying problem is that CopyForwarding is an inherently
dangerous pass by design that has been waiting for the SIL
representation to evolve to the point where it can be rewritten.
The regressions was caused by PredictableMemOps failing to preserve
normal patterns of ownership. When it forwards loads, it implicitly
extends the lifetime of stored value
store %val to %addr
...
retain %val
...
destroy_addr %addr
CopyForwarding already tried to detect such violations of ownership
rules and normally bypasses destroy hoisting in those cases. In this
case, it fails to detect the problem because PredictableMemOps has
already erased the load, so there's no evidence of the value's
lifetime being extended.
It might have been nice if PredictableMemOps had transformed the
retain %val into a retain %addr. However, for the immediate fix, we
don't want to change any existing behavior other than suppressing
optimization. In the long term, CopyForwarding does not really make
sense without SemanticSIL+SILOwnership and should be totally rewritten
and greatly simplified at that point.
Reviewing the code with Arnold revealed a corner case where forward propagating
a copy into multiple operands of the same instruction wasn't properly detected.
I don't think this case was possible given the language rules, but nonetheless
it is valid SIL and needs to be handled.
This is confusing because in some cases we optimize that situation correctly,
and in other cases we try to assert that it doesn't happen. So, I simplified the
code to bailout anywhere that we see multiple operands of the same value. This
isn't an expected situation that needs to be optimized.
Copy forwarding was designed with some assumptions about symmetry of
operations. If copy_value/destroy_value are expanded somewhere for a given
value, then they should be expanded everywhere. The pass took a conservative
approach to SIL patterns that were guaranteed safe, and bailed out on unknown
patterns. However, due to some over-aggressive code factoring, the assumption
wasn't being checked in one corner case.
This redesign makes a clear distinction between the requirements for forward
vs. backward propagation.
Fixes <rdar://35646292> Swift CI: resilience bot seg faults in
stdlib/RangeReplaceable.swift.gyb
introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
The etymology of these terms isn't about race, but "black" = "blocked"
and "white" = "allowed" isn't really a good look these days. In most
cases we weren't using these terms particularly precisely anyway, so
the rephrasing is actually an improvement.
This is a separate optimization that detects short-lived temporaries that can be eliminated.
This is necessary now that SILGen no longer performs basic RValue forwarding in some cases.
SR-5508: Performance regression in benchmarks caused by removing SILGen peephole for LoadExpr in +0 context
At some point, pass definitions were heavily macro-ized. Pass
descriptive names were added in two places. This is not only redundant
but a source of confusion. You could waste a lot of time grepping for
the wrong string. I removed all the getName() overrides which, at
around 90 passes, was a fairly significant amount of code bloat.
Any pass that we want to be able to invoke by name from a tool
(sil-opt) or pipeline plan *should* have unique type name, enum value,
commend-line string, and name string. I removed a comment about the
various inliner passes that contradicted that.
Side note: We should be consistent with the policy that a pass is
identified by its type. We have a couple passes, LICM and CSE, which
currently violate that convention.
Separate formal lowered types from SIL types.
The SIL type of an argument will depend on the SIL module's conventions.
The module conventions are determined by the SIL stage and LangOpts.
Almost NFC, but specialized manglings are broken incidentally as a result of
fixes to the way passes handle book-keeping of aruments. The mangler is fixed in
the subsequent commit.
Otherwise, NFC is intended, but quite possible do to rewriting the logic in many
places.
This means using a struct so we can put methods on the struct and using an
anonymous enum to create namespaced values. Specifically:
struct SILArgumentConvention {
enum : uint8_t {
Indirect_In,
Indirect_In_Guaranteed,
Indirect_Inout,
Indirect_InoutAliasable,
Indirect_Out,
Direct_Owned,
Direct_Unowned,
Direct_Deallocating,
Direct_Guaranteed,
} Value;
SILArgumentConvention(decltype(Value) NewValue)
: Value(NewValue) {}
operator decltype(Value)() const {
return Value;
}
ParameterConvention getParameterConvention() const {
switch (Value) {
...
}
}
bool isIndirectConvention() const {
...
}
};
This allows for:
1. Avoiding abstraction leakage via the enum type. If someone wants to use
decltype as well, I think that is enough work that the leakage is acceptable.
2. Still refer to enum cases like we are working with an enum class
(e.g. SILArgumentConvention::Direct_Owned).
3. Avoid using the anonymous type in function arguments due to an implicit
conversion.
4. And most importantly... *drum roll* add methods to our enums!
For a long time, we have:
1. Created methods on SILArgument that only work on either function arguments or
block arguments.
2. Created code paths in the compiler that only allow for "function"
SILArguments or "block" SILArguments.
This commit refactors SILArgument into two subclasses, SILPHIArgument and
SILFunctionArgument, separates the function and block APIs onto the subclasses
(leaving the common APIs on SILArgument). It also goes through and changes all
places in the compiler that conditionalize on one of the forms of SILArgument to
just use the relevant subclass. This is made easier by the relevant APIs not
being on SILArgument anymore. If you take a quick look through you will see that
the API now expresses a lot more of its intention.
The reason why I am performing this refactoring now is that SILFunctionArguments
have a ValueOwnershipKind defined by the given function's signature. On the
other hand, SILBlockArguments have a stored ValueOwnershipKind. Rather than
store ValueOwnershipKind in both instances and in the function case have a dead
variable, I decided to just bite the bullet and fix this.
rdar://29671437
Changes:
* Terminate all namespaces with the correct closing comment.
* Make sure argument names in comments match the corresponding parameter name.
* Remove redundant get() calls on smart pointers.
* Prefer using "override" or "final" instead of "virtual". Remove "virtual" where appropriate.
Similarly to how we've always handled parameter types, we
now recursively expand tuples in result types and separately
determine a result convention for each result.
The most important code-generation change here is that
indirect results are now returned separately from each
other and from any direct results. It is generally far
better, when receiving an indirect result, to receive it
as an independent result; the caller is much more likely
to be able to directly receive the result in the address
they want to initialize, rather than having to receive it
in temporary memory and then copy parts of it into the
target.
The most important conceptual change here that clients and
producers of SIL must be aware of is the new distinction
between a SILFunctionType's *parameters* and its *argument
list*. The former is just the formal parameters, derived
purely from the parameter types of the original function;
indirect results are no longer in this list. The latter
includes the indirect result arguments; as always, all
the indirect results strictly precede the parameters.
Apply instructions and entry block arguments follow the
argument list, not the parameter list.
A relatively minor change is that there can now be multiple
direct results, each with its own result convention.
This is a minor change because I've chosen to leave
return instructions as taking a single operand and
apply instructions as producing a single result; when
the type describes multiple results, they are implicitly
bound up in a tuple. It might make sense to split these
up and allow e.g. return instructions to take a list
of operands; however, it's not clear what to do on the
caller side, and this would be a major change that can
be separated out from this already over-large patch.
Unsurprisingly, the most invasive changes here are in
SILGen; this requires substantial reworking of both call
emission and reabstraction. It also proved important
to switch several SILGen operations over to work with
RValue instead of ManagedValue, since otherwise they
would be forced to spuriously "implode" buffers.
As there are no instructions left which produce multiple result values, this is a NFC regarding the generated SIL and generated code.
Although this commit is large, most changes are straightforward adoptions to the changes in the ValueBase and SILValue classes.
Having a separate address and container value returned from alloc_stack is not really needed in SIL.
Even if they differ we have both addresses available during IRGen, because a dealloc_stack is always dominated by the corresponding alloc_stack in the same function.
Although this commit quite large, most changes are trivial. The largest non-trivial change is in IRGenSIL.
This commit is a NFC regarding the generated code. Even the generated SIL is the same (except removed #0, #1 and @local_storage).
Don't allow this optimization to kick in for "inout" args.
The optimization may expose local writes to any aliases of the argument.
I can't prove that is memory safe.
Erik pointed out this case.
This requires a bit of code motion.
e.g.
1. %Tmp = alloc_stack
2. copy_addr %InArg to [initialization] %Tmp
3. DataAddr = init_enum_data_addr %OutArg
4. copy_addr %Tmp#1 to [initialization] %DataAddr
becomes
1. %Tmp = alloc_stack
4. DataAddr = init_enum_data_addr %OutArg
2. copy_addr %InArg to [initialization] %DataAddr
Fixes at least one regression resulting from '++' removal.
See rdar://23874709 [perf] -Onone Execution Time regression of up-to 19%
-Onone results
|.benchmark............|.bestbase.|.bestopt.|..delta.|.%delta.|speedup.|
|.StringWalk...........|....33570.|...20967.|.-12603.|.-37.5%.|..1.60x.|
|.OpenClose............|......446.|.....376.|....-70.|.-15.7%.|..1.19x.|
|.SmallPT..............|....98959.|...83964.|.-14995.|.-15.2%.|..1.18x.|
|.StrToInt.............|....17550.|...16377.|..-1173.|..-6.7%.|..1.07x.|
|.BenchLangCallingCFunc|......453.|.....428.|....-25.|..-5.5%.|..1.06x.|
|.CaptureProp..........|....50758.|...48156.|..-2602.|..-5.1%.|..1.05x.|
|.ProtocolDispatch.....|.....5276.|....5017.|...-259.|..-4.9%.|..1.05x.|
|.Join.................|.....1433.|....1372.|....-61.|..-4.3%.|..1.04x.|
AFAICT, this does not fix any existing bug, but eliminates unverified
assumptions about well-formed SIL, which could be broken by future
optimization.
Forward: The optimization will replace all in-scope uses of the
destination address with the source. With this change we will be sure
not eliminate writes into a destination address unless the destination
is an AllocStackInst. This hasn't been a problem in practice because
the optimization requires an in-scope deinit of the destination
address, which can't happen on typical address projections.
Backward: The optimization will replace in-scope uses of the source
with the destination. With this change we will be sure not to write
into the destination location prior to the copy unless the destination
is an AllocStackInst. This hasn't been a problem in practice because
the optimization requires the copy to be an initialization of the
address, which can't happen on typical address projections.
This change prevents both optimizations without an obvious guarantee
that any dependency on the destination address will manifest as a
SIL-level dependence on the address producer. For example,
init_enum_data_addr would not qualify because it simply projects an
address within a value that may have other dependencies.