Eventually, we decided to do this
1. Have the function signature opts (used to be called the cloner to create
the optimized function.
2. Mark the thunk as always_inline
3. Rely on the inliner to inline the thunk to get the benefit of calling optimized
function directly.
We decided to use the inliner to rewrite the caller's callsites.
And eventually I will turn FunctionSignatureAnalysis into a Utility.
As its data should only be used and kept in the cloner pass.
This forces the callsites to be rewritten by the inliner.
we have the issue that the thunk changes from the time the its created to
the time its reread to figure out what we have done to the original function
This results in missed opportunities.
This solution solves the problem gracefully, because the thunk carries the information
on how to set up the call to the optimized functions.
Inlining the thunk makes the callsite calling the optimized function for free. i.e.
without any rewriting.
I did not measure any regression with this change.
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
This occured if a stack-promoted object with a devirtualized final release is not actually allocated on the stack.
Now the ReleaseDevirtualizer models the procedure of a final release more accurately.
It inserts a set_deallocating instruction and calles the deallocator (instead of just the deinit).
This changes also includes two peephole optimizations in IRGen and LLVMStackPromotion which get rid of
unused runtime calls in case the stack promoted object is really allocated on the stack.
This fixes rdar://problem/25068118
In many places, we're interested in whether a type with archetypes *might be* a superclass of another type with the right bindings, particularly in the optimizer. Provide a separate Type::isBindableToSuperclassOf method that performs this check. Use it in the devirtualizer to fix rdar://problem/24993618. Using it might unblock other places where the optimizer is conservative, but we can fix those separately.
We were creating new uses of an argument just prior to erasing it from
the block argument list.
We need to replace references to that value in the side structure we
generate with references to the new value that we're replacing it with.
Fixes SR-884 / rdar://problem/25008398.
LSValue::reduce reduces a set of LSValues (mapped to a set of LSLocations) to
a single LSValue.
It can then be used as the forwarding value for the location.
Previously, we expand into intermediate nodes and leaf nodes and then go bottom
up, trying to create a single LSValue out of the given LSValues.
Instead, we now use a recursion to go top down. This simplifies the code. And this
is fine as we do not expect to run into type tree that are too deep.
Existing test cases ensure correctness.
For forwarding on allocstacks, we can invalidate the forwable bit when we
hit the deallocate stack.
This helps compilation time as we do not need to propagate these bits down
to subsequent basic blocks.
Reinstates commit 0c2ca94ef7
With two bug fixes:
*) use after free asan crash
*) wrong check in ValueLifetimeAnalysis::isWithinLifetime
And some refactoring
We were giving special handling to ApplyInst when we were attempting to use
getMemoryBehavior(). This commit changes the special handling to work on all
full apply sites instead of just AI. Additionally, we look through partial
applies and thin to thick functions.
I also added a dumper called BasicInstructionPropertyDumper that just dumps the
results of SILInstruction::get{Memory,Releasing}Behavior() for all instructions
in order to verify this behavior.
With this re-abstraction a specialized function has the same calling convention as if it would have been written with the specialized types in the first place.
In general this results in less alloc_stacks and load/stores.
It also can eliminate some re-abstraction thunks, e.g. if a generic closure is used in a non-generic context.
It some (hopefully rare) cases it may require to add re-abstraction thunks.
In case a function has multiple indirect results, only the first is converted to a direct result. This is an open TODO.
We were handling regular uses, but not handling promotions in things
like debug_value_addr.
This was exposed by some pass ordering changes I have in an upcoming
commit.
For a release on a guaranteed function paramater, we know right away
that its not the final release and therefore does not call deinit.
Therefore we know it does not read or write memory other than the reference
count.
This reduces the compilation time of dead store and redundant load elim. As
we need to go over alias analysis to make sure tracked locations do not alias
with it.
After collected enough information in the first iteration of the
data flow. We do not do second iteration (last iteration) for blocks
without loads as we will not forward any load there.
This improves compilation time of redundant load elimination.
When we emit calls to existential methods silgen produces a sequence of the
three instructions below:
open_existential_addr %0 : $*Pingable to $*@opened("1E467EB8-...") Pingable
witness_method $@opened("1E467EB8-...") Pingable, #Pingable.ping!1
apply %3<@opened("1E467EB8-...") Pingable>(%2)
This commit adds a new CSE-like pass that finds sequences of calls to protocol
methods and reuses the first two instructions open_existential_addr and
witness_method. The optimization finds arguments that must not alias and may not
escape and combines all of the existential method calls to use the same method
lookup. The optimization handles control flow by finding the top dominating
open_existential instruction, and uses that instruction.
related to rdar://22704464.
Similarly to how we've always handled parameter types, we
now recursively expand tuples in result types and separately
determine a result convention for each result.
The most important code-generation change here is that
indirect results are now returned separately from each
other and from any direct results. It is generally far
better, when receiving an indirect result, to receive it
as an independent result; the caller is much more likely
to be able to directly receive the result in the address
they want to initialize, rather than having to receive it
in temporary memory and then copy parts of it into the
target.
The most important conceptual change here that clients and
producers of SIL must be aware of is the new distinction
between a SILFunctionType's *parameters* and its *argument
list*. The former is just the formal parameters, derived
purely from the parameter types of the original function;
indirect results are no longer in this list. The latter
includes the indirect result arguments; as always, all
the indirect results strictly precede the parameters.
Apply instructions and entry block arguments follow the
argument list, not the parameter list.
A relatively minor change is that there can now be multiple
direct results, each with its own result convention.
This is a minor change because I've chosen to leave
return instructions as taking a single operand and
apply instructions as producing a single result; when
the type describes multiple results, they are implicitly
bound up in a tuple. It might make sense to split these
up and allow e.g. return instructions to take a list
of operands; however, it's not clear what to do on the
caller side, and this would be a major change that can
be separated out from this already over-large patch.
Unsurprisingly, the most invasive changes here are in
SILGen; this requires substantial reworking of both call
emission and reabstraction. It also proved important
to switch several SILGen operations over to work with
RValue instead of ManagedValue, since otherwise they
would be forced to spuriously "implode" buffers.