The hash function didn't take the "index" of an projection into account. This lead to quadratic behavior in redundant load elimination and dead store elimination when compiling very large functions with many loads and stores.
rdar://problem/56268570
Replaced some namespace qualified references to ArrayRef and Optional
with the unqualified type. Reordered the includes per clang-format.
Excerpted from @gottesm's https://github.com/apple/swift/pull/16756.
Added getUsers to ProjectionTree. The new method vands, via an out
parameter, a set of all the users of all of the nodes in the projection
tree that are themselves not in the projection tree by way of
getNonProjUsers. Took this opportunity to tweak getNonProjUsers to vend
a const ArrayRef rather than a SmallVector.
Excerpted from @gottesm's https://github.com/apple/swift/pull/16756.
Added getAllLeafTypes to ProjectionTree. The new method vends, via an
out paramter, a vector containing the types of all the leaves in a
projection tree in the order that they appear. The method relies uses a
new convenience on ProjectionTreeNode, isLeaf to include only the types
of those nodes which are leaves.
Excerpted from @gottesm's https://github.com/apple/swift/pull/16756.
This mostly requires changing various entry points to pass around a
TypeConverter instead of a SILModule. I've left behind entry points
that take a SILModule for a few methods like SILType::subst() to
avoid creating even more churn.
This improves on the previous situation:
- The request ensures that the backing storage for lazy properties
and property wrappers gets synthesized first; previously it was
only somewhat guaranteed by callers.
- Instead of returning a range this just returns an ArrayRef,
which simplifies clients.
- Indexing into the ArrayRef is O(1), which addresses some FIXMEs
in the SIL optimizer.
This patch comes out of my reading some generic code using .none in transparent
functions to conditionally compile out code at -Onone. Sadly, before this the
dead code in question wouldn't be compiled out unless the protocol was
constrained to be a class protocol.
I added a test that validates that this conditional compilation property can be
relied on in -Onone code in both cases.
The field's ordinal value is used by the Projection abstraction, which is
the basis of efficiently comparing and sorting access paths in SIL. It must
be cached before it is used by any SIL passes, including the verifier, or it
causes widespread quadratic complexity.
Fixes <rdar://problem/50353228> Swift compile time regression with optimizations enabled
In production code, a file that was taking 40 minutes to compile now
takes 1 minute, with more than half of the time in LLVM.
Here's a short script that reproduces the problem. It used to take 30s
and now takes 0.06s:
// swift genlazyinit.swift > lazyinit.sil
// sil-opt ./lazyinit.sil --access-enforcement-opts
var NumProperties = 300
print("""
sil_stage canonical
import Builtin
import Swift
import SwiftShims
public class LazyProperties {
""")
for i in 0..<NumProperties {
print("""
// public lazy var i\(i): Int { get set }
@_hasStorage @_hasInitialValue final var __lazy_storage__i\(i): Int? { get set }
""")
}
print("""
}
// LazyProperties.init()
sil @$s4lazy14LazyPropertiesCACycfc : $@convention(method) (@owned LazyProperties) -> @owned LazyProperties {
bb0(%0 : $LazyProperties):
%enum = enum $Optional<Int>, #Optional.none!enumelt
""")
for i in 0..<NumProperties {
let adr = (i*4) + 2
let access = adr + 1
print("""
%\(adr) = ref_element_addr %0 : $LazyProperties, #LazyProperties.__lazy_storage__i\(i)
%\(access) = begin_access [modify] [dynamic] %\(adr) : $*Optional<Int>
store %enum to %\(access) : $*Optional<Int>
end_access %\(access) : $*Optional<Int>
""")
}
print("""
return %0 : $LazyProperties
} // end sil function '$s4lazy14LazyPropertiesCACycfc'
""")
An interprocedural analysis pass that summarizes the dynamically
enforced formal accesses within a function. These summaries will be
used by a new AccessEnforcementOpts pass to locally fold access scopes
and remove dynamic checks based on whole module analysis.
introduce a common superclass, SILNode.
This is in preparation for allowing instructions to have multiple
results. It is also a somewhat more elegant representation for
instructions that have zero results. Instructions that are known
to have exactly one result inherit from a class, SingleValueInstruction,
that subclasses both ValueBase and SILInstruction. Some care must be
taken when working with SILNode pointers and testing for equality;
please see the comment on SILNode for more information.
A number of SIL passes needed to be updated in order to handle this
new distinction between SIL values and SIL instructions.
Note that the SIL parser is now stricter about not trying to assign
a result value from an instruction (like 'return' or 'strong_retain')
that does not produce any.
Applying nontrivial generic arguments to a nontrivial SIL layout requires lowered SILType substitution, which requires a SILModule. NFC yet, just an API change.
The new instructions are: ref_tail_addr, tail_addr and a new attribute [ tail_elems ] for alloc_ref.
For details see docs/SIL.rst
As these new instructions are not generated so far, this is a NFC.
The new instructions are: ref_tail_addr, tail_addr and a new attribute [ tail_elems ] for alloc_ref.
For details see docs/SIL.rst
As these new instructions are not generated so far, this is a NFC.
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Small modifications to the test cases.
Several functionalities have been added to FSO over time and the logic has become
muddled.
We were always looking at a static image of the SIL and try to reason about what kind of
function signature related optimizations we can do.
This can easily lead to muddled logic. e.g. we need to consider 2 different function
signature optimizations together instead of independently.
Split 1 single function to do all sorts of different analyses in FSO into several
small transformations, each of which does a specific job. After every analysis, we produce
a new function and eventually we collapse all intermediate thunks to in a single thunk.
With this change, it will be easier to implement function signature optimization as now
we can do them independently now.
Minimal modifications to the test cases.
This split the function signature module pass into 2 functin passes.
By doing so, this allows us to rewrite to using the FSO-optimized
function prior to attempting inlining, but allow us to do a substantial
amount of optimization on the current function before attempting to do
FSO on that function.
And also helps us to move to a model which module pass is NOT used unless
necesary.
I do not see regression nor improvement for on the performance test suite.
functionsignopts.sil and functionsignopt_sroa.sil are modified because the
mangler now takes into account of information in the projection tree.
a separate analysis pass.
This pass is run on every function and the optimized signature is return'ed through the
getArgDescList and getResultDescList.
Next step is to split to cloning and callsite rewriting into their own function passes.
rdar://24730896
"
analysis pass.
This pass is run on every function and the optimized signature is return'ed through the
getArgDescList and getResultDescList.
Next step is to split to cloning and callsite rewriting into their own function passes.
rdar://24730896
This change includes an option on how IsLive is defined/computed. the ProjectionTree
can now choose to ignore epilogue releases and mark a node as dead if its only non-debug
user is epilogue release.
It can also mark a node as alive even its only user is epilogue release as before.
Imagine a case where one passes in an array and not access its owner
besides to release it. In such a case, we *do* want to be able to eliminate
that argument even though there is a release in the function epilogue.
This will help to get rid of the retain and release pair at the callsite. i.e.
the guaranteed paramter is elimininated.
rdar://21114206
LSValue::reduce reduces a set of LSValues (mapped to a set of LSLocations) to
a single LSValue.
It can then be used as the forwarding value for the location.
Previously, we expand into intermediate nodes and leaf nodes and then go bottom
up, trying to create a single LSValue out of the given LSValues.
Instead, we now use a recursion to go top down. This simplifies the code. And this
is fine as we do not expect to run into type tree that are too deep.
Existing test cases ensure correctness.
When we have all the epilogue releases. Make sure they cover all the non-trivial
parts of the base. Otherwise, treat as if we've found no releases for the base.
Currently. this is a NFC other than epilogue dumper. I will wire it up with
function signature with next commit.
This is part of rdar://22380547
So instead of only being able to match %1 and release %1 in (1). we
can also match %1 with (release %2, and release%3, i.e. exploded release_value)
in (2).
(1)
foo(%1)
strong_release %1
(2)
foo(%1)
%2 = struct_extract %1, field_a
%3 = struct_extract %1, field_b
strong_release %2
strong_release %3
This will allow function signature to better move the release instructions to
the callers.
Currently, this is a NFC other than testing using the epilogue match dumper.