Pre-specialization of `Array._endMutation` (for AnyObject) prevents inlining this function and that results in sub-optimal code.
This function is basically a no-op. So it should be inlined.
Unfortunately we cannot remove the specialize-attributes anymore because the pre-specialized function(s) are now part of the stdlib's ABI.
Therefore make an exception for `Array._endMutation` in the generic specializer.
For example:
```
var p = Point(x: 10, y: 20)
let o = UnsafePointer(&p)
```
Also support outlined arrays with pointers to other globals. For example:
```
var g1 = 1
var g2 = 2
func f() -> [UnsafePointer<Int>] {
return [UnsafePointer(&g1), UnsafePointer(&g2)]
}
```
It sets the `[bare]` attribute for `alloc_ref` and `global_value` instructions if their header (reference count and metatype) is not used throughout the lifetime of the object.
The ComputeEffects pass derives escape information for function arguments and adds those effects in the function.
This needs a lot of changes in check-lines in the tests, because the effects are printed in SIL
The ComputeEffects pass derives escape information for function arguments and adds those effects in the function.
This needs a lot of changes in check-lines in the tests, because the effects are printed in SIL
Reduces the number of _ContiguousArrayStorage metadata.
In order to support constant time bridging we do need to set the correct
metadata when we bridge to Objective-C. This is so that the type check
succeeds when bridging back from Objective-C to reuse the storage
instance rather than bridging the elements.
To support dynamically setting the `_ContiguousArrayStorage` element
type i needed to add support for optimizing `alloc_ref_dynamic`
throughout the optimizer.
Possible future improvements:
* Use different metadata such that we can disambiguate native Swift
classes during destruction -- allowing native release rather then unknown
release usage.
* Optimize the newly added semantic function
getContiguousArrayStorageType
rdar://86171143
Literal closures are only ever directly referenced in the context of the expression they're written in,
so it's wasteful to emit them at their fully-substituted calling convention and then reabstract them if
they're passed directly to a generic function. Avoid this by saving the abstraction pattern of the context
before emitting the closure, and then lowering its main entry point's calling convention at that
level of abstraction. Generalize some of the prolog/epilog code to handle converting arguments and returns
to the correct representation for a different abstraction level.
Literal closures are only ever directly referenced in the context of the expression they're written in,
so it's wasteful to emit them at their fully-substituted calling convention and then reabstract them if
they're passed directly to a generic function. Avoid this by saving the abstraction pattern of the context
before emitting the closure, and then lowering its main entry point's calling convention at that
level of abstraction. Generalize some of the prolog/epilog code to handle converting arguments and returns
to the correct representation for a different abstraction level.
Literal closures are only ever directly referenced in the context of the expression they're written in,
so it's wasteful to emit them at their fully-substituted calling convention and then reabstract them if
they're passed directly to a generic function. Avoid this by saving the abstraction pattern of the context
before emitting the closure, and then lowering its main entry point's calling convention at that
level of abstraction. Generalize some of the prolog/epilog code to handle converting arguments and returns
to the correct representation for a different abstraction level.
We don't need to initialize the global object if it's never used for something which can access the object header.
This let's e.g. lookup-tables to be compiled with zero Array-overhead.
rdar://59874359
Tests which check if the optimizer is able to generate a certain code should never be "worked around" by adding command line options.
This defeats the purpose of such tests.
Unfortunately some optimizer deficiencies got unnoticed by adding this option.
to-do: there are more such cases which I didn't fix in this PR yet.
Actually: generate the array of (key, value) tuples in the data section, which is then passed to Dictionary.init(dictionaryLiteral:)
We already do this for simple arrays, e.g. arrays with trivial element types.
The only change needed for dictionary literals is to support tuple types in the ObjectOutliner.
The effect of this optimization is a significant reduction in code size for dictionary literals - and an increase in data size.
But in most cases there is a considerable net win for code+data size in total.
This change could impact Swift programs that previously appeared
well-behaved, but weren't fully tested in debug mode. Now, when running
in release mode, they may trap with the message "error: overlapping
accesses...".
Recent optimizations have brought performance where I think it needs
to be for adoption. More optimizations are planned, and some
benchmarks should be further improved, but at this point we're ready
to begin receiving bug reports. That will help prioritize the
remaining work for Swift 5.
Of the 656 public microbenchmarks in the Swift repository, there are
still several regressions larger than 10%:
TEST OLD NEW DELTA RATIO
ClassArrayGetter2 139 1307 +840.3% **0.11x**
HashTest 631 1233 +95.4% **0.51x**
NopDeinit 21269 32389 +52.3% **0.66x**
Hanoi 1478 2166 +46.5% **0.68x**
Calculator 127 158 +24.4% **0.80x**
Dictionary3OfObjects 391 455 +16.4% **0.86x**
CSVParsingAltIndices2 526 604 +14.8% **0.87x**
Prims 549 626 +14.0% **0.88x**
CSVParsingAlt2 1252 1411 +12.7% **0.89x**
Dictionary4OfObjects 206 232 +12.6% **0.89x**
ArrayInClass 46 51 +10.9% **0.90x**
The common pattern in these benchmarks is to define an array of data
as a class property and to repeatedly access that array through the
class reference. Each of those class property accesses now incurs a
runtime call. Naturally, introducing a runtime call in a loop that
otherwise does almost no work incurs substantial overhead. This is
similar to the issue caused by automatic reference counting. In some
cases, more sophistacated optimization will be able to determine the
same object is repeatedly accessed. Furthermore, the overhead of the
runtime call itself can be improved. But regardless of how well we
optimize, there will always a class of microbenchmarks in which the
runtime check has a noticeable impact.
As a general guideline, avoid performing class property access within
the most performance critical loops, particularly on different objects
in each loop iteration. If that isn't possible, it may help if the
visibility of those class properties is private or internal.
We run GlobalOpt multiple times in the pass pipeline but in some cases object outlining shouldn't be done too early.
Having it done in a separate pass enables to run it independently from GlobalOpt.
We run GlobalOpt multiple times in the pass pipeline but in some cases object outlining shouldn't be done too early.
Having it done in a separate pass enables to run it independently from GlobalOpt.
* Reduce array abstraction on apple platforms dealing with literals
Part of the ongoing quest to reduce swift array literal abstraction
penalties: make the SIL optimizer able to eliminate bridging overhead
when dealing with array literals.
Introduce a new classify_bridge_object SIL instruction to handle the
logic of extracting platform specific bits from a Builtin.BridgeObject
value that indicate whether it contains a ObjC tagged pointer object,
or a normal ObjC object. This allows the SIL optimizer to eliminate
these, which allows constant folding a ton of code. On the example
added to test/SILOptimizer/static_arrays.swift, this results in 4x
less SIL code, and also leads to a lot more commonality between linux
and apple platform codegen when passing an array literal.
This also introduces a couple of SIL combines for patterns that occur
in the array literal passing case.
The main part of the change is to support the ptr_to_int builtin in statically initialized globals. This builtin is used to build a StaticString from a string_literal.
On the other hand I removed the support of the FPTrunc builtin, which is not needed anyway (because it can be constant propagated).
Mainly this is done for array literals.
This new optimization creates a statically initialized global variable which is the allocated object.
The alloc_ref instruction is replaced by a global_value instruction.
This optimization can give significant performance improvements for large array literals.