The new implementation has several benefits compared to the old C++ implementation:
* It is significantly simpler. It optimizes each load separately instead of all at once with bit-field based dataflow.
* It's using alias analysis more accurately which enables more loads to be optimized
* It avoids inserting additional copies in OSSA
The algorithm is a data flow analysis which starts at the original load and searches for preceding stores or loads by following the control flow in backward direction.
The preceding stores and loads provide the "available values" with which the original load can be replaced.
Although nonescaping closures are representationally trivial pointers to their
on-stack context, it is useful to model them as borrowing their captures, which
allows for checking correct use of move-only values across the closure, and
lets us model the lifetime dependence between a closure and its captures without
an ad-hoc web of `mark_dependence` instructions.
During ownership elimination, We eliminate copy/destroy_value instructions and
end the partial_apply's lifetime with an explicit dealloc_stack as before,
for compatibility with existing IRGen and non-OSSA aware passes.
The ComputeEffects pass derives escape information for function arguments and adds those effects in the function.
This needs a lot of changes in check-lines in the tests, because the effects are printed in SIL
The ComputeEffects pass derives escape information for function arguments and adds those effects in the function.
This needs a lot of changes in check-lines in the tests, because the effects are printed in SIL
-sil-inline-never-functions already exists, but it does a substring
match. This is not desired all the time. Add
-sil-inline-never-function flag that does a full string match and avoids
inlining functions with that name
This has two advantages:
1. It does not force the Array in memory (to pass it as inout self to the non-inlinable _createNewBuffer).
2. The new _consumeAndCreateNew is annotated to consume self. This helps to reduce unnecessary retains/releases.
The change applies for Array and ContiguousArray.
And fix a test case that assumes Array.append is never inlined.
We need to be able to prevent inlining a function and any of its
specializations. There's no way to specify multiple function names on
the command line, so we use the partial match technique.
This additional check lets the optimizer eliminate most of the append-code in specializations where the appended sequence is also an Array.
For example, when "adding" arrays, e.g. arr += other_arr
This change could impact Swift programs that previously appeared
well-behaved, but weren't fully tested in debug mode. Now, when running
in release mode, they may trap with the message "error: overlapping
accesses...".
Recent optimizations have brought performance where I think it needs
to be for adoption. More optimizations are planned, and some
benchmarks should be further improved, but at this point we're ready
to begin receiving bug reports. That will help prioritize the
remaining work for Swift 5.
Of the 656 public microbenchmarks in the Swift repository, there are
still several regressions larger than 10%:
TEST OLD NEW DELTA RATIO
ClassArrayGetter2 139 1307 +840.3% **0.11x**
HashTest 631 1233 +95.4% **0.51x**
NopDeinit 21269 32389 +52.3% **0.66x**
Hanoi 1478 2166 +46.5% **0.68x**
Calculator 127 158 +24.4% **0.80x**
Dictionary3OfObjects 391 455 +16.4% **0.86x**
CSVParsingAltIndices2 526 604 +14.8% **0.87x**
Prims 549 626 +14.0% **0.88x**
CSVParsingAlt2 1252 1411 +12.7% **0.89x**
Dictionary4OfObjects 206 232 +12.6% **0.89x**
ArrayInClass 46 51 +10.9% **0.90x**
The common pattern in these benchmarks is to define an array of data
as a class property and to repeatedly access that array through the
class reference. Each of those class property accesses now incurs a
runtime call. Naturally, introducing a runtime call in a loop that
otherwise does almost no work incurs substantial overhead. This is
similar to the issue caused by automatic reference counting. In some
cases, more sophistacated optimization will be able to determine the
same object is repeatedly accessed. Furthermore, the overhead of the
runtime call itself can be improved. But regardless of how well we
optimize, there will always a class of microbenchmarks in which the
runtime check has a noticeable impact.
As a general guideline, avoid performing class property access within
the most performance critical loops, particularly on different objects
in each loop iteration. If that isn't possible, it may help if the
visibility of those class properties is private or internal.
We did this for @in => @owned for all parameters before enabling +0. We decided
to defer this work to after +0 was turned back on.
This also fixes the array_contentof_opt test without making append(contentOf: )
take the container at +1.
rdar://38152291
* Give Sequence a top-level Element, constrain Iterator to match
* Remove many instances of Iterator.
* Fixed various hard-coded tests
* XFAIL a few tests that need further investigation
* Change assoc type for arrayLiteralConvertible
* Mop up remaining "better expressed as a where clause" warnings
* Fix UnicodeDecoders prototype test
* Fix UIntBuffer
* Fix hard-coded Element identifier in CSDiag
* Fix up more tests
* Account for flatMap changes
if the argument is an array literal.
For example:
arr += [1, 2, 3]
is replaced by:
arr.append(1)
arr.append(2)
arr.append(3)
This gives considerable speedups up to 10x (for our micro-benchmarks which test this).
This is based on the work of @ben-ng, who implemented the first version of this optimization (thanks!).