* Simplified the logic for creating static initializers and constant folding for global variables: instead of creating a getter function, directly inline the constant value into the use-sites.
* Wired up the constant folder in GlobalOpt, so that a chains for global variables can be propagated, e.g.
let a = 1
let b = a + 10
let c = b + 5
* Fixed a problem where we didn't create a static initializer if a global is not used in the same module. E.g. a public let variable.
* Simplified the code in general.
rdar://problem/31515927
This change could impact Swift programs that previously appeared
well-behaved, but weren't fully tested in debug mode. Now, when running
in release mode, they may trap with the message "error: overlapping
accesses...".
Recent optimizations have brought performance where I think it needs
to be for adoption. More optimizations are planned, and some
benchmarks should be further improved, but at this point we're ready
to begin receiving bug reports. That will help prioritize the
remaining work for Swift 5.
Of the 656 public microbenchmarks in the Swift repository, there are
still several regressions larger than 10%:
TEST OLD NEW DELTA RATIO
ClassArrayGetter2 139 1307 +840.3% **0.11x**
HashTest 631 1233 +95.4% **0.51x**
NopDeinit 21269 32389 +52.3% **0.66x**
Hanoi 1478 2166 +46.5% **0.68x**
Calculator 127 158 +24.4% **0.80x**
Dictionary3OfObjects 391 455 +16.4% **0.86x**
CSVParsingAltIndices2 526 604 +14.8% **0.87x**
Prims 549 626 +14.0% **0.88x**
CSVParsingAlt2 1252 1411 +12.7% **0.89x**
Dictionary4OfObjects 206 232 +12.6% **0.89x**
ArrayInClass 46 51 +10.9% **0.90x**
The common pattern in these benchmarks is to define an array of data
as a class property and to repeatedly access that array through the
class reference. Each of those class property accesses now incurs a
runtime call. Naturally, introducing a runtime call in a loop that
otherwise does almost no work incurs substantial overhead. This is
similar to the issue caused by automatic reference counting. In some
cases, more sophistacated optimization will be able to determine the
same object is repeatedly accessed. Furthermore, the overhead of the
runtime call itself can be improved. But regardless of how well we
optimize, there will always a class of microbenchmarks in which the
runtime check has a noticeable impact.
As a general guideline, avoid performing class property access within
the most performance critical loops, particularly on different objects
in each loop iteration. If that isn't possible, it may help if the
visibility of those class properties is private or internal.
This eliminates a pretty similar list of passes added in a similar order
with just re-using the ordering from AddSSAPasses. Beyond the particular
inliner pass (which is maintained with this change), there was nothing
really specific to low-level code with the order that was present before.
I measure a 1% increase in compile time of the stdlib, no perf
regressions (at -O), and a few decent improvements:
19 CaptureProp 5233 4129 -1104 -21.1% 1.27x
30 ErrorHandling 3053 2678 -375 -12.3% 1.14x
65 Sim2DArray 610 518 -92 -15.1% 1.18x
I expect to be able to get back the 1% compile-time hit (and probably
more) with future changes.