Both the syntax and relative order of the LLVM `nocapture` parameter
attribute changed upstream in 29441e4f5fa5f5c7709f7cf180815ba97f611297.
To reduce conflicts with rebranch, adjust FileCheck patterns to expect
both syntaxes and orders anywhere the presence of the attribute is not
critical to the test. These changes are temporary and will be cleaned
up once rebranch is merged into main.
This attribute was introduced in
7eca38ce76d5d1915f4ab7e665964062c0b37697 (llvm-project).
Match it using a wildcard regex, since it is not relevant to these
tests.
This is intended to reduce future conflicts with rebranch.
This change could impact Swift programs that previously appeared
well-behaved, but weren't fully tested in debug mode. Now, when running
in release mode, they may trap with the message "error: overlapping
accesses...".
Recent optimizations have brought performance where I think it needs
to be for adoption. More optimizations are planned, and some
benchmarks should be further improved, but at this point we're ready
to begin receiving bug reports. That will help prioritize the
remaining work for Swift 5.
Of the 656 public microbenchmarks in the Swift repository, there are
still several regressions larger than 10%:
TEST OLD NEW DELTA RATIO
ClassArrayGetter2 139 1307 +840.3% **0.11x**
HashTest 631 1233 +95.4% **0.51x**
NopDeinit 21269 32389 +52.3% **0.66x**
Hanoi 1478 2166 +46.5% **0.68x**
Calculator 127 158 +24.4% **0.80x**
Dictionary3OfObjects 391 455 +16.4% **0.86x**
CSVParsingAltIndices2 526 604 +14.8% **0.87x**
Prims 549 626 +14.0% **0.88x**
CSVParsingAlt2 1252 1411 +12.7% **0.89x**
Dictionary4OfObjects 206 232 +12.6% **0.89x**
ArrayInClass 46 51 +10.9% **0.90x**
The common pattern in these benchmarks is to define an array of data
as a class property and to repeatedly access that array through the
class reference. Each of those class property accesses now incurs a
runtime call. Naturally, introducing a runtime call in a loop that
otherwise does almost no work incurs substantial overhead. This is
similar to the issue caused by automatic reference counting. In some
cases, more sophistacated optimization will be able to determine the
same object is repeatedly accessed. Furthermore, the overhead of the
runtime call itself can be improved. But regardless of how well we
optimize, there will always a class of microbenchmarks in which the
runtime check has a noticeable impact.
As a general guideline, avoid performing class property access within
the most performance critical loops, particularly on different objects
in each loop iteration. If that isn't possible, it may help if the
visibility of those class properties is private or internal.
While most class field accesses go through accessors, a special
case is if you have a final field (or class) in a non-resilient
module. Then, we were allowed to directly access the field.
However, an earlier hack made it so that this access always went
through a field offset global, which is unnecessary in the case
where the class layout is fully known.
One consequence of this is that 'Array.count' would compile down
to a load from a global followed by an indirect load, instead of
a single load from a constant offset.