The first is copied from https://github.com/apple/swift/pull/13930's
contribution (with a minor bug fix applied). The second is an
adaptation that tries to avoid creating copies and operate using
indices directly.
UnsafePointer implementation contains the following note:
> Note: The following family of operator overloads are redundant with
Strideable. However, optimizer improvements are needed before they can
be removed without affecting performance.
... but it looks like there is no benchmark to support this claim.
The idea being, we need to decide what benchmarks to run solely based on
tags.
`--tag` allows to list all tags that are required;
`--skip-tags` allows to skip benchmarks that have any of those tags.
By default, skip-tags list contains .unstable and .String, which results
in the same subset of benchmarks as before.
*NOTE* We always prefer a registered benchmark if we have one.
I am going to use BenchmarkInfo to solve the "create data for benchmark while we
are already timing" problem. I am going to add a field to BenchmarkInfo that if
it is not-null is called before we start measuring time. This closure can be
used to initialize any global data structures/etc.
But to do this, I need to be able to combine the registered and legacy
not-registered benchmarks.
Only use the .abstraction tag for benchmarks that are primarilly stressing the
optimization of a specific kind of abstraction. Then I can use these as CPU
benchmarks.
Previously this tag was applied to any benchmark that happened to involve
classes or protocols. When a compiler engineer changes something that might
affect codegen in these cases, they should simply benchmark the entire stable
suite. There's no value in having a separate set of 97 "abstraction" benchmarks
that are primarilly testing something else, like a stdlib API.
The ReduceInto benchmark performs three tasks using both `reduce(_:_)` and `reduce(into:_:)` so that their performance can be compared:
1. Summing an array, reducing to `Int`
2. Filtering an array, reducing to `[Int]`
3. Counting letter frequencies, reducing to `[Character: Int]`
Implement and document `reduce(into:_:)`, with a few notes:
- The `initial` parameter was renamed `initialResult` to match the first parameter in `reduce(_:_:)`.
- The unnamed `combining` parameter was renamed `updateAccumulatingResult` to try and resemble the naming of the closure parameter in `reduce(_:_:)`.
- The closure throws and `reduce(into:_)` re-throws.
- This documentation mentions that `reduce(into:_)` is preferred over `reduce(_:_:)` when the result is a copy-on-write type and an example where the result is a dictionary.
Add benchmarks for reduce with accumulation into a scalar, an array, and a dictionary.
Update expected error message in closures test (since there are now two `reduce` methods, the diagnostic is different).
Many strings use non-sub-300 punctuation characters (e.g. unicode
hyphen, CJK quotes, etc). This can cause switching between fast and
slow paths for grapheme breaking. Add in fast-paths for general
punctuation characters and CJK punctuation and symbol characters.
This results in about a 5-8x speedup for heavily (unicode) punctuated
Latiny and CJKy workloads.
The purpose of the benchmark was to check a very specific ARC optimization, which is better tested with a lit test.
Beside that, the benchmark was broken anyway.