Commit Graph

1638 Commits

Author SHA1 Message Date
Michael Ilseman
d92098bd19 [String] Performance improvements to comparison
Also, disable normalization benchmarks and other changes until we
merge, so we can compare with master 1-to-1.
2018-11-04 10:42:41 -08:00
Lance Parker
7376009ccc Add benchmarks and tests for the normalized iterator (#32)
Add benchmarks and tests for the normalized iterator
2018-11-04 10:42:41 -08:00
Mishal Shah
48dff2bd29 Merge pull request #20231 from eeckstein/smoke-bench
benchmarks: combine everything which is needed into run_smoke_bench
2018-11-02 15:23:30 -07:00
Karim Chang
8a5c8b3592 Use the &>> operator in HashTest.
Gets rid of some obvious inefficiencies in this benchmark.
2018-11-02 14:36:25 -04:00
Erik Eckstein
040aa06fec benchmarks: combine everything which is needed into run_smoke_bench
Now, run_smoke_bench runs the benchmarks, compares performance and code size and reports the results - on stdout and as a markdown file.
No need to run bench_code_size.py and compare_perf_tests.py separately.

This has two benefits:
- It's much easier to run it locally
- It's now more transparent what's happening in '@swiftci benchmark', because now all the logic is in run_smoke_bench rather than in the not visible script on the CI bot.

I also remove the branch-arguments from ReportFormatter in ompare_perf_tests.py. They were not used anyway.

For a smooth rollout in CI, I created a new script rather than changing the existing one. Once everything is setup in CI, I'll delete the old run_smoke_test.py and bench_code_size.py.
2018-11-01 16:41:39 -07:00
Pavol Vaskovic
f121ee1231 [benchmark] Legacy factor ArraySetElement
Lowered base workload by a factor of 10.
2018-11-01 06:32:06 +01:00
Pavol Vaskovic
435e55f0c0 [benchmark] Legacy factor ArrayOf[Generic]Ref
Lowered the base workload by a factor of 10
2018-11-01 06:29:14 +01:00
Pavol Vaskovic
04d1384b2c [benchmark] Legacy factor AnyHashableWithAClass
Lowered the base workload by a factor of 500
2018-11-01 06:26:40 +01:00
Pavol Vaskovic
a7f832fb57 [benchmark] Legacy factor
This adds optional `legacyFactor` to the `BenchmarkInfo`, which allows for linear modification of constants that unnecesarily inflate the base workload of benchmarks, while maintaining the continuity of log-term benchmark tracking.

For example, if a benchmark uses `for _ in N*10_000` in its run function, we could lower this to `for _ in N*1_000` and adding a `legacyFactor: 10` to its `BenchmarkInfo`.

Note that this doesn’t affect the real measurements gathered from the `--verbose` output. The `BenchmarkDoctor` has been slightly adjusted to work with these real samples, therefore `Benchmark_Driver check` will not flag these benchmarks for slow run time reported in the summary, if their real runtimes fall into the recommended range.
2018-11-01 06:24:27 +01:00
eeckstein
1d326d73dd Merge pull request #20074 from palimondo/within-cells-interlinked
[benchmark] Baseline test
2018-10-31 09:08:27 -07:00
eeckstein
bd59bf10e4 Merge pull request #20123 from palimondo/just-eyes
[benchmark] Reduce unreasonable setup times
2018-10-30 13:47:14 -07:00
Pavol Vaskovic
d8adfc71a1 [benchmark] MapReduceClass2 and NSDecimalNumber
Since this benchmark has been significantly modified and needs to be renamed, we can also lower the workload by a factor of 10, to keep up with the best practices.

The old benchmark that uses `NSDecimalNumber` as the tested class is renamed to `MapReduceNSDecimalNumber` and the renamed `MapReduceClass2` now newly measures Swift class `Box` that wrap an `Int`. Short versions were modified analogously.
2018-10-30 21:23:04 +01:00
Pavol Vaskovic
21a4aa17e0 [benchmark] Move check-added to run_smoke_bench 2018-10-30 06:04:21 +01:00
eeckstein
cd920b69f4 Merge pull request #19910 from palimondo/fluctuation-of-the-pupil
[benchmark] More Robust Benchmark_Driver
2018-10-29 15:02:07 -07:00
Michael Gottesman
ba7815b663 [benchmark] Fix swiftpm based benchmark build on Linux. 2018-10-29 12:15:20 -07:00
Pavol Vaskovic
ca2a52b5e9 [benchmark] Setup CharacterPropertiesPrecomputed
Reduced the time to run the setUpFunction from 2.2s to 380ms on my ancient computer… This should fit well under 200ms on more modern machines.
2018-10-29 16:18:01 +01:00
Michael Gottesman
b80991d08c Merge pull request #20116 from gottesmm/pr-899da50e441c8ce022a355432dcc2e01334cbe1a
[benchmark] Add two benchmarks that show performance of flattening an…
2018-10-28 23:42:21 -07:00
Michael Gottesman
4d76ff9681 [benchmark] Add two benchmarks that show performance of flattening an array.
The first is a naive imperative approach using appends in a loop. The second
uses flatMap. We would like both of these to have equivalent performance.
2018-10-28 15:55:26 -07:00
Michael Gottesman
7d58b40a96 [benchmark] Fix the swiftpm based benchmark build.
This does a few things:

1. We were not updated for libProc's addition. I bumped the swiftpm version
number to get the systemLibrary functionality (thanks Ankit).

2. I split up a bunch of lines to help the typechecker out a little bit.
2018-10-28 15:50:38 -07:00
Pavol Vaskovic
bfbff45727 [benchmark] Downsized DictionaryKeysContains
The DictionaryKeysContains used unreasonably large dictionary to demonstrate the failed O(n) instead of O(1) performance in case of Cocoa Dictionary. The setup of 1M element dictionary took 8 seconds on my old machine!

The old pathological behavior can be equaly well demonstrated with a much smaller dictionary. (Validated by modifying `public func _customContainsEquatableElement` in `Dictionary.swift` to `return _variant.index(forKey: element) != nil`)

The reported performance with correct O(1) behavior is  unchanged.
2018-10-27 07:34:33 +02:00
Pavol Vaskovic
897b9ef82e [benchmark] Gardening: Fix linter nitpicks 2018-10-27 06:15:23 +02:00
Pavol Vaskovic
eef71d4505 [benchmark] Check added benchmarks
Script for integrating BenchmarkDoctor’s check of newly added benchmarks into CI workflow.
2018-10-26 18:34:23 +02:00
Karoy Lorentey
f93dcf3dfa [benchmark] Add benchmark for [AnyHashable: Any] with String keys 2018-10-26 11:56:41 +01:00
Patrick Balestra
1c0778bb5b [benchmark] Add insert(_:Character) benchmark with ASCII and non-ASCII characters
Adds insert character benchmark with ASCII and non-ASCII characters
2018-10-24 18:55:20 -07:00
Pavol Vaskovic
96ff53d5c7 [benchmark] Extract setup: MapReduceClass(Small)
MapReduceClass had setup overhead fo 868 μs (7%).

Setup overhead of MapReduceClassShort was practically lost in the measurement noise from it’s artificially high base load, but it was there.

Extracting the decimal array initialization into `SetUpFunction` also takes out the cost of releasing the [NSDecimalNumber], which turns out to be about half of the measured runtime in the case of MapReduceClass benchmark. This significantly changes the reported runtimes (to about half), therfore the modified benchmarks get a new name with suffix `2`.
2018-10-24 09:12:55 +02:00
Pavol Vaskovic
b55244d558 [benchmark] Extract setup: Sequence *Array benches
Sequence benchmarks that test operations on Arrays have setup overhead of 14 μs. (Up from 4 μs a year ago!) That’s just the creation of an [Int] with 2k elements from a range… This array is now extracted into a constant.

This commit also removes the .unstable tag from some CountableRange benchmarks, restoring them back to commit set of the Swift Benchmark Suite.
2018-10-24 08:59:59 +02:00
Pavol Vaskovic
6b703141e3 [benchmark] Extract setup from SubstringComparable
SubstringComparable had setup overhead of 58 μs (26%).

This was a tricky modification: extracting `substrings` and `comparison` constants out of the run function surprisingly resulted in decreased performance. For some reason this configuration causes significant increase in retain/release traffic. Aliasing the constants in the run function somehow works around this deoptimization.

Also the initial split of the string into 8 substrings takes 44ms!!! (I’m suspecting some king of one-time ICU initialization?)
2018-10-23 23:50:44 +02:00
Pavol Vaskovic
6d3e6377d4 [benchmark] Extract setup from SortSortedStrings
SortSortedStrings had setup overhead of 914 μs (30%).

Renamed [String] constants to be shorter and more descriptive. Extracted the lazy initialiation of all these constants into `setUpFunction`, for cleaner measurements.
2018-10-23 23:37:52 +02:00
Pavol Vaskovic
524de6bec3 [benchmark] Fix setup overhead in RandomShuffle
RandomShuffleLCG2 had setup overhead of 902 μs (17%) even though it already used the setUpFunction. Turns out that copying 100k element array is measurably costly.

The only way to eliminate this overhead from measurement I could think of is to let the numbersLCG array linger around (800 kB), because shuffling the IOU version had different performance.
2018-10-23 23:29:00 +02:00
Pavol Vaskovic
4bc41f8879 [benchmark] Extract setup from PolymorphicCalls
PolymorphicCalls has setup overhead of 4 μs (7%).
2018-10-23 23:07:53 +02:00
Pavol Vaskovic
58a195fd49 [benchmark] Extract setup from Phonebook
Phonebook had setup overhead of 1266 μs (7%).
2018-10-23 23:05:34 +02:00
Pavol Vaskovic
3ff92efdac [benchmark] Extract setup from IterateData
IterateData has setup overhead of 480 μs (10%).

There remained strange setup overhead after extracting the data into setUpFunction, because of of-by-one error in the main loop. It should be either: `for _ 1…10*N` or: `for _ 0..<10*N`. It’s error to use 0…m*N, because this will result in `m*N + 1` iterations that will be divided by N in the reported measurement. The extra iteration then manifests as a mysterious setup overhead!
2018-10-23 22:49:25 +02:00
Pavol Vaskovic
63143d8c3f [benchmark] Extr. Setup DistinctClassFieldAccesses
DistinctClassFieldAccesses had setup overhead of 4 μs (14%).
Plus cosmetic code formatting fix.
2018-10-23 22:33:16 +02:00
Pavol Vaskovic
32003a708c [benchmark] Extract setup in Dictionary(OfObjects)
Dictionary had setup overhad of 136 μs (6%).
DictionaryOfObjects had setup overhead of 616 μs (7%).
Also fixed variable naming convention (lowerCameCase).
2018-10-23 22:28:14 +02:00
Pavol Vaskovic
ae9f5f18b0 [benchmark] Extract setup from DataBenchmarks
DataCount had setup overhead of 18 μs (20%).
DataSubscript had setup overhead of 18 μs (2%).
SetUpFunction wasn’t necessary, because of short initialization (18 μs for `sampleData(.medium)`), which will inflate only the initial measurement.

Runtimes of other benchmarks hide the sampleData initialization in their artificially high runtimes — most use internal multiplier of 10 000 iterations — but were changed to use the same constant data, since it was already available. The overhead will already be extracted if we go for more precise measurement with lower multipliers in the future.
2018-10-23 22:12:22 +02:00
Pavol Vaskovic
03d984114f [benchmark] Extract setup from ArrayInClass
ArrayInClass had setup overhead of 88 μs (17%).
2018-10-23 21:47:04 +02:00
Pavol Vaskovic
53653de575 [benchmark] Extract setup from ArrayAppend*
ArrayAppendStrings had setup overhead of 10ms (42%). ArrayAppendLazyMap had setup overhead of 24 μs (1%).

ArrayAppendOptionals and ArrayAppendArrayOfInt also had barely visible, small overhead of ~18μs, that was mostly hidden in measurement noise, but I’ve extracted the setup from all places that had 10 000 element array initializations, in preparation for more precise measurement in the future.
2018-10-23 21:12:36 +02:00
Pavol Vaskovic
8617745b39 [benchmark] Gardening: extract tags constant 2018-10-23 20:49:56 +02:00
Pavol Vaskovic
4bbb635fca [benchmark] Gardening: Fix copy/paste comments 2018-10-23 20:49:56 +02:00
Pavol Vaskovic
a24d0ff7a5 [benchmark] BenchmarkDoctor checks setup time
Add a check against unreasonably long setup times for benchmarks that do their initialization work in the `setUpFunction`. Given the typical benchmark measurements will last about 1 second, it’s reasonable to expect the setup to take at most 20% extra, on top of that: 200 ms.

The `DictionaryKeysContains*` benchmarks are an instance of this mistake. The setup of `DictionaryKeysContainsNative` takes 3 seconds on my machine, to prepare a dictionary for the run function, whose typical runtime is 90 μs. The setup of Cocoa version takes 8 seconds!!! It is trivial to rewite these with much smaller dictionaries that demonstrate the point of these benchmarks perfectly well, without the need to wait for ages to setup these benchmarks.
2018-10-15 09:06:38 +02:00
Pavol Vaskovic
638f4f8e5e [benchmark] Recommended runtime should be < 1ms
* Lowered the threshold for healthy benchmark runtime to be under 1000 μs.
* Offer suitable divisor that is power of 10, in addition to the one that’s power of 2.
* Expanded the motivation in the docstring.
2018-10-13 22:09:25 +02:00
Pavol Vaskovic
d9a89ffea2 [benchmark] Use header in CSV log
Since the meaning of some columns was changed, but their overall number remained, let’s include the header in the CSV log to make it clear that we are now reporting MIN, Q1, MEDIAN, Q3, MAX, MAX_RSS, instead of the old MIN, MAX, MEAN, SD, MEDIAN, MAX_RSS format.
2018-10-12 10:03:33 +02:00
Andrew Trick
599e5860c5 Remove references to SWIFT3 from benchmark Cmake files. 2018-10-11 21:53:19 -07:00
Pavol Vaskovic
397c44747b [benchmark] Exclude outliers from sample
Use the box-plot inspired technique for filtering out outlier measurements. Values that are higher than the top inner fence (TIF = Q3 + IQR * 1.5) are excluded from the sample.
2018-10-11 19:48:20 +02:00
Pavol Vaskovic
0d318b6464 [benchmark] Discard oversampled quantile values
When num_samples is less than quantile + 1, some of the measurements are repeated in the report summary. Parsed samples should strive to be a true reflection of the measured distribution, so we’ll correct this by discarding the repetated artifacts from quantile estimation.

This avoids introducting a bias from this oversampling into the empirical distribution obtained from merging independent samples.

See also:
https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
2018-10-11 18:56:27 +02:00
Pavol Vaskovic
a04edd1d47 [benchmark] Quantiles in Benchmark_Driver
Switching the measurement technique from gathering `i` independent samples characterized by their mean values, to a finer grained characterization of these measurements using quantiles.

The distribution of benchmark measurements is non-normal, with outliers that significantly inflate the mean and standard-deviation due to presence of uncontrolled variable of the system load. Therefore the MEAN and SD were incorrect statistics to properly characterize the benchmark measurements.

Benchmark_Driver now gathers more individual measurements from Benchmark_O. It is executed with `--num-iters=1`, because we don’t want to average the runtimes, we want raw data. This collects variable number of measurements gathered in about 1 second.  Using the `--quantile=20` we get up to 20 measured values that properly characterize the empirical distribution of the benchmark from each independent run. The measurements from `i` independent executions are combined to form the final empirical distribution, which is reported in a five-number summary (MIN, Q1, MEDIAN, Q3, MAX).
2018-10-11 18:56:27 +02:00
Pavol Vaskovic
0438c45e2d [benchmark] B_D iterations => independent-samples
Renamed Benchmark_Driver’s `iterations` argument to `independent-samples` to clarify its true meaning and  disambiguate it from the concept of `num-iters` used in Benchmark_O. The short form of the argument — `-i` — remains unchanged.
2018-10-11 18:56:27 +02:00
Pavol Vaskovic
67b489dcb1 [benchmark] Auto-determine number of samples
When measuring with specified number of iterations (generally, `--num-iters=1` makes sense), automaticially determine the number of samples to take, so that the overall measurement duration comes close to `sample-time`.

This is the same technique used to scale `num-iters` before, but for `num-samples`.
2018-10-11 18:56:27 +02:00
Pavol Vaskovic
61a092a695 [benchmark] LogParser delta quantiles support
Support for reading delta-encoded quantiles format.
2018-10-11 18:56:27 +02:00
Erik Eckstein
af71a1b8b6 benchmarks: fix NSStringConversion benchmark
Make sure that the result of the conversion is not optimized away and not moved out of the loop
2018-10-10 09:55:00 -07:00