Commit Graph

51 Commits

Author SHA1 Message Date
Mishal Shah
40024718ac Update doc and links to support new main branch 2020-09-22 23:53:29 -07:00
Erik Eckstein
e743280d6c tests: fix the benchmark test after renaming some benchmarks 2020-04-08 13:28:46 +02:00
Pavol Vaskovic
5571b83353 [benchmark] Driver: log measurement metadata
Added --meta option to log measurement metadata:

* PAGES – number of memory pages used
* ICS – number of involuntary context switches
* YIELD – number of voluntary yields

(Pages and ICS were previously available only in --verbose mode.)
2019-07-23 17:40:45 +02:00
Pavol Vaskovic
ec32140aed [benchmark] Run benchmarks using substring filters
Added support for running benchmarks using substring filters. Positional arguments prefixed with a single + or - sign are interpreted as benchmark name filters.

Excecutes all benchmarks whose names include any of the strings prefixed with a plus sign but none of the strings prefixed with a minus sign.
2019-07-07 11:59:45 +02:00
Pavol Vaskovic
ad24ca4ba6 [benchmark] Add min-sample argument to drivers
Support for gathering a minimal number of samples per benchmark, using the optional `--min-samples` argument, which overrides the automatically computed number of samples per `sample-time` if this is lower.
2019-07-07 10:13:26 +02:00
Pavol Vaskovic
e13e90ba77 [benchmark] Adjust lit tests for Benchmark_O
Change the test to work after removal of `HashQuadratic`.
2019-02-17 19:04:51 +01:00
Pavol Vaskovic
112ea9ca8a [benchmark] Fix: random fail Benchmark_O.test.md
When the two measured samples from `Ackermann` happen to have the exact same value, the delta compression omits the number. Accept both forms.

https://bugs.swift.org/browse/SR-9544
2018-12-21 23:06:39 +01:00
Pavol Vaskovic
49ce4402c8 [benchmark] Better ERE expression 2018-12-20 16:59:00 +01:00
Pavol Vaskovic
8a316de2bd [benchmark] Fix: random fail Benchmark_O.test.md
When the two measured samples from `Ackermann` happen to have the exact same value, the delta compression produces a comma instead of number. Accept both forms.

https://bugs.swift.org/browse/SR-9544
2018-12-20 16:53:49 +01:00
Pavol Vaskovic
7f858d1260 [benchmark] Re-enable Benchmark_Driver.test-sh 2018-12-10 14:51:06 +01:00
Pavol Vaskovic
aa9d5b6c75 [benchmark] Fix lit test for Benchmark_Driver 2018-12-08 22:18:47 +01:00
Arnold Schwaighofer
1c229d70f4 Disable Benchmark_Driver.test-sh test
rdar://46565291
2018-12-07 14:36:53 -08:00
Pavol Vaskovic
2582576048 [benchmark] Ackermann Redux
Reintroduce Ackermann benchmark with reasonably sized workload. Since this one was tagged `.unstable`, there’s no need to go through `legacyFactor`.

Adjusted `lit` test for Benchmark_O now that Ackermann isn’t marked `.unstable` anymore.

Removed incorrect `asserts` requirement from the benchmark lit tests.
2018-12-07 15:09:27 +01:00
Pavol Vaskovic
e0fa6cf827 [benchmark] Update lit test for variable sampling 2018-10-16 16:54:36 +02:00
Pavol Vaskovic
ce965380ff [benchmark] Update lit test to match rounding
The previous commit changed rounding to nearest or even, which now repeats min as median for 2 sample case.
2018-09-14 23:40:43 +02:00
Pavol Vaskovic
9bd599a914 [benchmark] Doctor explicitly measures memory
Small fix following the last refactorig of MAX_RSS, the `--memory` option is required to measure memory in `--verbose` mode. Added integration test for `check` command of Benchmark_Driver that depended on it.
2018-09-14 23:40:43 +02:00
Pavol Vaskovic
8b3b1f695a [benchmark] Option: delta encoded quantiles format
Added `--delta` argument to print the quantiles in delta encoded format, that ommits 0s.

This results in machine and human readable output that highlights modes and is easily digestible, giving you the feel for the underlying probability distribution of the samples in the reported results:

````
$ ./Benchmark_O --num-iters=1 --num-samples=20 --quantile=20 --delta 170 171 184 185 198 199 418 419 432 433 619 620
#,TEST,SAMPLES,MIN(μs),𝚫V1,𝚫V2,𝚫V3,𝚫V4,𝚫V5,𝚫V6,𝚫V7,𝚫V8,𝚫V9,𝚫VA,𝚫VB,𝚫VC,𝚫VD,𝚫VE,𝚫VF,𝚫VG,𝚫VH,𝚫VI,𝚫VJ,𝚫MAX
170,DropFirstArray,20,171,,,,,,,,,,,,,,,,,,,2,29
171,DropFirstArrayLazy,20,168,,,,,,,,,,,,,,,,,,,,8
184,DropLastArray,20,55,,,,,,,,,,,,,,,,,,,,26
185,DropLastArrayLazy,20,65,,,,,,,,,,,,,,,,,,,1,90
198,DropWhileArray,20,214,1,,,,,,,,,,,,,,,,,1,27,2
199,DropWhileArrayLazy,20,464,,,,1,,,,,,,,1,1,1,4,9,1,9,113,2903
418,PrefixArray,20,132,,,,,,,,,,,,,,,,,1,1,32,394
419,PrefixArrayLazy,20,168,,,,,,,,,,,,1,,2,9,1,15,8,88,3338
432,PrefixWhileArray,20,252,1,,,,1,,,,,,,,,,,1,,,,30
433,PrefixWhileArrayLazy,20,168,,,,,,,,,,,,,1,,6,6,14,43,28,10200
619,SuffixArray,20,68,,,,,,,,,,,,,1,,,,22,1,1,4
620,SuffixArrayLazy,20,65,,,,,,,,,,,,,,,,,,1,9,340
````
2018-09-14 23:40:43 +02:00
Pavol Vaskovic
022e1111a9 [benchmark] Report quantiles from samples
The default benchmark result reports statistics of a normal distribution — mean and standard deviation. Unfortunately the samples from our benchmarks are *not normally distributed*. To get a better picture of the underlying probability distribution, this adds support for reporting quantiles.

See https://en.wikipedia.org/wiki/Quantile

This gives better subsample of the measurements in the summary, without need to resort to the use of a full verbose mode, which might be unnecessarily slow.
2018-09-14 23:40:43 +02:00
Pavol Vaskovic
331c0bf772 [benchmark] Refactor numIters computation
The spaghetti if-else code was untangled into nested function that computes `iterationsPerSampleTime` and a single constant `numIters` expression that takes care of the overflow capping as well as the choice between fixed and computed `numIters` value.

The `numIters` is now computed and logged only once per benchmark measurement instead of on every sample.

The sampling loop is now just a single line. Hurrah!

Modified test to verify that the `LogParser` maintains `num-iters` derived from the `Measuring with scale` message across samples.
2018-09-14 23:40:43 +02:00
Ben Langmuir
423e145b0c Revert "[benchmark] Report Quantiles from Benchmark_O and a TON of Gardening" 2018-09-14 13:24:01 -07:00
Pavol Vaskovic
84bf15836d [benchmark] Doctor explicitly measures memory
Small fix following the last refactorig of MAX_RSS, the `--memory` option is required to measure memory in `--verbose` mode. Added integration test for `check` command of Benchmark_Driver that depended on it.
2018-09-06 18:21:50 +02:00
Pavol Vaskovic
313dfda5a4 [benchmark] Option: delta encoded quantiles format
Added `--delta` argument to print the quantiles in delta encoded format, that ommits 0s.

This results in machine and human readable output that highlights modes and is easily digestible, giving you the feel for the underlying probability distribution of the samples in the reported results:

````
$ ./Benchmark_O --num-iters=1 --num-samples=20 --quantile=20 --delta 170 171 184 185 198 199 418 419 432 433 619 620
#,TEST,SAMPLES,MIN(μs),𝚫V1,𝚫V2,𝚫V3,𝚫V4,𝚫V5,𝚫V6,𝚫V7,𝚫V8,𝚫V9,𝚫VA,𝚫VB,𝚫VC,𝚫VD,𝚫VE,𝚫VF,𝚫VG,𝚫VH,𝚫VI,𝚫VJ,𝚫MAX
170,DropFirstArray,20,171,,,,,,,,,,,,,,,,,,,2,29
171,DropFirstArrayLazy,20,168,,,,,,,,,,,,,,,,,,,,8
184,DropLastArray,20,55,,,,,,,,,,,,,,,,,,,,26
185,DropLastArrayLazy,20,65,,,,,,,,,,,,,,,,,,,1,90
198,DropWhileArray,20,214,1,,,,,,,,,,,,,,,,,1,27,2
199,DropWhileArrayLazy,20,464,,,,1,,,,,,,,1,1,1,4,9,1,9,113,2903
418,PrefixArray,20,132,,,,,,,,,,,,,,,,,1,1,32,394
419,PrefixArrayLazy,20,168,,,,,,,,,,,,1,,2,9,1,15,8,88,3338
432,PrefixWhileArray,20,252,1,,,,1,,,,,,,,,,,1,,,,30
433,PrefixWhileArrayLazy,20,168,,,,,,,,,,,,,1,,6,6,14,43,28,10200
619,SuffixArray,20,68,,,,,,,,,,,,,1,,,,22,1,1,4
620,SuffixArrayLazy,20,65,,,,,,,,,,,,,,,,,,1,9,340
````
2018-09-03 16:00:05 +02:00
Pavol Vaskovic
1f465b9bf7 [benchmark] Report quantiles from samples
The default benchmark result reports statistics of a normal distribution — mean and standard deviation. Unfortunately the samples from our benchmarks are *not normally distributed*. To get a better picture of the underlying probability distribution, this adds support for reporting quantiles.

See https://en.wikipedia.org/wiki/Quantile

This gives better subsample of the measurements in the summary, without need to resort to the use of a full verbose mode, which might be unnecessarily slow.
2018-08-31 23:16:34 +02:00
Pavol Vaskovic
be39c02001 [benchmark] Refactor numIters computation
The spaghetti if-else code was untangled into nested function that computes `iterationsPerSampleTime` and a single constant `numIters` expression that takes care of the overflow capping as well as the choice between fixed and computed `numIters` value.

The `numIters` is now computed and logged only once per benchmark measurement instead of on every sample.

The sampling loop is now just a single line. Hurrah!

Modified test to verify that the `LogParser` maintains `num-iters` derived from the `Measuring with scale` message across samples.
2018-08-31 17:17:48 +02:00
Pavol Vaskovic
3faf9259e1 [benchmark] Loosen test to allow ‘Yielding’ msg
The non-essential 'Yielding again after estimated … μs' message in verbose log could interfere with the test.
2018-08-29 19:51:41 +02:00
Pavol Vaskovic
ce39b12929 [benchmark] Strangler: BenchmarkDriver get_tests
See https://www.martinfowler.com/bliki/StranglerApplication.html for more info on the used pattern for refactoring legacy applications.

Introduced class `BenchmarkDriver` as a beginning of strangler application that will gradually replace old functions. Used it instead of `get_tests()` function in Benchmark_Driver.

The interaction with Benchmark_O is simulated through mocking. `SubprocessMock` class records the invocations of command line processes and responds with canned replies in the format of Benchmark_O output.

Removed 3 redundant lit tests that are now covered by the unit test `test_gets_list_of_all_benchmarks_when_benchmarks_args_exist`. This saves 3 seconds from test execution. Keeping only single integration test that verifies that the plumbing is connected correstly.
2018-08-17 00:32:04 +02:00
Pavol Vaskovic
f29fef6b67 [benchmark] Print delim in verbose config 2018-08-16 08:11:13 +02:00
Pavol Vaskovic
1a382ab775 [benchmark] Warn about incorrect --memory use 2018-07-25 08:02:28 +02:00
Pavol Vaskovic
a61a756b4d [benchmark] Fix: alphabetic sorting of tests 2018-07-25 07:45:07 +02:00
Pavol Vaskovic
4ed3dcfcc5 [benchmark][Gardening] --sample-time renaming
Sample time is a better name for what was previously called `iter-scale`.
2018-07-23 17:15:54 +02:00
Pavol Vaskovic
19613733a4 [benchmark] Log the MAX_RSS only w/ --memory flag
Printing of the MAX_RSS is now hidden behind the optional `--memory` flag.
2018-07-22 06:25:23 +02:00
Pavol Vaskovic
f89d41ad3b [benchmark] Print detailed argument help
The `--help` option now prints standard usage description with documentaion for all arguments:

````
 $ Benchmark_O --help
usage: Benchmark_O [--argument=VALUE] [TEST [TEST ...]]

positional arguments:
 TEST           name or number of the benchmark to measure

optional arguments:
 --help         show this help message and exit
 --num-samples  number of samples to take per benchmark; default: 1
 --num-iters    number of iterations averaged in the sample;
                default: auto-scaled to measure for 1 second
 --iter-scale   number of seconds used for num-iters calculation
                default: 1
 --verbose      increase output verbosity
 --delim        value delimiter used for log output; default: ,
 --tags         run tests matching all the specified categories
 --skip-tags    don't run tests matching any of the specified
                categories; default: unstable,skip
 --sleep        number of seconds to sleep after benchmarking
 --list         don't run the tests, just log the list of test
                numbers, names and tags (respects specified filters)
````
2018-07-21 22:58:44 +02:00
Pavol Vaskovic
0c0ed3d35d [benchmark][Gardening] Moved parseArgs into parser
The `parseArgs` funtion is now a private method on `ArgumentParser`.

Removed `Arguments` struct and moved the `Argument` as a nested struct into the parser.

Adjusted error messages and the corresponding checks.
2018-07-21 15:53:09 +02:00
Pavol Vaskovic
f674dd5cf0 [benchmark][Gardening] Handle --help inside parser
Moved the printing of help message inside the `ArgumentParser`, which has all the necessary info.

Added test that checks the `--help` option.
2018-07-21 13:16:08 +02:00
Pavol Vaskovic
7d19a03dce [benchmark] Gracefully type-check attribute values
We no longer crash when the argument value parsing fails, but report an error.
2018-07-21 01:32:40 +02:00
Pavol Vaskovic
371f155258 [benchmark] Exit gracefully on argument errors
Refactored to use Swift’s idiomatic error handling.
In case of invalid argument errors, the message is printed to `stderr` and we exit gracefully with error code 1. We no longer crash the app in most cases.
2018-07-21 01:32:40 +02:00
Pavol Vaskovic
377ee464d2 [benchmark] Test error handling parsing arguments
* Fix: flushing stdout before crashing to enable testing.
* Added tests that verify reporting of errors when the parsing of command line arguments fails.
2018-07-21 01:32:40 +02:00
Pavol Vaskovic
c198f442d4 [benchmark] Measure environment with rusage
Measure and report system environment indicators during benchmark execution:

* Memory usage with maximum resident set size (MAX_RSS) in bytes

Proxy indicators of system load level:

* Number of Involuntary Context Switches (ICS)
* Number of Voluntary Context Switches (VCS)

MAX_RSS delta is always reported in the last column of the log report.

The `--verbose` mode additionaly reports full values measured before and after the benchmark execution as well as their delta for MAX_RSS, ICS and VCS.
2018-07-21 01:32:40 +02:00
Pavol Vaskovic
8d0d25a576 [benchmark] Verbose mode checks
Test the `--verbose` mode output and the `--num-samples` option.
2018-07-17 07:28:39 +02:00
Pavol Vaskovic
e3cbd187a0 [benchmark] Improved docs and tests
* Added check for running by test number.
* Documented “dry run” using `--list`.
* Moved real run test to the end.
* Added checks for logging benchmarks measurements (header and stats).
* Added check that specifying the same test multiple times runs it just once.
2018-07-17 07:28:39 +02:00
Pavol Vaskovic
43390e8d5b [benchmark] Combined --list tests, documentation 2018-07-17 07:25:46 +02:00
Pavol Vaskovic
a74759bb49 [benchmark][docs] Re-formatting to markdown
Reformatting the `lit` test file to markdown format:

* Removed the unnecessary `// `prefix.
* Requires preambule in comment
* Test documentation intro
2018-07-16 18:28:11 +02:00
Pavol Vaskovic
04894da39b [benchmark][Gardening] Literate testing (markdown)
The testing of `Benchmark_O` and its public interface needs more documentation. The `lit` tests can be embedded in anything. Lets combine these.
2018-07-16 17:58:23 +02:00
Pavol Vaskovic
7f894268b2 [benchmark] Restore running benchmarks by numbers
Reintroduced feature lost during `BenchmarkInfo` modernization: All registered benchmarks are ordered alphabetically and assigned an index. This number can be used as a shortcut to invoke the test instead of its full name. (Adding and removing tests from the suite will naturally reassign the indices, but they are stable for a given build.)

The `--list` parameter now prints the test *number*, *name* and *tags* separated by delimiter.

The `--list` output format is modified from:
````
Enabled Tests,Tags
AngryPhonebook,[String, api, validation]
...
````
to this:
````
\#,Test,[Tags]
2,AngryPhonebook,[String, api, validation]
…
````
(There isn’t a backslash before the #, git was eating the whole line without it.)
Note: Test number 1 is Ackermann, which is marked as “skip”, so it’s not listed with the default `skip-tags` value.

Fixes the issue where running tests via `Benchmark_Driver` always reported each test as number 1. Each test is run independently, therefore every invocation was “first”. Restoring test numbers resolves this issue back to original state: The number reported in the first column when executing the tests is its ordinal number in the Swift Benchmark Suite.
2018-07-11 23:17:02 +02:00
Pavol Vaskovic
d82c996669 [benchmark] Fixed Benchmark_Driver running tests
Fixed failure in `get_tests` which depended on the removed `Benchmark_O --run-all` option for listing all test (not just the pre-commit set).

Fix: Restored the ability to run tests by ordinal number from `Benchmark_Driver` after the support for this was removed from `Benchmark_O`.

Added tests that verify output format of `Benchmark_O --list` and the support for `--skip-tags= ` option which effectively replaced the old `--run-all` option. Other tools, like `Benchmark_Driver` depend on it.
Added integration tests for the dependency between `Benchmark_Driver` and `Benchmark_O`.

Running pre-commit test set isn’t tested explicitly here. It would take too long and it is run fairly frequently by CI bots, so if that breaks, we’ll know soon enough.
2018-07-11 23:17:02 +02:00
Pavol Vaskovic
2d004970fd [benchmark] Fix: Running skip-tag-marked benchmark
Also updated benchmark documentation with more detailed description of tag handling.
2018-07-11 23:17:02 +02:00
Pavol Vaskovic
cf1d78be6b [benchmark] Test running skip-tag-marked benchmark
Added test: It should be possible to run benchmark by name, even if its tags match the default skip-tags.

Added verification tests for benchmark filtering with `tags` and `skip-tags` parameters.
2018-07-11 23:17:02 +02:00
Pavol Vaskovic
d40ddabcd5 [benchmark] Fix: running with --num-iters=1
Fixed bug where the `elapsed_time` was always 0 when `--num-iters=1` was specified.
2018-07-11 23:17:02 +02:00
Pavol Vaskovic
4c4c6a2409 [benchmark] Fix: Better tags in benchmark list
When listing benchmarks with `--list` parameter, present the tags in format that is actually accepted by the `--tags` and `--skip-tags` parameters.

Changes the `--list` output from
````
Enabled Tests,Tags
AngryPhonebook,[TestsUtils.BenchmarkCategory.validation, TestsUtils.BenchmarkCategory.api, TestsUtils.BenchmarkCategory.String]
...
````
into
````
Enabled Tests,Tags
AngryPhonebook,[String, api, validation]
…
````
2018-07-11 23:17:02 +02:00
Pavol Vaskovic
7438ffd0bb [benchmark] Tests for Benchmark_O
Added `lit` substitution for running Benchmark_O binary.
Introduced `lit` tests for `Bechmark_O` to demonstrate bugs:
* listing benchmarks’ tags
* when running with `--num-iters=1`
2018-07-11 23:16:57 +02:00