Added --meta option to log measurement metadata:
* PAGES – number of memory pages used
* ICS – number of involuntary context switches
* YIELD – number of voluntary yields
(Pages and ICS were previously available only in --verbose mode.)
Added support for running benchmarks using substring filters. Positional arguments prefixed with a single + or - sign are interpreted as benchmark name filters.
Excecutes all benchmarks whose names include any of the strings prefixed with a plus sign but none of the strings prefixed with a minus sign.
Support for gathering a minimal number of samples per benchmark, using the optional `--min-samples` argument, which overrides the automatically computed number of samples per `sample-time` if this is lower.
When the two measured samples from `Ackermann` happen to have the exact same value, the delta compression omits the number. Accept both forms.
https://bugs.swift.org/browse/SR-9544
When the two measured samples from `Ackermann` happen to have the exact same value, the delta compression produces a comma instead of number. Accept both forms.
https://bugs.swift.org/browse/SR-9544
Reintroduce Ackermann benchmark with reasonably sized workload. Since this one was tagged `.unstable`, there’s no need to go through `legacyFactor`.
Adjusted `lit` test for Benchmark_O now that Ackermann isn’t marked `.unstable` anymore.
Removed incorrect `asserts` requirement from the benchmark lit tests.
Small fix following the last refactorig of MAX_RSS, the `--memory` option is required to measure memory in `--verbose` mode. Added integration test for `check` command of Benchmark_Driver that depended on it.
Added `--delta` argument to print the quantiles in delta encoded format, that ommits 0s.
This results in machine and human readable output that highlights modes and is easily digestible, giving you the feel for the underlying probability distribution of the samples in the reported results:
````
$ ./Benchmark_O --num-iters=1 --num-samples=20 --quantile=20 --delta 170 171 184 185 198 199 418 419 432 433 619 620
#,TEST,SAMPLES,MIN(μs),𝚫V1,𝚫V2,𝚫V3,𝚫V4,𝚫V5,𝚫V6,𝚫V7,𝚫V8,𝚫V9,𝚫VA,𝚫VB,𝚫VC,𝚫VD,𝚫VE,𝚫VF,𝚫VG,𝚫VH,𝚫VI,𝚫VJ,𝚫MAX
170,DropFirstArray,20,171,,,,,,,,,,,,,,,,,,,2,29
171,DropFirstArrayLazy,20,168,,,,,,,,,,,,,,,,,,,,8
184,DropLastArray,20,55,,,,,,,,,,,,,,,,,,,,26
185,DropLastArrayLazy,20,65,,,,,,,,,,,,,,,,,,,1,90
198,DropWhileArray,20,214,1,,,,,,,,,,,,,,,,,1,27,2
199,DropWhileArrayLazy,20,464,,,,1,,,,,,,,1,1,1,4,9,1,9,113,2903
418,PrefixArray,20,132,,,,,,,,,,,,,,,,,1,1,32,394
419,PrefixArrayLazy,20,168,,,,,,,,,,,,1,,2,9,1,15,8,88,3338
432,PrefixWhileArray,20,252,1,,,,1,,,,,,,,,,,1,,,,30
433,PrefixWhileArrayLazy,20,168,,,,,,,,,,,,,1,,6,6,14,43,28,10200
619,SuffixArray,20,68,,,,,,,,,,,,,1,,,,22,1,1,4
620,SuffixArrayLazy,20,65,,,,,,,,,,,,,,,,,,1,9,340
````
The default benchmark result reports statistics of a normal distribution — mean and standard deviation. Unfortunately the samples from our benchmarks are *not normally distributed*. To get a better picture of the underlying probability distribution, this adds support for reporting quantiles.
See https://en.wikipedia.org/wiki/Quantile
This gives better subsample of the measurements in the summary, without need to resort to the use of a full verbose mode, which might be unnecessarily slow.
The spaghetti if-else code was untangled into nested function that computes `iterationsPerSampleTime` and a single constant `numIters` expression that takes care of the overflow capping as well as the choice between fixed and computed `numIters` value.
The `numIters` is now computed and logged only once per benchmark measurement instead of on every sample.
The sampling loop is now just a single line. Hurrah!
Modified test to verify that the `LogParser` maintains `num-iters` derived from the `Measuring with scale` message across samples.
Small fix following the last refactorig of MAX_RSS, the `--memory` option is required to measure memory in `--verbose` mode. Added integration test for `check` command of Benchmark_Driver that depended on it.
Added `--delta` argument to print the quantiles in delta encoded format, that ommits 0s.
This results in machine and human readable output that highlights modes and is easily digestible, giving you the feel for the underlying probability distribution of the samples in the reported results:
````
$ ./Benchmark_O --num-iters=1 --num-samples=20 --quantile=20 --delta 170 171 184 185 198 199 418 419 432 433 619 620
#,TEST,SAMPLES,MIN(μs),𝚫V1,𝚫V2,𝚫V3,𝚫V4,𝚫V5,𝚫V6,𝚫V7,𝚫V8,𝚫V9,𝚫VA,𝚫VB,𝚫VC,𝚫VD,𝚫VE,𝚫VF,𝚫VG,𝚫VH,𝚫VI,𝚫VJ,𝚫MAX
170,DropFirstArray,20,171,,,,,,,,,,,,,,,,,,,2,29
171,DropFirstArrayLazy,20,168,,,,,,,,,,,,,,,,,,,,8
184,DropLastArray,20,55,,,,,,,,,,,,,,,,,,,,26
185,DropLastArrayLazy,20,65,,,,,,,,,,,,,,,,,,,1,90
198,DropWhileArray,20,214,1,,,,,,,,,,,,,,,,,1,27,2
199,DropWhileArrayLazy,20,464,,,,1,,,,,,,,1,1,1,4,9,1,9,113,2903
418,PrefixArray,20,132,,,,,,,,,,,,,,,,,1,1,32,394
419,PrefixArrayLazy,20,168,,,,,,,,,,,,1,,2,9,1,15,8,88,3338
432,PrefixWhileArray,20,252,1,,,,1,,,,,,,,,,,1,,,,30
433,PrefixWhileArrayLazy,20,168,,,,,,,,,,,,,1,,6,6,14,43,28,10200
619,SuffixArray,20,68,,,,,,,,,,,,,1,,,,22,1,1,4
620,SuffixArrayLazy,20,65,,,,,,,,,,,,,,,,,,1,9,340
````
The default benchmark result reports statistics of a normal distribution — mean and standard deviation. Unfortunately the samples from our benchmarks are *not normally distributed*. To get a better picture of the underlying probability distribution, this adds support for reporting quantiles.
See https://en.wikipedia.org/wiki/Quantile
This gives better subsample of the measurements in the summary, without need to resort to the use of a full verbose mode, which might be unnecessarily slow.
The spaghetti if-else code was untangled into nested function that computes `iterationsPerSampleTime` and a single constant `numIters` expression that takes care of the overflow capping as well as the choice between fixed and computed `numIters` value.
The `numIters` is now computed and logged only once per benchmark measurement instead of on every sample.
The sampling loop is now just a single line. Hurrah!
Modified test to verify that the `LogParser` maintains `num-iters` derived from the `Measuring with scale` message across samples.
See https://www.martinfowler.com/bliki/StranglerApplication.html for more info on the used pattern for refactoring legacy applications.
Introduced class `BenchmarkDriver` as a beginning of strangler application that will gradually replace old functions. Used it instead of `get_tests()` function in Benchmark_Driver.
The interaction with Benchmark_O is simulated through mocking. `SubprocessMock` class records the invocations of command line processes and responds with canned replies in the format of Benchmark_O output.
Removed 3 redundant lit tests that are now covered by the unit test `test_gets_list_of_all_benchmarks_when_benchmarks_args_exist`. This saves 3 seconds from test execution. Keeping only single integration test that verifies that the plumbing is connected correstly.
The `--help` option now prints standard usage description with documentaion for all arguments:
````
$ Benchmark_O --help
usage: Benchmark_O [--argument=VALUE] [TEST [TEST ...]]
positional arguments:
TEST name or number of the benchmark to measure
optional arguments:
--help show this help message and exit
--num-samples number of samples to take per benchmark; default: 1
--num-iters number of iterations averaged in the sample;
default: auto-scaled to measure for 1 second
--iter-scale number of seconds used for num-iters calculation
default: 1
--verbose increase output verbosity
--delim value delimiter used for log output; default: ,
--tags run tests matching all the specified categories
--skip-tags don't run tests matching any of the specified
categories; default: unstable,skip
--sleep number of seconds to sleep after benchmarking
--list don't run the tests, just log the list of test
numbers, names and tags (respects specified filters)
````
The `parseArgs` funtion is now a private method on `ArgumentParser`.
Removed `Arguments` struct and moved the `Argument` as a nested struct into the parser.
Adjusted error messages and the corresponding checks.
Refactored to use Swift’s idiomatic error handling.
In case of invalid argument errors, the message is printed to `stderr` and we exit gracefully with error code 1. We no longer crash the app in most cases.
* Fix: flushing stdout before crashing to enable testing.
* Added tests that verify reporting of errors when the parsing of command line arguments fails.
Measure and report system environment indicators during benchmark execution:
* Memory usage with maximum resident set size (MAX_RSS) in bytes
Proxy indicators of system load level:
* Number of Involuntary Context Switches (ICS)
* Number of Voluntary Context Switches (VCS)
MAX_RSS delta is always reported in the last column of the log report.
The `--verbose` mode additionaly reports full values measured before and after the benchmark execution as well as their delta for MAX_RSS, ICS and VCS.
* Added check for running by test number.
* Documented “dry run” using `--list`.
* Moved real run test to the end.
* Added checks for logging benchmarks measurements (header and stats).
* Added check that specifying the same test multiple times runs it just once.
Reintroduced feature lost during `BenchmarkInfo` modernization: All registered benchmarks are ordered alphabetically and assigned an index. This number can be used as a shortcut to invoke the test instead of its full name. (Adding and removing tests from the suite will naturally reassign the indices, but they are stable for a given build.)
The `--list` parameter now prints the test *number*, *name* and *tags* separated by delimiter.
The `--list` output format is modified from:
````
Enabled Tests,Tags
AngryPhonebook,[String, api, validation]
...
````
to this:
````
\#,Test,[Tags]
2,AngryPhonebook,[String, api, validation]
…
````
(There isn’t a backslash before the #, git was eating the whole line without it.)
Note: Test number 1 is Ackermann, which is marked as “skip”, so it’s not listed with the default `skip-tags` value.
Fixes the issue where running tests via `Benchmark_Driver` always reported each test as number 1. Each test is run independently, therefore every invocation was “first”. Restoring test numbers resolves this issue back to original state: The number reported in the first column when executing the tests is its ordinal number in the Swift Benchmark Suite.
Fixed failure in `get_tests` which depended on the removed `Benchmark_O --run-all` option for listing all test (not just the pre-commit set).
Fix: Restored the ability to run tests by ordinal number from `Benchmark_Driver` after the support for this was removed from `Benchmark_O`.
Added tests that verify output format of `Benchmark_O --list` and the support for `--skip-tags= ` option which effectively replaced the old `--run-all` option. Other tools, like `Benchmark_Driver` depend on it.
Added integration tests for the dependency between `Benchmark_Driver` and `Benchmark_O`.
Running pre-commit test set isn’t tested explicitly here. It would take too long and it is run fairly frequently by CI bots, so if that breaks, we’ll know soon enough.
Added test: It should be possible to run benchmark by name, even if its tags match the default skip-tags.
Added verification tests for benchmark filtering with `tags` and `skip-tags` parameters.
When listing benchmarks with `--list` parameter, present the tags in format that is actually accepted by the `--tags` and `--skip-tags` parameters.
Changes the `--list` output from
````
Enabled Tests,Tags
AngryPhonebook,[TestsUtils.BenchmarkCategory.validation, TestsUtils.BenchmarkCategory.api, TestsUtils.BenchmarkCategory.String]
...
````
into
````
Enabled Tests,Tags
AngryPhonebook,[String, api, validation]
…
````
Added `lit` substitution for running Benchmark_O binary.
Introduced `lit` tests for `Bechmark_O` to demonstrate bugs:
* listing benchmarks’ tags
* when running with `--num-iters=1`