Reintroduced feature lost during `BenchmarkInfo` modernization: All registered benchmarks are ordered alphabetically and assigned an index. This number can be used as a shortcut to invoke the test instead of its full name. (Adding and removing tests from the suite will naturally reassign the indices, but they are stable for a given build.)
The `--list` parameter now prints the test *number*, *name* and *tags* separated by delimiter.
The `--list` output format is modified from:
````
Enabled Tests,Tags
AngryPhonebook,[String, api, validation]
...
````
to this:
````
\#,Test,[Tags]
2,AngryPhonebook,[String, api, validation]
…
````
(There isn’t a backslash before the #, git was eating the whole line without it.)
Note: Test number 1 is Ackermann, which is marked as “skip”, so it’s not listed with the default `skip-tags` value.
Fixes the issue where running tests via `Benchmark_Driver` always reported each test as number 1. Each test is run independently, therefore every invocation was “first”. Restoring test numbers resolves this issue back to original state: The number reported in the first column when executing the tests is its ordinal number in the Swift Benchmark Suite.
Fixed failure in `get_tests` which depended on the removed `Benchmark_O --run-all` option for listing all test (not just the pre-commit set).
Fix: Restored the ability to run tests by ordinal number from `Benchmark_Driver` after the support for this was removed from `Benchmark_O`.
Added tests that verify output format of `Benchmark_O --list` and the support for `--skip-tags= ` option which effectively replaced the old `--run-all` option. Other tools, like `Benchmark_Driver` depend on it.
Added integration tests for the dependency between `Benchmark_Driver` and `Benchmark_O`.
Running pre-commit test set isn’t tested explicitly here. It would take too long and it is run fairly frequently by CI bots, so if that breaks, we’ll know soon enough.
To use this, one needs to first build an installable root for swift (i.e. like
the smoke testbot does). Then use the tool ./benchmark/scripts/build_linux.py
with the appropriate locations of the build-directory, installable snapshot,
and it will build the benchmarks. (There are more arguments, just use --help).
rdar://40541972
Disable the random hash seed while benchmarking. By its nature, it makes the number of hash collisions fluctuate between runs, adding unnecessary noise to benchmark results.
I expect we'll be able to re-enable random seeding here once we have made hash collisions cheaper -- they are currently always resolved by calling the Key's Equatable implementation, which can be expensive.
This benchmark script is similar to the guard malloc/runtime runner, but it only
runs the tests. The intention is that one can use this to quickly in a
multithreaded way verify that all benchmarks run successfully. In contrast, the
normal driver will run only single threaded since it is meant to test
performance, so is not able to take advantage of all cores on a system.
I wrote this quickly to verify some benchmark tests still worked. No point in
not sharing with everyone else.
On the bots, we have a timeout without output of 60 minutes for the entire test.
This should ensure that we are able to kill mis-behaving tests and give a good
error instead of just getting a jenkins timeout error.
For those confused, this is for the guard malloc/leaks test.
rdar://36874229
Support specifying a baseline branch to compare the current results
against. Previously, the master branch was hardcoded.
Fixes: rdar://problem/32751587
Add support for running benchmarks by reffering to them by their ordinal number in `Benchmark_Driver`, as is supported by `Benchmark_O`(`Onone`, `Ounchecked`).
Updated documentation to reflect this.
SR-4780 Can not run performance tests that are not in precommit suite
Modified driver to honor command line arguments when listing enabled tests. Fixed interaction between filters (positional arguments) and --run-all option.
Benchmark_Driver lists available benchmarks with --run-all option when benchmarks or filters are specified.
Coverage at 99% according to coverage.py
* `compare_perf_tests.py` now always outputs the same format to stdout as is written to `--output` file
* Added integration test for the main() function
* Added tests for console output (and suppressed it leaking during testing)
* Fixed file name in test’s file header
compare_perf_test.py is now covered with unit tests and public methods are documented in the implementation.
Minor refactoring to better conform to Python conventions:
* classes declared in new style
* proper private method prefix of single underscore
* replacing map with list comprehension where it was clearer
Unit test are executed as part of validation-test.
.gitignore was modified to ignore .coverage and htmlcov artifacts generated by the coverage.py package
* Refactor compare_perf_tests.py
* Fix SR-4601 Report Added and Removed Benchmarks in Performance Comparison
Improved HTML styling.
* Added back support for reading concatenated Benchmark_Driver output
PerformanceTestResults can be merged, computing new MIN, MAX, and running MEAN and SD.
* Handle output from Benchmark_O again
Treat MAX_RSS as optional column
Column names must not contain spaces for tools that auto-format the table.
The extra "(%)" was completely redundant since every value in the column
reads as a percentage.
Both regressions and improvements are sorted by the delta, which in case
of improvements produces the reversed order due to negative values of
delta.
This change makes the improvements ordered 'naturally':
most-improved-first.