The test number column in the space justified column format emmited by the Benchmark_Driver to stdout while logging to file is right aligned, so it must handle leading whitespace.
Replaced guts of the `run_benchmarks` function with implementation from `BenchmarDriver`. There was only single client which called it with `verbose=True`, so this parameter could be safely removed.
Function `instrument_test` is replaced by running the `Benchmark_0` with `--memory` option, which implements the MAX_RSS measurement while also excluding the overhead from the benchmarking infrastructure. The incorrect computation of standard deviation was simply dropped for measurements of more than one independent sample. Bogus aggregated `Totals` statistics were removed, now reporting only the total number of executed benchmarks.
Introduce algorithm for excluding of outliers after collecting all samples using the Interquartile Range rule.
The `exclude_outliers` method uses 1st and 3rd Quartile to compute Interquartile Range, then uses inner fences at Q1 - 1.5*IQR and Q3 + 1.5*IQR to remove samples outside this fence.
Based on experiments with collecting hundreads and thousands of samples (`num_samples`) per test with low iteration count (`num_iters`) with ~1s runtime, this rule is very effective in providing much better quality of sample population, effectively removing short environmental fluctuations that were previously averaged into the overall result (by the adaptively determined `num_iters` to run for ~1s), enlarging the reported result with these measurement errors. This technique can be used for some benchmarks, to get more stable results faster than before.
This outlier filering is employed when parsing `--verbose` test results.
* Moved the functionality to compute median, standard deviation and related statistics from `PerformanceTestResult` into `PerformanceTestSamples`.
* Fixed wrong unit in comments
Measure more of environment during test
In addition to measuring maximum resident set size, also extract number of voluntary and involuntary context switches from the verbose mode.
LogParser doesn’t use `csv.reader` anymore.
Parsing is handled by a Finite State Machine. Each line is matched against a set of (mutually exclusive) regular expressions that represent known states. When a match is found, corresponding parsing action is taken.
Moved result formatting methods from `PerformanceTestResult` and `ResultComparison` to `ReportFormatter`, in order to free PTR to take more computational responsibilities in the future.
Coverage at 99% according to coverage.py
* `compare_perf_tests.py` now always outputs the same format to stdout as is written to `--output` file
* Added integration test for the main() function
* Added tests for console output (and suppressed it leaking during testing)
* Fixed file name in test’s file header
compare_perf_test.py is now covered with unit tests and public methods are documented in the implementation.
Minor refactoring to better conform to Python conventions:
* classes declared in new style
* proper private method prefix of single underscore
* replacing map with list comprehension where it was clearer
Unit test are executed as part of validation-test.
.gitignore was modified to ignore .coverage and htmlcov artifacts generated by the coverage.py package
* Refactor compare_perf_tests.py
* Fix SR-4601 Report Added and Removed Benchmarks in Performance Comparison
Improved HTML styling.
* Added back support for reading concatenated Benchmark_Driver output
PerformanceTestResults can be merged, computing new MIN, MAX, and running MEAN and SD.
* Handle output from Benchmark_O again
Treat MAX_RSS as optional column
Column names must not contain spaces for tools that auto-format the table.
The extra "(%)" was completely redundant since every value in the column
reads as a percentage.
Both regressions and improvements are sorted by the delta, which in case
of improvements produces the reversed order due to negative values of
delta.
This change makes the improvements ordered 'naturally':
most-improved-first.
Make sure all Python code in the repo specifies which exceptions to
catch when using `except:`.
Regressions can be catched using `flake8-blind-except` going forward.