Commit Graph

20 Commits

Author SHA1 Message Date
Pavol Vaskovic
a56c55c8e4 [benchmark] Round quantile idx to nearest or even
Explicitly use round-half-to-even rounding algorithm to match the behavior of numpy's quantile(interpolation='nearest') and quantile estimate type R-3, SAS-2. See:
https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample
2018-09-10 10:45:00 +02:00
Pavol Vaskovic
be39c02001 [benchmark] Refactor numIters computation
The spaghetti if-else code was untangled into nested function that computes `iterationsPerSampleTime` and a single constant `numIters` expression that takes care of the overflow capping as well as the choice between fixed and computed `numIters` value.

The `numIters` is now computed and logged only once per benchmark measurement instead of on every sample.

The sampling loop is now just a single line. Hurrah!

Modified test to verify that the `LogParser` maintains `num-iters` derived from the `Measuring with scale` message across samples.
2018-08-31 17:17:48 +02:00
Pavol Vaskovic
0db20feda2 [benchmark] Fix index computation for quantiles
Turns out that both the old code in `DriverUtils` that computed median, as well as newer quartiles in `PerformanceTestSamples` had off-by-1 error.

It trully is the 3rd of the 2 hard things in computer science!
2018-08-31 07:32:10 +02:00
Pavol Vaskovic
7ae5d7754c [benchmark] Report totals as a sentence
Clean up after removing bogus agregate statistics from last line of the log. It makes more sense to report the total number of executed benchmarks as a sentence that trying to fit into the format of preceding table.

Added test assertion that `run_benchmarks` return csv formatted log, as it is used to write the log into file in `log_results`.
2018-08-23 18:01:46 +02:00
Pavol Vaskovic
049ffb34b0 [benchmark] Fix parsing formatted text
The test number column in the space justified column format emmited by the Benchmark_Driver to stdout  while logging to file is right aligned, so it must handle leading whitespace.
2018-08-23 12:31:00 +02:00
Pavol Vaskovic
076415f969 [benchmark] Strangler run_benchmarks
Replaced guts of the `run_benchmarks` function with implementation from `BenchmarDriver`. There was only single client which called it with `verbose=True`, so this parameter could be safely removed.

Function `instrument_test` is replaced by running the `Benchmark_0` with `--memory` option, which implements the MAX_RSS measurement while also excluding the overhead from the benchmarking infrastructure. The incorrect computation of standard deviation was simply dropped for measurements of more than one independent sample. Bogus aggregated `Totals` statistics were removed, now reporting only the total number of executed benchmarks.
2018-08-17 08:40:39 +02:00
Pavol Vaskovic
e80165f316 [benchmark] Exclude only outliers from the top
Option to exclude the outliers only from top of the range, leaving in the outliers on the min side.
2018-08-17 08:39:50 +02:00
Pavol Vaskovic
27cc77c590 [benchmark] Exclude outliers from samples
Introduce algorithm for excluding of outliers after collecting all samples using the Interquartile Range rule.

The `exclude_outliers` method uses 1st and 3rd Quartile to compute Interquartile Range, then uses inner fences at Q1 - 1.5*IQR and Q3 + 1.5*IQR to remove samples outside this fence.

Based on experiments with collecting hundreads and thousands of samples (`num_samples`) per test with low iteration count (`num_iters`) with ~1s runtime, this rule is very effective in providing much better quality of sample population, effectively removing short environmental fluctuations that were previously averaged into the overall result (by the adaptively determined `num_iters` to run for ~1s), enlarging the reported result with these measurement errors. This technique can be used for some benchmarks, to get more stable results faster than before.

This outlier filering is employed when parsing `--verbose` test results.
2018-08-17 08:39:50 +02:00
Pavol Vaskovic
91077e3289 [benchmark] Introduced PerformanceTestSamples
* Moved the functionality to compute median, standard deviation and related statistics from `PerformanceTestResult` into `PerformanceTestSamples`.
* Fixed wrong unit in comments
2018-08-17 08:39:50 +02:00
Pavol Vaskovic
bea35cb7c1 [benchmark] LogParser measure environment
Measure more of environment during test

In addition to measuring maximum resident set size, also extract number of voluntary and involuntary context switches from the verbose mode.
2018-08-17 00:32:04 +02:00
Pavol Vaskovic
c60e223a3b [benchmark] LogParser: tab & space delimited logs
Added support for tab delimited and formatted log output (space aligned columns as output to console by Benchmark_Driver).
2018-08-17 00:32:04 +02:00
Pavol Vaskovic
d0cdaee798 [benchmark] LogParser support for --verbose mode
LogParser doesn’t use `csv.reader` anymore.
Parsing is handled by a Finite State Machine. Each line is matched against a set of (mutually exclusive) regular expressions that represent known states. When a match is found, corresponding parsing action is taken.
2018-08-17 00:32:04 +02:00
Pavol Vaskovic
9852e9a32a [benchmark] Extracted LogParser class 2018-08-17 00:32:04 +02:00
Pavol Vaskovic
0b990a82a5 [benchmark] Extracted test_utils.py
Moving the `captured_output` function to own file.

Adding homegrown unit testing helper classes `Stub` and `Mock`.

The issue is that the unittest.mock was added in Python 3.3 and we need to run on Python 2.7. `Stub` and `Mock` were organically developed as minimal implementations to support the common testing patterns used on the original branch, but since I’m rewriting the commit history to provide an easily digestible narrative, it makes sense to introduce them here in one step as a custom unit testing library.
2018-08-16 20:08:34 +02:00
Pavol Vaskovic
179b12103f [benchmark] Refactor formatting responsibilities
Moved result formatting methods from `PerformanceTestResult` and `ResultComparison` to `ReportFormatter`, in order to free PTR to take more computational responsibilities in the future.
2018-08-16 17:44:59 +02:00
Pavol Vaskovic
686c761992 One more typo fix. 2017-06-04 18:48:19 +02:00
Pavol Vaskovic
dea7d8fe77 Consistent --output; Improved coverage: main()
Coverage at 99% according to coverage.py

* `compare_perf_tests.py` now always outputs the same format to stdout as is written to `--output` file
* Added integration test for the main() function
* Added tests for console output (and suppressed it leaking during testing)
* Fixed file name in test’s file header
2017-06-04 18:31:06 +02:00
Pavol Vaskovic
9265a71ac6 Improved coverage: ReportFormatter
Coverage at 87% according to coveragy.py

Also fixed spelling errors in documentation.
2017-06-02 02:28:44 +02:00
Pavol Vaskovic
d178b6e0cd Improved coverage with more tests: parse_args
Coverage at 66% according to coveragy.py
2017-06-01 22:19:33 +02:00
Pavol Vaskovic
49ddd96c83 Added documentation and test coverage.
compare_perf_test.py is now covered with unit tests and public methods are documented in the implementation.

Minor refactoring  to better conform to Python conventions:
* classes declared in new style
* proper private method prefix of single underscore
* replacing map with list comprehension where it was clearer

Unit test are executed as part of validation-test.

.gitignore was modified to ignore .coverage and htmlcov artifacts generated by the coverage.py package
2017-06-01 20:05:40 +02:00