Commit Graph

245 Commits

Author SHA1 Message Date
Sergej Jaskiewicz
cce9e81f0b Support Python 3 in the benchmark suite 2020-02-28 01:45:35 +03:00
Ross Bayer
b1961745e0 [Python: black] Reformatted the benchmark Python sources using utils/python_format.py. 2020-02-08 15:32:44 -08:00
Michael Gottesman
2840a7609d When gathering counters, check for instability and FAIL otherwise.
The way we already gather numbers for this test is that we run two runs of
`Benchmark_O $TEST` with num-samples=2, iters={2,3}. Under the assumption that
the only difference in counter numbers can be caused by that extra iteration,
subtracting the group of counts for 2,3 gives us the number of counts in that
iteration.

In certain cases, I have found that a small subset of the benchmarks are
producing weird output and I haven't had the time to look into why. That being
said, I do know what these weird results look like, so in this commit we do some
extra validation work to see if we need to fail a test due to instability.

The specific validation is that:

1. We perform another run with num-samples=2, iter=5 and subtract the iter=3
counts from that. Under the assumption that overall work should increase
linearly with iteration size in our benchmarks, we check if the counts are
actual 2x.

2. If either `result[iter=3] - result[iter=2]` or `result[iter=5] -
result[iter=3]` is negative. All of the counters we gather should never decrease
with iteration count.
2020-01-15 14:41:21 -08:00
Michael Gottesman
461f17e5b7 Change -csv flag to be --emit-csv. 2020-01-15 14:41:21 -08:00
Michael Gottesman
35aa0405d1 Pattern match test names, not numbers to capture test names from Benchmark_O --list
This makes the output of the test more readable.
2020-01-15 14:41:21 -08:00
Michael Gottesman
676411f0b0 Have dtrace aggregate rr opts and start tracking {retain,release}_n.
Otherwise, one can get results that seem to imply more rr traffic when in
reality, one was not tracking {retain,release}_n that as a result of better
optimization become just simple retain, release.
2020-01-15 14:39:55 -08:00
Michael Gottesman
6fff30c122 [benchmark-dtrace] Enabling multiprocessing option to speed up gathering data. 2020-01-08 16:06:56 -08:00
Michael Gottesman
c7c2e6e17b [benchmark-dtrace] Fix the amount of samples taken along side the number of iters.
Otherwise, the output is not stable.
2020-01-08 16:06:56 -08:00
Michael Gottesman
d48cdd9cad [benchmark-dtrace] Set SWIFT_DETERMINISTIC_HASHING=1 before calling subjobs.
This prevents a bunch of instability in the retain, release numbers. I am still
getting some of it, but this helps a lot.
2020-01-08 16:06:56 -08:00
Alex Hoppen
932525d762 [gardening] Fix several python-lint warnings 2019-10-29 10:40:20 -07:00
Alex Hoppen
776e2c0030 Revert "Migrate building SwiftSyntax to swift_build_support" 2019-10-29 09:55:32 -07:00
Alex Hoppen
46501b881f [gardening] Fix several python-lint warnings 2019-10-25 15:58:07 -07:00
Erik Eckstein
81a5c0f479 run_smoke_bench: make num_retries configurable 2019-10-14 11:37:42 +02:00
Pavol Vaskovic
cc0e16ca34 [benchmark] LogParser: measurement metadata 2019-07-23 19:44:41 +02:00
Pavol Vaskovic
007d398f4a [Gardening] ReportFormatter: tying up loose ends 2019-05-24 00:18:44 +02:00
Pavol Vaskovic
b3f7996ea7 [benchmark] ReportFormatter: better inline headers
Improve inline headers in `single_table` mode to also print labels for the numeric columns.

Sections in the `single_table` are visually distinguished by a separator row preceding the the inline headers.

Separated header label styles for git and markdown modes with UPPERCASE and **Bold**  formatting respectively.

Inlined section template definitions.
2019-05-23 23:24:51 +02:00
Pavol Vaskovic
73b31006ee [benchmark] Fix help printing for run_smoke_bench 2019-05-23 21:40:44 +02:00
Pavol Vaskovic
9750581bf5 [benchmark] ReportFormatter: right-align num cols 2019-05-23 19:32:34 +02:00
Pavol Vaskovic
af7ef03aaf [benchmark] ReportFormatter: refactor header logic
Confine the logic for printing headers to the header function.
2019-05-23 17:28:21 +02:00
Pavol Vaskovic
a998e18e18 [benchmark] ReportFormatter: faster templating
It is slightly faster to simply concatenate strings that don’t require special formatting.
2019-05-23 12:29:19 +02:00
Pavol Vaskovic
49d25bfc51 [benchmark] ReportFomatter: de-tuple
Remove unnecessary list-to-tuple conversions.
2019-05-23 12:20:19 +02:00
Pavol Vaskovic
081e1c94a5 [benchmark] Add unit test for single table report 2019-05-22 14:54:00 +02:00
Michael Gottesman
c86c1763c6 [benchmarks] Add support to the build-script swiftpm benchmarks for building the benchmarks in -Osize. 2019-04-11 10:10:38 -07:00
Michael Gottesman
53ff97428a [benchmarks] Change the build_script_helper to use subdirectories for each build and install final binaries in a toplevel ./bin build directory.
This will let me:

1. Add -Osize support easily.
2. Put all of the binaries in the same directory so that Benchmark_Driver can
   work with them via the -tools argument.
2019-04-10 22:18:50 -07:00
Michael Gottesman
115f7a43e0 Move build_script_helper from ./benchmarks/utils => ./benchmarks/scripts. 2019-04-10 22:18:50 -07:00
Pavol Vaskovic
691007b029 [benchmark] LogParser: Accept -?! in bench. names
Extend parser to support benchmark names that include `-?!` in names, to fully support the new Naming Convention from PR #20334.
2019-02-19 23:31:58 +01:00
Pavol Vaskovic
84e7d4dfb8 [benchmark] Adjust Driver’s console output format
…to handle longer benchmark names, assuming maximum length of 40 characters.
2019-02-19 23:28:51 +01:00
Pavol Vaskovic
3f179f39e0 Increase # of independent samples for changes.
Multimodal benchmarks with significant delta between the modes can report false performance changes when we gather too few independent samples. This increases the minimal number of independent samples from 5 to 10.
Fix for https://bugs.swift.org/browse/SR-9907
2019-02-12 11:42:51 +01:00
Pavol Vaskovic
85ba83191e [benchmark] Remove unused function get_results
Remove the `get_results` function, which is no longer used after the refactoring that rebased the benchmark measurements on `BenchmarDriver` class in #21684.
2019-02-04 10:11:57 +01:00
Patrick Balestra
3c6b3ab4cc [benchmark] Fix linter errors in create_benchmark.py 2019-01-21 22:40:23 +01:00
Patrick Balestra
0b3fa54249 [benchmark] Split template into separate line and fix linter errors 2019-01-21 22:40:23 +01:00
Patrick Balestra
1ca47e7870 [benchmark] Add script to automate creation of new single-source benchmarks
Adds a `create_benchmark` script that automates the following three tasks:
1. Add a new Swift file (YourTestNameHere.swift), built according to the template below, to the {{single-source}}directory.
2. Add the filename of the new Swift file to CMakeLists.txt
3. Edit main.swift. Import and register your new Swift module.

The process of adding new benchmarks is now automated and a lot less error-prone.
2019-01-20 22:22:08 +01:00
Gwynne Raskind
faf8a5edb6 Fix indentation for python_lint 2019-01-17 02:00:40 -06:00
Gwynne Raskind
09b4159cb2 Global replace of "assertEquals" with "assertEqual" in compliance with deprecation of assertEquals name in Python 2.7 2019-01-16 04:06:38 -06:00
Pavol Vaskovic
2c271493d5 [benchmark] Limit of Accuracy in Setup Overhead
Clarified limit of accuracy in setup overhead detection.
2019-01-09 18:01:06 +01:00
Pavol Vaskovic
2096151ee9 [benchmark] BenchmarkDoctor: Lower runtime limit
Warn about runtimes under 20 μs and flag 0 μs runtimes as errors.
2019-01-08 19:16:40 +01:00
Pavol Vaskovic
8a8a3ad6df [benchmark] Limit setup overhead detection (>20)
For really small runtimes < 20 μs this method of setup overhead detection doesn’t work. Even 1μs change in 20μs runtime is 5%. Just return no overhead.
2019-01-08 19:15:29 +01:00
Pavol Vaskovic
d854f0f898 [benchmark] test_performance with BenchmarkDriver
Refactored `test_perfomance` function to use existing  BenchmarkDriver and TestComparator.

This replaces hand-rolled parser and comparison logic with library functions which already have full unit test coverage.
2019-01-08 00:22:00 +01:00
Pavol Vaskovic
cd4886aa2b [Gardening] Move imports and DriverArgs to top 2019-01-07 20:59:47 +01:00
Pavol Vaskovic
4a716445df [benchmark] BernchmarkDriver run in batch mode
Finished support for running all active tests in one batch. Returns a dictionary of PerformanceTestResults.

Known test names are passed to the harness in a compressed form as test numbers.
2019-01-07 20:59:39 +01:00
Pavol Vaskovic
df3389259b [benchmark] BenchmarkDriver: store test_numbers 2019-01-07 20:57:47 +01:00
Pavol Vaskovic
1f58ad6662 [Gardening] Better names: _tests_by_name_or_number 2019-01-07 20:57:47 +01:00
Pavol Vaskovic
3023ab5545 [benchmark] BenchmarkDriver sample_time support
Added support for Benchmark_X’s `--sample-time` parameter .
2019-01-07 20:57:42 +01:00
Pavol Vaskovic
b831f93dd4 [benchmark] BenchmarkDoctor: Optional markdown arg
Don't require the presence of `markdown` argument for initialization.
(It doesn't exist when BenchmarkDoctor is used from `run_smoke_bench`.)
2018-12-21 21:26:24 +01:00
Pavol Vaskovic
46f94d7709 [benchmark] BenchmarkDriver check --markdown
Added `--markdown` flag for the `check` command to output the `BenchmarkDoctor`’s report in the Markdown format (as used by swift-ci on GitHub).
2018-12-21 01:22:38 +01:00
Andrew Trick
5154886491 Merge pull request #20334 from palimondo/within-cells-interlinked
[benchmark] Naming Convention
2018-12-13 08:20:23 -08:00
Pavol Vaskovic
ec836bd04b Merge pull request #20861 from palimondo/a-tall-white-fountain-played
[benchmark] Janitor Duty, Legacy Factor: A-C
2018-12-07 19:49:06 +01:00
Pavol Vaskovic
9d6f7ad160 [benchmark] Driver & Doctor: Lower the sample cap
Lowered the default sample cap from 2k to 200. (This doesn’t effect manually specified `--num-samples` argument in the driver.)

Swift benchmarks have pretty constant performance profile over time. It’s more beneficial to get multiple independent measurements faster, than more samples from the same run.
2018-12-07 15:06:43 +01:00
Erik Eckstein
5797e3d8c7 benchmarks: Remove the obsolete bench_code_size.py and run_smoke_bench.py scripts.
Those scripts are replaced by run_smoke_bench.

This is a follow-up commit to 040aa06fec
2018-12-05 16:02:46 -08:00
Pavol Vaskovic
d6392cc014 Merge pull request #20807 from palimondo/and-dreadfully-distinct
[benchmark] Added Benchmark Check Report
2018-11-29 05:35:18 +01:00