swift-mirror

mirror of https://github.com/apple/swift.git synced 2025-12-14 20:36:38 +01:00

Author	SHA1	Message	Date
Tim Kientzle	e3e72fc21c	Update benchmark test script to correctly verify the --min-samples=2 default command-line arg	2023-09-25 11:57:34 -07:00
Tim Kientzle	961a38b636	Test the non-JSON output We have to continue using the non-JSON forms until the JSON-supporting code is universally available.	2022-11-07 14:46:13 -08:00
Tim Kientzle	071e9f1c7e	Python style fixes	2022-11-04 14:02:03 -07:00
Tim Kientzle	520fd79efd	Fix some test failures The new code stores test numbers as numbers (not strings), which requires a few adjustments. I also apparently missed a few test updates.	2022-11-04 14:02:03 -07:00
Tim Kientzle	971a5d8547	Overhaul Benchmarking pipeline to use complete sample data, not summaries The Swift benchmarking harness now has two distinct output formats: * Default: Formatted text that's intended for human consumption. Right now, this is just the minimum value, but we can augment that. * `--json`: each output line is a JSON-encoded object that contains raw data This information is intended for use by python scripts that aggregate or compare multiple independent tests. Previously, we tried to use the same output for both purposes. This required the python scripts to do more complex parsing of textual layouts, and also meant that the python scripts had only summary data to work with instead of full raw sample information. This in turn made it almost impossible to derive meaningful comparisons between runs or to aggregate multiple runs. Typical output in the new JSON format looks like this: ``` {"number":89, "name":"PerfTest", "samples":[1.23, 2.35], "max_rss":16384} {"number":91, "name":"OtherTest", "samples":[14.8, 19.7]} ``` This format is easy to parse in Python. Just iterate over lines and decode each one separately. Also note that the optional fields (`"max_rss"` above) are trivial to handle: ``` import json for l in lines: j = json.loads(l) # Default 0 if not present max_rss = j.get("max_rss", 0) ``` Note the `"samples"` array includes the runtime for each individual run. Because optional fields are so much easier to handle in this form, I reworked the Python logic to translate old formats into this JSON format for more uniformity. Hopefully, we can simplify the code in a year or so by stripping out the old log formats entirely, along with some of the redundant statistical calculations. In particular, the python logic still makes an effort to preserve mean, median, max, min, stdev, and other statistical data whenever the full set of samples is not present. Once we've gotten to a point where we're always keeping full samples, we can compute any such information on the fly as needed, eliminating the need to record it. This is a pretty big rearchitecture of the core benchmarking logic. In order to try to keep things a bit more manageable, I have not taken this opportunity to replace any of the actual statistics used in the higher level code or to change how the actual samples are measured. (But I expect this rearchitecture will make such changes simpler.) In particular, this should not actually change any benchmark results. For the future, please keep this general principle in mind: Statistical summaries (averages, medians, etc) should as a rule be computed for immediate output and rarely if ever stored or used as input for other processing. Instead, aim to store and transfer raw data from which statistics can be recomputed as necessary.	2022-11-04 14:02:03 -07:00
Josh Soref	fa3ff899a9	Spelling benchmark (#42457 ) * spelling: approximate Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: available Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: benchmarks Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: between Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: calculation Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: characterization Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: coefficient Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: computation Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: deterministic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: divisor Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: encounter Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: expected Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: fibonacci Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: fulfill Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: implements Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: into Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: intrinsic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: markdown Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: measure Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: occurrences Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: omitted Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: partition Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: performance Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: practice Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: preemptive Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: repeated Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: requirements Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: requires Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: response Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: supports Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: unknown Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: utilities Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: verbose Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>	2022-04-25 09:02:06 -07:00
Daniel Duan	3dfc40898c	[NFC] Remove Python 2 imports from __future__ (#42086 ) The `__future__` we relied on is now, where the 3 specific things are all included [since Python 3.0](https://docs.python.org/3/library/__future__.html): * absolute_import * print_function * unicode_literals * division These import statements are no-ops and are no longer necessary.	2022-04-13 14:01:30 -07:00
Daniel Duan	06a04624a6	[benchmark] Remove Python 2 logic (#42048 ) Found a few pieces of Python 2 code. Remove them since we are on Python 3 entirely.	2022-03-27 15:02:58 -07:00
Evan Wilde	6956b7c5c9	Replace /usr/bin/python with /usr/env/python /usr/bin/python doesn't exist on ubuntu 20.04 causing tests to fail. I've updated the shebangs everywhere to use `/usr/bin/env python` instead.	2021-09-28 10:05:05 -07:00
Erik Eckstein	b5f1e265e0	benchmarks: disable the flaky test_log_file BenchmarkDriver test I'm not sure if it makes sense to keep this test around at all. For now I just disabled it. rdar://79701124	2021-06-28 13:13:02 +02:00
Erik Eckstein	abcae7bfa1	benchmarks: fix smoke test run by setting the dynamic library path This is a workaround for rdar://78584073	2021-05-31 15:10:08 +02:00
Erik Eckstein	a46cda8c51	benchmarks: fix run_smoke_bench to support new benchmark executable naming scheme Find the right benchmark executable with a glob pattern. Also, add an option "-arch" to select between executables for different architectures.	2020-07-07 11:01:49 +02:00
Sergej Jaskiewicz	cce9e81f0b	Support Python 3 in the benchmark suite	2020-02-28 01:45:35 +03:00
Ross Bayer	b1961745e0	[Python: black] Reformatted the benchmark Python sources using utils/python_format.py.	2020-02-08 15:32:44 -08:00
Alex Hoppen	932525d762	[gardening] Fix several python-lint warnings	2019-10-29 10:40:20 -07:00
Alex Hoppen	776e2c0030	Revert "Migrate building SwiftSyntax to swift_build_support"	2019-10-29 09:55:32 -07:00
Alex Hoppen	46501b881f	[gardening] Fix several python-lint warnings	2019-10-25 15:58:07 -07:00
Pavol Vaskovic	84e7d4dfb8	[benchmark] Adjust Driver’s console output format …to handle longer benchmark names, assuming maximum length of 40 characters.	2019-02-19 23:28:51 +01:00
Gwynne Raskind	faf8a5edb6	Fix indentation for python_lint	2019-01-17 02:00:40 -06:00
Gwynne Raskind	09b4159cb2	Global replace of "assertEquals" with "assertEqual" in compliance with deprecation of assertEquals name in Python 2.7	2019-01-16 04:06:38 -06:00
Pavol Vaskovic	2c271493d5	[benchmark] Limit of Accuracy in Setup Overhead Clarified limit of accuracy in setup overhead detection.	2019-01-09 18:01:06 +01:00
Pavol Vaskovic	2096151ee9	[benchmark] BenchmarkDoctor: Lower runtime limit Warn about runtimes under 20 μs and flag 0 μs runtimes as errors.	2019-01-08 19:16:40 +01:00
Pavol Vaskovic	8a8a3ad6df	[benchmark] Limit setup overhead detection (>20) For really small runtimes < 20 μs this method of setup overhead detection doesn’t work. Even 1μs change in 20μs runtime is 5%. Just return no overhead.	2019-01-08 19:15:29 +01:00
Pavol Vaskovic	4a716445df	[benchmark] BernchmarkDriver run in batch mode Finished support for running all active tests in one batch. Returns a dictionary of PerformanceTestResults. Known test names are passed to the harness in a compressed form as test numbers.	2019-01-07 20:59:39 +01:00
Pavol Vaskovic	df3389259b	[benchmark] BenchmarkDriver: store test_numbers	2019-01-07 20:57:47 +01:00
Pavol Vaskovic	3023ab5545	[benchmark] BenchmarkDriver sample_time support Added support for Benchmark_X’s `--sample-time` parameter .	2019-01-07 20:57:42 +01:00
Pavol Vaskovic	46f94d7709	[benchmark] BenchmarkDriver check --markdown Added `--markdown` flag for the `check` command to output the `BenchmarkDoctor`’s report in the Markdown format (as used by swift-ci on GitHub).	2018-12-21 01:22:38 +01:00
Andrew Trick	5154886491	Merge pull request #20334 from palimondo/within-cells-interlinked [benchmark] Naming Convention	2018-12-13 08:20:23 -08:00
Pavol Vaskovic	9d6f7ad160	[benchmark] Driver & Doctor: Lower the sample cap Lowered the default sample cap from 2k to 200. (This doesn’t effect manually specified `--num-samples` argument in the driver.) Swift benchmarks have pretty constant performance profile over time. It’s more beneficial to get multiple independent measurements faster, than more samples from the same run.	2018-12-07 15:06:43 +01:00
Pavol Vaskovic	92cf40dcd3	[benchmark] MarkdownReportHandler `logging.Handler` that creates nicely formatted report from `BecnhmarkDoctor`’s `check` in Markdown table for display on GitHub.	2018-11-27 22:55:02 +01:00
Pavol Vaskovic	9a04207735	[benchmark] Doctor: emit mem_page details info Promoting previously DEBUG message to INFO.	2018-11-27 22:49:07 +01:00
Pavol Vaskovic	b4f901bae4	[benchmark] Naming Convention New benchmark naming convention for better readability and improved naming system that accounts for performance coverage growth going forward.	2018-11-05 22:44:50 +01:00
Pavol Vaskovic	a7f832fb57	[benchmark] Legacy factor This adds optional `legacyFactor` to the `BenchmarkInfo`, which allows for linear modification of constants that unnecesarily inflate the base workload of benchmarks, while maintaining the continuity of log-term benchmark tracking. For example, if a benchmark uses `for _ in N10_000` in its run function, we could lower this to `for _ in N1_000` and adding a `legacyFactor: 10` to its `BenchmarkInfo`. Note that this doesn’t affect the real measurements gathered from the `--verbose` output. The `BenchmarkDoctor` has been slightly adjusted to work with these real samples, therefore `Benchmark_Driver check` will not flag these benchmarks for slow run time reported in the summary, if their real runtimes fall into the recommended range.	2018-11-01 06:24:27 +01:00
Pavol Vaskovic	a24d0ff7a5	[benchmark] BenchmarkDoctor checks setup time Add a check against unreasonably long setup times for benchmarks that do their initialization work in the `setUpFunction`. Given the typical benchmark measurements will last about 1 second, it’s reasonable to expect the setup to take at most 20% extra, on top of that: 200 ms. The `DictionaryKeysContains*` benchmarks are an instance of this mistake. The setup of `DictionaryKeysContainsNative` takes 3 seconds on my machine, to prepare a dictionary for the run function, whose typical runtime is 90 μs. The setup of Cocoa version takes 8 seconds!!! It is trivial to rewite these with much smaller dictionaries that demonstrate the point of these benchmarks perfectly well, without the need to wait for ages to setup these benchmarks.	2018-10-15 09:06:38 +02:00
Pavol Vaskovic	638f4f8e5e	[benchmark] Recommended runtime should be < 1ms * Lowered the threshold for healthy benchmark runtime to be under 1000 μs. * Offer suitable divisor that is power of 10, in addition to the one that’s power of 2. * Expanded the motivation in the docstring.	2018-10-13 22:09:25 +02:00
Pavol Vaskovic	d9a89ffea2	[benchmark] Use header in CSV log Since the meaning of some columns was changed, but their overall number remained, let’s include the header in the CSV log to make it clear that we are now reporting MIN, Q1, MEDIAN, Q3, MAX, MAX_RSS, instead of the old MIN, MAX, MEAN, SD, MEDIAN, MAX_RSS format.	2018-10-12 10:03:33 +02:00
Pavol Vaskovic	a04edd1d47	[benchmark] Quantiles in Benchmark_Driver Switching the measurement technique from gathering `i` independent samples characterized by their mean values, to a finer grained characterization of these measurements using quantiles. The distribution of benchmark measurements is non-normal, with outliers that significantly inflate the mean and standard-deviation due to presence of uncontrolled variable of the system load. Therefore the MEAN and SD were incorrect statistics to properly characterize the benchmark measurements. Benchmark_Driver now gathers more individual measurements from Benchmark_O. It is executed with `--num-iters=1`, because we don’t want to average the runtimes, we want raw data. This collects variable number of measurements gathered in about 1 second. Using the `--quantile=20` we get up to 20 measured values that properly characterize the empirical distribution of the benchmark from each independent run. The measurements from `i` independent executions are combined to form the final empirical distribution, which is reported in a five-number summary (MIN, Q1, MEDIAN, Q3, MAX).	2018-10-11 18:56:27 +02:00
Pavol Vaskovic	0438c45e2d	[benchmark] B_D iterations => independent-samples Renamed Benchmark_Driver’s `iterations` argument to `independent-samples` to clarify its true meaning and disambiguate it from the concept of `num-iters` used in Benchmark_O. The short form of the argument — `-i` — remains unchanged.	2018-10-11 18:56:27 +02:00
Pavol Vaskovic	9bd599a914	[benchmark] Doctor explicitly measures memory Small fix following the last refactorig of MAX_RSS, the `--memory` option is required to measure memory in `--verbose` mode. Added integration test for `check` command of Benchmark_Driver that depended on it.	2018-09-14 23:40:43 +02:00
Ben Langmuir	423e145b0c	Revert "[benchmark] Report Quantiles from Benchmark_O and a TON of Gardening"	2018-09-14 13:24:01 -07:00
Pavol Vaskovic	84bf15836d	[benchmark] Doctor explicitly measures memory Small fix following the last refactorig of MAX_RSS, the `--memory` option is required to measure memory in `--verbose` mode. Added integration test for `check` command of Benchmark_Driver that depended on it.	2018-09-06 18:21:50 +02:00
Pavol Vaskovic	49e8e692fb	[benchmark] Strangle `run` and `run_benchmarks` Moved all `run` command related functionality to `BenchmarkDriver`.	2018-08-23 18:01:46 +02:00
Pavol Vaskovic	7ae5d7754c	[benchmark] Report totals as a sentence Clean up after removing bogus agregate statistics from last line of the log. It makes more sense to report the total number of executed benchmarks as a sentence that trying to fit into the format of preceding table. Added test assertion that `run_benchmarks` return csv formatted log, as it is used to write the log into file in `log_results`.	2018-08-23 18:01:46 +02:00
Pavol Vaskovic	ef1461ca46	[benchmark] Strangle `log_results` Moved `log_results` to BenchmarkDriver.	2018-08-23 18:01:46 +02:00
Pavol Vaskovic	a10b6070dd	[benchmark] Refactor `log_results` Added tests for `log_results` and the space-justified-columns format emited to stdout while logging to file.	2018-08-23 18:01:46 +02:00
Pavol Vaskovic	1d3fa87fdd	[benchmark] Strangle log_results -> log_file Moved the `log_file` path construction to the `BenchmarkDriver`. Retired `get__git_` functions.	2018-08-23 18:01:41 +02:00
Pavol Vaskovic	f38e6df914	[benchmark] Doctor verifies constant memory use This needs to be finished with function approximating normal range based on the memory used.	2018-08-20 16:52:07 +02:00
Pavol Vaskovic	06061976da	[benchmark] BenchmarkDoctor checks setup overhead Detect setup overhead in benchmark and report if it exceeds 5%.	2018-08-17 08:50:04 +02:00
Pavol Vaskovic	7725c0096e	[benchmark] Measure and analyze benchmark runtimes `BenchmarkDoctor` measures benchmark execution (using `BenchmarkDriver`) and verifies that their runtime stays under 2500 microseconds.	2018-08-17 08:40:39 +02:00
Pavol Vaskovic	ab16999e20	[benchmark] Created BenchmarkDoctor (naming) `BenchmarkDoctor` analyzes performance tests and reports their conformance to the set of desired criteria. First two rules verify the naming convention. `BenchmarkDoctor` is invoked from `Benchmark_Driver` with `check` aurgument.	2018-08-17 08:40:39 +02:00

1 2

54 Commits