swift-mirror

mirror of https://github.com/apple/swift.git synced 2025-12-14 20:36:38 +01:00

Author	SHA1	Message	Date
Egor Zhdan	f931de8948	[benchmark] Do not abort with `TypeError` if no memory measurements were taken	2023-01-23 13:34:17 +00:00
Tim Kientzle	961a38b636	Test the non-JSON output We have to continue using the non-JSON forms until the JSON-supporting code is universally available.	2022-11-07 14:46:13 -08:00
Tim Kientzle	dfe8284462	pylint fixes	2022-11-07 14:45:59 -08:00
Tim Kientzle	168130741b	Pylint fixes	2022-11-07 14:45:37 -08:00
Tim Kientzle	b0ce365b53	A better way to adapt to -num-samples	2022-11-05 16:05:26 -07:00
Tim Kientzle	c3a727486f	Make --num-samples actually work	2022-11-05 14:18:41 -07:00
Tim Kientzle	5c14017bba	For size comparisons, build the result objects directly with sample data	2022-11-05 14:17:34 -07:00
Tim Kientzle	e1ab70a4b0	Use results consistently	2022-11-05 13:29:52 -07:00
Tim Kientzle	a63adc9114	Use non-json format for now until we have switched over completely	2022-11-04 16:17:57 -07:00
Tim Kientzle	2a3e68a1f8	Match new benchmark driver default output	2022-11-04 16:16:37 -07:00
Tim Kientzle	40eaaac0b1	Do not use --json for listing tests (yet)	2022-11-04 14:02:03 -07:00
Tim Kientzle	998475bf80	Pylint cleanup, more comments	2022-11-04 14:02:03 -07:00
Tim Kientzle	b4fa3833d8	Comment some TODO items	2022-11-04 14:02:03 -07:00
Tim Kientzle	071e9f1c7e	Python style fixes	2022-11-04 14:02:03 -07:00
Tim Kientzle	520fd79efd	Fix some test failures The new code stores test numbers as numbers (not strings), which requires a few adjustments. I also apparently missed a few test updates.	2022-11-04 14:02:03 -07:00
Tim Kientzle	971a5d8547	Overhaul Benchmarking pipeline to use complete sample data, not summaries The Swift benchmarking harness now has two distinct output formats: * Default: Formatted text that's intended for human consumption. Right now, this is just the minimum value, but we can augment that. * `--json`: each output line is a JSON-encoded object that contains raw data This information is intended for use by python scripts that aggregate or compare multiple independent tests. Previously, we tried to use the same output for both purposes. This required the python scripts to do more complex parsing of textual layouts, and also meant that the python scripts had only summary data to work with instead of full raw sample information. This in turn made it almost impossible to derive meaningful comparisons between runs or to aggregate multiple runs. Typical output in the new JSON format looks like this: ``` {"number":89, "name":"PerfTest", "samples":[1.23, 2.35], "max_rss":16384} {"number":91, "name":"OtherTest", "samples":[14.8, 19.7]} ``` This format is easy to parse in Python. Just iterate over lines and decode each one separately. Also note that the optional fields (`"max_rss"` above) are trivial to handle: ``` import json for l in lines: j = json.loads(l) # Default 0 if not present max_rss = j.get("max_rss", 0) ``` Note the `"samples"` array includes the runtime for each individual run. Because optional fields are so much easier to handle in this form, I reworked the Python logic to translate old formats into this JSON format for more uniformity. Hopefully, we can simplify the code in a year or so by stripping out the old log formats entirely, along with some of the redundant statistical calculations. In particular, the python logic still makes an effort to preserve mean, median, max, min, stdev, and other statistical data whenever the full set of samples is not present. Once we've gotten to a point where we're always keeping full samples, we can compute any such information on the fly as needed, eliminating the need to record it. This is a pretty big rearchitecture of the core benchmarking logic. In order to try to keep things a bit more manageable, I have not taken this opportunity to replace any of the actual statistics used in the higher level code or to change how the actual samples are measured. (But I expect this rearchitecture will make such changes simpler.) In particular, this should not actually change any benchmark results. For the future, please keep this general principle in mind: Statistical summaries (averages, medians, etc) should as a rule be computed for immediate output and rarely if ever stored or used as input for other processing. Instead, aim to store and transfer raw data from which statistics can be recomputed as necessary.	2022-11-04 14:02:03 -07:00
Alex Lorenz	cd634444c5	[benchmark] markdown report handler - write encoded message to byte buffer	2022-11-03 11:55:27 -07:00
Boris Bügling	f71eb8e2cb	Remove use of deprecated option	2022-10-13 22:47:19 -07:00
YOCKOW	c1e154a9cb	[Gardening] Remove trailing whitespaces in Python scripts. (W291) That has been marked as 'FIXME' for three years. This commit fixes it.	2022-08-25 16:08:36 +09:00
Andrew Trick	f09cc8cc8b	Fix compare_perf_tests.py for running locally. The script defaulted to a mode that no one uses without checking whether the input was compatible with that mode. This is the script used for run-to-run comparison of benchmark results. The in-tree benchmarks happened to work with the script only because of a fragile string comparison burried deep within the script. Other out-of-tree benchmark scripts that generate results were silently broken when using this script for comparison.	2022-05-12 16:50:32 -07:00
Josh Soref	fa3ff899a9	Spelling benchmark (#42457 ) * spelling: approximate Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: available Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: benchmarks Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: between Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: calculation Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: characterization Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: coefficient Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: computation Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: deterministic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: divisor Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: encounter Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: expected Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: fibonacci Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: fulfill Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: implements Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: into Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: intrinsic Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: markdown Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: measure Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: occurrences Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: omitted Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: partition Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: performance Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: practice Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: preemptive Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: repeated Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: requirements Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: requires Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: response Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: supports Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: unknown Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: utilities Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> * spelling: verbose Signed-off-by: Josh Soref <jsoref@users.noreply.github.com> Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>	2022-04-25 09:02:06 -07:00
Erik Eckstein	fb65284995	benchmarks: fix run_smoke_bench after upgrading to python3 Need to decode result of `subprocess.check_output`	2022-04-19 13:59:55 +02:00
Daniel Duan	3dfc40898c	[NFC] Remove Python 2 imports from __future__ (#42086 ) The `__future__` we relied on is now, where the 3 specific things are all included [since Python 3.0](https://docs.python.org/3/library/__future__.html): * absolute_import * print_function * unicode_literals * division These import statements are no-ops and are no longer necessary.	2022-04-13 14:01:30 -07:00
Daniel Duan	06a04624a6	[benchmark] Remove Python 2 logic (#42048 ) Found a few pieces of Python 2 code. Remove them since we are on Python 3 entirely.	2022-03-27 15:02:58 -07:00
swift-ci	32a967f1ea	Merge pull request #39171 from eltociear/patch-22	2022-01-13 07:01:02 -08:00
Evan Wilde	6956b7c5c9	Replace /usr/bin/python with /usr/env/python /usr/bin/python doesn't exist on ubuntu 20.04 causing tests to fail. I've updated the shebangs everywhere to use `/usr/bin/env python` instead.	2021-09-28 10:05:05 -07:00
Karoy Lorentey	8304e6c0bf	Merge pull request #39336 from lorentey/decapitate-benchmarks [benchmark][NFC] Use Swift naming conventions	2021-09-20 17:16:35 -07:00
Karoy Lorentey	2fbf391b57	[benchmark] Benchmark_Driver: Correctly set SWIFT_DETERMINISTIC_HASHING	2021-09-16 16:57:35 -07:00
Karoy Lorentey	8910b75cfe	[benchmark] Stop capitalizing function and variable names	2021-09-15 22:08:07 -07:00
Ikko Ashimine	c48f6e09bb	[benchmark] Fix typo in compare_perf_tests.py formating -> formatting	2021-09-04 09:10:34 +09:00
Guillaume Lessard	715b3fa7d0	[gardening] update copyright year in the benchmark template	2021-07-22 16:39:15 -06:00
Erik Eckstein	b5f1e265e0	benchmarks: disable the flaky test_log_file BenchmarkDriver test I'm not sure if it makes sense to keep this test around at all. For now I just disabled it. rdar://79701124	2021-06-28 13:13:02 +02:00
Mishal Shah	ddabee30e2	Add arch info to benchmark report	2021-06-01 09:59:20 -07:00
Erik Eckstein	abcae7bfa1	benchmarks: fix smoke test run by setting the dynamic library path This is a workaround for rdar://78584073	2021-05-31 15:10:08 +02:00
Mishal Shah	40024718ac	Update doc and links to support new main branch	2020-09-22 23:53:29 -07:00
Michael Gottesman	0591fa0d6b	[leaks-checker] Add verbose flag to dump out raw output from runtime to help debug failures on bots. Just a quick hack to ease debugging on the bots.	2020-09-21 09:59:32 -05:00
Xiaodi Wu	514dce144f	Update copyright year on benchmark and its template	2020-09-04 15:02:38 -04:00
tbkka	ab861d5890	Pass architecture into Benchmark_Driver to fix `build-script -B` (#33100 ) * Pass architecture into Benchmark_Driver to fix `build-script -B` * "Benchmark_Driver compare" does not need the architecture	2020-07-25 11:15:49 -07:00
tbkka	3181dd1e4c	Fix a bunch of python lint errors (#32951 ) * Fix a bunch of python lint errors * adjust indentation	2020-07-17 14:30:21 -07:00
Erik Eckstein	2387732ab5	benchmarks: support new executable file names in perf_test_driver rdar://problem/65508278	2020-07-16 15:43:37 +02:00
Erik Eckstein	a46cda8c51	benchmarks: fix run_smoke_bench to support new benchmark executable naming scheme Find the right benchmark executable with a glob pattern. Also, add an option "-arch" to select between executables for different architectures.	2020-07-07 11:01:49 +02:00
Meghana Gupta	911ac8e45e	Fix code size reporting when input directory is missing a trailing '/' run_smoke_bench script fails to report code size changes if you have a trailing '/' in <old_build_dir> but not <new_build_dir>. This change appends a separator if it is missing	2020-05-14 13:49:35 -07:00
Sergej Jaskiewicz	cce9e81f0b	Support Python 3 in the benchmark suite	2020-02-28 01:45:35 +03:00
Ross Bayer	b1961745e0	[Python: black] Reformatted the benchmark Python sources using utils/python_format.py.	2020-02-08 15:32:44 -08:00
Michael Gottesman	2840a7609d	When gathering counters, check for instability and FAIL otherwise. The way we already gather numbers for this test is that we run two runs of `Benchmark_O $TEST` with num-samples=2, iters={2,3}. Under the assumption that the only difference in counter numbers can be caused by that extra iteration, subtracting the group of counts for 2,3 gives us the number of counts in that iteration. In certain cases, I have found that a small subset of the benchmarks are producing weird output and I haven't had the time to look into why. That being said, I do know what these weird results look like, so in this commit we do some extra validation work to see if we need to fail a test due to instability. The specific validation is that: 1. We perform another run with num-samples=2, iter=5 and subtract the iter=3 counts from that. Under the assumption that overall work should increase linearly with iteration size in our benchmarks, we check if the counts are actual 2x. 2. If either `result[iter=3] - result[iter=2]` or `result[iter=5] - result[iter=3]` is negative. All of the counters we gather should never decrease with iteration count.	2020-01-15 14:41:21 -08:00
Michael Gottesman	461f17e5b7	Change -csv flag to be --emit-csv.	2020-01-15 14:41:21 -08:00
Michael Gottesman	35aa0405d1	Pattern match test names, not numbers to capture test names from Benchmark_O --list This makes the output of the test more readable.	2020-01-15 14:41:21 -08:00
Michael Gottesman	676411f0b0	Have dtrace aggregate rr opts and start tracking {retain,release}_n. Otherwise, one can get results that seem to imply more rr traffic when in reality, one was not tracking {retain,release}_n that as a result of better optimization become just simple retain, release.	2020-01-15 14:39:55 -08:00
Michael Gottesman	6fff30c122	[benchmark-dtrace] Enabling multiprocessing option to speed up gathering data.	2020-01-08 16:06:56 -08:00
Michael Gottesman	c7c2e6e17b	[benchmark-dtrace] Fix the amount of samples taken along side the number of iters. Otherwise, the output is not stable.	2020-01-08 16:06:56 -08:00

1 2 3 4 5 ...

287 Commits