Commit Graph

294 Commits

Author SHA1 Message Date
Erik Eckstein
2960f472a7 fix the swift library code size comparison in the run_smoke_bench script
In some configurations the script mixed up the build architectures and accidentally reported the code size difference between the x86 and arm.
2024-08-05 11:03:24 +02:00
Alexander Cyon
9d04bfd848 [benchmark] Fix typos 2024-07-06 13:17:13 +02:00
Oscar Byström Ericsson
bd2abc6cde Some create_benchmark.py script enhancements (v3).
This patch is held at linterpoint. Here's the ransom.
2024-02-25 11:22:51 +01:00
Oscar Byström Ericsson
783a9c6a77 Some create_benchmark.py script enhancements (v2). 2024-02-24 10:40:06 +01:00
Oscar Byström Ericsson
5b8ce67a3d Some create_benchmark.py script enhancements.
This commit addresses some trials and tribulations I encountered while working on (#71786). It:

1. fixes the auto-registration regex
2. fixes the auto-generated array's name
3. generates the current year for the license header
4. generates some dashes for the license header
2024-02-24 10:15:34 +01:00
Tim Kientzle
e3e72fc21c Update benchmark test script to correctly verify the --min-samples=2 default command-line arg 2023-09-25 11:57:34 -07:00
Tim Kientzle
7052de9399 Fix build-script -B
Without additional options, build-script -B was badly broken:
* It added a broken --independent-samples option to the driver command line
* Slow tests that ran only 1 sample by default would break the statistics

Fix the first issue by adding `--independent-samples` to the command
line only when a sample was actually provided by other options.

Fix the second issue by including `--min-samples=2` in the command.
2023-09-22 17:44:42 -07:00
Egor Zhdan
f931de8948 [benchmark] Do not abort with TypeError if no memory measurements were taken 2023-01-23 13:34:17 +00:00
Tim Kientzle
961a38b636 Test the non-JSON output
We have to continue using the non-JSON forms
until the JSON-supporting code is universally
available.
2022-11-07 14:46:13 -08:00
Tim Kientzle
dfe8284462 pylint fixes 2022-11-07 14:45:59 -08:00
Tim Kientzle
168130741b Pylint fixes 2022-11-07 14:45:37 -08:00
Tim Kientzle
b0ce365b53 A better way to adapt to -num-samples 2022-11-05 16:05:26 -07:00
Tim Kientzle
c3a727486f Make --num-samples actually work 2022-11-05 14:18:41 -07:00
Tim Kientzle
5c14017bba For size comparisons, build the result objects directly with sample data 2022-11-05 14:17:34 -07:00
Tim Kientzle
e1ab70a4b0 Use results consistently 2022-11-05 13:29:52 -07:00
Tim Kientzle
a63adc9114 Use non-json format for now until we have switched over completely 2022-11-04 16:17:57 -07:00
Tim Kientzle
2a3e68a1f8 Match new benchmark driver default output 2022-11-04 16:16:37 -07:00
Tim Kientzle
40eaaac0b1 Do not use --json for listing tests (yet) 2022-11-04 14:02:03 -07:00
Tim Kientzle
998475bf80 Pylint cleanup, more comments 2022-11-04 14:02:03 -07:00
Tim Kientzle
b4fa3833d8 Comment some TODO items 2022-11-04 14:02:03 -07:00
Tim Kientzle
071e9f1c7e Python style fixes 2022-11-04 14:02:03 -07:00
Tim Kientzle
520fd79efd Fix some test failures
The new code stores test numbers as numbers (not strings), which
requires a few adjustments. I also apparently missed a few test updates.
2022-11-04 14:02:03 -07:00
Tim Kientzle
971a5d8547 Overhaul Benchmarking pipeline to use complete sample data, not summaries
The Swift benchmarking harness now has two distinct output formats:

* Default: Formatted text that's intended for human consumption.
  Right now, this is just the minimum value, but we can augment that.

* `--json`: each output line is a JSON-encoded object that contains raw data
  This information is intended for use by python scripts that aggregate
  or compare multiple independent tests.

Previously, we tried to use the same output for both purposes.  This required
the python scripts to do more complex parsing of textual layouts, and also meant
that the python scripts had only summary data to work with instead of full raw
sample information.  This in turn made it almost impossible to derive meaningful
comparisons between runs or to aggregate multiple runs.

Typical output in the new JSON format looks like this:
```
{"number":89, "name":"PerfTest", "samples":[1.23, 2.35], "max_rss":16384}
{"number":91, "name":"OtherTest", "samples":[14.8, 19.7]}
```

This format is easy to parse in Python.  Just iterate over
lines and decode each one separately. Also note that the
optional fields (`"max_rss"` above) are trivial to handle:
```
import json
for l in lines:
   j = json.loads(l)
   # Default 0 if not present
   max_rss = j.get("max_rss", 0)
```
Note the `"samples"` array includes the runtime for each individual run.

Because optional fields are so much easier to handle in this form, I reworked
the Python logic to translate old formats into this JSON format for more
uniformity.  Hopefully, we can simplify the code in a year or so by stripping
out the old log formats entirely, along with some of the redundant statistical
calculations.  In particular, the python logic still makes an effort to preserve
mean, median, max, min, stdev, and other statistical data whenever the full set
of samples is not present.  Once we've gotten to a point where we're always
keeping full samples, we can compute any such information on the fly as needed,
eliminating the need to record it.

This is a pretty big rearchitecture of the core benchmarking logic. In order to
try to keep things a bit more manageable, I have not taken this opportunity to
replace any of the actual statistics used in the higher level code or to change
how the actual samples are measured. (But I expect this rearchitecture will make
such changes simpler.) In particular, this should not actually change any
benchmark results.

For the future, please keep this general principle in mind: Statistical
summaries (averages, medians, etc) should as a rule be computed for immediate
output and rarely if ever stored or used as input for other processing. Instead,
aim to store and transfer raw data from which statistics can be recomputed as
necessary.
2022-11-04 14:02:03 -07:00
Alex Lorenz
cd634444c5 [benchmark] markdown report handler - write encoded message to byte buffer 2022-11-03 11:55:27 -07:00
Boris Bügling
f71eb8e2cb Remove use of deprecated option 2022-10-13 22:47:19 -07:00
YOCKOW
c1e154a9cb [Gardening] Remove trailing whitespaces in Python scripts. (W291)
That has been marked as 'FIXME' for three years.
This commit fixes it.
2022-08-25 16:08:36 +09:00
Andrew Trick
f09cc8cc8b Fix compare_perf_tests.py for running locally.
The script defaulted to a mode that no one uses without checking
whether the input was compatible with that mode.

This is the script used for run-to-run comparison of benchmark
results. The in-tree benchmarks happened to work with the script only
because of a fragile string comparison burried deep within the
script. Other out-of-tree benchmark scripts that generate results were
silently broken when using this script for comparison.
2022-05-12 16:50:32 -07:00
Josh Soref
fa3ff899a9 Spelling benchmark (#42457)
* spelling: approximate

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: available

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: benchmarks

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: between

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: calculation

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: characterization

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: coefficient

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: computation

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: deterministic

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: divisor

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: encounter

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: expected

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: fibonacci

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: fulfill

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: implements

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: into

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: intrinsic

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: markdown

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: measure

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: occurrences

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: omitted

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: partition

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: performance

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: practice

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: preemptive

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: repeated

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: requirements

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: requires

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: response

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: supports

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: unknown

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: utilities

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: verbose

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>
2022-04-25 09:02:06 -07:00
Erik Eckstein
fb65284995 benchmarks: fix run_smoke_bench after upgrading to python3
Need to decode result of `subprocess.check_output`
2022-04-19 13:59:55 +02:00
Daniel Duan
3dfc40898c [NFC] Remove Python 2 imports from __future__ (#42086)
The `__future__` we relied on is now,  where the 3 specific things are
all included [since Python 3.0](https://docs.python.org/3/library/__future__.html):

* absolute_import
* print_function
* unicode_literals
* division

These import statements are no-ops and are no longer necessary.
2022-04-13 14:01:30 -07:00
Daniel Duan
06a04624a6 [benchmark] Remove Python 2 logic (#42048)
Found a few pieces of Python 2 code. Remove them since we are on Python
3 entirely.
2022-03-27 15:02:58 -07:00
swift-ci
32a967f1ea Merge pull request #39171 from eltociear/patch-22 2022-01-13 07:01:02 -08:00
Evan Wilde
6956b7c5c9 Replace /usr/bin/python with /usr/env/python
/usr/bin/python doesn't exist on ubuntu 20.04 causing tests to fail.
I've updated the shebangs everywhere to use `/usr/bin/env python`
instead.
2021-09-28 10:05:05 -07:00
Karoy Lorentey
8304e6c0bf Merge pull request #39336 from lorentey/decapitate-benchmarks
[benchmark][NFC] Use Swift naming conventions
2021-09-20 17:16:35 -07:00
Karoy Lorentey
2fbf391b57 [benchmark] Benchmark_Driver: Correctly set SWIFT_DETERMINISTIC_HASHING 2021-09-16 16:57:35 -07:00
Karoy Lorentey
8910b75cfe [benchmark] Stop capitalizing function and variable names 2021-09-15 22:08:07 -07:00
Ikko Ashimine
c48f6e09bb [benchmark] Fix typo in compare_perf_tests.py
formating -> formatting
2021-09-04 09:10:34 +09:00
Guillaume Lessard
715b3fa7d0 [gardening] update copyright year in the benchmark template 2021-07-22 16:39:15 -06:00
Erik Eckstein
b5f1e265e0 benchmarks: disable the flaky test_log_file BenchmarkDriver test
I'm not sure if it makes sense to keep this test around at all.
For now I just disabled it.

rdar://79701124
2021-06-28 13:13:02 +02:00
Mishal Shah
ddabee30e2 Add arch info to benchmark report 2021-06-01 09:59:20 -07:00
Erik Eckstein
abcae7bfa1 benchmarks: fix smoke test run by setting the dynamic library path
This is a workaround for rdar://78584073
2021-05-31 15:10:08 +02:00
Mishal Shah
40024718ac Update doc and links to support new main branch 2020-09-22 23:53:29 -07:00
Michael Gottesman
0591fa0d6b [leaks-checker] Add verbose flag to dump out raw output from runtime to help debug failures on bots.
Just a quick hack to ease debugging on the bots.
2020-09-21 09:59:32 -05:00
Xiaodi Wu
514dce144f Update copyright year on benchmark and its template 2020-09-04 15:02:38 -04:00
tbkka
ab861d5890 Pass architecture into Benchmark_Driver to fix build-script -B (#33100)
* Pass architecture into Benchmark_Driver to fix `build-script -B`

* "Benchmark_Driver compare" does not need the architecture
2020-07-25 11:15:49 -07:00
tbkka
3181dd1e4c Fix a bunch of python lint errors (#32951)
* Fix a bunch of python lint errors

* adjust indentation
2020-07-17 14:30:21 -07:00
Erik Eckstein
2387732ab5 benchmarks: support new executable file names in perf_test_driver
rdar://problem/65508278
2020-07-16 15:43:37 +02:00
Erik Eckstein
a46cda8c51 benchmarks: fix run_smoke_bench to support new benchmark executable naming scheme
Find the right benchmark executable with a glob pattern.
Also, add an option "-arch" to select between executables for different architectures.
2020-07-07 11:01:49 +02:00
Meghana Gupta
911ac8e45e Fix code size reporting when input directory is missing a trailing '/'
run_smoke_bench script fails to report code size changes if you have a
trailing '/' in <old_build_dir> but not <new_build_dir>.

This change appends a separator if it is missing
2020-05-14 13:49:35 -07:00
Sergej Jaskiewicz
cce9e81f0b Support Python 3 in the benchmark suite 2020-02-28 01:45:35 +03:00