swift-mirror

mirror of https://github.com/apple/swift.git synced 2025-12-14 20:36:38 +01:00

Author	SHA1	Message	Date
Pavol Vaskovic	94e86c043e	[benchmark] Call test_opt_levels directly Simplify the code by removing indirection: After removal of -check-added option, only single function remains.	2018-11-28 22:47:02 +01:00
Pavol Vaskovic	98d6f22a7d	[benchmark] run_smoke_bench -skip-check-added Removed the option to `-check-added`, as it’s now run by default. Replaced with option to skip checking added benchmarks: `-skip-check-added`.	2018-11-28 21:47:57 +01:00
Pavol Vaskovic	cbd882119c	[benchmark] Added Benchmark Check Report Produce Markdown formatted report, analyzing the quality of newly added benchmarks.	2018-11-27 23:01:39 +01:00
Pavol Vaskovic	92cf40dcd3	[benchmark] MarkdownReportHandler `logging.Handler` that creates nicely formatted report from `BecnhmarkDoctor`’s `check` in Markdown table for display on GitHub.	2018-11-27 22:55:02 +01:00
Pavol Vaskovic	9a04207735	[benchmark] Doctor: emit mem_page details info Promoting previously DEBUG message to INFO.	2018-11-27 22:49:07 +01:00
Pavol Vaskovic	bc0064d285	[benchmark] Simpler naming convention regex	2018-11-19 10:03:33 +01:00
Pavol Vaskovic	4e5f51d2f6	[benchmark] run_smoke tests with new naming Allow for running of test matching the naming convention proposed in #20334.	2018-11-17 22:35:58 +01:00
Pavol Vaskovic	af719248e6	[benchmark] run_smoke_bench deterministic hashing Tests that used hashing were being unnecessarily tested multiple times, because this environment variable was missing.	2018-11-17 21:38:46 +01:00
Graydon Hoare	6b972f5165	Merge pull request #20347 from graydon/flake8-fixes Flake8 fixes	2018-11-06 17:30:01 -08:00
Graydon Hoare	71da5ec519	Fix flake8 warning W605 invalid escape sequence.	2018-11-06 11:38:39 -08:00
eeckstein	f64f02bfde	Merge pull request #20212 from palimondo/fluctuation-of-the-pupil [benchmark] Legacy Factor	2018-11-06 11:31:08 -08:00
Graydon Hoare	4aa01e908a	Fix flake8 warning E741 ambiguous variable name 'l'.	2018-11-05 21:01:33 -08:00
Pavol Vaskovic	b4f901bae4	[benchmark] Naming Convention New benchmark naming convention for better readability and improved naming system that accounts for performance coverage growth going forward.	2018-11-05 22:44:50 +01:00
Erik Eckstein	040aa06fec	benchmarks: combine everything which is needed into run_smoke_bench Now, run_smoke_bench runs the benchmarks, compares performance and code size and reports the results - on stdout and as a markdown file. No need to run bench_code_size.py and compare_perf_tests.py separately. This has two benefits: - It's much easier to run it locally - It's now more transparent what's happening in '@swiftci benchmark', because now all the logic is in run_smoke_bench rather than in the not visible script on the CI bot. I also remove the branch-arguments from ReportFormatter in ompare_perf_tests.py. They were not used anyway. For a smooth rollout in CI, I created a new script rather than changing the existing one. Once everything is setup in CI, I'll delete the old run_smoke_test.py and bench_code_size.py.	2018-11-01 16:41:39 -07:00
Pavol Vaskovic	a7f832fb57	[benchmark] Legacy factor This adds optional `legacyFactor` to the `BenchmarkInfo`, which allows for linear modification of constants that unnecesarily inflate the base workload of benchmarks, while maintaining the continuity of log-term benchmark tracking. For example, if a benchmark uses `for _ in N10_000` in its run function, we could lower this to `for _ in N1_000` and adding a `legacyFactor: 10` to its `BenchmarkInfo`. Note that this doesn’t affect the real measurements gathered from the `--verbose` output. The `BenchmarkDoctor` has been slightly adjusted to work with these real samples, therefore `Benchmark_Driver check` will not flag these benchmarks for slow run time reported in the summary, if their real runtimes fall into the recommended range.	2018-11-01 06:24:27 +01:00
eeckstein	1d326d73dd	Merge pull request #20074 from palimondo/within-cells-interlinked [benchmark] Baseline test	2018-10-31 09:08:27 -07:00
Pavol Vaskovic	21a4aa17e0	[benchmark] Move check-added to run_smoke_bench	2018-10-30 06:04:21 +01:00
Pavol Vaskovic	897b9ef82e	[benchmark] Gardening: Fix linter nitpicks	2018-10-27 06:15:23 +02:00
Pavol Vaskovic	eef71d4505	[benchmark] Check added benchmarks Script for integrating BenchmarkDoctor’s check of newly added benchmarks into CI workflow.	2018-10-26 18:34:23 +02:00
Pavol Vaskovic	a24d0ff7a5	[benchmark] BenchmarkDoctor checks setup time Add a check against unreasonably long setup times for benchmarks that do their initialization work in the `setUpFunction`. Given the typical benchmark measurements will last about 1 second, it’s reasonable to expect the setup to take at most 20% extra, on top of that: 200 ms. The `DictionaryKeysContains*` benchmarks are an instance of this mistake. The setup of `DictionaryKeysContainsNative` takes 3 seconds on my machine, to prepare a dictionary for the run function, whose typical runtime is 90 μs. The setup of Cocoa version takes 8 seconds!!! It is trivial to rewite these with much smaller dictionaries that demonstrate the point of these benchmarks perfectly well, without the need to wait for ages to setup these benchmarks.	2018-10-15 09:06:38 +02:00
Pavol Vaskovic	638f4f8e5e	[benchmark] Recommended runtime should be < 1ms * Lowered the threshold for healthy benchmark runtime to be under 1000 μs. * Offer suitable divisor that is power of 10, in addition to the one that’s power of 2. * Expanded the motivation in the docstring.	2018-10-13 22:09:25 +02:00
Pavol Vaskovic	d9a89ffea2	[benchmark] Use header in CSV log Since the meaning of some columns was changed, but their overall number remained, let’s include the header in the CSV log to make it clear that we are now reporting MIN, Q1, MEDIAN, Q3, MAX, MAX_RSS, instead of the old MIN, MAX, MEAN, SD, MEDIAN, MAX_RSS format.	2018-10-12 10:03:33 +02:00
Pavol Vaskovic	397c44747b	[benchmark] Exclude outliers from sample Use the box-plot inspired technique for filtering out outlier measurements. Values that are higher than the top inner fence (TIF = Q3 + IQR * 1.5) are excluded from the sample.	2018-10-11 19:48:20 +02:00
Pavol Vaskovic	0d318b6464	[benchmark] Discard oversampled quantile values When num_samples is less than quantile + 1, some of the measurements are repeated in the report summary. Parsed samples should strive to be a true reflection of the measured distribution, so we’ll correct this by discarding the repetated artifacts from quantile estimation. This avoids introducting a bias from this oversampling into the empirical distribution obtained from merging independent samples. See also: https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis	2018-10-11 18:56:27 +02:00
Pavol Vaskovic	a04edd1d47	[benchmark] Quantiles in Benchmark_Driver Switching the measurement technique from gathering `i` independent samples characterized by their mean values, to a finer grained characterization of these measurements using quantiles. The distribution of benchmark measurements is non-normal, with outliers that significantly inflate the mean and standard-deviation due to presence of uncontrolled variable of the system load. Therefore the MEAN and SD were incorrect statistics to properly characterize the benchmark measurements. Benchmark_Driver now gathers more individual measurements from Benchmark_O. It is executed with `--num-iters=1`, because we don’t want to average the runtimes, we want raw data. This collects variable number of measurements gathered in about 1 second. Using the `--quantile=20` we get up to 20 measured values that properly characterize the empirical distribution of the benchmark from each independent run. The measurements from `i` independent executions are combined to form the final empirical distribution, which is reported in a five-number summary (MIN, Q1, MEDIAN, Q3, MAX).	2018-10-11 18:56:27 +02:00
Pavol Vaskovic	0438c45e2d	[benchmark] B_D iterations => independent-samples Renamed Benchmark_Driver’s `iterations` argument to `independent-samples` to clarify its true meaning and disambiguate it from the concept of `num-iters` used in Benchmark_O. The short form of the argument — `-i` — remains unchanged.	2018-10-11 18:56:27 +02:00
Pavol Vaskovic	61a092a695	[benchmark] LogParser delta quantiles support Support for reading delta-encoded quantiles format.	2018-10-11 18:56:27 +02:00
Pavol Vaskovic	012e07cdd2	[benchmark] LogParser support for quantile format Gather all samples published in the benchamark summary from the `Benchmark_O --quantile` output format.	2018-10-09 15:52:28 +02:00
Pavol Vaskovic	9ba571f641	[benchmark] Parse yield timings from verbose log	2018-10-09 09:52:14 +02:00
Pavol Vaskovic	0f25849f8c	[benchmark] Parse setup time from verbose log	2018-10-09 09:52:06 +02:00
Pavol Vaskovic	78159e1fe3	[benchmark] Fix drop mean and sd on merge	2018-10-09 09:50:45 +02:00
Pavol Vaskovic	f729b8e623	[benchmark] Fix merging max_rss when None	2018-10-06 11:43:56 +02:00
Pavol Vaskovic	a9f0ce4338	[benchmark] Fix quantile estimation type The correct quantile estimation type for printing all measurements in the summary report while `quantile == num-samples - 1` is R-1, SAS-3. It's the inverse of empirical distribution function. References: * https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample * discussion in https://github.com/apple/swift/pull/19097#issuecomment-421238197	2018-09-20 09:19:07 +02:00
Ben Langmuir	541c48f9e4	Merge pull request #19328 from palimondo/test-twice-commit-once [benchmark] Report Quantiles from Benchmark_O and a TON of Gardening (take 2)	2018-09-17 11:54:08 -07:00
Pavol Vaskovic	f0e7b8737a	[benchmark] Round quantile idx to nearest or even Explicitly use round-half-to-even rounding algorithm to match the behavior of numpy's quantile(interpolation='nearest') and quantile estimate type R-3, SAS-2. See: https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	9bd599a914	[benchmark] Doctor explicitly measures memory Small fix following the last refactorig of MAX_RSS, the `--memory` option is required to measure memory in `--verbose` mode. Added integration test for `check` command of Benchmark_Driver that depended on it.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	331c0bf772	[benchmark] Refactor numIters computation The spaghetti if-else code was untangled into nested function that computes `iterationsPerSampleTime` and a single constant `numIters` expression that takes care of the overflow capping as well as the choice between fixed and computed `numIters` value. The `numIters` is now computed and logged only once per benchmark measurement instead of on every sample. The sampling loop is now just a single line. Hurrah! Modified test to verify that the `LogParser` maintains `num-iters` derived from the `Measuring with scale` message across samples.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	e48b5fdb34	[benchmark] Fix index computation for quantiles Turns out that both the old code in `DriverUtils` that computed median, as well as newer quartiles in `PerformanceTestSamples` had off-by-1 error. It trully is the 3rd of the 2 hard things in computer science!	2018-09-14 23:40:43 +02:00
Ben Langmuir	423e145b0c	Revert "[benchmark] Report Quantiles from Benchmark_O and a TON of Gardening"	2018-09-14 13:24:01 -07:00
Pavol Vaskovic	2ad8bf732a	[benchmarks] Rename column label SPEEDUP to RATIO Since the results comparisons are now used to also compare code sizes in addition to runtimes, it makes sense to rename the column label to the more neutral term “ratio” instead of old “speedup”.	2018-09-13 22:00:52 +02:00
Pavol Vaskovic	a56c55c8e4	[benchmark] Round quantile idx to nearest or even Explicitly use round-half-to-even rounding algorithm to match the behavior of numpy's quantile(interpolation='nearest') and quantile estimate type R-3, SAS-2. See: https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample	2018-09-10 10:45:00 +02:00
Pavol Vaskovic	84bf15836d	[benchmark] Doctor explicitly measures memory Small fix following the last refactorig of MAX_RSS, the `--memory` option is required to measure memory in `--verbose` mode. Added integration test for `check` command of Benchmark_Driver that depended on it.	2018-09-06 18:21:50 +02:00
Pavol Vaskovic	be39c02001	[benchmark] Refactor numIters computation The spaghetti if-else code was untangled into nested function that computes `iterationsPerSampleTime` and a single constant `numIters` expression that takes care of the overflow capping as well as the choice between fixed and computed `numIters` value. The `numIters` is now computed and logged only once per benchmark measurement instead of on every sample. The sampling loop is now just a single line. Hurrah! Modified test to verify that the `LogParser` maintains `num-iters` derived from the `Measuring with scale` message across samples.	2018-08-31 17:17:48 +02:00
Pavol Vaskovic	0db20feda2	[benchmark] Fix index computation for quantiles Turns out that both the old code in `DriverUtils` that computed median, as well as newer quartiles in `PerformanceTestSamples` had off-by-1 error. It trully is the 3rd of the 2 hard things in computer science!	2018-08-31 07:32:10 +02:00
Erik Eckstein	1f32935fc4	benchmarks: fix regexp for parsing code size results Accept a '.' in the benchmark name which is used for .o and .dylib files	2018-08-27 17:07:40 -07:00
Pavol Vaskovic	13c499339b	[benchmark] Class descriptions in module doctring	2018-08-23 19:59:15 +02:00
Pavol Vaskovic	49e8e692fb	[benchmark] Strangle `run` and `run_benchmarks` Moved all `run` command related functionality to `BenchmarkDriver`.	2018-08-23 18:01:46 +02:00
Pavol Vaskovic	6bddcbe9e4	[benchmark] Refactor `run_benchmarks` log format	2018-08-23 18:01:46 +02:00
Pavol Vaskovic	7ae5d7754c	[benchmark] Report totals as a sentence Clean up after removing bogus agregate statistics from last line of the log. It makes more sense to report the total number of executed benchmarks as a sentence that trying to fit into the format of preceding table. Added test assertion that `run_benchmarks` return csv formatted log, as it is used to write the log into file in `log_results`.	2018-08-23 18:01:46 +02:00
Pavol Vaskovic	ef1461ca46	[benchmark] Strangle `log_results` Moved `log_results` to BenchmarkDriver.	2018-08-23 18:01:46 +02:00

1 2 3 4 5 ...

294 Commits