swift-mirror

mirror of https://github.com/apple/swift.git synced 2025-12-14 20:36:38 +01:00

Author	SHA1	Message	Date
Slava Pestov	48eddac961	Benchmarks: Add support for async benchmarks	2025-08-27 10:37:10 -04:00
Slava Pestov	2ec19ecb46	Benchmarks: Skip long benchmarks in -Onone build	2025-08-27 10:37:10 -04:00
Max Desiatov	21a2b78801	stdlib/benchmark: add `canImport(Musl)` where needed (#67120 ) This allows compiling stdlib and benchmarks when targeting musl instead of Glibc.	2023-07-05 19:55:08 +01:00
Tim Kientzle	b8e023ad53	Make the default output a little more like the old version (for now)	2022-11-04 18:07:12 -07:00
Tim Kientzle	08604eab40	Fix colliding fields; match old format more closely	2022-11-04 16:16:13 -07:00
Tim Kientzle	30b3763211	Fix underflow in the padding calculation	2022-11-04 14:02:03 -07:00
Tim Kientzle	971a5d8547	Overhaul Benchmarking pipeline to use complete sample data, not summaries The Swift benchmarking harness now has two distinct output formats: * Default: Formatted text that's intended for human consumption. Right now, this is just the minimum value, but we can augment that. * `--json`: each output line is a JSON-encoded object that contains raw data This information is intended for use by python scripts that aggregate or compare multiple independent tests. Previously, we tried to use the same output for both purposes. This required the python scripts to do more complex parsing of textual layouts, and also meant that the python scripts had only summary data to work with instead of full raw sample information. This in turn made it almost impossible to derive meaningful comparisons between runs or to aggregate multiple runs. Typical output in the new JSON format looks like this: ``` {"number":89, "name":"PerfTest", "samples":[1.23, 2.35], "max_rss":16384} {"number":91, "name":"OtherTest", "samples":[14.8, 19.7]} ``` This format is easy to parse in Python. Just iterate over lines and decode each one separately. Also note that the optional fields (`"max_rss"` above) are trivial to handle: ``` import json for l in lines: j = json.loads(l) # Default 0 if not present max_rss = j.get("max_rss", 0) ``` Note the `"samples"` array includes the runtime for each individual run. Because optional fields are so much easier to handle in this form, I reworked the Python logic to translate old formats into this JSON format for more uniformity. Hopefully, we can simplify the code in a year or so by stripping out the old log formats entirely, along with some of the redundant statistical calculations. In particular, the python logic still makes an effort to preserve mean, median, max, min, stdev, and other statistical data whenever the full set of samples is not present. Once we've gotten to a point where we're always keeping full samples, we can compute any such information on the fly as needed, eliminating the need to record it. This is a pretty big rearchitecture of the core benchmarking logic. In order to try to keep things a bit more manageable, I have not taken this opportunity to replace any of the actual statistics used in the higher level code or to change how the actual samples are measured. (But I expect this rearchitecture will make such changes simpler.) In particular, this should not actually change any benchmark results. For the future, please keep this general principle in mind: Statistical summaries (averages, medians, etc) should as a rule be computed for immediate output and rarely if ever stored or used as input for other processing. Instead, aim to store and transfer raw data from which statistics can be recomputed as necessary.	2022-11-04 14:02:03 -07:00
Tim Kientzle	48c1931c78	Unbreak delta reporting in benchmarks (#61236 ) The logic here was apparently intended to omit literal zeros from deltas to save a few bytes, but it instead drops all zeros from all columns. Remove the condition that drops zeros in order to avoid confusing the many scripts that consume this data. Alternatives Considered I'm probably going to entirely drop the delta form in an upcoming PR, so I didn't think it was worthwhile to do something more complex, such as: * Fixing this logic to only omit zeros from actual delta columns * Rewriting all the client scripts to treat any empty column as zero	2022-09-22 10:22:16 -07:00
Andrew Trick	f09cc8cc8b	Fix compare_perf_tests.py for running locally. The script defaulted to a mode that no one uses without checking whether the input was compatible with that mode. This is the script used for run-to-run comparison of benchmark results. The in-tree benchmarks happened to work with the script only because of a fragile string comparison burried deep within the script. Other out-of-tree benchmark scripts that generate results were silently broken when using this script for comparison.	2022-05-12 16:50:32 -07:00
Karoy Lorentey	8304e6c0bf	Merge pull request #39336 from lorentey/decapitate-benchmarks [benchmark][NFC] Use Swift naming conventions	2021-09-20 17:16:35 -07:00
Karoy Lorentey	758c52bc2a	[benchmark] Don't create array instance in modules with solitary benchmarks It just produces unnecessary code sign churn.	2021-09-16 18:54:14 -07:00
Karoy Lorentey	6cf798cd6d	[benchmark] Trap if deterministic hashing isn't enabled	2021-09-16 16:57:06 -07:00
Karoy Lorentey	8944591e71	[benchmark] Simplify benchmark registration	2021-09-15 22:08:08 -07:00
Karoy Lorentey	8910b75cfe	[benchmark] Stop capitalizing function and variable names	2021-09-15 22:08:07 -07:00
Ikko Ashimine	473e4af90a	[benchmark] Fix typo in DriverUtils.swift reseting -> resetting	2021-01-14 01:50:21 +09:00
Mao ZiJun	d1259cec50	eliminated "dangling pointer" warnings	2019-12-09 17:41:20 +09:00
Pavol Vaskovic	5571b83353	[benchmark] Driver: log measurement metadata Added --meta option to log measurement metadata: * PAGES – number of memory pages used * ICS – number of involuntary context switches * YIELD – number of voluntary yields (Pages and ICS were previously available only in --verbose mode.)	2019-07-23 17:40:45 +02:00
Pavol Vaskovic	ec32140aed	[benchmark] Run benchmarks using substring filters Added support for running benchmarks using substring filters. Positional arguments prefixed with a single + or - sign are interpreted as benchmark name filters. Excecutes all benchmarks whose names include any of the strings prefixed with a plus sign but none of the strings prefixed with a minus sign.	2019-07-07 11:59:45 +02:00
Pavol Vaskovic	ad24ca4ba6	[benchmark] Add min-sample argument to drivers Support for gathering a minimal number of samples per benchmark, using the optional `--min-samples` argument, which overrides the automatically computed number of samples per `sample-time` if this is lower.	2019-07-07 10:13:26 +02:00
Pavol Vaskovic	5190db0acd	[Gardening][benchmark] Import MSVCRT on Windows Import functions from standard C library on Windows.	2019-07-01 16:11:55 +02:00
Pavol Vaskovic	9d6f7ad160	[benchmark] Driver & Doctor: Lower the sample cap Lowered the default sample cap from 2k to 200. (This doesn’t effect manually specified `--num-samples` argument in the driver.) Swift benchmarks have pretty constant performance profile over time. It’s more beneficial to get multiple independent measurements faster, than more samples from the same run.	2018-12-07 15:06:43 +01:00
Pavol Vaskovic	0bdd3ef275	[benchmark] Equalize memory usage (w&w/o verbose) The use of `--verbose` parameter was affecting the reported memory usage (`--memory`), because it front-loads initialization of string interpolation and printing. By always computing the configuration string and always calling print, the baseline memory measurement no longer includes this constant overhead.	2018-11-28 21:34:12 +01:00
Pavol Vaskovic	a7f832fb57	[benchmark] Legacy factor This adds optional `legacyFactor` to the `BenchmarkInfo`, which allows for linear modification of constants that unnecesarily inflate the base workload of benchmarks, while maintaining the continuity of log-term benchmark tracking. For example, if a benchmark uses `for _ in N10_000` in its run function, we could lower this to `for _ in N1_000` and adding a `legacyFactor: 10` to its `BenchmarkInfo`. Note that this doesn’t affect the real measurements gathered from the `--verbose` output. The `BenchmarkDoctor` has been slightly adjusted to work with these real samples, therefore `Benchmark_Driver check` will not flag these benchmarks for slow run time reported in the summary, if their real runtimes fall into the recommended range.	2018-11-01 06:24:27 +01:00
eeckstein	cd920b69f4	Merge pull request #19910 from palimondo/fluctuation-of-the-pupil [benchmark] More Robust Benchmark_Driver	2018-10-29 15:02:07 -07:00
Michael Gottesman	ba7815b663	[benchmark] Fix swiftpm based benchmark build on Linux.	2018-10-29 12:15:20 -07:00
Pavol Vaskovic	67b489dcb1	[benchmark] Auto-determine number of samples When measuring with specified number of iterations (generally, `--num-iters=1` makes sense), automaticially determine the number of samples to take, so that the overall measurement duration comes close to `sample-time`. This is the same technique used to scale `num-iters` before, but for `num-samples`.	2018-10-11 18:56:27 +02:00
Pavol Vaskovic	9d9200e9eb	[benchmark] Measure setUpFunction Measure the duration of the `setUpFunction` and report it in verbose mode. This will be used by `BenchmarkDoctor`, to ensure there isn’t unreasonably big imbalance between the time it takes to set up and run the actual benchmark.	2018-10-02 14:34:43 +02:00
Pavol Vaskovic	a9f0ce4338	[benchmark] Fix quantile estimation type The correct quantile estimation type for printing all measurements in the summary report while `quantile == num-samples - 1` is R-1, SAS-3. It's the inverse of empirical distribution function. References: * https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample * discussion in https://github.com/apple/swift/pull/19097#issuecomment-421238197	2018-09-20 09:19:07 +02:00
Pavol Vaskovic	f0e7b8737a	[benchmark] Round quantile idx to nearest or even Explicitly use round-half-to-even rounding algorithm to match the behavior of numpy's quantile(interpolation='nearest') and quantile estimate type R-3, SAS-2. See: https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	8b3b1f695a	[benchmark] Option: delta encoded quantiles format Added `--delta` argument to print the quantiles in delta encoded format, that ommits 0s. This results in machine and human readable output that highlights modes and is easily digestible, giving you the feel for the underlying probability distribution of the samples in the reported results: ```` $ ./Benchmark_O --num-iters=1 --num-samples=20 --quantile=20 --delta 170 171 184 185 198 199 418 419 432 433 619 620 #,TEST,SAMPLES,MIN(μs),𝚫V1,𝚫V2,𝚫V3,𝚫V4,𝚫V5,𝚫V6,𝚫V7,𝚫V8,𝚫V9,𝚫VA,𝚫VB,𝚫VC,𝚫VD,𝚫VE,𝚫VF,𝚫VG,𝚫VH,𝚫VI,𝚫VJ,𝚫MAX 170,DropFirstArray,20,171,,,,,,,,,,,,,,,,,,,2,29 171,DropFirstArrayLazy,20,168,,,,,,,,,,,,,,,,,,,,8 184,DropLastArray,20,55,,,,,,,,,,,,,,,,,,,,26 185,DropLastArrayLazy,20,65,,,,,,,,,,,,,,,,,,,1,90 198,DropWhileArray,20,214,1,,,,,,,,,,,,,,,,,1,27,2 199,DropWhileArrayLazy,20,464,,,,1,,,,,,,,1,1,1,4,9,1,9,113,2903 418,PrefixArray,20,132,,,,,,,,,,,,,,,,,1,1,32,394 419,PrefixArrayLazy,20,168,,,,,,,,,,,,1,,2,9,1,15,8,88,3338 432,PrefixWhileArray,20,252,1,,,,1,,,,,,,,,,,1,,,,30 433,PrefixWhileArrayLazy,20,168,,,,,,,,,,,,,1,,6,6,14,43,28,10200 619,SuffixArray,20,68,,,,,,,,,,,,,1,,,,22,1,1,4 620,SuffixArrayLazy,20,65,,,,,,,,,,,,,,,,,,1,9,340 ````	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	72e960457b	[benchmark] Gardening maxRSS as Int?	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	022e1111a9	[benchmark] Report quantiles from samples The default benchmark result reports statistics of a normal distribution — mean and standard deviation. Unfortunately the samples from our benchmarks are not normally distributed. To get a better picture of the underlying probability distribution, this adds support for reporting quantiles. See https://en.wikipedia.org/wiki/Quantile This gives better subsample of the measurements in the summary, without need to resort to the use of a full verbose mode, which might be unnecessarily slow.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	219a5d9290	[benchmark] Rename SampleRunner -> TestRunner It is now running all the benchmarks, so it’s a TestRunner.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	0e751e2717	[benchmark] Gardening: Even nicer microseconds	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	d704557c88	[benchmark] Gardening: Fixed method indentation	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	12c6e39a20	[benchmark] Refactor run runBenchmarks logVerbose Extracted nested func logVerbose as instance method on SampleRunner. Internalized the free functions `runBech` and `runBenchmarks` into SampleRunner as methods `run` and `runBenchmarks`.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	e7d1d482d8	[benchmark] Extract yield & add resetMeasurements	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	331c0bf772	[benchmark] Refactor numIters computation The spaghetti if-else code was untangled into nested function that computes `iterationsPerSampleTime` and a single constant `numIters` expression that takes care of the overflow capping as well as the choice between fixed and computed `numIters` value. The `numIters` is now computed and logged only once per benchmark measurement instead of on every sample. The sampling loop is now just a single line. Hurrah! Modified test to verify that the `LogParser` maintains `num-iters` derived from the `Measuring with scale` message across samples.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	29b2cc7397	[benchmark] Refactor sampling loop with addSample Extracted sample saving to inner func `addSample`. Used it to save the `oneIter` sample from `numIters` calibration when it comes out as 1 and continue the for loop to next sample. This simplified following code that can now always measure the sample with `numIters` and save it.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	b762f80a64	[benchmark] Gardening: Documentation of numIters Clarified the need for capping `numIters` according to the discussion at https://github.com/apple/swift/pull/17268#issuecomment-404831035 The sampling loop is a hairy piece of code, because it’s trying to reuse the calibration measurement as a regular sample, in case the computed `numIters` turns out to be 1. But it conflicts with the case when `fixedNumIters` is 1, necessitating a separate measurement in the else branch… That was a quick fix back then, but its hard to make it clean. More thinking is required…	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	75604a285d	[benchmark] Gardening: Sensibly rename variables To make sense of this spaghetti code, let’s first use reasonable variable names: * scale -> numIters * elapsed_time -> time	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	a169606e60	[benchmark] Gardening: DRYer verbose log	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	9ae69908b0	[benchmark] Refactor to currency type Int Removed unnecessary use of UInt64, where appropriate, following the advice from Swift Language Guide: > Use the `Int` type for all general-purpose integer constants and variables in your code, even if they’re known to be nonnegative. Using the default integer type in everyday situations means that integer constants and variables are immediately interoperable in your code and will match the inferred type for integer literal values. https://docs.swift.org/swift-book/LanguageGuide/TheBasics.html#ID324	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	28eb79819b	[benchmark] Refactor to report samples in μs Moved the adjustment of `lastSampleTime` to account for the `scale` (`numIters`) and conversion to microseconds into SampleRunner’s `measure` method.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	beabad86f4	[benchmark] Gardening: scale was always Int Since the `scale` (or `numIters`) is passed to the `test.runFunction` as `Int`, the whole type-casting dance here was just silly!	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	e775b8fc60	[benchmark] Gardening: numSamples UInt vs Int Type check command line argument to be non-negative, but store value in currency type `Int`.	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	79d7730be8	[benchmark] Gardening: afterRunSleep is UInt32	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	7768cb3295	[benchmark] Move stats computation to BenchResults	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	e48b5fdb34	[benchmark] Fix index computation for quantiles Turns out that both the old code in `DriverUtils` that computed median, as well as newer quartiles in `PerformanceTestSamples` had off-by-1 error. It trully is the 3rd of the 2 hard things in computer science!	2018-09-14 23:40:43 +02:00
Pavol Vaskovic	ab3e6122c0	[benchmark] Refactor min max median computation We can spare 2 array passes (for min and max), if we just sort first.	2018-09-14 23:40:43 +02:00

1 2 3 4

174 Commits