Some of the benchmarks use Swift 3 APIs. Let’s keep them building that
way to not perturb benchmark numbers.
We should consider adding benchmarks that specifically enable -swift-version 4.
The benchmark bot uses this functionality today to run the benchmarks. By
default build-script only uses 3 samples for each test. Given the noise on our
systems, this is definitely not sufficient for any sort of robust numbers.
Using this patch, I am going to change the benchmarking bot to take the minimum
of 20 samples as we do for our internal benchmarking. This should help make the
benchmark bot give better data. This will have as a cost cause the bot to take
more time. The testing time issue can be solved down the line by changing to a
protocol where we first do tests with a small number of samples (< 5). Then any
benchmark with a delta > 5% is rerun with 20 samples or perhaps until a
statistical criterion is satisfied. But until that is implemented, this at least
makes the bot useful.
There are other things that need to be changed on the benchmarking bot as well,
namely that it should build on a separate machine from which it is running the
benchmarks on. The benchmarking machine should be quiet and not have any work
being done on it. But that is also for another time.
This is an intermediate commit in a series of commits towards being able to
build the perf test suite from *.sib files.
Doing so will enable us to:
1. Stress the serialization pipeline.
2. Provide an interesting test case for verifying that executables compiled
from *.sib yield the same object files as if we compiled directly to *.o files.
Keep in mind this is a longer term effort that I am doing on the side.