WebAssembly does not support _Float16 type, so we need to guard the use
of the type. Unfortunately, Clang does not provide a good way to detect
the support of _Float16 type at compile time, so just disable for wasm
targets.
We can easily test all 2**16 values, so let's do it. Also now _Float16 is properly supported in clang, so we can pass arguments to CPP that way, which lets us get snan right on more platforms.
We should always be using `<errno.h>`, not `<sys/errno.h>`. The
former is part of the C standard. The latter is a non-standard
header that happens to be present on some systems.
rdar://123507361
This replaces a number of `#include`-s like this:
```
#include "../../../stdlib/public/SwiftShims/Visibility.h"
```
with this:
```
#include "swift/shims/Visibility.h"
```
This is needed to allow SwiftCompilerSources to use C++ headers which include SwiftShims headers. Currently trying to do that results in errors:
```
swift/swift/include/swift/Demangling/../../../stdlib/public/SwiftShims/module.modulemap:1:8: error: redefinition of module 'SwiftShims'
module SwiftShims {
^
Builds.noindex/swift/swift/bootstrapping0/lib/swift/shims/module.modulemap:1:8: note: previously defined here
module SwiftShims {
^
```
This happens because the headers in both the source dir and the build dir refer to SwiftShims headers by relative path, and both the source root and the build root contain SwiftShims headers (which are equivalent, but since they are located in different dirs, Clang treats them as different modules).
Moved all the threading code to one place. Added explicit support for
Darwin, Linux, Pthreads, C11 threads and Win32 threads, including new
implementations of Once for Linux, Pthreads, C11 and Win32.
rdar://90776105
SWIFT_STDLIB_SINGLE_THREADED_RUNTIME is too much of a blunt instrument here.
It covers both the Concurrency runtime and the rest of the runtime, but we'd
like to be able to have e.g. a single-threaded Concurrency runtime while
the rest of the runtime is still thread safe (for instance).
So: rename it to SWIFT_STDLIB_SINGLE_THREADED_CONCURRENCY and make it just
control the Concurrency runtime, then add a SWIFT_STDLIB_THREADING_PACKAGE
setting at the CMake/build-script level, which defines
SWIFT_STDLIB_THREADING_xxx where xxx depends on the chosen threading package.
This is especially useful on systems where there may be a choice of threading
package that you could use.
rdar://90776105
Moved all the threading code to one place. Added explicit support for
Darwin, Linux, Pthreads, C11 threads and Win32 threads, including new
implementations of Once for Linux, Pthreads, C11 and Win32.
rdar://90776105
SWIFT_STDLIB_SINGLE_THREADED_RUNTIME is too much of a blunt instrument here.
It covers both the Concurrency runtime and the rest of the runtime, but we'd
like to be able to have e.g. a single-threaded Concurrency runtime while
the rest of the runtime is still thread safe (for instance).
So: rename it to SWIFT_STDLIB_SINGLE_THREADED_CONCURRENCY and make it just
control the Concurrency runtime, then add a SWIFT_STDLIB_THREADING_PACKAGE
setting at the CMake/build-script level, which defines
SWIFT_STDLIB_THREADING_xxx where xxx depends on the chosen threading package.
This is especially useful on systems where there may be a choice of threading
package that you could use.
rdar://90776105
This removes the explicit tree structure reference in the stubs to
locate the shims. Instead, it expects that the `SwiftShims` directory
will be added to the header search path.
OpenBSD doesn't have `pthread_attr_get_np` and expects something like
`pthread_attr_getstackaddr` to be used to get the initial stack size.
We need to use `pthread_stackseg_np` on this platform to get the
stack size and location of `pthread_self`.
Adds two new IRGen-level builtins (one for allocating, the other for deallocating), a stdlib shim function for enhanced stack-promotion heuristics, and the proposed public stdlib functions.
* SwiftDtoa v2: Better, Smaller, Faster floating-point formatting
SwiftDtoa is the C/C++ code used in the Swift runtime to produce the textual representations used by the `description` and `debugDescription` properties of the standard Swift floating-point types.
This update includes a number of algorithmic improvements to SwiftDtoa to improve portability, reduce code size, and improve performance but does not change the actual output.
About SwiftDtoa
===============
In early versions of Swift, the `description` properties used the C library `sprintf` functionality with a fixed number of digits.
In 2018, that logic was replaced with the first version of SwiftDtoa which used used a fast, adaptive algorithm to automatically choose the correct number of digits for a particular value.
The resulting decimal output is always:
* Accurate. Parsing the decimal form will yield exactly the same binary floating-point value again. This guarantee holds for any parser that accurately implements IEEE 754. In particular, the Swift standard library can guarantee that for any Double `d` that is not a NaN, `Double(d.description) == d`.
* Short. Among all accurate forms, this form has the fewest significant digits. (Caution: Surprisingly, this is not the same as minimizing the number of characters. In some cases, minimizing the number of characters requires producing additional significant digits.)
* Close. If there are multiple accurate, short forms, this code chooses the decimal form that is closest to the exact binary value. If there are two exactly the same distance, the one with an even final digit will be used.
Algorithms that can produce this "optimal" output have been known since at least 1990, when Steele and White published their Dragon4 algorithm.
However, Dragon4 and other algorithms from that period relied on high-precision integer arithmetic, which made them slow.
More recently, a surge of interest in this problem has produced dramatically better algorithms that can produce the same results using only fast fixed-precision arithmetic.
This format is ideal for JSON and other textual interchange: accuracy ensures that the value will be correctly decoded, shortness minimizes network traffic, and the existence of high-performance algorithms allows this form to be generated more quickly than many `printf`-based implementations.
This format is also ideal for logging, debugging, and other general display. In particular, the shortness guarantee avoids the confusion of unnecessary additional digits, so that the result of `1.0 / 10.0` consistently displays as `0.1` instead of `0.100000000000000000001`.
About SwiftDtoa v2
==================
Compared to the original SwiftDtoa code, this update is:
**Better**:
The core logic is implemented using only C99 features with 64-bit and smaller integer arithmetic.
If available, 128-bit integers are used for better performance.
The core routines do not require any floating-point support from the C/C++ standard library and with only minor modifications should be usable on systems with no hardware or software floating-point support at all.
This version also has experimental support for IEEE 754 binary128 format, though this support is obviously not included when compiling for the Swift standard library.
**Smaller**:
Code size reduction compared to the earlier versions was a primary goal for this effort.
In particular, the new binary128 support shares essentially all of its code with the float80 implementation.
**Faster**:
Even with the code size reductions, all formats are noticeably faster.
The primary performance gains come from three major changes:
Text digits are now emitted directly in the core routines in a form that requires only minimal adjustment to produce the final text.
Digit generation produces 2, 4, or even 8 digits at a time, depending on the format.
The double logic optimistically produces 7 digits in the initial scaling with a Ryu-inspired backtracking when fewer digits suffice.
SwiftDtoa's algorithms
======================
SwiftDtoa started out as a variation of Florian Loitsch' Grisu2 that addressed the shortness failures of that algorithm.
Subsequent work has incorporated ideas from Errol3, Ryu, and other sources to yield a production-quality implementation that is performance- and size-competitive with current research code.
Those who wish to understand the details can read the extensive comments included in the code.
Note that float16 actually uses a different algorithm than the other formats, as the extremely limited range can be handled with much simpler techniques.
The float80/binary128 logic sacrifices some performance optimizations in order to minimize the code size for these less-used formats; the goal for SwiftDtoa v2 has been to match the float80 performance of earlier implementations while reducing code size and widening the arithmetic routines sufficiently to support binary128.
SwiftDtoa Testing
=================
A newly-developed test harness generates several large files of test data that include known-correct results computed with high-precision arithmetic routines.
The test files include:
* Critical values generated by the algorithm presented in the Errol paper (about 48 million cases for binary128)
* Values for which the optimal decimal form is exactly midway between two binary floating-point values.
* All exact powers of two representable in this format.
* Floating-point values that are close to exact powers of ten.
In addition, several billion random values for each format were compared to the results from other implementations.
For binary16 and binary32 this provided exhaustive validation of every possible input value.
Code Size and Performance
=========================
The tables below summarize the code size and performance for the SwiftDtoa C library module by itself on several different processor architectures.
When used from Swift, the `.description` and `.debugDescription` implementations incur additional overhead for creating and returning Swift strings that are not captured here.
The code size tables show the total size in bytes of the compiled `.o` object files for a particular version of that code.
The headings indicate the floating-point formats supported by that particular build (e.g., "16,32" for a version that supports binary16 and binary32 but no other formats).
The performance numbers below were obtained from a custom test harness that generates random bit patterns, interprets them as the corresponding floating-point value, and averages the overall time.
For float80, the random bit patterns were generated in a way that avoids generating invalid values.
All code was compiled with the system C/C++ compiler using `-O2` optimization.
A few notes about particular implementations:
* **SwiftDtoa v1** is the original SwiftDtoa implementation as committed to the Swift runtime in April 2018.
* **SwiftDtoa v1a** is the same as SwiftDtoa v1 with added binary16 support.
* **SwiftDtoa v2** can be configured with preprocessor macros to support any subset of the supported formats. I've provided sizes here for several different build configurations.
* **Ryu** (Ulf Anders) implements binary32 and binary64 as completely independent source files. The size here is the total size of the two .o object files.
* **Ryu(size)** is Ryu compiled with the `RYU_OPTIMIZE_SIZE` option.
* **Dragonbox** (Junekey Jeon). The size here is the compiled size of a simple `.cpp` file that instantiates the template for the specified formats, plus the size of the associated text output logic.
* **Dragonbox(size)** is Dragonbox compiled to minimize size by using a compressed power-of-10 table.
* **gdtoa** has a very large feature set. For this reason, I excluded it from the code size comparison since I didn't consider the numbers to be comparable to the others.
x86_64
----------------
These were built using Apple clang 12.0.5 on a 2019 16" MacBook Pro (2.4GHz 8-core Intel Core i9) running macOS 11.1.
**Code Size**
Bold numbers here indicate the configurations that have shipped as part of the Swift runtime.
| | 16,32,64,80 | 32,64,80 | 32,64 |
|---------------|------------:|------------:|------------:|
|SwiftDtoa v1 | | **15128** | |
|SwiftDtoa v1a | **16888** | | |
|SwiftDtoa v2 | **20220** | 18628 | 8248 |
|Ryu | | | 40408 |
|Ryu(size) | | | 23836 |
|Dragonbox | | | 23176 |
|Dragonbox(size)| | | 15132 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 25ns | 46ns | 82ns | |
|SwiftDtoa v1a | 37ns | 26ns | 47ns | 83ns | |
|SwiftDtoa v2 | 22ns | 19ns | 31ns | 72ns | 90ns |
|Ryu | | 19ns | 26ns | | |
|Ryu(size) | | 17ns | 24ns | | |
|Dragonbox | | 19ns | 24ns | | |
|Dragonbox(size) | | 19ns | 29ns | | |
|gdtoa | 220ns | 381ns | 1184ns | 16044ns | 22800ns |
ARM64
----------------
These were built using Apple clang 12.0.0 on a 2020 M1 Mac Mini running macOS 11.1.
**Code Size**
| | 16,32,64 | 32,64 |
|---------------|---------:|------:|
|SwiftDtoa v1 | | 7436 |
|SwiftDtoa v1a | 9124 | |
|SwiftDtoa v2 | 9964 | 8228 |
|Ryu | | 35764 |
|Ryu(size) | | 16708 |
|Dragonbox | | 27108 |
|Dragonbox(size)| | 19172 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 21ns | 39ns | | |
|SwiftDtoa v1a | 17ns | 21ns | 39ns | | |
|SwiftDtoa v2 | 15ns | 17ns | 29ns | 54ns | 71ns |
|Ryu | | 15ns | 19ns | | |
|Ryu(size) | | 29ns | 24ns | | |
|Dragonbox | | 16ns | 24ns | | |
|Dragonbox(size) | | 15ns | 34ns | | |
|gdtoa | 143ns | 242ns | 858ns | 25129ns | 36195ns |
ARM32
----------------
These were built using clang 8.0.1 on a BeagleBone Black (500MHz ARMv7) running FreeBSD 12.1-RELEASE.
**Code Size**
| | 16,32,64 | 32,64 |
|---------------|---------:|------:|
|SwiftDtoa v1 | | 8668 |
|SwiftDtoa v1a | 10356 | |
|SwiftDtoa v2 | 9796 | 8340 |
|Ryu | | 32292 |
|Ryu(size) | | 14592 |
|Dragonbox | | 29000 |
|Dragonbox(size)| | 21980 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 459ns | 1152ns | | |
|SwiftDtoa v1a | 383ns | 451ns | 1148ns | | |
|SwiftDtoa v2 | 202ns | 357ns | 715ns | 2720ns | 3379ns |
|Ryu | | 345ns | 5450ns | | |
|Ryu(size) | | 786ns | 5577ns | | |
|Dragonbox | | 300ns | 904ns | | |
|Dragonbox(size) | | 294ns | 1021ns | | |
|gdtoa | 2180ns | 4749ns | 18742ns |293000ns | 440000ns |
* This is fast enough now even for non-optimized test runs
* Fix float80 Nan/Inf parsing, comment more thoroughly
Previously, overflow and underflow both caused this to return `nil`, which causes several problems:
* It does not distinguish between a large but valid input and a malformed input. `Float("3.402824e+38")` is perfectly well-formed but returns nil
* It differs from how the compiler handles literals. As a result, `Float(3.402824e+38)` is very different from `Float("3.402824e+38")`
* It's inconsistent with Foundation Scanner()
* It's inconsistent with other programming languages
This is exactly the same as #25313
Fixes rdar://problem/36990878
Most SwiftShims were put in the swift namespace in C++ mode which broke certain things when importing them in a swift file in C++ mode. This was OK when they were only imported as part of the swift runtime but, now they are used in C++ mode both in the swift runtime and when C++ interop is enabled.
This broke when C++ interop was enabled because the `Swift` module contains references to symbols in the SwiftShims headers which are built without C++ interop enabled (no "swift" namespace). But, when C++ interop is enabled, the SwiftShims headers would put everything in the swift namespace meaning the symbols couldn't be found in the global namespace. Then, the compiler would error when trying to deserialize the Swift module.
Currently, _swift_stdlib_strtoX_clocale_impl is present here twice: one
general definition for most platforms that wraps the standard strto*
functions, and another general definition with a slightly different
implementation for Cygwin, Haiku, and Windows, using stringstreams to
achieve a similar result. Furthermore, for Windows, the stringstream
implementation isn't even used but specialized away.
Firstly: the stringstream implementation is slightly broken, since it
causes some of the unit tests to fail; secondly, there is an awful lot
of repetition here that is ripe for simplification.
Instead of duplicating twice and using template specialization to induce
platform-specific behavior, we massage the stringstream implementation
for Cygwin and Haiku into something that looks like a standard strto*
function.
Now we have one definition (for each type) of swift_strto*_l. By
default, the _swift_stdlib_strto*_clocale stubs will refer to strto*_l,
and platforms withing to make use of the swift_strto*_l stubs can make
the necessary preprocessor definitions to utilize them.
Instead of putting the stubs alongside the redefinitions in each platform
preprocessor section, split these out, in anticipation for consolidation
in the next commit.
Here the template specializations can be adapted to a strto* wrapper, for
use with the general function-pointer implementation of
_swift_stdlib_strtoX_clocale_impl.
The template defined for Cygwin and friends does not handle failure cases
properly so the NumericParsing unit test fails. To get the correct test
behavior, we need to use an implementation such as like in the Windows
specializations or the non-Cygwin implementation that takes a function
pointer.
However, adding additional specializations would be too wordy, and
OpenBSD doesn't have locale-dependent definitions of the relevant strto*
functions. We could add a specialization for a two-argument function
pointer, but that becomes too repetitive.
Instead, implement a few stubs and use the preprocessor, a la the
implementation for Android.
This reduces the dependency on `LLVMSupport`. This is the first step
towards helping move towards a local fork of the LLVM ADT to ensure that
static linking of the Swift runtime and core library does not result in
ODR violations.
Extend SwiftDtoa to provide optimal formatting for Float16 and use that for `Float16.description` and `Float16.debugDescription`.
Notes on signaling NaNs: LLVM's Float16 support passes Float16s on x86
by legalizing to Float32. This works well for most purposes but incidentally
loses the signaling marker from any NaN (because it's a conversion as far
as the hardware is concerned), with a side effect that the print code never
actually sees a true sNaN. This is similar to what happens with Float and
Double on i386 backends. The earlier code here tried to detect sNaN in a
different way, but that approach isn't guaranteed to work so we decided to
make this code use the correct detection logic -- sNaN printing will just be
broken until we can get a better argument passing convention.
Resolves rdar://61414101
These should hopefully all be uncontroversial, minimal changes to deal
with progressing the build to completion on OpenBSD or addressing minor
portability issues. This is not the full set of changes to get a
successful build; other portability issues will be addressed in future
commits.
Most of this is just adding the relevant clauses to the ifdefs, but of
note in this commit:
* StdlibUnittest.swift: the default conditional in _getOSVersion assumes
an Apple platform, therefore the explicit conditional and the relevant
enums need filling out. The default conditional should be #error, but
we'll fix this in a different commit.
* tgmath.swift.gyb: inexplicably, OpenBSD is missing just lgammal_r.
Tests are updated correspondingly.
* ThreadLocalStorage.h: we use the pthread implementation, so it
seems we should typedef __swift_thread_key_t as pthread_key_t.
However, that's also a tweak for another commit.