OpenBSD doesn't have `pthread_attr_get_np` and expects something like
`pthread_attr_getstackaddr` to be used to get the initial stack size.
We need to use `pthread_stackseg_np` on this platform to get the
stack size and location of `pthread_self`.
Adds two new IRGen-level builtins (one for allocating, the other for deallocating), a stdlib shim function for enhanced stack-promotion heuristics, and the proposed public stdlib functions.
* SwiftDtoa v2: Better, Smaller, Faster floating-point formatting
SwiftDtoa is the C/C++ code used in the Swift runtime to produce the textual representations used by the `description` and `debugDescription` properties of the standard Swift floating-point types.
This update includes a number of algorithmic improvements to SwiftDtoa to improve portability, reduce code size, and improve performance but does not change the actual output.
About SwiftDtoa
===============
In early versions of Swift, the `description` properties used the C library `sprintf` functionality with a fixed number of digits.
In 2018, that logic was replaced with the first version of SwiftDtoa which used used a fast, adaptive algorithm to automatically choose the correct number of digits for a particular value.
The resulting decimal output is always:
* Accurate. Parsing the decimal form will yield exactly the same binary floating-point value again. This guarantee holds for any parser that accurately implements IEEE 754. In particular, the Swift standard library can guarantee that for any Double `d` that is not a NaN, `Double(d.description) == d`.
* Short. Among all accurate forms, this form has the fewest significant digits. (Caution: Surprisingly, this is not the same as minimizing the number of characters. In some cases, minimizing the number of characters requires producing additional significant digits.)
* Close. If there are multiple accurate, short forms, this code chooses the decimal form that is closest to the exact binary value. If there are two exactly the same distance, the one with an even final digit will be used.
Algorithms that can produce this "optimal" output have been known since at least 1990, when Steele and White published their Dragon4 algorithm.
However, Dragon4 and other algorithms from that period relied on high-precision integer arithmetic, which made them slow.
More recently, a surge of interest in this problem has produced dramatically better algorithms that can produce the same results using only fast fixed-precision arithmetic.
This format is ideal for JSON and other textual interchange: accuracy ensures that the value will be correctly decoded, shortness minimizes network traffic, and the existence of high-performance algorithms allows this form to be generated more quickly than many `printf`-based implementations.
This format is also ideal for logging, debugging, and other general display. In particular, the shortness guarantee avoids the confusion of unnecessary additional digits, so that the result of `1.0 / 10.0` consistently displays as `0.1` instead of `0.100000000000000000001`.
About SwiftDtoa v2
==================
Compared to the original SwiftDtoa code, this update is:
**Better**:
The core logic is implemented using only C99 features with 64-bit and smaller integer arithmetic.
If available, 128-bit integers are used for better performance.
The core routines do not require any floating-point support from the C/C++ standard library and with only minor modifications should be usable on systems with no hardware or software floating-point support at all.
This version also has experimental support for IEEE 754 binary128 format, though this support is obviously not included when compiling for the Swift standard library.
**Smaller**:
Code size reduction compared to the earlier versions was a primary goal for this effort.
In particular, the new binary128 support shares essentially all of its code with the float80 implementation.
**Faster**:
Even with the code size reductions, all formats are noticeably faster.
The primary performance gains come from three major changes:
Text digits are now emitted directly in the core routines in a form that requires only minimal adjustment to produce the final text.
Digit generation produces 2, 4, or even 8 digits at a time, depending on the format.
The double logic optimistically produces 7 digits in the initial scaling with a Ryu-inspired backtracking when fewer digits suffice.
SwiftDtoa's algorithms
======================
SwiftDtoa started out as a variation of Florian Loitsch' Grisu2 that addressed the shortness failures of that algorithm.
Subsequent work has incorporated ideas from Errol3, Ryu, and other sources to yield a production-quality implementation that is performance- and size-competitive with current research code.
Those who wish to understand the details can read the extensive comments included in the code.
Note that float16 actually uses a different algorithm than the other formats, as the extremely limited range can be handled with much simpler techniques.
The float80/binary128 logic sacrifices some performance optimizations in order to minimize the code size for these less-used formats; the goal for SwiftDtoa v2 has been to match the float80 performance of earlier implementations while reducing code size and widening the arithmetic routines sufficiently to support binary128.
SwiftDtoa Testing
=================
A newly-developed test harness generates several large files of test data that include known-correct results computed with high-precision arithmetic routines.
The test files include:
* Critical values generated by the algorithm presented in the Errol paper (about 48 million cases for binary128)
* Values for which the optimal decimal form is exactly midway between two binary floating-point values.
* All exact powers of two representable in this format.
* Floating-point values that are close to exact powers of ten.
In addition, several billion random values for each format were compared to the results from other implementations.
For binary16 and binary32 this provided exhaustive validation of every possible input value.
Code Size and Performance
=========================
The tables below summarize the code size and performance for the SwiftDtoa C library module by itself on several different processor architectures.
When used from Swift, the `.description` and `.debugDescription` implementations incur additional overhead for creating and returning Swift strings that are not captured here.
The code size tables show the total size in bytes of the compiled `.o` object files for a particular version of that code.
The headings indicate the floating-point formats supported by that particular build (e.g., "16,32" for a version that supports binary16 and binary32 but no other formats).
The performance numbers below were obtained from a custom test harness that generates random bit patterns, interprets them as the corresponding floating-point value, and averages the overall time.
For float80, the random bit patterns were generated in a way that avoids generating invalid values.
All code was compiled with the system C/C++ compiler using `-O2` optimization.
A few notes about particular implementations:
* **SwiftDtoa v1** is the original SwiftDtoa implementation as committed to the Swift runtime in April 2018.
* **SwiftDtoa v1a** is the same as SwiftDtoa v1 with added binary16 support.
* **SwiftDtoa v2** can be configured with preprocessor macros to support any subset of the supported formats. I've provided sizes here for several different build configurations.
* **Ryu** (Ulf Anders) implements binary32 and binary64 as completely independent source files. The size here is the total size of the two .o object files.
* **Ryu(size)** is Ryu compiled with the `RYU_OPTIMIZE_SIZE` option.
* **Dragonbox** (Junekey Jeon). The size here is the compiled size of a simple `.cpp` file that instantiates the template for the specified formats, plus the size of the associated text output logic.
* **Dragonbox(size)** is Dragonbox compiled to minimize size by using a compressed power-of-10 table.
* **gdtoa** has a very large feature set. For this reason, I excluded it from the code size comparison since I didn't consider the numbers to be comparable to the others.
x86_64
----------------
These were built using Apple clang 12.0.5 on a 2019 16" MacBook Pro (2.4GHz 8-core Intel Core i9) running macOS 11.1.
**Code Size**
Bold numbers here indicate the configurations that have shipped as part of the Swift runtime.
| | 16,32,64,80 | 32,64,80 | 32,64 |
|---------------|------------:|------------:|------------:|
|SwiftDtoa v1 | | **15128** | |
|SwiftDtoa v1a | **16888** | | |
|SwiftDtoa v2 | **20220** | 18628 | 8248 |
|Ryu | | | 40408 |
|Ryu(size) | | | 23836 |
|Dragonbox | | | 23176 |
|Dragonbox(size)| | | 15132 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 25ns | 46ns | 82ns | |
|SwiftDtoa v1a | 37ns | 26ns | 47ns | 83ns | |
|SwiftDtoa v2 | 22ns | 19ns | 31ns | 72ns | 90ns |
|Ryu | | 19ns | 26ns | | |
|Ryu(size) | | 17ns | 24ns | | |
|Dragonbox | | 19ns | 24ns | | |
|Dragonbox(size) | | 19ns | 29ns | | |
|gdtoa | 220ns | 381ns | 1184ns | 16044ns | 22800ns |
ARM64
----------------
These were built using Apple clang 12.0.0 on a 2020 M1 Mac Mini running macOS 11.1.
**Code Size**
| | 16,32,64 | 32,64 |
|---------------|---------:|------:|
|SwiftDtoa v1 | | 7436 |
|SwiftDtoa v1a | 9124 | |
|SwiftDtoa v2 | 9964 | 8228 |
|Ryu | | 35764 |
|Ryu(size) | | 16708 |
|Dragonbox | | 27108 |
|Dragonbox(size)| | 19172 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 21ns | 39ns | | |
|SwiftDtoa v1a | 17ns | 21ns | 39ns | | |
|SwiftDtoa v2 | 15ns | 17ns | 29ns | 54ns | 71ns |
|Ryu | | 15ns | 19ns | | |
|Ryu(size) | | 29ns | 24ns | | |
|Dragonbox | | 16ns | 24ns | | |
|Dragonbox(size) | | 15ns | 34ns | | |
|gdtoa | 143ns | 242ns | 858ns | 25129ns | 36195ns |
ARM32
----------------
These were built using clang 8.0.1 on a BeagleBone Black (500MHz ARMv7) running FreeBSD 12.1-RELEASE.
**Code Size**
| | 16,32,64 | 32,64 |
|---------------|---------:|------:|
|SwiftDtoa v1 | | 8668 |
|SwiftDtoa v1a | 10356 | |
|SwiftDtoa v2 | 9796 | 8340 |
|Ryu | | 32292 |
|Ryu(size) | | 14592 |
|Dragonbox | | 29000 |
|Dragonbox(size)| | 21980 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 459ns | 1152ns | | |
|SwiftDtoa v1a | 383ns | 451ns | 1148ns | | |
|SwiftDtoa v2 | 202ns | 357ns | 715ns | 2720ns | 3379ns |
|Ryu | | 345ns | 5450ns | | |
|Ryu(size) | | 786ns | 5577ns | | |
|Dragonbox | | 300ns | 904ns | | |
|Dragonbox(size) | | 294ns | 1021ns | | |
|gdtoa | 2180ns | 4749ns | 18742ns |293000ns | 440000ns |
* This is fast enough now even for non-optimized test runs
* Fix float80 Nan/Inf parsing, comment more thoroughly
Previously, overflow and underflow both caused this to return `nil`, which causes several problems:
* It does not distinguish between a large but valid input and a malformed input. `Float("3.402824e+38")` is perfectly well-formed but returns nil
* It differs from how the compiler handles literals. As a result, `Float(3.402824e+38)` is very different from `Float("3.402824e+38")`
* It's inconsistent with Foundation Scanner()
* It's inconsistent with other programming languages
This is exactly the same as #25313
Fixes rdar://problem/36990878
Most SwiftShims were put in the swift namespace in C++ mode which broke certain things when importing them in a swift file in C++ mode. This was OK when they were only imported as part of the swift runtime but, now they are used in C++ mode both in the swift runtime and when C++ interop is enabled.
This broke when C++ interop was enabled because the `Swift` module contains references to symbols in the SwiftShims headers which are built without C++ interop enabled (no "swift" namespace). But, when C++ interop is enabled, the SwiftShims headers would put everything in the swift namespace meaning the symbols couldn't be found in the global namespace. Then, the compiler would error when trying to deserialize the Swift module.
Currently, _swift_stdlib_strtoX_clocale_impl is present here twice: one
general definition for most platforms that wraps the standard strto*
functions, and another general definition with a slightly different
implementation for Cygwin, Haiku, and Windows, using stringstreams to
achieve a similar result. Furthermore, for Windows, the stringstream
implementation isn't even used but specialized away.
Firstly: the stringstream implementation is slightly broken, since it
causes some of the unit tests to fail; secondly, there is an awful lot
of repetition here that is ripe for simplification.
Instead of duplicating twice and using template specialization to induce
platform-specific behavior, we massage the stringstream implementation
for Cygwin and Haiku into something that looks like a standard strto*
function.
Now we have one definition (for each type) of swift_strto*_l. By
default, the _swift_stdlib_strto*_clocale stubs will refer to strto*_l,
and platforms withing to make use of the swift_strto*_l stubs can make
the necessary preprocessor definitions to utilize them.
Instead of putting the stubs alongside the redefinitions in each platform
preprocessor section, split these out, in anticipation for consolidation
in the next commit.
Here the template specializations can be adapted to a strto* wrapper, for
use with the general function-pointer implementation of
_swift_stdlib_strtoX_clocale_impl.
The template defined for Cygwin and friends does not handle failure cases
properly so the NumericParsing unit test fails. To get the correct test
behavior, we need to use an implementation such as like in the Windows
specializations or the non-Cygwin implementation that takes a function
pointer.
However, adding additional specializations would be too wordy, and
OpenBSD doesn't have locale-dependent definitions of the relevant strto*
functions. We could add a specialization for a two-argument function
pointer, but that becomes too repetitive.
Instead, implement a few stubs and use the preprocessor, a la the
implementation for Android.
This reduces the dependency on `LLVMSupport`. This is the first step
towards helping move towards a local fork of the LLVM ADT to ensure that
static linking of the Swift runtime and core library does not result in
ODR violations.
Extend SwiftDtoa to provide optimal formatting for Float16 and use that for `Float16.description` and `Float16.debugDescription`.
Notes on signaling NaNs: LLVM's Float16 support passes Float16s on x86
by legalizing to Float32. This works well for most purposes but incidentally
loses the signaling marker from any NaN (because it's a conversion as far
as the hardware is concerned), with a side effect that the print code never
actually sees a true sNaN. This is similar to what happens with Float and
Double on i386 backends. The earlier code here tried to detect sNaN in a
different way, but that approach isn't guaranteed to work so we decided to
make this code use the correct detection logic -- sNaN printing will just be
broken until we can get a better argument passing convention.
Resolves rdar://61414101
These should hopefully all be uncontroversial, minimal changes to deal
with progressing the build to completion on OpenBSD or addressing minor
portability issues. This is not the full set of changes to get a
successful build; other portability issues will be addressed in future
commits.
Most of this is just adding the relevant clauses to the ifdefs, but of
note in this commit:
* StdlibUnittest.swift: the default conditional in _getOSVersion assumes
an Apple platform, therefore the explicit conditional and the relevant
enums need filling out. The default conditional should be #error, but
we'll fix this in a different commit.
* tgmath.swift.gyb: inexplicably, OpenBSD is missing just lgammal_r.
Tests are updated correspondingly.
* ThreadLocalStorage.h: we use the pthread implementation, so it
seems we should typedef __swift_thread_key_t as pthread_key_t.
However, that's also a tweak for another commit.
These are supposed to be processed in the C locale always, irrespective
of the current locale. We were not doing this and so we would parse the
value incorrectly.
The conversion routines in MSVCPRT return "0" for the conversion of
"-inf" et al. Provde template specializations for `float`, `double`,
and `long double` to use `strtof`, `strtod`, and `strtold` respectively.
This fixes the lossless conversion of floating point constants.
When building on Linux, the definition of `swift_snprintf_l` would cause
an unused function warning. Expand the scope of the preprocessor guard
to encompass the function for the single use. This avoids the unused
function warning.
Some of the previously used stubs are no longer needed in newer releases
of the Android API. Android L and Android O provide the functions in
their associated versions of bionic. This is needed to build against a
newer version of the SDK.
The returned type `std::streamoff` on Windows x64 is a `long long`
rather than `int`. This results in a 64-to-32 bit shortening of the
value. Use the appropriate type to avoid the truncation.
`strlen` returns a unsigned value, but `std::streamoff` is an signed
value. Explicitly cast the value to avoid the warning about the
implicit signed conversion.
Move the duplicated compiler-rt support routines into its own source
file. This will need to be expanded for Windows. As on Linux, there
are certain builtin routines which are not available from the standard
runtime and need to be augmented for now.
* SR-106: New floating-point `description` implementation
This replaces the current implementation of `description` and
`debugDescription` for the standard floating-point types with a new
formatting routine based on a variation of Florian Loitsch' Grisu2
algorithm with changes suggested by Andrysco, Jhala, and Lerner's 2016
paper describing Errol3.
Unlike the earlier code based on `sprintf` with a fixed number of
digits, this version always chooses the optimal number of digits. As
such, we can now use the exact same output for both `description` and
`debugDescription` (except of course that `debugDescription` provides
full detail for NaNs).
The implementation has been extensively commented; people familiar with
Grisu-style algorithms should find the code easy to understand.
This implementation is:
* Fast. It uses only fixed-width integer arithmetic and has constant
memory and time requirements.
* Simple. It is only a little more complex than Loitsch' original
implementation of Grisu2. The digit decomposition logic for double is
less than 300 lines of standard C (half of which is common arithmetic
support routines).
* Always Accurate. Converting the decimal form back to binary (using an
accurate algorithm such as Clinger's) will always yield exactly the
original binary value. For the IEEE 754 formats, the round-trip will
produce exactly the same bit pattern in memory. This is an essential
requirement for JSON serialization, debugging, and logging.
* Always Short. This always selects an accurate result with the minimum
number of decimal digits. (So that `1.0 / 10.0` will always print
`0.1`.)
* Always Close. Among all accurate, short results, this always chooses
the result that is closest to the exact floating-point value. (In case
of an exact tie, it rounds the last digit even.)
This resolves SR-106 and related issues that have complained
about the floating-point `description` properties being inexact.
* Remove duplicate infinity handling
* Use defined(__SIZEOF_INT128__) to detect uint128_t support
* Separate `extracting` the integer part from `clearing` the integer part
The previous code was unnecessarily obfuscated by the attempt to combine
these two operations.
* Use `UINT32_MAX` to mask off 32 bits of a larger integer
* Correct the expected NaN results for 32-bit i386
* Make the C++ exceptions here consistent
Adding a C source file somehow exposed an issue in an unrelated C++ file.
Thanks to Joe Groff for the fix.
* Rename SwiftDtoa to ".cpp"
Having a C file in stdlib/public/runtime causes strange
build failures on Linux in unrelated C++ files.
As a workaround, rename SwiftDtoa.c to .cpp to see
if that avoids the problems.
* Revert "Make the C++ exceptions here consistent"
This reverts commit 6cd5c20566.
On `istringstream`, `tellg()` returns -1 if the stream is at the end of the file. This indicates success in this circumstance, so we should update `pos` to reflect that the whole string has been read.
NFC on platforms other than Windows, Cygwin, and Haiku.
Use the KeyPath implementation's new support for instantiating and dealing with captures to lower the generic context required to dispatch computed accessors with dependent generics.