SwiftDtoa v2: Better, Smaller, Faster floating-point formatting (#35299)

* SwiftDtoa v2: Better, Smaller, Faster floating-point formatting SwiftDtoa is the C/C++ code used in the Swift runtime to produce the textual representations used by the `description` and `debugDescription` properties of the standard Swift floating-point types. This update includes a number of algorithmic improvements to SwiftDtoa to improve portability, reduce code size, and improve performance but does not change the actual output. About SwiftDtoa =============== In early versions of Swift, the `description` properties used the C library `sprintf` functionality with a fixed number of digits. In 2018, that logic was replaced with the first version of SwiftDtoa which used used a fast, adaptive algorithm to automatically choose the correct number of digits for a particular value. The resulting decimal output is always: * Accurate. Parsing the decimal form will yield exactly the same binary floating-point value again. This guarantee holds for any parser that accurately implements IEEE 754. In particular, the Swift standard library can guarantee that for any Double `d` that is not a NaN, `Double(d.description) == d`. * Short. Among all accurate forms, this form has the fewest significant digits. (Caution: Surprisingly, this is not the same as minimizing the number of characters. In some cases, minimizing the number of characters requires producing additional significant digits.) * Close. If there are multiple accurate, short forms, this code chooses the decimal form that is closest to the exact binary value. If there are two exactly the same distance, the one with an even final digit will be used. Algorithms that can produce this "optimal" output have been known since at least 1990, when Steele and White published their Dragon4 algorithm. However, Dragon4 and other algorithms from that period relied on high-precision integer arithmetic, which made them slow. More recently, a surge of interest in this problem has produced dramatically better algorithms that can produce the same results using only fast fixed-precision arithmetic. This format is ideal for JSON and other textual interchange: accuracy ensures that the value will be correctly decoded, shortness minimizes network traffic, and the existence of high-performance algorithms allows this form to be generated more quickly than many `printf`-based implementations. This format is also ideal for logging, debugging, and other general display. In particular, the shortness guarantee avoids the confusion of unnecessary additional digits, so that the result of `1.0 / 10.0` consistently displays as `0.1` instead of `0.100000000000000000001`. About SwiftDtoa v2 ================== Compared to the original SwiftDtoa code, this update is: **Better**: The core logic is implemented using only C99 features with 64-bit and smaller integer arithmetic. If available, 128-bit integers are used for better performance. The core routines do not require any floating-point support from the C/C++ standard library and with only minor modifications should be usable on systems with no hardware or software floating-point support at all. This version also has experimental support for IEEE 754 binary128 format, though this support is obviously not included when compiling for the Swift standard library. **Smaller**: Code size reduction compared to the earlier versions was a primary goal for this effort. In particular, the new binary128 support shares essentially all of its code with the float80 implementation. **Faster**: Even with the code size reductions, all formats are noticeably faster. The primary performance gains come from three major changes: Text digits are now emitted directly in the core routines in a form that requires only minimal adjustment to produce the final text. Digit generation produces 2, 4, or even 8 digits at a time, depending on the format. The double logic optimistically produces 7 digits in the initial scaling with a Ryu-inspired backtracking when fewer digits suffice. SwiftDtoa's algorithms ====================== SwiftDtoa started out as a variation of Florian Loitsch' Grisu2 that addressed the shortness failures of that algorithm. Subsequent work has incorporated ideas from Errol3, Ryu, and other sources to yield a production-quality implementation that is performance- and size-competitive with current research code. Those who wish to understand the details can read the extensive comments included in the code. Note that float16 actually uses a different algorithm than the other formats, as the extremely limited range can be handled with much simpler techniques. The float80/binary128 logic sacrifices some performance optimizations in order to minimize the code size for these less-used formats; the goal for SwiftDtoa v2 has been to match the float80 performance of earlier implementations while reducing code size and widening the arithmetic routines sufficiently to support binary128. SwiftDtoa Testing ================= A newly-developed test harness generates several large files of test data that include known-correct results computed with high-precision arithmetic routines. The test files include: * Critical values generated by the algorithm presented in the Errol paper (about 48 million cases for binary128) * Values for which the optimal decimal form is exactly midway between two binary floating-point values. * All exact powers of two representable in this format. * Floating-point values that are close to exact powers of ten. In addition, several billion random values for each format were compared to the results from other implementations. For binary16 and binary32 this provided exhaustive validation of every possible input value. Code Size and Performance ========================= The tables below summarize the code size and performance for the SwiftDtoa C library module by itself on several different processor architectures. When used from Swift, the `.description` and `.debugDescription` implementations incur additional overhead for creating and returning Swift strings that are not captured here. The code size tables show the total size in bytes of the compiled `.o` object files for a particular version of that code. The headings indicate the floating-point formats supported by that particular build (e.g., "16,32" for a version that supports binary16 and binary32 but no other formats). The performance numbers below were obtained from a custom test harness that generates random bit patterns, interprets them as the corresponding floating-point value, and averages the overall time. For float80, the random bit patterns were generated in a way that avoids generating invalid values. All code was compiled with the system C/C++ compiler using `-O2` optimization. A few notes about particular implementations: * **SwiftDtoa v1** is the original SwiftDtoa implementation as committed to the Swift runtime in April 2018. * **SwiftDtoa v1a** is the same as SwiftDtoa v1 with added binary16 support. * **SwiftDtoa v2** can be configured with preprocessor macros to support any subset of the supported formats. I've provided sizes here for several different build configurations. * **Ryu** (Ulf Anders) implements binary32 and binary64 as completely independent source files. The size here is the total size of the two .o object files. * **Ryu(size)** is Ryu compiled with the `RYU_OPTIMIZE_SIZE` option. * **Dragonbox** (Junekey Jeon). The size here is the compiled size of a simple `.cpp` file that instantiates the template for the specified formats, plus the size of the associated text output logic. * **Dragonbox(size)** is Dragonbox compiled to minimize size by using a compressed power-of-10 table. * **gdtoa** has a very large feature set. For this reason, I excluded it from the code size comparison since I didn't consider the numbers to be comparable to the others. x86_64 ---------------- These were built using Apple clang 12.0.5 on a 2019 16" MacBook Pro (2.4GHz 8-core Intel Core i9) running macOS 11.1. **Code Size** Bold numbers here indicate the configurations that have shipped as part of the Swift runtime. | | 16,32,64,80 | 32,64,80 | 32,64 | |---------------|------------:|------------:|------------:| |SwiftDtoa v1 | | **15128** | | |SwiftDtoa v1a | **16888** | | | |SwiftDtoa v2 | **20220** | 18628 | 8248 | |Ryu | | | 40408 | |Ryu(size) | | | 23836 | |Dragonbox | | | 23176 | |Dragonbox(size)| | | 15132 | **Performance** | | binary16 | binary32 | binary64 | float80 | binary128 | |--------------|---------:|---------:|---------:|--------:|----------:| |SwiftDtoa v1 | | 25ns | 46ns | 82ns | | |SwiftDtoa v1a | 37ns | 26ns | 47ns | 83ns | | |SwiftDtoa v2 | 22ns | 19ns | 31ns | 72ns | 90ns | |Ryu | | 19ns | 26ns | | | |Ryu(size) | | 17ns | 24ns | | | |Dragonbox | | 19ns | 24ns | | | |Dragonbox(size) | | 19ns | 29ns | | | |gdtoa | 220ns | 381ns | 1184ns | 16044ns | 22800ns | ARM64 ---------------- These were built using Apple clang 12.0.0 on a 2020 M1 Mac Mini running macOS 11.1. **Code Size** | | 16,32,64 | 32,64 | |---------------|---------:|------:| |SwiftDtoa v1 | | 7436 | |SwiftDtoa v1a | 9124 | | |SwiftDtoa v2 | 9964 | 8228 | |Ryu | | 35764 | |Ryu(size) | | 16708 | |Dragonbox | | 27108 | |Dragonbox(size)| | 19172 | **Performance** | | binary16 | binary32 | binary64 | float80 | binary128 | |--------------|---------:|---------:|---------:|--------:|----------:| |SwiftDtoa v1 | | 21ns | 39ns | | | |SwiftDtoa v1a | 17ns | 21ns | 39ns | | | |SwiftDtoa v2 | 15ns | 17ns | 29ns | 54ns | 71ns | |Ryu | | 15ns | 19ns | | | |Ryu(size) | | 29ns | 24ns | | | |Dragonbox | | 16ns | 24ns | | | |Dragonbox(size) | | 15ns | 34ns | | | |gdtoa | 143ns | 242ns | 858ns | 25129ns | 36195ns | ARM32 ---------------- These were built using clang 8.0.1 on a BeagleBone Black (500MHz ARMv7) running FreeBSD 12.1-RELEASE. **Code Size** | | 16,32,64 | 32,64 | |---------------|---------:|------:| |SwiftDtoa v1 | | 8668 | |SwiftDtoa v1a | 10356 | | |SwiftDtoa v2 | 9796 | 8340 | |Ryu | | 32292 | |Ryu(size) | | 14592 | |Dragonbox | | 29000 | |Dragonbox(size)| | 21980 | **Performance** | | binary16 | binary32 | binary64 | float80 | binary128 | |--------------|---------:|---------:|---------:|--------:|----------:| |SwiftDtoa v1 | | 459ns | 1152ns | | | |SwiftDtoa v1a | 383ns | 451ns | 1148ns | | | |SwiftDtoa v2 | 202ns | 357ns | 715ns | 2720ns | 3379ns | |Ryu | | 345ns | 5450ns | | | |Ryu(size) | | 786ns | 5577ns | | | |Dragonbox | | 300ns | 904ns | | | |Dragonbox(size) | | 294ns | 1021ns | | | |gdtoa | 2180ns | 4749ns | 18742ns |293000ns | 440000ns | * This is fast enough now even for non-optimized test runs * Fix float80 Nan/Inf parsing, comment more thoroughly
2025-12-14 20:36:38 +01:00 · 2021-01-27 14:35:55 -08:00
parent 61aba8d896
commit a32dacb131
4 changed files with 2188 additions and 2008 deletions
--- a/include/swift/Runtime/SwiftDtoa.h
+++ b/include/swift/Runtime/SwiftDtoa.h
@@ -2,69 +2,224 @@
 //
 // This source file is part of the Swift.org open source project
 //
-// Copyright (c) 2018 Apple Inc. and the Swift project authors
+// Copyright (c) 2018, 2020 Apple Inc. and the Swift project authors
 // Licensed under Apache License v2.0 with Runtime Library Exception
 //
 // See https://swift.org/LICENSE.txt for license information
 // See https://swift.org/CONTRIBUTORS.txt for the list of Swift project authors
 //
 //===---------------------------------------------------------------------===//
 //
 /// About SwiftDtoa
 /// ===============
 ///
 /// SwiftDtoa is the C implementation that supports the `.description`
 /// and `.debugDescription` properties for the standard Swift
 /// floating-point types.  These functions produce the "optimal form"
 /// for the binary floating point value.  The optimal form is a
 /// decimal representation that satisfies the following properties:
 ///
 /// 1. Accurate.  Parsing the value back to a binary floating-point
 ///    value of the same precision will exactly yield the original
 ///    value.  For example, `Double(d.description) == d` for all `Double`
 ///    values `d` (except for NaN values, of course).
 ///
 /// 2. Short.  Of all accurate results, the returned value will
 ///    contain the minimum number of significant digits.  Note that
 ///    this is not quite the same as C++ `to_chars` which promises the
 ///    minimal number of characters.
 ///
 /// 3. Close.  Of all accurate, short results, the value printed will
 ///    be the one that is closest to the exact binary floating-point
 ///    value.
 ///
 /// The optimal form is the ideal textual form for use in JSON and
 /// similar interchange formats because it is accurate, compact, and
 /// can be generated very quickly.  It is also ideal for logging and
 /// debugging use; the accuracy guarantees that the result can be
 /// cut-and-pasted to obtain the exact original value, and the
 /// shortness property eliminates unnecessary digits that can be
 /// confusing to readers.
 ///
 /// Algorithms that produce such output have been known since at least
 /// 1990, when Steele and White published their Dragon4 algorithm.
 /// However, the earliest algorithms required high-precision
 /// arithmetic which limited their use.  Starting in 2010 with the
 /// publication of Grisu3, there has been a surge of interest and
 /// there are now a number of algorithms that can produce optimal
 /// forms very quickly.  This particular implementation is loosely
 /// based on Grisu2 but incorporates concepts from Errol and Ryu that
 /// make it significantly faster and ensure accuracy in all cases.
 ///
 /// About SwiftDtoa v1
 /// ------------------
 ///
 /// The first version of SwiftDtoa was committed to the Swift runtime
 /// in 2018.  It supported Swift's Float, Double, and Float80 formats.
 ///
 /// About SwiftDtoa v1a
 /// -------------------
 ///
 /// Version 1a of SwiftDtoa added support for Float16.
 ///
 /// About SwiftDtoa v2
 /// ------------------
 ///
 /// Version 2 of SwiftDtoa is a major overhaul with a number of
 /// algorithmic improvements to make it faster (especially for Float16
 /// and Float80), smaller, and more portable (the code only requires
 /// C99 and makes no use of C or C++ floating-point facilities).  It
 /// also includes experimental support for IEEE 754 quad-precision
 /// binary128 format, which is not currently supported by Swift.
 //
 //===---------------------------------------------------------------------===//
 #ifndef SWIFT_DTOA_H
 #define SWIFT_DTOA_H
 #include <float.h>
 #include <stdbool.h>
 #include <stdint.h>
 #include <stdlib.h>
-// This implementation strongly assumes that `float` is
+//
-// IEEE 754 single-precision binary32 format and that
+// IEEE 754 Binary16 support (also known as "half-precision")
-// `double` is IEEE 754 double-precision binary64 format.
+//
-// Essentially all modern platforms use IEEE 754 floating point
+// Enable this by default.
-// types now, so enable these by default:
+// Force disable: -DSWIFT_DTOA_BINARY16_SUPPORT=0
-#define SWIFT_DTOA_FLOAT16_SUPPORT 1
+#ifndef SWIFT_DTOA_BINARY16_SUPPORT
-#define SWIFT_DTOA_FLOAT_SUPPORT 1
+ #define SWIFT_DTOA_BINARY16_SUPPORT 1
-#define SWIFT_DTOA_DOUBLE_SUPPORT 1
+#endif
-// This implementation assumes `long double` is Intel 80-bit extended format.
+//
-#if defined(_WIN32)
+// IEEE 754 Binary32 support (also known as "single-precision")
- // Windows has `long double` == `double` on all platforms, so disable this.
+//
- #undef SWIFT_DTOA_FLOAT80_SUPPORT
+
-#elif defined(__ANDROID__)
+// Does "float" on this system use binary32 format?
- // At least for now Float80 is disabled. See: https://github.com/apple/swift/pull/25502
+// (Almost all modern systems do this.)
-#elif defined(__APPLE__) || defined(__linux__) || defined(__OpenBSD__)
+#if (FLT_RADIX == 2) && (FLT_MANT_DIG == 24) && (FLT_MIN_EXP == -125) && (FLT_MAX_EXP == 128)
- // macOS and Linux support Float80 on X86 hardware but not on ARM
+  #define FLOAT_IS_BINARY32 1
- #if defined(__x86_64__) || defined(__i386)
+#else
  #undef FLOAT_IS_BINARY32
 #endif
 // We can format binary32 values even if the local C environment
 // does not support it.  But `float` == binary32 almost everywhere,
 // so we enable it by default.
 // Force disable: -DSWIFT_DTOA_BINARY32_SUPPORT=0
 #ifndef SWIFT_DTOA_BINARY32_SUPPORT
 #define SWIFT_DTOA_BINARY32_SUPPORT 1
 #endif
 //
 // IEEE 754 Binary64 support (also known as "double-precision")
 //
 // Does "double" on this system use binary64 format?
 // (Almost all modern systems do this.)
 #if (FLT_RADIX == 2) && (DBL_MANT_DIG == 53) && (DBL_MIN_EXP == -1021) && (DBL_MAX_EXP == 1024)
  #define DOUBLE_IS_BINARY64 1
 #else
  #undef DOUBLE_IS_BINARY64
 #endif
 // Does "long double" on this system use binary64 format?
 // (Windows, for example.)
 #if (FLT_RADIX == 2) && (LDBL_MANT_DIG == 53) && (LDBL_MIN_EXP == -1021) && (LDBL_MAX_EXP == 1024)
  #define LONG_DOUBLE_IS_BINARY64 1
 #else
  #undef LONG_DOUBLE_IS_BINARY64
 #endif
 // We can format binary64 values even if the local C environment
 // does not support it.  But `double` == binary64 almost everywhere,
 // so we enable it by default.
 // Force disable: -DSWIFT_DTOA_BINARY64_SUPPORT=0
 #ifndef SWIFT_DTOA_BINARY64_SUPPORT
 #define SWIFT_DTOA_BINARY64_SUPPORT 1
 #endif
 //
 // Intel x87 Float80 support
 //
 // Is "long double" on this system the same as Float80?
 // (macOS, Linux, and FreeBSD when running on x86 or x86_64 processors.)
 #if (FLT_RADIX == 2) && (LDBL_MANT_DIG == 64) && (LDBL_MIN_EXP == -16381) && (LDBL_MAX_EXP == 16384)
 #define LONG_DOUBLE_IS_FLOAT80 1
 #else
 #undef LONG_DOUBLE_IS_FLOAT80
 #endif
 // We can format float80 values even if the local C environment
 // does not support it.  However, by default, we only enable it for
 // environments where float80 == long double.
 // Force enable: -DSWIFT_DTOA_FLOAT80_SUPPORT=1
 // Force disable: -DSWIFT_DTOA_FLOAT80_SUPPORT=0
 #ifndef SWIFT_DTOA_FLOAT80_SUPPORT
 #if LONG_DOUBLE_IS_FLOAT80
  #define SWIFT_DTOA_FLOAT80_SUPPORT 1
 #endif
 #endif
 //
 // IEEE 754 Binary128 support
 //
 // Is "long double" on this system the same as Binary128?
 // (Android on LP64 hardware.)
 #if (FLT_RADIX == 2) && (LDBL_MANT_DIG == 113) && (LDBL_MIN_EXP == -16381) && (LDBL_MAX_EXP == 16384)
 #define LONG_DOUBLE_IS_BINARY128 1
 #else
 #undef LONG_DOUBLE_IS_BINARY128
 #endif
 // We can format binary128 values even if the local C environment
 // does not support it.  However, by default, we only enable it for
 // environments where binary128 == long double.
 // Force enable: -DSWIFT_DTOA_BINARY128_SUPPORT=1
 // Force disable: -DSWIFT_DTOA_BINARY128_SUPPORT=0
 #ifndef SWIFT_DTOA_BINARY128_SUPPORT
 #if LONG_DOUBLE_IS_BINARY128
  #define SWIFT_DTOA_BINARY128_SUPPORT 1
 #endif
 #endif
 #ifdef __cplusplus
 extern "C" {
 #endif
-#if SWIFT_DTOA_DOUBLE_SUPPORT
+// Format a floating point value as an ASCII string
 // Compute the optimal decimal digits and exponent for a double.
 //
 // Input:
-// * `d` is the number to be decomposed
+// * `d` is the number to be formatted
-// * `digits` is an array of `digits_length`
+// * `dest` is a buffer of length `length`
 // * `decimalExponent` is a pointer to an `int`
 //
 // Ouput:
-// * `digits` will receive the decimal digits
+// * Return value is the length of the string placed into `dest`
-// * `decimalExponent` will receive the decimal exponent
+//   or zero if the buffer is too small.
-// * function returns the number of digits generated
+// * For infinity, it copies "inf" or "-inf".
-// * the sign of the input number is ignored
+// * For NaN, it outputs a Swift-style detailed dump, including
 //   sign, signaling/quiet, and payload (if any).  Typical output:
 //   "nan", "-nan", "-snan(0x1234)".
 // * For zero, it outputs "0.0" or "-0.0" depending on the sign.
 // * The destination buffer is always null-terminated (even on error)
 //   unless the length is zero.
 //
 // Note: If you want to customize the output for Infinity, zero, or
 // Nan, you can easily write a wrapper function that uses `fpclassify`
 // to identify those cases and only calls through to these functions
 // for normal and subnormal values.
 //
 // Guarantees:
 //
-// * Accurate. If you parse the result back to a double via an accurate
+// * Accurate. If you parse the result back to the same floating-point
-//   algorithm (such as Clinger's algorithm), the resulting double will
+//   format via an accurate algorithm (such as Clinger's algorithm),
-//   be exactly equal to the original value.  On most systems, this
+//   the resulting value will be _exactly_ equal to the original value.
-//   implies that using `strtod` to parse the output of
+//   On most systems, this implies that using `strtod` to parse the
-//   `swift_format_double` will yield exactly the original value.
+//   output of `swift_dtoa_optimal_double` will yield exactly the
 //   original value.
 //
 // * Short. No other accurate result will have fewer digits.
 //
@@ -72,82 +227,58 @@ extern "C" {
 //   both accurate and short, the form computed here will be
 //   closest to the original binary value.
 //
-// Notes:
+// Naming: The `_p` forms take a `const void *` pointing to the value
-//
+// in memory.  These forms do not require any support from the local C
-// If the input value is infinity or NaN, or `digits_length < 17`, the
+// environment.  In particular, they should work correctly even on
-// function returns zero and generates no ouput.
+// systems with no floating-point support.  Forms ending in a C
-//
+// floating-point type (e.g., "_float", "_double") are identical but
-// If the input value is zero, it will return `decimalExponent = 0` and
+// take the corresponding argument type.  These forms obviously
-// a single digit of value zero.
+// require the C environment to support passing floating-point types as
-//
+// function arguments.
 int swift_decompose_double(double d,
    int8_t *digits, size_t digits_length, int *decimalExponent);
-// Format a double as an ASCII string.
+#if SWIFT_DTOA_BINARY16_SUPPORT
-//
+size_t swift_dtoa_optimal_binary16_p(const void *, char *dest, size_t length);
 // For infinity, it outputs "inf" or "-inf".
 //
 // For NaN, it outputs a Swift-style detailed dump, including
 // sign, signaling/quiet, and payload (if any).  Typical output:
 // "nan", "-nan", "-snan(0x1234)".
 //
 // For zero, it outputs "0.0" or "-0.0" depending on the sign.
 //
 // For other values, it uses `swift_decompose_double` to compute the
 // digits, then uses either `swift_format_decimal` or
 // `swift_format_exponential` to produce an ASCII string depending on
 // the magnitude of the value.
 //
 // In all cases, it returns the number of ASCII characters actually
 // written, or zero if the buffer was too small.
 size_t swift_format_double(double, char *dest, size_t length);
 #endif
-#if SWIFT_DTOA_FLOAT16_SUPPORT
+#if SWIFT_DTOA_BINARY32_SUPPORT
-// See swift_decompose_double.  `digits_length` must be at least 5.
+size_t swift_dtoa_optimal_binary32_p(const void *, char *dest, size_t length);
-int swift_decompose_float16(const __fp16 *f,
+#if FLOAT_IS_BINARY32
-    int8_t *digits, size_t digits_length, int *decimalExponent);
+// If `float` happens to be binary32, define the convenience wrapper.
-// See swift_format_double.
+size_t swift_dtoa_optimal_float(float, char *dest, size_t length);
-size_t swift_format_float16(const __fp16 *, char *dest, size_t length);
+#endif
 #endif
-#if SWIFT_DTOA_FLOAT_SUPPORT
+#if SWIFT_DTOA_BINARY64_SUPPORT
-// See swift_decompose_double.  `digits_length` must be at least 9.
+size_t swift_dtoa_optimal_binary64_p(const void *, char *dest, size_t length);
-int swift_decompose_float(float f,
+#if DOUBLE_IS_BINARY64
-    int8_t *digits, size_t digits_length, int *decimalExponent);
+// If `double` happens to be binary64, define the convenience wrapper.
-// See swift_format_double.
+size_t swift_dtoa_optimal_double(double, char *dest, size_t length);
-size_t swift_format_float(float, char *dest, size_t length);
+#endif
 #if LONG_DOUBLE_IS_BINARY64
 // If `long double` happens to be binary64, define the convenience wrapper.
 size_t swift_dtoa_optimal_long_double(long double, char *dest, size_t length);
 #endif
 #endif
 #if SWIFT_DTOA_FLOAT80_SUPPORT
-// See swift_decompose_double.  `digits_length` must be at least 21.
+// Universal entry point works on all platforms, regardless of
-int swift_decompose_float80(long double f,
+// whether the local system has direct support for float80
-    int8_t *digits, size_t digits_length, int *decimalExponent);
+size_t swift_dtoa_optimal_float80_p(const void *, char *dest, size_t length);
-// See swift_format_double.
+#if LONG_DOUBLE_IS_FLOAT80
-size_t swift_format_float80(long double, char *dest, size_t length);
+// If 'long double' happens to be float80, define a convenience wrapper.
 size_t swift_dtoa_optimal_long_double(long double, char *dest, size_t length);
 #endif
 #endif
-// Generate an ASCII string from the raw exponent and digit information
+#if SWIFT_DTOA_BINARY128_SUPPORT
-// as generated by `swift_decompose_double`.  Returns the number of
+// Universal entry point works on all platforms, regardless of
-// bytes actually used.  If `dest` was not big enough, these functions
+// whether the local system has direct support for float80
-// return zero.  The generated string is always terminated with a zero
+size_t swift_dtoa_optimal_binary128_p(const void *, char *dest, size_t length);
-// byte unless `length` was zero.
+#if LONG_DOUBLE_IS_BINARY128
-
+// If 'long double' happens to be binary128, define a convenience wrapper.
-// "Exponential" form uses common exponential format, e.g., "-1.234e+56"
+size_t swift_dtoa_optimal_long_double(long double, char *dest, size_t length);
-// The exponent always has a sign and at least two digits.  The
+#endif
-// generated string is never longer than `digits_count + 9` bytes,
+#endif
 // including the trailing zero byte.
 size_t swift_format_exponential(char *dest, size_t length,
    bool negative, const int8_t *digits, int digits_count, int decimalExponent);
 // "Decimal" form writes the value without using exponents.  This
 // includes cases such as "0.000001234", "123.456", and "123456000.0".
 // Note that the result always has a decimal point with at least one
 // digit before and one digit after.  The generated string is never
 // longer than `digits_count + abs(exponent) + 4` bytes, including the
 // trailing zero byte.
 size_t swift_format_decimal(char *dest, size_t length,
    bool negative, const int8_t *digits, int digits_count, int decimalExponent);
 #ifdef __cplusplus
 }
--- a/stdlib/public/runtime/SwiftDtoa.cpp
+++ b/stdlib/public/runtime/SwiftDtoa.cpp
--- a/stdlib/public/stubs/Stubs.cpp
+++ b/stdlib/public/stubs/Stubs.cpp
@@ -167,84 +167,6 @@ static locale_t getCLocale() {
 }
 #endif
 #if !SWIFT_DTOA_FLOAT80_SUPPORT
 #if defined(__APPLE__)
 #define swift_snprintf_l snprintf_l
 #elif defined(__CYGWIN__) || defined(_WIN32) || defined(__HAIKU__)
 // swift_snprintf_l() is not used.
 #else
 static int swift_snprintf_l(char *Str, size_t StrSize, locale_t Locale,
                            const char *Format, ...) {
  if (Locale == nullptr) {
    Locale = getCLocale();
  }
  locale_t OldLocale = uselocale(Locale);
  va_list Args;
  va_start(Args, Format);
  int Result = std::vsnprintf(Str, StrSize, Format, Args);
  va_end(Args);
  uselocale(OldLocale);
  return Result;
 }
 #endif
 template <typename T>
 static uint64_t swift_floatingPointToString(char *Buffer, size_t BufferLength,
                                            T Value, const char *Format, 
                                            bool Debug) {
  if (BufferLength < 32)
    swift::crash("swift_floatingPointToString: insufficient buffer size");
  int Precision = std::numeric_limits<T>::digits10;
  if (Debug) {
    Precision = std::numeric_limits<T>::max_digits10;
  }
 #if defined(__CYGWIN__) || defined(_WIN32) || defined(__HAIKU__)
  // Cygwin does not support uselocale(), but we can use the locale feature 
  // in stringstream object.
  std::ostringstream ValueStream;
  ValueStream.width(0);
  ValueStream.precision(Precision);
  ValueStream.imbue(std::locale::classic());
  ValueStream << Value;
  std::string ValueString(ValueStream.str());
  size_t i = ValueString.length();
  if (i < BufferLength) {
    std::copy(ValueString.begin(), ValueString.end(), Buffer);
    Buffer[i] = '\0';
  } else {
    swift::crash("swift_floatingPointToString: insufficient buffer size");
  }
 #else
  // Pass a null locale to use the C locale.
  int i = swift_snprintf_l(Buffer, BufferLength, /*Locale=*/nullptr, Format,
                           Precision, Value);
  if (i < 0)
    swift::crash(
        "swift_floatingPointToString: unexpected return value from sprintf");
  if (size_t(i) >= BufferLength)
    swift::crash("swift_floatingPointToString: insufficient buffer size");
 #endif
  // Add ".0" to a float that (a) is not in scientific notation, (b) does not
  // already have a fractional part, (c) is not infinite, and (d) is not a NaN
  // value.
  if (strchr(Buffer, 'e') == nullptr && strchr(Buffer, '.') == nullptr &&
      strchr(Buffer, 'n') == nullptr) {
    Buffer[i++] = '.';
    Buffer[i++] = '0';
  }
  return i;
 }
 #endif
 // TODO: replace this with a float16 implementation instead of calling _float.
 // Argument type will have to stay float, though; only the formatting changes.
 // Note, return type is __swift_ssize_t, not uint64_t as with the other
@@ -254,33 +176,31 @@ SWIFT_CC(swift) SWIFT_RUNTIME_STDLIB_API
 __swift_ssize_t swift_float16ToString(char *Buffer, size_t BufferLength,
                                      float Value, bool Debug) {
  __fp16 v = Value;
-  return swift_format_float16(&v, Buffer, BufferLength);
+  return swift_dtoa_optimal_binary16_p(&v, Buffer, BufferLength);
 }
 SWIFT_CC(swift) SWIFT_RUNTIME_STDLIB_API
 uint64_t swift_float32ToString(char *Buffer, size_t BufferLength,
                               float Value, bool Debug) {
-  return swift_format_float(Value, Buffer, BufferLength);
+  return swift_dtoa_optimal_float(Value, Buffer, BufferLength);
 }
 SWIFT_CC(swift) SWIFT_RUNTIME_STDLIB_API
 uint64_t swift_float64ToString(char *Buffer, size_t BufferLength,
                               double Value, bool Debug) {
-  return swift_format_double(Value, Buffer, BufferLength);
+  return swift_dtoa_optimal_double(Value, Buffer, BufferLength);
 }
 // We only support float80 on platforms that use that exact format for 'long double'
 // This should match the conditionals in Runtime.swift
 #if !defined(_WIN32) && !defined(__ANDROID__) && (defined(__i386__) || defined(__i686__) || defined(__x86_64__))
 SWIFT_CC(swift) SWIFT_RUNTIME_STDLIB_API
 uint64_t swift_float80ToString(char *Buffer, size_t BufferLength,
                               long double Value, bool Debug) {
-#if SWIFT_DTOA_FLOAT80_SUPPORT
+  // SwiftDtoa.cpp automatically enables float80 on platforms that use it for 'long double'
-  return swift_format_float80(Value, Buffer, BufferLength);
+  return swift_dtoa_optimal_float80_p(&Value, Buffer, BufferLength);
 #else
  // Use this when 'long double' is not true Float80
  return swift_floatingPointToString<long double>(Buffer, BufferLength, Value,
                                                  "%0.*Lg", Debug);
 #endif
 }
 #endif
 /// \param[out] LinePtr Replaced with the pointer to the malloc()-allocated
 /// line.  Can be NULL if no characters were read. This buffer should be
--- a/test/stdlib/PrintFloat.swift.gyb
+++ b/test/stdlib/PrintFloat.swift.gyb
@@ -6,9 +6,6 @@
 // RUN: %line-directive %t/FloatingPointPrinting.swift -- %target-run %t/main.out --locale ru_RU.UTF-8
 // REQUIRES: executable_test
 // With a non-optimized stdlib the test takes very long.
 // REQUIRES: optimized_stdlib
 import StdlibUnittest
 #if canImport(Darwin)
  import Darwin