NFC except that I added swift_errorRetain and swift_errorRelease
functions on non-ObjC targets so that we have consistent
functions to call in the runtime. I have not changed everywhere
in the runtime to use these, nor have I changed the compiler to
call them.
The file static-executable-args.lnk was added to the stdlib install
component, but stdlib didn't depend on it, so when invoking
install-stdlib or install-swift-components, the file was not being
generated.
Because of the magic target marked as `ALL`, when one built all the
targets, and later issued a `install-*` target, the file was there by
a lucky coincidence.
The change only creates that dependency, so a call to install the stdlib
will also create this file.
When available, take advantage of precomputed protocol conformances in the shared cache on Darwin platforms to accelerate conformance lookups and cut down on memory usage.
We consult the precomputed conformances before doing the runtime's standard conformance lookup. When a conformance is precomputed, this avoids the slow linear scan of all loaded conformances for the first access, and it avoids the memory cost of storing the conformance in the cache.
When the shared cache has no images overridden (the normal case), then we can also skip scanning conformances in the shared cache in all circumstances, because the precomputed conformances will always cover those results. This greatly speeds up the slow linear scan for the initial lookup of anything that doesn't have a conformance in the shared cache, including lookups with conformances in apps or app frameworks, and negative lookups.
A validation mode is available by setting the SWIFT_DEBUG_VALIDATE_SHARED_CACHE_PROTOCOL_CONFORMANCES environment variable. When enabled, results from the precomputed conformances are compared with the results from a slow scan of the conformances in the shared cache. This extremely slow, but good at catching bugs in the system.
When the calls for precomputed conformances are unavailable, the new code is omitted and we remain in the same situation as before, with the runtime performing all lookups by scanning conformance sections in all loaded Swift images.
rdar://71128983
The new cast logic checks and aborts if a non-nullable pointer has a null value
at runtime. However, because this was tolerated by the old casting logic, some
apps inadvertently rely on being able to cast such null references.
This change adds specific checks for null pointers in compatibility mode and
handles them similarly to the previous casting logic:
* Casting to Obj-C or CF type: casting a null pointer succeeds
* Casting to class-constrained existential: casting a null pointer succeeds
* Casting to a Swift class type: cast fails without crashing
* Bridging from Obj-C class to Swift struct type: cast fails without crashing
This also adds a guard to avoid trying to lookup the dynamic type of a null
class reference.
In non-compatibility mode, all of the above cause an immediate runtime crash.
The changes here are only intended to support existing binaries running against
new Swift runtimes.
Resolves rdar://72323929
In the uncached case, we'd scan conformances, cache them, then re-query the cache. This worked fine when the cache always grew, but now we clear the cache when loading new Swift images into the process. If that happens between the scan and the re-query, we lose the entry and return a false negative.
Instead, track what we've found in the scan in a separate local table, then query that after completing the scan.
While we're in there, fix a bug in TypeLookupError where operator= accidentally copied this->Context instead of other.Context. This caused the runtime to crash when trying to print error messages due to the false negative.
Add a no-parameter constructor to TypeLookupErrorOr<> to distinguish the case where it's being initialized with nothing from the case where it's being initialized with nullptr.
* Harden `isKindOfClass:` check
We've seen some Obj-C NSProxy subclass implementations that are broken and will
crash if you send them a standard `isKindOfClass:` message. This came out because
the casting machinery is now more aggressive about trying different casting options which
means some of these broken objects are being queried in new ways.
This adds an additional check before calling `isKindOfClass:` to verify that
this object is not using the default NSProxy selector routing. If it is,
we just assume it's not whatever class is being tested for since we cannot
do anything better.
Resolves rdar://73799124
* Fixes from Mike Ash
* Fix build
This restores the earlier behavior of Optionals cast to
AnyHashable, so that [String?:Any] dictionaries cast
to [AnyHashable:Any] can be indexed by plain String
keys.
This is a little problematic because it's not consistent with the
compiler-optimized casts.
But the ability to index such dictionaries by plain String
keys seems important to preserve. SR-9047 will expand
Optional/AnyHashable interoperability so that such
indexing works without this special case.
An earlier proposal would link-check this, changing the behavior
depending on the caller. But it's not really workable to
change the behavior seen by intermediate frameworks depending
on the app they're being called by.
As part of making casting more consistent, the behavior of Optional -> AnyHashable casts
was changed in some cases. This PR provides a hook for re-enabling the old behavior
in certain contexts.
Background: Most of the time, casts from String? to AnyHashable get optimized
to just injects the String? into the AnyHashable, so the following
has long been true and remains true in Swift 5.4:
```
let s = "abc"
let o: String? = s
// Next test is true because s is promoted to Optional<String>
print(s == o)
// Next test is false: Optional<String> and String are different types
print(s as AnyHashable == o as AnyHashable)
```
But when casts ended up going through the runtime, Swift 5.3 behaved
differently, as you could see by casting a dictionary with `String?` keys (in
the generic array code, key and value casts always use the runtime logic). In
the following code, both print statements behave differently in Swift 5.4 than
before:
```
let a: [String?:String] = ["Foo":"Bar"]
let b = a as [AnyHashable:Any]
print(b["Foo"] == "Bar") // Works before Swift 5.4
print(b["Foo" as String?] == "Bar") // Works in Swift 5.4 and later
```
Old behavior: The `String?` keys would get unwrapped to `String` before being injected into AnyHashable. This allows the first to work but strangely breaks the second.
New behavior: The `String?` keys do not get unwrapped. This breaks the first but makes the second work.
TODO: The long-term goal, of course, is for `AnyHashable("Foo" as String?)` to
test equal to `AnyHashable("Foo")` (and hash the same, of course). In that
case, all of the tests above will succeed.
Resolves rdar://73301155
* SwiftDtoa v2: Better, Smaller, Faster floating-point formatting
SwiftDtoa is the C/C++ code used in the Swift runtime to produce the textual representations used by the `description` and `debugDescription` properties of the standard Swift floating-point types.
This update includes a number of algorithmic improvements to SwiftDtoa to improve portability, reduce code size, and improve performance but does not change the actual output.
About SwiftDtoa
===============
In early versions of Swift, the `description` properties used the C library `sprintf` functionality with a fixed number of digits.
In 2018, that logic was replaced with the first version of SwiftDtoa which used used a fast, adaptive algorithm to automatically choose the correct number of digits for a particular value.
The resulting decimal output is always:
* Accurate. Parsing the decimal form will yield exactly the same binary floating-point value again. This guarantee holds for any parser that accurately implements IEEE 754. In particular, the Swift standard library can guarantee that for any Double `d` that is not a NaN, `Double(d.description) == d`.
* Short. Among all accurate forms, this form has the fewest significant digits. (Caution: Surprisingly, this is not the same as minimizing the number of characters. In some cases, minimizing the number of characters requires producing additional significant digits.)
* Close. If there are multiple accurate, short forms, this code chooses the decimal form that is closest to the exact binary value. If there are two exactly the same distance, the one with an even final digit will be used.
Algorithms that can produce this "optimal" output have been known since at least 1990, when Steele and White published their Dragon4 algorithm.
However, Dragon4 and other algorithms from that period relied on high-precision integer arithmetic, which made them slow.
More recently, a surge of interest in this problem has produced dramatically better algorithms that can produce the same results using only fast fixed-precision arithmetic.
This format is ideal for JSON and other textual interchange: accuracy ensures that the value will be correctly decoded, shortness minimizes network traffic, and the existence of high-performance algorithms allows this form to be generated more quickly than many `printf`-based implementations.
This format is also ideal for logging, debugging, and other general display. In particular, the shortness guarantee avoids the confusion of unnecessary additional digits, so that the result of `1.0 / 10.0` consistently displays as `0.1` instead of `0.100000000000000000001`.
About SwiftDtoa v2
==================
Compared to the original SwiftDtoa code, this update is:
**Better**:
The core logic is implemented using only C99 features with 64-bit and smaller integer arithmetic.
If available, 128-bit integers are used for better performance.
The core routines do not require any floating-point support from the C/C++ standard library and with only minor modifications should be usable on systems with no hardware or software floating-point support at all.
This version also has experimental support for IEEE 754 binary128 format, though this support is obviously not included when compiling for the Swift standard library.
**Smaller**:
Code size reduction compared to the earlier versions was a primary goal for this effort.
In particular, the new binary128 support shares essentially all of its code with the float80 implementation.
**Faster**:
Even with the code size reductions, all formats are noticeably faster.
The primary performance gains come from three major changes:
Text digits are now emitted directly in the core routines in a form that requires only minimal adjustment to produce the final text.
Digit generation produces 2, 4, or even 8 digits at a time, depending on the format.
The double logic optimistically produces 7 digits in the initial scaling with a Ryu-inspired backtracking when fewer digits suffice.
SwiftDtoa's algorithms
======================
SwiftDtoa started out as a variation of Florian Loitsch' Grisu2 that addressed the shortness failures of that algorithm.
Subsequent work has incorporated ideas from Errol3, Ryu, and other sources to yield a production-quality implementation that is performance- and size-competitive with current research code.
Those who wish to understand the details can read the extensive comments included in the code.
Note that float16 actually uses a different algorithm than the other formats, as the extremely limited range can be handled with much simpler techniques.
The float80/binary128 logic sacrifices some performance optimizations in order to minimize the code size for these less-used formats; the goal for SwiftDtoa v2 has been to match the float80 performance of earlier implementations while reducing code size and widening the arithmetic routines sufficiently to support binary128.
SwiftDtoa Testing
=================
A newly-developed test harness generates several large files of test data that include known-correct results computed with high-precision arithmetic routines.
The test files include:
* Critical values generated by the algorithm presented in the Errol paper (about 48 million cases for binary128)
* Values for which the optimal decimal form is exactly midway between two binary floating-point values.
* All exact powers of two representable in this format.
* Floating-point values that are close to exact powers of ten.
In addition, several billion random values for each format were compared to the results from other implementations.
For binary16 and binary32 this provided exhaustive validation of every possible input value.
Code Size and Performance
=========================
The tables below summarize the code size and performance for the SwiftDtoa C library module by itself on several different processor architectures.
When used from Swift, the `.description` and `.debugDescription` implementations incur additional overhead for creating and returning Swift strings that are not captured here.
The code size tables show the total size in bytes of the compiled `.o` object files for a particular version of that code.
The headings indicate the floating-point formats supported by that particular build (e.g., "16,32" for a version that supports binary16 and binary32 but no other formats).
The performance numbers below were obtained from a custom test harness that generates random bit patterns, interprets them as the corresponding floating-point value, and averages the overall time.
For float80, the random bit patterns were generated in a way that avoids generating invalid values.
All code was compiled with the system C/C++ compiler using `-O2` optimization.
A few notes about particular implementations:
* **SwiftDtoa v1** is the original SwiftDtoa implementation as committed to the Swift runtime in April 2018.
* **SwiftDtoa v1a** is the same as SwiftDtoa v1 with added binary16 support.
* **SwiftDtoa v2** can be configured with preprocessor macros to support any subset of the supported formats. I've provided sizes here for several different build configurations.
* **Ryu** (Ulf Anders) implements binary32 and binary64 as completely independent source files. The size here is the total size of the two .o object files.
* **Ryu(size)** is Ryu compiled with the `RYU_OPTIMIZE_SIZE` option.
* **Dragonbox** (Junekey Jeon). The size here is the compiled size of a simple `.cpp` file that instantiates the template for the specified formats, plus the size of the associated text output logic.
* **Dragonbox(size)** is Dragonbox compiled to minimize size by using a compressed power-of-10 table.
* **gdtoa** has a very large feature set. For this reason, I excluded it from the code size comparison since I didn't consider the numbers to be comparable to the others.
x86_64
----------------
These were built using Apple clang 12.0.5 on a 2019 16" MacBook Pro (2.4GHz 8-core Intel Core i9) running macOS 11.1.
**Code Size**
Bold numbers here indicate the configurations that have shipped as part of the Swift runtime.
| | 16,32,64,80 | 32,64,80 | 32,64 |
|---------------|------------:|------------:|------------:|
|SwiftDtoa v1 | | **15128** | |
|SwiftDtoa v1a | **16888** | | |
|SwiftDtoa v2 | **20220** | 18628 | 8248 |
|Ryu | | | 40408 |
|Ryu(size) | | | 23836 |
|Dragonbox | | | 23176 |
|Dragonbox(size)| | | 15132 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 25ns | 46ns | 82ns | |
|SwiftDtoa v1a | 37ns | 26ns | 47ns | 83ns | |
|SwiftDtoa v2 | 22ns | 19ns | 31ns | 72ns | 90ns |
|Ryu | | 19ns | 26ns | | |
|Ryu(size) | | 17ns | 24ns | | |
|Dragonbox | | 19ns | 24ns | | |
|Dragonbox(size) | | 19ns | 29ns | | |
|gdtoa | 220ns | 381ns | 1184ns | 16044ns | 22800ns |
ARM64
----------------
These were built using Apple clang 12.0.0 on a 2020 M1 Mac Mini running macOS 11.1.
**Code Size**
| | 16,32,64 | 32,64 |
|---------------|---------:|------:|
|SwiftDtoa v1 | | 7436 |
|SwiftDtoa v1a | 9124 | |
|SwiftDtoa v2 | 9964 | 8228 |
|Ryu | | 35764 |
|Ryu(size) | | 16708 |
|Dragonbox | | 27108 |
|Dragonbox(size)| | 19172 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 21ns | 39ns | | |
|SwiftDtoa v1a | 17ns | 21ns | 39ns | | |
|SwiftDtoa v2 | 15ns | 17ns | 29ns | 54ns | 71ns |
|Ryu | | 15ns | 19ns | | |
|Ryu(size) | | 29ns | 24ns | | |
|Dragonbox | | 16ns | 24ns | | |
|Dragonbox(size) | | 15ns | 34ns | | |
|gdtoa | 143ns | 242ns | 858ns | 25129ns | 36195ns |
ARM32
----------------
These were built using clang 8.0.1 on a BeagleBone Black (500MHz ARMv7) running FreeBSD 12.1-RELEASE.
**Code Size**
| | 16,32,64 | 32,64 |
|---------------|---------:|------:|
|SwiftDtoa v1 | | 8668 |
|SwiftDtoa v1a | 10356 | |
|SwiftDtoa v2 | 9796 | 8340 |
|Ryu | | 32292 |
|Ryu(size) | | 14592 |
|Dragonbox | | 29000 |
|Dragonbox(size)| | 21980 |
**Performance**
| | binary16 | binary32 | binary64 | float80 | binary128 |
|--------------|---------:|---------:|---------:|--------:|----------:|
|SwiftDtoa v1 | | 459ns | 1152ns | | |
|SwiftDtoa v1a | 383ns | 451ns | 1148ns | | |
|SwiftDtoa v2 | 202ns | 357ns | 715ns | 2720ns | 3379ns |
|Ryu | | 345ns | 5450ns | | |
|Ryu(size) | | 786ns | 5577ns | | |
|Dragonbox | | 300ns | 904ns | | |
|Dragonbox(size) | | 294ns | 1021ns | | |
|gdtoa | 2180ns | 4749ns | 18742ns |293000ns | 440000ns |
* This is fast enough now even for non-optimized test runs
* Fix float80 Nan/Inf parsing, comment more thoroughly
Introduce `@concurrent` attribute on function types, including:
* Parsing as a type attribute
* (De-/re-/)mangling for concurrent function types
* Implicit conversion from @concurrent to non-@concurrent
- (De-)serialization for concurrent function types
- AST printing and dumping support
This gives us build-time warnings about format string mistakes, like we would get if we called the built-in asprintf directly.
Make TypeLookupError's format string constructor a macro instead so that its callers can get these build-time warnings.
This reveals various mistakes in format strings and arguments in the runtime, which are now fixed.
rdar://73417805
Background: We've noticed a lot of problems from Obj-C APIs that returned null
even though they were declared to never do so. These mismatches subvert Swift's
type system and can lead to hard-to-diagnose crashes much later in the program.
This fatal error was introduced into the primary casting function to help catch
such problems closer to the point where they occur so developers could more
easily identify and fix them.
However, there's been some concern about what this means for old binaries, so
we're considering a check here that would allow the old behavior in certain
cases yet to be determined. This PR adds the framework for such a check.
Resolves rdar://72323929
When building for Windows x86, we do not have `__i686__` defined, but do
have `__i386__` defined. Ensure that the routines are included for the
x86 Windows target.
Use the StackAllocator as task allocator.
TODO: we could pass an initial pre-allocated first slab to the allocator, which is allocated on the stack or with the parent task's allocator.
rdar://problem/71157018
A StackAllocator performs fast allocation and deallocation of memory by implementing a bump-pointer allocation strategy.
In contrast to a pure bump-pointer allocator, it's possible to free memory.
Allocations and deallocations must follow a strict stack discipline.
In general, slabs which become unused are _not_ freed, but reused for subsequent allocations.
The first slab can be placed into pre-allocated memory.
Instead of scribbling each allocation as it's parceled out, scribble the entire chunk up-front. Then, when handing out an allocation, check to make sure it still has the right scribbled data in it.
In derivatives of loops, no longer allocate boxes for indirect case payloads. Instead, use a custom pullback context in the runtime which contains a bump-pointer allocator.
When a function contains a differentiated loop, the closure context is a `Builtin.NativeObject`, which contains a `swift::AutoDiffLinearMapContext` and a tail-allocated top-level linear map struct (which represents the linear map struct that was previously directly partial-applied into the pullback). In branching trace enums, the payloads of previously indirect cases will be allocated by `swift::AutoDiffLinearMapContext::allocate` and stored as a `Builtin.RawPointer`.
While the existing _forEachField in ReflectionMirror.swift
already gives the offsets and types for each field, this isn't
enough information to construct a keypath for that field in
order to modify it.
For reference, this should be sufficent to implement the features
described here: (https://forums.swift.org/t/storedpropertyiterable/19218/62)
purely at runtime without any derived conformances for many types.
Note: Since there isn't enough reflection information for
`.mutatingGetSet` fields, this means that we're not able to support
reflecting certain types of fields (functions, nonfinal class fields,
etc). Whether this is an error or not is controlled by the `.ignoreUnknown`
option.
To prevent rdar://problem/68997282 from regressing, verify at runtime in
debug builds that in calls to swift_allocateGenericValueMetadata the
extraDataSize argument matches the OffsetInWords and SizeInWords
specified by the GenericMetadataPartialPattern available within the
pattern argument.
initClassFieldOffsetVector writes the instanceStart and size to the class's rodata. In some cases they already match, and this write will dirty memory unnecessarily, and prevent the compiler from emitting those rodatas into read-only memory.
rdar://problem/71119533
os_unfair_lock is much smaller than pthread_mutex_t (4 bytes versus 64) and a bit faster.
However, it doesn't support condition variables. Most of our uses of Mutex don't use condition variables, but a few do. Introduce ConditionMutex and StaticConditionMutex, which allow condition variables and continue to use pthread_mutex_t.
On all other platforms, we continue to use the same backing mutex type for both Mutex and ConditionMutex.
rdar://problem/45412121