Swift 5.7 added stronger index validation for `String`, so some illegal cases that previously triggered inconsistently diagnosed out of bounds accesses now result in reliable runtime errors. Similarly, attempts at applying an index originally vended by a UTF-8 string on a UTF-16 string now result in a reliable runtime error.
As is usually the case, new traps to the stdlib exposes code that contains previously undiagnosed / unreliably diagnosed coding issues.
Allow invalid code in binaries built with earlier versions of the stdlib to continue running with the 5.7 library by disabling some of the new traps based on the version of Swift the binary was built with.
In the case of an index encoding mismatch, allow transcoding of string storage regardless of the direction of the mismatch. (Previously we only allowed transcoding a UTF-8 string to UTF-16.)
rdar://93379333
The exhaustive substring.replaceSubrange test probably takes too long
to include in regular testing, but let’s enable it for now: it has
caught a bunch of problems already and it will probably catch more
before this lands.
Misaligned indices were fixed in 5.1, but we should disable the test
when testing back deployment.
Adds a shared helper to StdlibUnittest for the run time check.
* Replace stdlib and test/stdlib 9999 availability.
macOS 9999 -> macOS 10.15
iOS 9999 -> iOS 13
tvOS 9999 -> tvOS 13
watchOS 9999 -> watchOS 6
* Restore the pre-10.15 version of public init?(_: NSRange, in: __shared String)
We need this to allow master to work on 10.14 systems (in particular, to allow PR testing to work correctly without disabling back-deployment tests).
Fixes a general category (pun intended) of scalar-alignment bugs
surrounding exchanging non-scalar-aligned indices between views and
for slicing.
SE-0180 unifies the Index type of String and all its views and allows
non-scalar-aligned indices to be used across views. In order to
guarantee behavior, we often have to check and perform scalar
alignment. To speed up these checks, we allocate a bit denoting
known-to-be-aligned, so that the alignment check can skip the
load. The below shows what views need to check for alignment before
they can operate, and whether the indices they produce are aligned.
┌───────────────╥────────────────────┬──────────────────────────┐
│ View ║ Requires Alignment │ Produces Aligned Indices │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Native UTF8 ║ no │ no │
├───────────────╫────────────────────┼──────────────────────────┤
│ Native UTF16 ║ yes │ no │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Foreign UTF8 ║ yes │ no │
├───────────────╫────────────────────┼──────────────────────────┤
│ Foreign UTF16 ║ no │ no │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ UnicodeScalar ║ yes │ yes │
├───────────────╫────────────────────┼──────────────────────────┤
│ Character ║ yes │ yes │
└───────────────╨────────────────────┴──────────────────────────┘
The "requires alignment" applies to any operation taking a
String.Index that's not defined entirely in terms of other operations
taking a String.Index. These include:
* index(after:)
* index(before:)
* subscript
* distance(from:to:) (since `to` is compared against directly)
* UTF16View._nativeGetOffset(for:)
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.
Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.
Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).
This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
Simplify String.Index by sinking transcoded offsets into the .utf8
variant. This is in preparation for a more resilient index type
capable of supporting existential string indices.