These simply expose the preexisting internal
`_StringGuts.validate*Index` functions that indexing operations
use to implicitly round indices down to the nearest valid index. (Or, in the case of the encoding views, the nearest scalar boundary.)
Being able to do this as a standalone, explicit, efficient operation
is crucial when implementing some `String` algorithms that need to
work with arbitrary indices.
Swift 5.7 added stronger index validation for `String`, so some illegal cases that previously triggered inconsistently diagnosed out of bounds accesses now result in reliable runtime errors. Similarly, attempts at applying an index originally vended by a UTF-8 string on a UTF-16 string now result in a reliable runtime error.
As is usually the case, new traps to the stdlib exposes code that contains previously undiagnosed / unreliably diagnosed coding issues.
Allow invalid code in binaries built with earlier versions of the stdlib to continue running with the 5.7 library by disabling some of the new traps based on the version of Swift the binary was built with.
In the case of an index encoding mismatch, allow transcoding of string storage regardless of the direction of the mismatch. (Previously we only allowed transcoding a UTF-8 string to UTF-16.)
rdar://93379333
In Swift 5.6 and below, (broken) code that acquired indices from a
UTF-16-encoded string bridged from Cocoa and kept using them after a
`makeContiguousUTF8` call (or other mutation) may have appeared to be
working correctly as long as the string was ASCII.
Since https://github.com/apple/swift/pull/41417, the
`String(_:within:)` initializers recognize miscoded indices and reject
them by returning nil. This is technically correct, but it
unfortunately may be a binary compatibility issue, as these used to
return non-nil in previous versions.
Mitigate this issue by accepting UTF-16 indices on a UTF-8 string,
transcoding their offset as needed. (Attempting to use an UTF-8 index
on a UTF-16 string is still rejected — we do not implicitly convert
strings in that direction.)
rdar://89369680
* Replace stdlib and test/stdlib 9999 availability.
macOS 9999 -> macOS 10.15
iOS 9999 -> iOS 13
tvOS 9999 -> tvOS 13
watchOS 9999 -> watchOS 6
* Restore the pre-10.15 version of public init?(_: NSRange, in: __shared String)
We need this to allow master to work on 10.14 systems (in particular, to allow PR testing to work correctly without disabling back-deployment tests).
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.
My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
Beside the general goal to remove inlinable functions, this reduces code size and also improves performance for several benchmarks.
The performance problem was that by inlining top-level String API functions into client code (like String.count) it ended up calling non-inlinable internal String functions eventually.
This is much slower than to make a single call at the top-level API boundary into the library. Inside the library all the internal String functions can be specialized and inlined.
rdar://problem/39921548
Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.
I failed to merge the upstream changes to swift-corelibs-foundation at the same
time as I merged that #9806, and it broke on linux. Going to get it right this
time.
This removes the `_core` property from UnicodeScalarView.Index
and moves any remaining index-moving logic from the index to
the view in UnicodeScalarView and CharacterView.
This documentation revision covers a large number of types & protocols:
String, its views and their indices, the Unicode codec types and protocol,
as well as Character, UnicodeScalar, and StaticString, among others.
This also includes a few small changes across the standard library for
consistency.
The RangeProtocol was a very weak and fragile abstraction because it
didn't specify the interpretation of the endpoints. To write a
non-trivial algorithm, one usually needed to consult that information.
The standard library code only actually worked correctly with half-open
and closed ranges (and didn't handle fully open ranges, for example).
The other two protocols, HalfOpenRangeProtocol and ClosedRangeProtocol,
were only used for code sharing, and present an ABI burden. We can use
gyb instead.