[stdlib] Make String.Index(_:within:) initializers more permissive

In Swift 5.6 and below, (broken) code that acquired indices from a
UTF-16-encoded string bridged from Cocoa and kept using them after a
`makeContiguousUTF8` call (or other mutation) may have appeared to be
working correctly as long as the string was ASCII.

Since https://github.com/apple/swift/pull/41417, the
`String(_:within:)` initializers recognize miscoded indices and reject
them by returning nil. This is technically correct, but it
unfortunately may be a binary compatibility issue, as these used to
return non-nil in previous versions.

Mitigate this issue by accepting UTF-16 indices on a UTF-8 string,
transcoding their offset as needed. (Attempting to use an UTF-8 index
on a UTF-16 string is still rejected — we do not implicitly convert
strings in that direction.)

rdar://89369680
This commit is contained in:
Karoy Lorentey
2022-04-18 21:02:14 -07:00
parent e21b846828
commit 4d557b0b45
6 changed files with 101 additions and 20 deletions

View File

@@ -429,13 +429,13 @@ extension String.UnicodeScalarIndex {
within unicodeScalars: String.UnicodeScalarView
) {
guard
unicodeScalars._guts.hasMatchingEncoding(sourcePosition),
sourcePosition._encodedOffset <= unicodeScalars._guts.count,
unicodeScalars._guts.isOnUnicodeScalarBoundary(sourcePosition)
let i = unicodeScalars._guts.ensureMatchingEncodingNoTrap(sourcePosition),
i._encodedOffset <= unicodeScalars._guts.count,
unicodeScalars._guts.isOnUnicodeScalarBoundary(i)
else {
return nil
}
self = sourcePosition
self = i
}
/// Returns the position in the given string that corresponds exactly to this