[stdlib] Make String.Index(_:within:) initializers more permissive

In Swift 5.6 and below, (broken) code that acquired indices from a
UTF-16-encoded string bridged from Cocoa and kept using them after a
`makeContiguousUTF8` call (or other mutation) may have appeared to be
working correctly as long as the string was ASCII.

Since https://github.com/apple/swift/pull/41417, the
`String(_:within:)` initializers recognize miscoded indices and reject
them by returning nil. This is technically correct, but it
unfortunately may be a binary compatibility issue, as these used to
return non-nil in previous versions.

Mitigate this issue by accepting UTF-16 indices on a UTF-8 string,
transcoding their offset as needed. (Attempting to use an UTF-8 index
on a UTF-16 string is still rejected — we do not implicitly convert
strings in that direction.)

rdar://89369680
This commit is contained in:
Karoy Lorentey
2022-04-18 21:02:14 -07:00
parent e21b846828
commit 4d557b0b45
6 changed files with 101 additions and 20 deletions

View File

@@ -359,8 +359,17 @@ extension String.UTF8View.Index {
public init?(_ idx: String.Index, within target: String.UTF8View) {
// Note: This method used to be inlinable until Swift 5.7.
guard target._guts.hasMatchingEncoding(idx) else { return nil }
guard idx._encodedOffset <= target._guts.count else { return nil }
// As a special exception, we allow `idx` to be an UTF-16 index when `self`
// is a UTF-8 string, to preserve compatibility with (broken) code that
// keeps using indices from a bridged string after converting the string to
// a native representation. Such indices are invalid, but returning nil here
// can break code that appeared to work fine for ASCII strings in Swift
// releases prior to 5.7.
guard
let idx = target._guts.ensureMatchingEncodingNoTrap(idx),
idx._encodedOffset <= target._guts.count
else { return nil }
if _slowPath(target._guts.isForeign) {
guard idx._foreignIsWithin(target) else { return nil }
} else {