mirror of
https://github.com/apple/swift.git
synced 2025-12-21 12:14:44 +01:00
[stdlib] Add bookkeeping to keep track of the encoding of strings and indices
Assign some previously reserved bits in String.Index and _StringObject to keep track of their associated storage encoding (either UTF-8 or UTF-16). None of these bits will be reliably set in processes that load binaries compiled with older stdlib releases, but when they do end up getting set, we can use them opportunistically to more reliably detect cases where an index is applied on a string with a mismatching encoding. As more and more code gets recompiled with 5.7+, the stdlib will gradually become able to detect such issues with complete accuracy. Code that misuses indices this way was always considered broken; however, String wasn’t able to reliably detect these runtime errors before. Therefore, I expect there is a large amount of broken code out there that keeps using bridged Cocoa String indices (UTF-16) after a mutation turns them into native UTF-8 strings. Therefore, instead of trapping, this commit silently corrects the issue, transcoding the offsets into the correct encoding. It would probably be a good idea to also emit a runtime warning in addition to recovering from the error. This would generate some noise that would gently nudge folks to fix their code. rdar://89369680
This commit is contained in:
@@ -233,6 +233,12 @@ extension _StringGuts {
|
||||
return self.withFastUTF8 { _decodeScalar($0, startingAt: i).0 }
|
||||
}
|
||||
|
||||
@_alwaysEmitIntoClient
|
||||
@inline(__always)
|
||||
internal func isOnUnicodeScalarBoundary(_ offset: Int) -> Bool {
|
||||
isOnUnicodeScalarBoundary(String.Index(_encodedOffset: offset))
|
||||
}
|
||||
|
||||
@usableFromInline
|
||||
@_effects(releasenone)
|
||||
internal func isOnUnicodeScalarBoundary(_ i: String.Index) -> Bool {
|
||||
@@ -244,7 +250,7 @@ extension _StringGuts {
|
||||
|
||||
if _fastPath(isFastUTF8) {
|
||||
return self.withFastUTF8 {
|
||||
return !UTF8.isContinuation($0[i._encodedOffset])
|
||||
return !UTF8.isContinuation($0[_unchecked: i._encodedOffset])
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user