[String.Index] Deprecate encodedOffset var/init

String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
This commit is contained in:
Michael Ilseman
2019-01-25 16:39:06 -08:00
parent 706f15f228
commit 415cc8fb0c
23 changed files with 223 additions and 111 deletions

View File

@@ -274,11 +274,11 @@ extension _StringGuts {
@inlinable
internal var startIndex: String.Index {
@inline(__always) get { return Index(encodedOffset: 0) }
@inline(__always) get { return Index(_encodedOffset: 0) }
}
@inlinable
internal var endIndex: String.Index {
@inline(__always) get { return Index(encodedOffset: self.count) }
@inline(__always) get { return Index(_encodedOffset: self.count) }
}
}