String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.
Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.
Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).
This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.
Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
* Refactor out RRC implementation into dedicated file.
* Change our `_invariantCheck` pattern to generate efficient code in
asserts builds and make the optimizer job's easier.
* Drop a few Bidi shims we no longer need.
* Restore View decls to String, workaround no longer needed
* Cleaner unicode helper facilities
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.
My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
As a general rule, it is safe to mark the implementation of hash(into:) and _rawHashValue(seed:) for @_fixed_layout structs as inlinable.
However, some structs (like String guts, Character, KeyPath-related types) have complicated enough hashing that it seems counterproductive to inline them. Mark these with @effects(releasenone) instead.
Move the shifts to index creation time rather than index comparison
time. This seems to benefit micro benchmarks and cover up
inefficiencies in our generic index distance calculations.
Simplify String.Index by sinking transcoded offsets into the .utf8
variant. This is in preparation for a more resilient index type
capable of supporting existential string indices.
String.Index is 3 words in size, which means that Range<String.Index>
is 6, and Substring is 8 words total. This is pretty wasteful, so make
a very minor adjustment to the index cache's UTF-8 buffer to bring it
down to 2 words total.
Do other simplifications too.
This includes various revisions to the APIs landing in Swift 4.2, including:
- Random and other randomness APIs
- Hashable changes
- MemoryLayout.offset(of:)
In theory there could be a "fixed-layout" enum that's not exhaustive
but promises not to add any more cases with payloads, but we don't
need that distinction today.
(Note that @objc enums are still "fixed-layout" in the actual sense of
"having a compile-time known layout". There's just no special way to
spell that.)
This is a step along the way toward handling backward-compatiblity of UTF8View
slicing and preventing inadvertent creation of String instances that keep
inaccessible memory alive.
I failed to merge the upstream changes to swift-corelibs-foundation at the same
time as I merged that #9806, and it broke on linux. Going to get it right this
time.