* Implement GraphemeWalker that does native grapheme breaking
* Bridged strings use native grapheme breaking for forward strides
* Implement bidirectional native grapheme breaking for native and foreign strings
* Remove ICU's grapheme breaking support
* Use UnicodeScalarView to implement GraphemeWalker
use an Iterator approach
remove Iterator conformance
* Incorporate Michael's feedback
more comments addressed
fix crlf bug
* Try bringing back some old fast paths
* Parameterize nextBoundary and previousBoundary
Parameterize nextBoundary and previousBoundary
* Implement Michael's suggestions
ICU will return different results if we call with an offset into a
code unit buffer vs if we slice the buffer first and provide an offset
of zero. Slicing more closely models the semantics of SE-0180, so use
that.
Test case coming in subsequent commit enforcing index
scalar-alignment.
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.
Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.
Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).
This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.
Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
* Refactor out RRC implementation into dedicated file.
* Change our `_invariantCheck` pattern to generate efficient code in
asserts builds and make the optimizer job's easier.
* Drop a few Bidi shims we no longer need.
* Restore View decls to String, workaround no longer needed
* Cleaner unicode helper facilities
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.
My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
Simplify String.Index by sinking transcoded offsets into the .utf8
variant. This is in preparation for a more resilient index type
capable of supporting existential string indices.
Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.