Carefully overhaul our word breaking implementation to follow the recommendations of Unicode Annex #29. Start exposing the core primitives (as well as `String`-level interfaces), so that folks can prototype proper API for these concepts.
- Fix `_wordIndex(after:)` to always advance forward. It now requires its input index to be on a word boundary. Remove the `@_spi` attribute, exposing it as a (hidden, but) public entry point.
- The old SPIs `_wordIndex(before:)` and `_nearestWordIndex(atOrBelow:)` were irredemably broken; follow the Unicode recommendation for implementing random-access text segmentation and replace them both with a new public `_wordIndex(somewhereAtOrBefore:)` entry pont.
- Expose handcrafted low-level state machines for detecting word boundaries (_WordRecognizer`, `_RandomAccessWordRecognizer`), following the design of `_CharacterRecognizer`.
- Add tests to reliably validate that the two state machine flavors always produce consistent results.
rdar://155482680
These simply expose the preexisting internal
`_StringGuts.validate*Index` functions that indexing operations
use to implicitly round indices down to the nearest valid index. (Or, in the case of the encoding views, the nearest scalar boundary.)
Being able to do this as a standalone, explicit, efficient operation
is crucial when implementing some `String` algorithms that need to
work with arbitrary indices.
* Implement String.WordView
* Add isWordAligned bit
* Hide WordView for now (also separate Index type)
add bidirectional conformance
Fix tests
* Address comments from Karoy and Michael
* Remove word view, use index methods
* Address Karoy's comments
aaa
Swift 5.7 added stronger index validation for `String`, so some illegal cases that previously triggered inconsistently diagnosed out of bounds accesses now result in reliable runtime errors. Similarly, attempts at applying an index originally vended by a UTF-8 string on a UTF-16 string now result in a reliable runtime error.
As is usually the case, new traps to the stdlib exposes code that contains previously undiagnosed / unreliably diagnosed coding issues.
Allow invalid code in binaries built with earlier versions of the stdlib to continue running with the 5.7 library by disabling some of the new traps based on the version of Swift the binary was built with.
In the case of an index encoding mismatch, allow transcoding of string storage regardless of the direction of the mismatch. (Previously we only allowed transcoding a UTF-8 string to UTF-16.)
rdar://93379333