Commit Graph

8 Commits

Author SHA1 Message Date
Karoy Lorentey
847df728fc [test] Resolve failures detected by CI 2025-08-08 14:01:09 -07:00
Karoy Lorentey
3e18a07187 [stdlib] Fix implementation of Unicode text segmentation for word boundaries
Carefully overhaul our word breaking implementation to follow the recommendations of Unicode Annex #29. Start exposing the core primitives (as well as `String`-level interfaces), so that folks can prototype proper API for these concepts.

- Fix `_wordIndex(after:)` to always advance forward. It now requires its input index to be on a word boundary. Remove the `@_spi` attribute, exposing it as a (hidden, but) public entry point.
- The old SPIs `_wordIndex(before:)` and `_nearestWordIndex(atOrBelow:)` were irredemably broken; follow the Unicode recommendation for implementing random-access text segmentation and replace them both with a new public `_wordIndex(somewhereAtOrBefore:)` entry pont.
- Expose handcrafted low-level state machines for detecting word boundaries (_WordRecognizer`, `_RandomAccessWordRecognizer`), following the design of `_CharacterRecognizer`.
- Add tests to reliably validate that the two state machine flavors always produce consistent results.

rdar://155482680
2025-08-05 20:04:46 -07:00
Alejandro Alonso
708d72d2c3 Update tests for 6.1 Unicode 16 stdlib 2025-01-15 14:09:57 -08:00
Alexander Cyon
4a2942bb4e Fix typos in: cmake, tools, utils, unittests, validation-test
Co-authored-by: Saleem Abdulrasool <compnerd@compnerd.org>
2024-07-12 02:34:00 +03:00
cui fliter
127077b3aa chore: fix some comments
Signed-off-by: cui fliter <imcusg@gmail.com>
2024-03-05 17:23:22 +08:00
Alejandro Alonso
3deacf8944 Fix backwards iteration of word breaking 2023-11-08 13:26:20 -08:00
Alejandro Alonso
f16f0c3c23 Update the stdlib to use Unicode 15 data 2023-03-29 10:18:16 -07:00
Alejandro Alonso
95da55b182 [stdlib] Implement String.WordView (#42414)
* Implement String.WordView

* Add isWordAligned bit

* Hide WordView for now (also separate Index type)

add bidirectional conformance

Fix tests

* Address comments from Karoy and Michael

* Remove word view, use index methods

* Address Karoy's comments

aaa
2022-06-22 09:10:09 -07:00