Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.
Removes the legacy grapheme breaking code paths. Simplifies and
clarifies the non-contiguous grapheme breaking code through consistent
naming and handling of absolute positions vs relative offsets.
This adds Unicode 9 grapheme breaking support for non-contiguous
NSStrings. Non-contiguous NSStrings that don't hit our fast paths are
very rare, but should still behave identically to contiguous
strings.
We first copy a fixed number of code units into a fixed size buffer
(currently 16 in size) and try to grapheme break inside of that
buffer. This is sufficient storage for all known non-pathological
graphemes. Any graphemes larger than the buffer are handled by copying
larger portions of the string into an Array.
Test cases added, including pathological "zalgo" text that stresses
extremely long graphemes.