* Implement GraphemeWalker that does native grapheme breaking
* Bridged strings use native grapheme breaking for forward strides
* Implement bidirectional native grapheme breaking for native and foreign strings
* Remove ICU's grapheme breaking support
* Use UnicodeScalarView to implement GraphemeWalker
use an Iterator approach
remove Iterator conformance
* Incorporate Michael's feedback
more comments addressed
fix crlf bug
* Try bringing back some old fast paths
* Parameterize nextBoundary and previousBoundary
Parameterize nextBoundary and previousBoundary
* Implement Michael's suggestions
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.
My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
ICU's uText APIs offer a cross-encoding means of interacting with
grapheme breaking. Storing one on our TLS allows us to explore mised
encodings, and even for UTF-16 might be profitable if setText
mallocs/frees an internal uText (which seems to be the case). This
isn't hooked up to anything yet, but its highly likely we will be
using it soon and adding it now enables more rapid development of the
UTF-8 prototype, due to it's tie-in with the runtime.
ThreadLocalStorage is not accessible in an inlinable context, so drop
all such annotations. Also, drop a redundant one from Optional to
suppress a warning.
NFC
Aggressively remove all `@inlinable` from any function that's
`@inline(never)` to see the impact.
`@inlinable @inline(never)` is a potential code smell. While it might
expose some optimization and specialization opportunities to the
optimizer, it's most commonly a sign that more thought is needed.
Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.
In grand LLVM tradition, the first step to redesigning _StringCore is
to first rename it to _LegacyStringCore. Subsequent commits will
introduce the replacement, and eventually all uses of the old one will
be moved to the new one.
NFC.
Windows x86 requires that the TLS destructor callback uses the "stdcall"
calling convention. Provide a shim which will adjust the calling
convention and call back into the swift portion of the code code with
the expected calling convention.
Rename the explicit `pthread` to `thread` to indicate that this is not
`pthread` specifically. This allows us to implement the TLS support for
Windows using Fibers.
* Unify the capitalization across all user-visible error messages (fatal errors, assertion failures, precondition failures) produced by the runtime, standard library and the compiler.
* Update some more tests to the new expectations.
This adds Unicode 9 grapheme breaking support for non-contiguous
NSStrings. Non-contiguous NSStrings that don't hit our fast paths are
very rare, but should still behave identically to contiguous
strings.
We first copy a fixed number of code units into a fixed size buffer
(currently 16 in size) and try to grapheme break inside of that
buffer. This is sufficient storage for all known non-pathological
graphemes. Any graphemes larger than the buffer are handled by copying
larger portions of the string into an Array.
Test cases added, including pathological "zalgo" text that stresses
extremely long graphemes.
The overhead of the weak reference overwhelms the savings from
avoiding the setText, when there's a fair amount of reference counting
involved. The real solution will be to use an unowned reference. For
now, drop the cache and introduce it once we have more runtime
functionality.
Adds in Linux platform support for our pthread TLS. Replace usage of
PTHREAD_KEYS_MAX with a sentinel value, as it's tricky to define
cross-platform and was only lightly used inside sanity checks.
Introduces a _ThreadLocalStorage struct to hold thread-local, but
global resources. Set it up to host a UBreakIterator and a cache key
for resetting text.
UBreakIterators are extremely expensive to create on the fly, so we
store one for each thread. UBreakIterators are also expensive to bind
to new text, so we cache the text it's currently bound to in order to
help avoid it.
The struct can be expanded with more functionality in the future, but
the standard library should only ever use a single key, and thus
everything should go on this struct. The _ThreadLocalStorage struct is
not meant to be copyable, creatable (by anyone else except the
once-per-thread initialize routine), and should accessed through the
pointers it provides.
Future immediate directions could include cashing multiple
UBreakIterators (e.g. avoid a text reset for mutual character
iteration patterns, etc).
Test added in test/stdlib/ThreadLocalStorage.swift.
Adds in Linux platform support for our pthread TLS. Replace usage of
PTHREAD_KEYS_MAX with a sentinel value, as it's tricky to define
cross-platform and was only lightly used inside sanity checks.
Introduces a _ThreadLocalStorage struct to hold thread-local, but
global resources. Set it up to host a UBreakIterator and a cache key
for resetting text.
UBreakIterators are extremely expensive to create on the fly, so we
store one for each thread. UBreakIterators are also expensive to bind
to new text, so we cache the text it's currently bound to in order to
help avoid it.
The struct can be expanded with more functionality in the future, but
the standard library should only ever use a single key, and thus
everything should go on this struct. The _ThreadLocalStorage struct is
not meant to be copyable, creatable (by anyone else except the
once-per-thread initialize routine), and should accessed through the
pointers it provides.
Future immediate directions could include cashing multiple
UBreakIterators (e.g. avoid a text reset for mutual character
iteration patterns, etc).
Test added in test/stdlib/ThreadLocalStorage.swift.