Introduce shims for using UBreakIterators from ICU. Also introduce
shims for using thread local storage via pthreads.
We will be relying on ICU and UBreakIterators for grapheme
breaking. But, UBreakIterators are very expensive to create,
especially for the way we do grapheme breaking, which is relatively
stateless. Thus, we will stash one or more into thread local storage
and reset it as needed.
Note: Currently, pthread_key_t is hard coded for a single platform
(Darwin), but I have a static_assert alongside directions on how to
adapt it to any future platforms who differ in key type.
The best high-level APIs for decoding/transcoding are still under active
investigation. It's likely we want more views. Therefore, leave
de-underscored/public only the lowest-level APIs for now.
The following code behaves incorrectly due to the presence of this
overload.
let a: Int = 1
let b: Int? = 2
let c: Int? = nil
let result: [Any] = [a, b, c].flatMap { $0 }
Fixes: <rdar://problem/31910642>
UnsafeBufferPoiunter subscript used in the fast path only checks bounds
in Debug mode, therefore extra checks are needed.
Addresses: <rdar://problem/31992473>
At some point during the implementation of integer protocols these
overloads were necessary to make expressions like `i32 < 0` be faster
and unambiguous.
Now they are no longer necessary, and also cause problems for
expressions like `(u64 - u64) < u64`, where they cause the deprecated
`func - (Strideable, Strideable) -> Stride` be used, which is wrong, as
it will trap in many cases, where `func - (UInt64, UInt64) -> UInt64`
would not.
Fixes: <rdar://problem/31909031>
This is a follow-up fix for making struct constructors inline(__always) in
155db0a4bd: Let Character literals, which fit into 64 bits, be folded into a single integer constant.
and
d8f1caf4a6: Inline all the new low-level bits
If we decide that this structs should not have fixed layout we must re-evaluate the performance difference of not being able to inline
the struct constructors.
This adds a fast path for single-code-unit Character
construction. Rather than use the general purpose String based
initializer (which then repeats grapheme breaking to ensure a trap,
amongst other inefficiencies), just make the Character from the single
unicode scalar value directly.
This also speeds up simple iteration of BMP strings when the optimizer
is unable to eliminate the subscript. Around 2x for ASCII, and around
20% for BMP UTF16.
This is done by ensuring that the corresponding Character constructor is inlined. llvm will do the constant folding.
Also add a test which checks this.
It makes character literals much faster (3x improvement for the CharacterLiteralsSmall benchmark)
And it removes _a lot_ of redundant code (~80% for the CharacterLiteralsSmall benchmark)