This commit supersedes 2866b4a which was overwritten by 4ab45df.
Store small string code units in little-endian byte order. This way
the code units are in the same order on all machines and can be
safely treated as an array of bytes.
In anticipation of potential future HW features, e.g. armv8.5 memory
tagging, only use the high 4 bytes as discriminator bits in
_BridgeObject rather than the top 8 bits. Utilize two perf flags to
cover this instead. This requires shifting around a fair amount of
internal complexity.
Remove Discriminator, Flags, etc., abstractions from
StringObject. These cause code divergence between 32-bit and 64-bit
ABI, complicate ABI changes, and otherwise contribute to bloat.
These should be audited since some might not actually need to be
@inlinable, but for now:
- Anything public and @inline(__always) is now also @inlinable
- Anything @usableFromInline and @inline(__always) is now @inlinable
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.
Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
Add inlinability annotations to restore performance parity with 4.2 String.
Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.
RRC improvements and optmizations.
* Refactor out RRC implementation into dedicated file.
* Change our `_invariantCheck` pattern to generate efficient code in
asserts builds and make the optimizer job's easier.
* Drop a few Bidi shims we no longer need.
* Restore View decls to String, workaround no longer needed
* Cleaner unicode helper facilities
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.
My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
Exclusively store small strings in little-endian byte order. This
will insert byte swaps when accessing small strings on big endian
platforms, however these are usually extremely cheap.
This approach means that the layout of the code points and count
in memory will be the same on both big and little endian machines
simplifying future development. Prior to this change this code
was broken on big endian machines because the memory layout was
different (the count ending up in the middle of the string).
@effects is too low a level, and not meant for general usage outside
the standard library. Therefore it deserves to be underscored like
other such attributes.
The main part of this is to rewrite the small string literal-constructor to work with values (= shifting bytes) instead of setting bytes in memory.
This allows the compiler to fold away everything and end up with the optimal code for small string literals.
As this function is generic, it makes a big difference when it can be specialized for concrete sequences, like arrays or unsafe buffers.
This fixes a performance regression of String(decoding:as:), e.g. when constructing a String from a byte buffer.
This adds a small string representation capable of holding up to 15
ASCII code units directly in registers. This is extendable to UTF-8 in
the future.
It is intended to be the preferred representation whenever possible
for String, and is intended to be a String fast-path. Future small
forms may be added in the future (likely off the fast-path).
Small strings are available on 64-bit, where they are most beneficial
and well accomodated by limited address spaces. They are unavailable
on 32-bit, where they are less of a win and would require much more
hackery due to full address spaces.