These weren’t doing the right thing, and all callers have now
migrated to the new `_StringGuts.validate*` methods, which combine
bounds checks with encoding validation and scalar alignment.
To prevent unaligned indices from breaking well-defined index distance
and index offset calculations, round every index down to the nearest
whole Character.
For the horrific details, see the forum discussion below.
https://forums.swift.org/t/string-index-unification-vs-bidirectionalcollection-requirements/55946
To avoid rounding from regressing String performance in the regular
case (when indices aren’t being passed across string views), introduce
a new String.Index flag bit that indicates that the index is already
Character aligned.
There are three flavors, corresponding to i < endIndex, i <= endIndex, and range containment checks.
Additionally, we have separate variants for index validation in substrings.
`String.+=` function to `String.append` function, and use a new
semantics attribute for String.+=.
Teach the constant evaluator about `String.Append` instead of `String.+=`.
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.
Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.
Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).
This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.
Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
Add inlinability annotations to restore performance parity with 4.2 String.
Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.
RRC improvements and optmizations.
* Refactor out RRC implementation into dedicated file.
* Change our `_invariantCheck` pattern to generate efficient code in
asserts builds and make the optimizer job's easier.
* Drop a few Bidi shims we no longer need.
* Restore View decls to String, workaround no longer needed
* Cleaner unicode helper facilities
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.
My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
* [stdlib] Update complexity docs for seq/collection algorithms
This corrects and standardizes the complexity documentation for Sequence
and Collection methods. The use of constants is more consistent, with `n`
equal to the length of the target collection, `m` equal to the length of
a collection passed in as a parameter, and `k` equal to any other passed
or calculated constant.
* Apply notes from @brentdax about complexity nomenclature
* Change `n` to `distance` in `index(_:offsetBy:)`
* Use equivalency language more places; sync across array types
* Use k instead of n for parameter names
* Slight changes to index(_:offsetBy:) discussion.
* Update tests with new parameter names
Simplify String.Index by sinking transcoded offsets into the .utf8
variant. This is in preparation for a more resilient index type
capable of supporting existential string indices.
Drop append-related @inlinable annotations for String, StringGuts,
StringStorage, and the Views. Drop several for larger operations, such
as case conversion. Drop as many as we can from StringGuts for now.
Beside the general goal to remove inlinable functions, this reduces code size and also improves performance for several benchmarks.
The performance problem was that by inlining top-level String API functions into client code (like String.count) it ended up calling non-inlinable internal String functions eventually.
This is much slower than to make a single call at the top-level API boundary into the library. Inside the library all the internal String functions can be specialized and inlined.
rdar://problem/39921548
Use the visitor pattern in most of the opaque-by-hand call
sites. Inspecting the compiler output does not show excessive and
unanticipated ARC, but there may need to be further tweaks.
One downside of the visitor pattern as written is that there's extra
shuffling around of registers for the closure CC. Hopefully this will
also be fixed soon.
Stop inlining _asOpaque into user code. Inlining it bloats user code
as there's a bit-test-and-branch to a block containing the _asOpaque
call, followed up some operations to e.g. manipulate the range or
re-align the calling convention, etc., followed by a final branch to
opaque stdlib code.
Instead, branch directly into opaque stdlib code. In theory, this
means that supporting all opaque patterns can be done with minimal
bloat. On ARM, this is a single tbnz instruction.