String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.
Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.
Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).
This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
Different tests used different os checks for importing Darwin, Glibc and
MSVCRT. This commit use the same pattern for importing those libraries,
in order to avoid the #else branches of the incorrect patterns to be
applied to the wrong platform. This was very normal for Android, which
normally should follow the Linux branches, but sometimes was trying to
import Darwin or not importing anything.
The standarized pattern imports Darwin for macOS, iOS, tvOS and watchOS.
It imports Glibc for Linux, FreeBSD, PS4, Android, Cygwin and Haiku; and
imports MSVCRT for Windows. If a new platform is introduced, the else
branch will report an error, so the new platform can be added to one of
the branches (or maybe add a new specific branch).
In some cases the standard pattern was modified because some test required
it (importing extra modules, or extra type aliases), and in some other
cases some branches were removed because the test will not have used
them (but it is not exhaustive, so there might be some unnecessary
branches).
This should, at least, fix three tests for Android (the three
dynamic_replacement*.swift ones).
Include some tuning and tweaking to reduce the constant factors
involved in string comparison. This yields considerable improvement on
our micro-benchmarks, and allows us to make less inlinable code and
have a smaller ABI surface area.
Adds more extensive testing of corner cases in our existing
fast-paths.
When in a post-binary-prefix-scan fast-path, we need to make sure we
are comparing a full-segment scalar, otherwise we miss situations
where a combining end-of-segment scalar would be reordered with a
prior combining scalar in the same segment under normalization in one
string but not the other.
This was hidden by the fact that many combining scalars are not
NFC_QC=maybe, but those which are not present in any precomposed form
have NFC_QC=yes. Added tests.
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
Most of this is just "remember to specify the inputs and outputs on
the command line, so remote-run can see them". A bit is "prefix
environment variables with '%env-'". And the last few are "yeah,
this was never going to work in a remote environment".
In the few cases where I couldn't think of anything reasonable, I just
marked the test as "UNSUPPORTED: remote_run", a new "feature".
_StringGuts shouldn't expose a subscript, implying efficient
access. Switch to the explicit code unit fetch method. Update tests
accordingly, and switch off of deprecated typealiases.
Create a _StringRepresentation struct to standardize internal testing
on. Internalize much of _StringGuts, except for some SPI hacks, and
update tests to use _StringRepresentation.
Switch StringObject and StringGuts from opaquely storing tagged cocoa
strings into storing small strings. Plumb small string support
throughout the standard library's routines.
Streamline internal String creation. Previously, everything funneled
into a single generic function, however, every single call of the
generic funnel had relevant specific information that could be used
for a more efficient algorithm.
In preparation for efficiently forming small strings, refactor this
logic into a handful of more specialized subroutines to preserve more
specific information from the callers.
Restore (un-revert) sting comparison, with fixes
More exhaustive testing of opaque strings, which consistently reproduces prior sporadic failure. Shims fixups. Some test tweaking.
Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.
In grand LLVM tradition, the first step to redesigning _StringCore is
to first rename it to _LegacyStringCore. Subsequent commits will
introduce the replacement, and eventually all uses of the old one will
be moved to the new one.
NFC.
* Remove a bunch of Default(Bidirectional|RandomAccess)Indices usage from stdlib and test
* Remove some DefaultRandomAccessIndices and IndexDistance usage from Foundation
* Remove no-longer-used internal type in Existentials.swift
* Get rid of indicesForTraversal
* Eradicate IndexDistance associated type, replacing with Int everywhere
* Consistently use Int for ExistentialCollection’s IndexDistance type.
* Fix test for IndexDistance removal
* Remove a handful of no-longer-needed explicit types
* Add compatibility shims for non-Int index distances
* Test compatibility shim
* Move IndexDistance typealias into the Collection protocol
This necessary for ensuring the property that String doesn't keep
inaccessible memory alive. For example, before this change,
String(s.dropFirst().unicodeScalars)
would compile and produce a String that owned inaccessible memory.
Now it no longer compiles.
String's view's SubSequences are the same as the Substring's
view. E.g. String.UnicodeScalarView.SubSequence is
Substring.UnicodeScalarView.
New compatibility inits added, to work around the fact that many
previously failable initializers are now non-failable.