PR 79186 (https://github.com/swiftlang/swift/pull/79186) moved one of
the mandatory passes from the C++ implementation to the Swift
implementation resulting in a compiler that is unable to build the
standard library. The pass used to ensure that inaccessible control-flow
positions after an infinite loop was marked with `unreachable` in SIL.
Since the pass is no longer running, any function that returns a value
that also has an infinite loop internally must place a fatalError after
the infinite loop or it will fail to compile as the compiler will
determine that the function does not return from all control flow paths
even though some of the paths are unreachable.
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.
Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.
Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).
This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
Hand-incrementing the loop variable allows us to skip overflow
detection, and will permit more vectorization improvements in the
future. For now, it gives us perf improvements in nano-benchmarks.
Include some tuning and tweaking to reduce the constant factors
involved in string comparison. This yields considerable improvement on
our micro-benchmarks, and allows us to make less inlinable code and
have a smaller ABI surface area.
Adds more extensive testing of corner cases in our existing
fast-paths.
When in a post-binary-prefix-scan fast-path, we need to make sure we
are comparing a full-segment scalar, otherwise we miss situations
where a combining end-of-segment scalar would be reordered with a
prior combining scalar in the same segment under normalization in one
string but not the other.
This was hidden by the fact that many combining scalars are not
NFC_QC=maybe, but those which are not present in any precomposed form
have NFC_QC=yes. Added tests.
Add a use an unchecked subscript on UnsafeBufferPointer, which skips
debugPrecondition checks (in case we're not inlined) as well as a
force-unwrap check.
Less inlining for hashing and comparison. Saves code size on very
frequent String comparison in exchange for costing us in some of our
ridiculuous micro-benchmarks.
Also adds in more _effects for better codegen
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.
Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
Add inlinability annotations to restore performance parity with 4.2 String.
Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.
RRC improvements and optmizations.
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.
My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
The functions in LibcShims are used externally, some directly and some through @inlineable functions. These are changed to SWIFT_RUNTIME_STDLIB_SPI to better match their actual usage. Their names are also changed to add "_swift" to the front to match our naming conventions.
Three functions from SwiftObject.mm are changed to SPI and get a _swift prefix.
A few other support functions are also changed to SPI. They already had a prefix and look like they were meant to be SPI anyway. It was just hard to notice any mixup when they were #defined to the same thing.
rdar://problem/35863717
_StringGuts shouldn't expose a subscript, implying efficient
access. Switch to the explicit code unit fetch method. Update tests
accordingly, and switch off of deprecated typealiases.
- numericValue returns nil instead of .nan for non-numerics
- Remove small-string optimizations from _scalarName that failed on 32-bit archs
- Put case mappings back into U.S.Properties
- Added more sanity tests
@effects is too low a level, and not meant for general usage outside
the standard library. Therefore it deserves to be underscored like
other such attributes.
Promote small-string to small-string comparison into the fast path for
equality and less-than.
Small ASCII strings that are not binary equal do not compare equal,
allowing us to early exit. Small ASCII strings otherwise compare
lexicographically, which we can call prior to jumping through a few
intermediaries.
Unicode Kelvin sign normalizes to ASCII 'K', but our comparison logic
didn't handle this situation when the other side was single-byte all
ASCII. Fall back to the slow comparison path if the point of
difference between an all-ASCII string and a UTF-16 string falls on
such a non-ASCII-yet-normalizes-to-ASCII scalar (rare).
This property is too specific in that it forces a particular normalization; let's not expose it this way, but instead in the future with a full normalization API.
Switch StringObject and StringGuts from opaquely storing tagged cocoa
strings into storing small strings. Plumb small string support
throughout the standard library's routines.
In theory there could be a "fixed-layout" enum that's not exhaustive
but promises not to add any more cases with payloads, but we don't
need that distinction today.
(Note that @objc enums are still "fixed-layout" in the actual sense of
"having a compile-time known layout". There's just no special way to
spell that.)
* Add partial range subscripts to _UnmanagedOpaqueString
* Use SipHash13+_NormalizedCodeUnitIterator for String hashes on all platforms
* Remove unecessary collation algorithm shims
* Pass the buffer to the SipHasher for ASCII
* Hash the ascii parts of UTF16 strings the same way we hash pure ascii strings
* De-dupe some code that can be shared between _UnmanagedOpaqueString and _UnmanagedString<UInt16>
* ASCII strings now hash consistently for in hashASCII() and hashUTF16()
* Fix zalgo comparison regression
* Use hasher
* Fix crash when appending to an empty _FixedArray
* Compact ASCII characters into a single UInt64 for hashing
* String: Switch to _hash(into:)-based hashing
This should speed up String hashing quite a bit, as doing it through hashValue involves two rounds of SipHash nested in each other.
* Remove obsolete workaround for ARC traffic
* Ditch _FixedArray<UInt8> in favor of _UIntBuffer<UInt64, UInt8>
* Bad rebase remnants
* Fix failing benchmarks
* michael's feedback
* clarify the comment about nul-terminated string hashes