- String hashing is not inlinable, so it can use _Hasher._core operations directly. Remove custom buffering.
- Speed up ASCII hashing by as much as 5.5x by feeding the storage buffer directly into hasher in a single go.
- For other strings, just feed the UTF-8 encoding of the normalized string to the hasher; don't switch to UTF-16 at the first non-ASCII scalar. (Doing that would make the hash encoding of some string sequences ambiguous, leading to artificial collisions.)
- Add a single unconditional terminator byte, 0xFF. It's not a valid UTF-8 code unit, so it won't ever occur within a normalized string encoding.
Switch StringObject and StringGuts from opaquely storing tagged cocoa
strings into storing small strings. Plumb small string support
throughout the standard library's routines.
* Add partial range subscripts to _UnmanagedOpaqueString
* Use SipHash13+_NormalizedCodeUnitIterator for String hashes on all platforms
* Remove unecessary collation algorithm shims
* Pass the buffer to the SipHasher for ASCII
* Hash the ascii parts of UTF16 strings the same way we hash pure ascii strings
* De-dupe some code that can be shared between _UnmanagedOpaqueString and _UnmanagedString<UInt16>
* ASCII strings now hash consistently for in hashASCII() and hashUTF16()
* Fix zalgo comparison regression
* Use hasher
* Fix crash when appending to an empty _FixedArray
* Compact ASCII characters into a single UInt64 for hashing
* String: Switch to _hash(into:)-based hashing
This should speed up String hashing quite a bit, as doing it through hashValue involves two rounds of SipHash nested in each other.
* Remove obsolete workaround for ARC traffic
* Ditch _FixedArray<UInt8> in favor of _UIntBuffer<UInt64, UInt8>
* Bad rebase remnants
* Fix failing benchmarks
* michael's feedback
* clarify the comment about nul-terminated string hashes
In preparation for small strings optimizations, bifurcate StringGuts's
inits to denote when the caller is choosing to skip any checks for is
small. In the majority of cases, the caller has more local information
to guide the decision.
Adds todos and comments as well:
* TODO(SSO) denotes any task that should be done simultaneously with
the introduction of small strings.
* TODO(TODO: JIRA) denotes tasks that should eventually happen
later (and will have corresponding JIRAs filed for).
* TODO(Comparison) denotes tasks when the new string comparison
lands.
Beyond switching hashing algorithms, this also enables per-execution hash seeds, fulfilling a long-standing prophecy in Hashable’s documentation.
To reduce the possibility of random test failures, StdlibUnittest’s TestSuite overrides the random hash seed on initialization.
rdar://problem/24109692
rdar://problem/35052153
Avoid a source of ARC for hashValue, which is perf-sensitive
especially for hashed collection growth.
This improves the Dictionary benchmark by around 30%.
Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.