Less inlining for hashing and comparison. Saves code size on very
frequent String comparison in exchange for costing us in some of our
ridiculuous micro-benchmarks.
Also adds in more _effects for better codegen
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.
Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
Add inlinability annotations to restore performance parity with 4.2 String.
Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.
RRC improvements and optmizations.
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.
My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
- Don’t expose the raw execution seed to _rawHashValue.
- Change the type of _rawHashValue’s seed from (UInt64,UInt64) to a single Int. Working with a pair of UInt64s is unwieldy, and overkill in practice. Int as a seed also integrates nicely with Int as a hash value.
- Remove _HasherCore._generateSeed(). Instead, simply call finalize() on a copy of the hasher to get a seed suitable for _rawHashValue.
- Update Set and Dictionary to store a single Int as the seed value.
Note that this doesn’t affect the core hasher, which still mixes in the actual 128-bit execution seed during its initialization. To reduce the potential of confusion, use the name “rawSeed” to refer to an actual 128-bit seed value.
@effects is too low a level, and not meant for general usage outside
the standard library. Therefore it deserves to be underscored like
other such attributes.
This includes various revisions to the APIs landing in Swift 4.2, including:
- Random and other randomness APIs
- Hashable changes
- MemoryLayout.offset(of:)
- String hashing is not inlinable, so it can use _Hasher._core operations directly. Remove custom buffering.
- Speed up ASCII hashing by as much as 5.5x by feeding the storage buffer directly into hasher in a single go.
- For other strings, just feed the UTF-8 encoding of the normalized string to the hasher; don't switch to UTF-16 at the first non-ASCII scalar. (Doing that would make the hash encoding of some string sequences ambiguous, leading to artificial collisions.)
- Add a single unconditional terminator byte, 0xFF. It's not a valid UTF-8 code unit, so it won't ever occur within a normalized string encoding.
Switch StringObject and StringGuts from opaquely storing tagged cocoa
strings into storing small strings. Plumb small string support
throughout the standard library's routines.
* Add partial range subscripts to _UnmanagedOpaqueString
* Use SipHash13+_NormalizedCodeUnitIterator for String hashes on all platforms
* Remove unecessary collation algorithm shims
* Pass the buffer to the SipHasher for ASCII
* Hash the ascii parts of UTF16 strings the same way we hash pure ascii strings
* De-dupe some code that can be shared between _UnmanagedOpaqueString and _UnmanagedString<UInt16>
* ASCII strings now hash consistently for in hashASCII() and hashUTF16()
* Fix zalgo comparison regression
* Use hasher
* Fix crash when appending to an empty _FixedArray
* Compact ASCII characters into a single UInt64 for hashing
* String: Switch to _hash(into:)-based hashing
This should speed up String hashing quite a bit, as doing it through hashValue involves two rounds of SipHash nested in each other.
* Remove obsolete workaround for ARC traffic
* Ditch _FixedArray<UInt8> in favor of _UIntBuffer<UInt64, UInt8>
* Bad rebase remnants
* Fix failing benchmarks
* michael's feedback
* clarify the comment about nul-terminated string hashes
In preparation for small strings optimizations, bifurcate StringGuts's
inits to denote when the caller is choosing to skip any checks for is
small. In the majority of cases, the caller has more local information
to guide the decision.
Adds todos and comments as well:
* TODO(SSO) denotes any task that should be done simultaneously with
the introduction of small strings.
* TODO(TODO: JIRA) denotes tasks that should eventually happen
later (and will have corresponding JIRAs filed for).
* TODO(Comparison) denotes tasks when the new string comparison
lands.
Beyond switching hashing algorithms, this also enables per-execution hash seeds, fulfilling a long-standing prophecy in Hashable’s documentation.
To reduce the possibility of random test failures, StdlibUnittest’s TestSuite overrides the random hash seed on initialization.
rdar://problem/24109692
rdar://problem/35052153
Avoid a source of ARC for hashValue, which is perf-sensitive
especially for hashed collection growth.
This improves the Dictionary benchmark by around 30%.
Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.