Fix Hashable conformance of standard integer types so that the number of bits they feed into hasher is exactly Self.bitWidth.
This was intended to be part of SE-0206. However, it would have introduced additional issues with AnyHashable. The custom AnyHashable representations introduced in the previous commit unify hashing for numeric types, eliminating the problem.
AnyHashable has numerous edge cases where two AnyHashable values compare equal but produce different hashes. This breaks Set and Dictionary invariants and can cause unexpected behavior and/or traps. This change overhauls AnyHashable's implementation to fix these edge cases, hopefully without introducing new issues.
- Fix transitivity of ==. Previously, comparisons involving AnyHashable values with Objective-C provenance were handled specially, breaking Equatable:
let a = (42 as Int as AnyHashable)
let b = (42 as NSNumber as AnyHashable)
let c = (42 as Double as AnyHashable)
a == b // true
b == c // true
a == c // was false(!), now true
let d = ("foo" as AnyHashable)
let e = ("foo" as NSString as AnyHashable)
let f = ("foo" as NSString as NSAttributedStringKey as AnyHashable)
d == e // true
e == f // true
d == f // was false(!), now true
- Fix Hashable conformance for numeric types boxed into AnyHashable:
b == c // true
b.hashValue == c.hashValue // was false(!), now true
Fixing this required adding a custom AnyHashable box for all standard integer and floating point types. The custom box was needed to ensure that two AnyHashables containing the same number compare equal and hash the same way, no matter what their original type was. (This behavior is required to ensure consistency with NSNumber, which has not been preserving types since SE-0170.
- Add custom AnyHashable representations for Arrays, Sets and Dictionaries, so that when they contain numeric types, they hash correctly under the new rules above.
- Remove AnyHashable._usedCustomRepresentation. The provenance of a value should not affect its behavior.
- Allow AnyHashable values to be downcasted into compatible types more often.
- Forward _rawHashValue(seed:) to AnyHashable box. This fixes AnyHashable hashing for types that customize single-shot hashing.
https://bugs.swift.org/browse/SR-7496
rdar://problem/39648819
Drop append-related @inlinable annotations for String, StringGuts,
StringStorage, and the Views. Drop several for larger operations, such
as case conversion. Drop as many as we can from StringGuts for now.
With the number of tests Swift does, this had a relatively high chance to fail
regularly somewhere.
Also, rejecting the lower tail means rejecting things that are perfectly
uniform, which I don't think should be the purpose of this test.
I noticed this during testing, but it has nothing to do with the other changes
in this PR. This static violation has always been present as a warning and would
continue to be a warning after my changes.
When Set/Dictionary is nested in another Set, the boundaries of the nested collections weren’t correctly delineated in commutative hashing.
For example, these Sets all hashed the same:
[[1, 2], [3, 4]]
[[1, 3], [2, 4]]
[[1, 4], [2, 3]]
Hash collisions could thus be systematically generated.
To fix this, remove collection-level support for one-shot hashing and revert to the previous method of generating hash values. (Set is still able to support one-shot hashing for its members, though.)
The new _rawHashValue(seed:) requirement allows stdlib types to specialize their hashing when they’re hashed on their own (i.e., not as a component of some composite type).
This makes it possible to get rid of discriminator/terminator values and to eliminate most of Hasher’s resiliency overhead, leading to a measurable speedup, especially for tiny keys.
Newly internal declarations include Hasher._seed and the integer overloads of Hasher._combine(_:), as well as _SipHash13 and _SipHash24.
Unify the interfaces of these SipHash testing structs with Hasher. Update SipHash test to cover Hasher, too.
Add @usableFromInline to all newly internal stuff. In addition to its normal use, it also enables white box testing; compile tests that need to use these declarations with -disable-access-control.
This prototype is not fully implemented, and it relies on specific hash values to not trigger unhandled cases.
To keep its test working, define and use a custom hashing interface that emulates hashValue behavior prior to SE-0206.
This is safe to do with hash(into:), because random hash collisions can be eliminated with awesome certainty by trying a number of different hash seeds. (Unless there is a weakness in SipHash.)
In some cases, we intentionally want hashing to produce looser equivalency classes than equality — to let those cases keep working, add an optional hashEqualityOracle parameter.
Review usages of checkHashable and add hash oracles as needed.
StdlibUnittest uses gyb to avoid duplicating many source-context
arguments. However, this means that any test that wishes to add new
expect helpers has to also be gybbed. Given that this structure hasn't
changed in years, and we should have a real language support
eventually, de-gyb it.
Since this code was first written, we've added more language features that introduce the opportunity for a materializeForSet protocol witness to have an incompatible polymorphic convention with its concrete implementation:
- In a conditional conformance, if a witness comes from a constrained extension with additional protocol requirements, then the witness will require those conformances as additional polymorphic arguments, making its materializeForSet uncallable from code using the protocol witness.
- Given a subscript requirement, the witness may be a generic subscript with a more general signature than the witness, making the generic arguments to the concrete materializeForSet callback incompatible with those expected for the witness.
Longer term, representing materializeForSet patterns using accessor coroutines should obviate the need for this hack. For now, it's necessary for correctness, addressing rdar://problem/35760754.
Switch StringObject and StringGuts from opaquely storing tagged cocoa
strings into storing small strings. Plumb small string support
throughout the standard library's routines.
Streamline internal String creation. Previously, everything funneled
into a single generic function, however, every single call of the
generic funnel had relevant specific information that could be used
for a more efficient algorithm.
In preparation for efficiently forming small strings, refactor this
logic into a handful of more specialized subroutines to preserve more
specific information from the callers.