Commit Graph

25 Commits

Author SHA1 Message Date
Karoy Lorentey
ebf290a598 [stdlib] Optimize ASCII String hashing
- String hashing is not inlinable, so it can use _Hasher._core operations directly. Remove custom buffering.
- Speed up ASCII hashing by as much as 5.5x by feeding the storage buffer directly into hasher in a single go.
- For other strings, just feed the UTF-8 encoding of the normalized string to the hasher; don't switch to UTF-16 at the first non-ASCII scalar. (Doing that would make the hash encoding of some string sequences ambiguous, leading to artificial collisions.)
- Add a single unconditional terminator byte, 0xFF. It's not a valid UTF-8 code unit, so it won't ever occur within a normalized string encoding.
2018-04-19 11:49:51 +01:00
Karoy Lorentey
ccdc218cbd [stdlib] _Hasher: append => combine 2018-04-18 14:18:44 +01:00
Slava Pestov
2e5aef9c8d stdlib: Remove redundant @usableFromInline attributes 2018-04-06 00:02:30 -07:00
Slava Pestov
e1f50b2d36 SE-0193: Rename @_inlineable to @inlinable, @_versioned to @usableFromInline 2018-03-30 21:55:30 -07:00
Michael Ilseman
93d6130066 [string] Integrate small strings.
Switch StringObject and StringGuts from opaquely storing tagged cocoa
strings into storing small strings. Plumb small string support
throughout the standard library's routines.
2018-03-27 14:00:59 -07:00
Lance Parker
10707acfe8 Clarified the comment on ascii string hashes 2018-03-19 11:42:25 -07:00
Lance Parker
cbf157f924 [stdlib]Unify String hashing implementation (#14921)
* Add partial range subscripts to _UnmanagedOpaqueString

* Use SipHash13+_NormalizedCodeUnitIterator for String hashes on all platforms

* Remove unecessary collation algorithm shims

* Pass the buffer to the SipHasher for ASCII

* Hash the ascii parts of UTF16 strings the same way we hash pure ascii strings

* De-dupe some code that can be shared between _UnmanagedOpaqueString and _UnmanagedString<UInt16>

* ASCII strings now hash consistently for in hashASCII() and hashUTF16()

* Fix zalgo comparison regression

* Use hasher

* Fix crash when appending to an empty _FixedArray

* Compact ASCII characters into a single UInt64 for hashing

* String: Switch to _hash(into:)-based hashing

This should speed up String hashing quite a bit, as doing it through hashValue involves two rounds of SipHash nested in each other.

* Remove obsolete workaround for ARC traffic

* Ditch _FixedArray<UInt8> in favor of _UIntBuffer<UInt64, UInt8>

* Bad rebase remnants

* Fix failing benchmarks

* michael's feedback

* clarify the comment about nul-terminated string hashes
2018-03-17 22:13:37 -07:00
Michael Ilseman
884356fc0a [string] Bifurcate StringGuts inits for large strings.
In preparation for small strings optimizations, bifurcate StringGuts's
inits to denote when the caller is choosing to skip any checks for is
small. In the majority of cases, the caller has more local information
to guide the decision.

Adds todos and comments as well:

  * TODO(SSO) denotes any task that should be done simultaneously with
    the introduction of small strings.

  * TODO(TODO: JIRA) denotes tasks that should eventually happen
    later (and will have corresponding JIRAs filed for).

  * TODO(Comparison) denotes tasks when the new string comparison
    lands.
2018-03-13 15:32:19 -07:00
Karoy Lorentey
8cf5bc8bdc [stdlib] Switch to using SipHash-1-3 as the standard hash function
Beyond switching hashing algorithms, this also enables per-execution hash seeds, fulfilling a long-standing prophecy in Hashable’s documentation.

To reduce the possibility of random test failures, StdlibUnittest’s TestSuite overrides the random hash seed on initialization.

rdar://problem/24109692
rdar://problem/35052153
2018-03-09 14:48:59 +00:00
Karoy Lorentey
f8ce96273e [stdlib] Implement _hash(into:) for standard Hashable types. 2018-03-09 14:35:22 +00:00
Michael Ilseman
436475fb99 [string] ARC hack around hashValue
Avoid a source of ARC for hashValue, which is perf-sensitive
especially for hashed collection growth.

This improves the Dictionary benchmark by around 30%.
2018-01-23 21:43:17 -08:00
Karoy Lorentey
90e894729a [StringGuts] Linux support
Add support for compiling StringGuts without the Objective-C runtime.
2018-01-21 12:37:36 -08:00
Michael Ilseman
3be2faf5d3 [String] Initial implementation of 64-bit StringGuts.
Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.
2018-01-21 12:32:26 -08:00
Slava Pestov
c272d41e2f Re-apply "SIL: Remove special meaning for @_semantics("stdlib_binary_only")"
With -sil-serialize-all gone, this no longer means anything; just
don't declare the function as @_inlineable instead.

Fixes <rdar://problem/34564380>.
2017-10-04 14:07:52 -07:00
Jordan Rose
aab5f7aa4f Revert "SIL: Remove special meaning for @_semantics("stdlib_binary_only")" (#12270)
It still affects StdlibUnittest, which is still using -sil-serialize-all.
2017-10-04 12:49:21 -07:00
Slava Pestov
0fad13eeba SIL: Remove special meaning for @_semantics("stdlib_binary_only")
With -sil-serialize-all gone, this no longer means anything; just
don't declare the function as @_inlineable instead.

Fixes <rdar://problem/34564380>.
2017-10-03 13:48:22 -07:00
Max Moiseev
ef6b5c4795 Add missing @_inlineable attributes and deinits 2017-09-29 11:26:56 -07:00
Max Moiseev
53b8419279 [stdlib] Make all the stdlib APIs @_inlineable
This change in theory should allow us to remove a special stdlib-only
sil-serialize-all compilation mode.

<rdar://problem/34138683>
2017-09-29 11:26:56 -07:00
Dave Abrahams
97f875ad84 [stdlib] De-underscore Unicode "namespace" 2017-05-11 15:23:25 -07:00
Ben Cohen
d8be7ae29e Use CF for Hashing (#9203) 2017-05-03 05:42:40 -07:00
practicalswift
6d1ae2a39c [gardening] 2016 → 2017 2017-01-06 16:41:22 +01:00
practicalswift
797b80765f [gardening] Use the correct base URL (https://swift.org) in references to the Swift website
Remove all references to the old non-TLS enabled base URL (http://swift.org)
2016-11-20 17:36:03 +01:00
Dmitri Gribenko
e8e8b35610 stdlib: use SipHash-1-3 for string hashing on non-ObjC platforms
Part of rdar://problem/24109692
2016-09-06 20:41:03 -07:00
Dmitri Gribenko
77813904f8 stdlib: replace error-prone pairs of isASCII/startASCII calls with an API that returns Optional 2016-09-05 22:47:24 -07:00
Dmitri Gribenko
c314b599fc stdlib: move 'String : Hashable' conformance to a separate file
I'm about to add more code around it.
2016-09-02 19:34:14 -07:00