Commit Graph

39 Commits

Author SHA1 Message Date
David Smith
dbaada435c Stay in vectors longer before doing a horizontal sum 2022-03-18 15:27:40 -07:00
David Smith
eaf3f316ec Vectorize UTF16 offset calculations 2022-03-17 14:18:21 -07:00
Hassan
1d4f220ed4 [stdlib] Replace precondition with the internal _precondition 2021-11-04 23:51:10 +02:00
Kuba (Brecka) Mracek
404badb49a Introduce SWIFT_ENABLE_REFLECTION to turn on/off the support for Mirrors and reflection (#33617) 2021-09-08 13:08:13 -07:00
Doug Gregor
9579390024 [SE-0304] Rename ConcurrentValue to Sendable 2021-03-18 22:48:20 -07:00
Doug Gregor
1a1f79c0de Introduce safety checkin for ConcurrentValue conformance.
Introduce checking of ConcurrentValue conformances:
- For structs, check that each stored property conforms to ConcurrentValue
- For enums, check that each associated value conforms to ConcurrentValue
- For classes, check that each stored property is immutable and conforms
  to ConcurrentValue

Because all of the stored properties / associated values need to be
visible for this check to work, limit ConcurrentValue conformances to
be in the same source file as the type definition.

This checking can be disabled by conforming to a new marker protocol,
UnsafeConcurrentValue, that refines ConcurrentValue.
UnsafeConcurrentValue otherwise his no specific meaning. This allows
both "I know what I'm doing" for types that manage concurrent access
themselves as well as enabling retroactive conformance, both of which
are fundamentally unsafe but also quite necessary.

The bulk of this change ended up being to the standard library, because
all conformances of standard library types to the ConcurrentValue
protocol needed to be sunk down into the standard library so they
would benefit from the checking above. There were numerous little
mistakes in the initial pass through the stsandard library types that
have now been corrected.
2021-02-04 03:45:09 -08:00
Valeriy Van
f434cbab05 Fixes example snippets in StringUTF16View.swift 2020-04-27 23:54:04 +02:00
Michael Ilseman
d8f25be9fa [string] Skip unnecessary self UTF-16 length in isEqual
For isEqual bridging comparisons, skip checking our own UTF-16 length when the
string we're comparing against is known to be ASCII.
2020-03-05 16:13:15 -08:00
Michael Ilseman
7ff3ecf2e5 [string] Skip unnecessary self UTF-16 length in isEqual
For isEqual bridging comparisons, skip checking our own UTF-16 length when the
string we're comparing against is known to be ASCII.
2020-03-05 16:13:15 -08:00
Michael Ilseman
0ca42e9ef7 [string] Shrink storage class sizes.
* Don't allocate breadrumbs pointer if under threshold
* Increase breadrumbs threshold
* Linear 16-byte bucketing until 128 bytes, malloc_size after
* Allow cap less than _SmallString.capacity (bridging non-ASCII)

This change decreases the amount of heap usage for moderate-length
strings (< 64 UTF-8 code units in length) and increases the amount of
spare code unit capacity available (less growth needed).

Average improvements for moderate-length strings:

* 64-bit: on average, 8 bytes saved and 4 bytes of extra capacity
* 32-bit: on average, 4 bytes saved and 6 bytes of extra capacity

Additionally, on 32-bit, large-length strings also gain an average of
6 bytes of extra spare capacity.

Details:

On 64-bit, half of moderate-length allocations will save 16 bytes
while the other half get an extra 8 bytes of spare capacity.

On 32-bit, a quarter of moderate-length allocations will save 16
bytes, and the rest get an extra 4 bytes of spare
capacity. Additionally, 32-bit string's storage class now claims its
full allocation, which is its birthright. Prior to this change, we'd
have on average 1.5 bytes of spare capacity, and now we have 7.5 bytes
of spare capacity.

Breadcrumbs threshold is increased from the super-conservative 32 to
the pretty-conservative 64. Some speed improvements are incorporated
in this change, but more are in flight. Even without those eventual
improvements, this is a worthwhile change (ASCII is still fast-pathed
and irrelevant to breadcrumbing).

For a complex real-world workload, this amounts to around a 5%
improvement to transient heap usage due to all strings and a 4%
improvement to peak heap usage due to all strings. For moderate-length
strings specifically, this gives around 11% improvement to both.
2020-03-05 16:10:23 -08:00
Paul Hudson
06f82a53b5 Replaced the majority of ' : ' with ': '. 2019-07-18 20:46:07 +01:00
Michael Ilseman
63a6794cf9 [String] Switch scalar-aligned bit to a reserved bit.
Since scalar-alignment is set in inlinable code, switch the alignment
bit to one of the previously-reserved bits rather than a grapheme
cache bit. Setting a grapheme cache bit in inlinable would break
backward deployment, as older versions would interpret it as a cached
value.

Also adjust the name to "scalar-aligned", which is clearer, and
removed assertion (which should be a real precondition).
2019-07-02 16:25:04 -07:00
Michael Ilseman
bd5a40ff1b [gardening] Add underscore to internal member 2019-06-27 11:11:44 -07:00
Michael Ilseman
4cd1e812b7 [String] Scalar-alignment bug fixes.
Fixes a general category (pun intended) of scalar-alignment bugs
surrounding exchanging non-scalar-aligned indices between views and
for slicing.

SE-0180 unifies the Index type of String and all its views and allows
non-scalar-aligned indices to be used across views. In order to
guarantee behavior, we often have to check and perform scalar
alignment. To speed up these checks, we allocate a bit denoting
known-to-be-aligned, so that the alignment check can skip the
load. The below shows what views need to check for alignment before
they can operate, and whether the indices they produce are aligned.

┌───────────────╥────────────────────┬──────────────────────────┐
│ View          ║ Requires Alignment │ Produces Aligned Indices │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Native UTF8   ║ no                 │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Native UTF16  ║ yes                │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Foreign UTF8  ║ yes                │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Foreign UTF16 ║ no                 │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ UnicodeScalar ║ yes                │ yes                      │
├───────────────╫────────────────────┼──────────────────────────┤
│ Character     ║ yes                │ yes                      │
└───────────────╨────────────────────┴──────────────────────────┘

The "requires alignment" applies to any operation taking a
String.Index that's not defined entirely in terms of other operations
taking a String.Index. These include:

* index(after:)
* index(before:)
* subscript
* distance(from:to:) (since `to` is compared against directly)
* UTF16View._nativeGetOffset(for:)
2019-06-26 16:42:58 -07:00
Ben Cohen
e9d4687e31 De-underscore @frozen, apply it to structs (#24185)
* De-underscore @frozen for enums

* Add @frozen for structs, deprecate @_fixed_layout for them

* Switch usage from _fixed_layout to frozen
2019-05-30 17:55:37 -07:00
Michael Ilseman
f7cdda2720 [gardening] Clean up many String computed vars 2019-04-08 15:16:48 -07:00
Michael Ilseman
415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Michael Ilseman
b01ee7267a [String] Custom iterator for UTF16View (#20929)
Defining a custom iterator for the UTF16View avoid some redundant
computation over the indexing model. This speeds up iteration by
around 40% on non-ASCII strings.
2018-12-01 09:35:27 -08:00
Michael Ilseman
c3c6fdc77f [String] ASCII fast-path for UTF16View (#20848)
Add an isASCII fast-path for many UTF16View operations. These are
heavily utilized in random-access scenarios, allowing us to both be
more efficient and skip generating breadcrumbs for ASCII strings.
2018-11-29 18:19:32 -08:00
David Smith
8b57921905 Assorted bridging changes:
• Convert _AbstractStringStorage to a protocol, and the free functions used to deduplicate implementations to extensions on that protocol.
• Move 'start' into the abstract type and use that to simplify some code
• Move the ASCII fast path for length into UTF16View.
• Add a weirder but faster way to check which (if any) of our NSString subclasses a given object is, and adopt it
2018-11-28 16:04:34 -08:00
Michael Ilseman
4111b21cb0 [String] Bug fix for empty-range getCharacters 2018-11-26 12:13:23 -08:00
Michael Ilseman
3a0ac0270d [stdlib] Unchecked subscript on UnsafeBufferPointer
Add a use an unchecked subscript on UnsafeBufferPointer, which skips
debugPrecondition checks (in case we're not inlined) as well as a
force-unwrap check.
2018-11-16 11:12:29 -08:00
Ben Cohen
1673c12d78 [stdlib] Replace "sanityCheck" with "internalInvariant" (#20616)
* Replace "sanityCheck" with "internalInvariant"
2018-11-15 20:50:22 -08:00
Michael Ilseman
63fe485758 [String] Audit and publish the rest of the ABI 2018-11-15 11:06:33 -08:00
Michael Ilseman
c749123297 [String] DCE and drop inlinable
Remove some more inlinable annotations and drop dead code.
2018-11-15 11:06:30 -08:00
Michael Ilseman
1939d165ae Make corelibs-foundation build
Expose the old SPIs again so corelibs-foundation can build. We'll want
to wean them off of these and establish proper APIs soon.
2018-11-04 10:42:44 -08:00
Michael Ilseman
c04dcf3b38 [String] More efficient breadcrumb-scanning code.
Rather than rely on the UTF16View, scan between breadcrumbs by hand
for a decent 20% speedup. This code will also make it more obvious how
to slot in a vectorized solution later.
2018-11-04 10:42:44 -08:00
Michael Ilseman
948655e850 [String] Cleanups, comments, documentation
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
2018-11-04 10:42:42 -08:00
Michael Ilseman
cb0fbc6fc7 [String] 5X Faster getCharacters implementation
Rather than bounce through the UTF-16 view, implement custom
transcoding for getCharacters. This speeds it up by around 5X. Adds
tests.
2018-11-04 10:42:41 -08:00
Michael Ilseman
7aea40680d [String] NFC iterator fast-paths
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.

Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
2018-11-04 10:42:41 -08:00
Michael Ilseman
8851bac1be [String] Inlining, NFC fast paths, and more.
Add inlinability annotations to restore performance parity with 4.2 String.

Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.

RRC improvements and optmizations.
2018-11-04 10:42:41 -08:00
Michael Ilseman
752423b86c [String] Remove dead code and decls 2018-11-04 10:42:41 -08:00
Michael Ilseman
2e368a3f6a [String] Introduce StringBreadcrumbs
Breadcrumbs provide us amortized O(1) access to the UTF-16 view, which
is vital for efficient Cocoa interoperability.
2018-11-04 10:42:40 -08:00
Michael Ilseman
fe7c3ce2e4 [String] Refactorings and cleanup
* Refactor out RRC implementation into dedicated file.

* Change our `_invariantCheck` pattern to generate efficient code in
  asserts builds and make the optimizer job's easier.

* Drop a few Bidi shims we no longer need.

* Restore View decls to String, workaround no longer needed

* Cleaner unicode helper facilities
2018-11-04 10:42:40 -08:00
Michael Ilseman
f23a3c19b8 [String] Bounds checking and Index cleanup 2018-11-04 10:42:40 -08:00
Michael Ilseman
89d18e1a3a [String] Refactor helper code into UnicodeHelpers.swift.
Clean up some of the index assumptions, stick index-aware methods on
_StringGuts, and otherwise migrate code over to UnicodeHelpers.swift.
2018-11-04 10:42:40 -08:00
Michael Ilseman
4ab45dfe20 [String] Drop in initial UTF-8 String prototype
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.

My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
2018-11-04 10:42:40 -08:00
Michael Ilseman
8294c0003a [string] Drop _StringGuts subscript; NFC
_StringGuts shouldn't expose a subscript, implying efficient
access. Switch to the explicit code unit fetch method. Update tests
accordingly, and switch off of deprecated typealiases.
2018-08-02 16:34:22 -07:00
Michael Ilseman
2195cda3ec [gardening] Rename StringUTFx.swift to StringUTFxView.swift 2018-07-25 14:09:45 -07:00