Commit Graph

48 Commits

Author SHA1 Message Date
David Smith f08afd00c3 Vectorize UTF16->UTF8 transcoding (#83073)
Fixes rdar://141789595
2026-05-08 11:49:00 -07:00
Doug Gregor 453277eb74 Mark the various with* functions as @safe
Functions like withUnsafeBufferPointer are, by themselves, safe to
call. It's only the operations on the unsafe pointers passed into the
closure that are the safety issue.

This was the intent spelled out in SE-0458 but was not fully realized
in the library.

Fixes rdar://174519372.
2026-04-16 22:37:54 -07:00
David Smith def9ee7464 Introduce a "single breadcrumb mode" for Strings decoded from UTF16. (#83987)
This allows us to quickly answer .utf16.count without requiring
additional allocations

Fixes rdar://160656317
2026-03-26 18:09:18 -07:00
Clinton Nkwocha 2385488c84 Generalize String functions for typed throws (#87495)
Generalizes `String`:
- `init(unsafeUninitializedCapacity:initializingUTF8With:)`
- `withCString(_:)`
- `withCString(encodedAs:_:)`
- `withUTF8(_:)`
- `_withNFCCodeUnits(_:)`

and `Substring`:
  - `withCString(_:)`
  - `withCString(encodedAs:_:)`
  - `withUTF8(_:)`

for typed throws.
2026-03-10 11:27:06 +00:00
Stephen Canon d10b3f82fc Optimization pass over String and UTF8Span's allASCII helper (#82540)
This ranges between parity (for very small strings) and 5x faster (for
32-63B strings) in benchmarking on M1 MBP. For largeish strings it
delivers a roughly 2x speedup; further increase in blocksize nets a
small win in microbenchmarks that I do not expect would translate to
real world usage due to codesize impact and the fact that most strings
are smallish.

There's some opportunity for further work here; in particular, if people
start building Swift for a baseline of AVX2 or AVX512, we should have
paths for that (and we should also implement them if/when we get better
multiversioning dispatch machinery in the language). Span adoption would
be interesting. It's likely we should have a dedicated "small core"
implementation that uses only aligned accesses. Still, this is a
significant improvement as-is, and we should land it.


![allASCII](https://github.com/user-attachments/assets/ebbc45ba-5ba8-42dd-bf63-31ca77844fca)
2025-07-11 12:08:17 -04:00
Guillaume Lessard 77f34f4f29 [stdlib] update doc-comment and add a code comment 2025-07-08 13:20:11 -07:00
Guillaume Lessard 62115791c0 [stdlib] make makeContiguousUTF8 stricter 2025-07-08 13:20:11 -07:00
Michael Ilseman e6e4bd6056 UTF8Span (#78531)
Add support for UTF8Span

Also, refactor validation and grapheme breaking
2025-04-11 16:11:11 -06:00
Allan Shortlidge 60e66f3613 stdlib: Address StrictMemorySafety warnings in String related code. 2025-03-31 16:45:08 -07:00
Doug Gregor 22eecacc35 Adopt unsafe annotations throughout the standard library 2025-02-26 14:28:01 -08:00
Kuba Mracek 7ae20b7039 [embedded] Port Swift.String to embedded Swift 2024-05-08 11:11:37 -07:00
Allan Shortlidge ca925ab379 stdlib: Resolve an un-mutated var warning.
NFC.
2024-03-18 16:33:17 -07:00
David Smith dc405525f9 Optimize checking for all-ASCII bytes (#72312) 2024-03-14 16:13:34 -07:00
Mishal Shah af112c1591 Update the Swift version to 6.0 from 5.11 2024-02-19 17:47:16 -08:00
Guillaume Lessard 98273aa6c2 Update stdlib/public/core/StringCreate.swift
Was off by one.
2024-01-10 14:32:26 -08:00
Guillaume Lessard fa9c80ae08 [test] round out testing for String.init?(validating:as:) 2024-01-10 14:32:26 -08:00
Guillaume Lessard 4617553ee7 [se-0405] improve slow path 2023-12-21 10:44:52 -08:00
Guillaume Lessard 0ba58de1e1 [se-0405] improve fast path 2023-12-21 10:44:52 -08:00
Guillaume Lessard 566fbf4fec [se-0405] update availability to a realistic release target 2023-12-21 10:44:52 -08:00
Guillaume Lessard f7006880c7 [se-0405] adapt implementation from staging package 2023-12-21 10:44:52 -08:00
David Smith 89144300fb Keep the __StringStorage alive while we're using its buffer 2021-08-31 08:51:33 -07:00
David Smith 4209361695 Revert "Copy the code units into a temporary buffer in the invalid UTF8 handling path of String(unsafeUninitializedCapacity:initializingWith:) before calling a function which might invalidate the String's buffer. rdar://80379070"
This reverts commit 4bdc8e3c99.
2021-08-31 08:50:35 -07:00
David Smith 52f9c77560 Revert "Remove unnecessary _fixLifetime"
This reverts commit ce17877d50.
2021-08-31 08:50:16 -07:00
David Smith ce17877d50 Remove unnecessary _fixLifetime 2021-08-31 08:36:27 -07:00
David Smith 4bdc8e3c99 Copy the code units into a temporary buffer in the invalid UTF8 handling path of String(unsafeUninitializedCapacity:initializingWith:) before calling a function which might invalidate the String's buffer. rdar://80379070 2021-08-30 21:27:50 -07:00
Michael Ilseman ae224cacdb [string] Restore _HasContiguousBytes for untyped storage
UnsafeRawBufferPointer cannot implement
withContiguousStorageIfAvailable because doing so would potentially
create a typed pointer from untyped data.
2020-04-09 13:38:28 -07:00
Michael Ilseman c2631004d7 [string] _HasContiguousBytes -> withContiguousStorageIfAvailable
Switch String(decoding:as) and other entry points to call
withContiguousStorageIfAvailable rather than use _HasContiguousBytes.
2020-04-09 13:38:28 -07:00
Michael Ilseman 19b332c8e2 [gardening] Delete Trailing Whitespace 2020-04-09 13:38:27 -07:00
Michael Ilseman 0ca42e9ef7 [string] Shrink storage class sizes.
* Don't allocate breadrumbs pointer if under threshold
* Increase breadrumbs threshold
* Linear 16-byte bucketing until 128 bytes, malloc_size after
* Allow cap less than _SmallString.capacity (bridging non-ASCII)

This change decreases the amount of heap usage for moderate-length
strings (< 64 UTF-8 code units in length) and increases the amount of
spare code unit capacity available (less growth needed).

Average improvements for moderate-length strings:

* 64-bit: on average, 8 bytes saved and 4 bytes of extra capacity
* 32-bit: on average, 4 bytes saved and 6 bytes of extra capacity

Additionally, on 32-bit, large-length strings also gain an average of
6 bytes of extra spare capacity.

Details:

On 64-bit, half of moderate-length allocations will save 16 bytes
while the other half get an extra 8 bytes of spare capacity.

On 32-bit, a quarter of moderate-length allocations will save 16
bytes, and the rest get an extra 4 bytes of spare
capacity. Additionally, 32-bit string's storage class now claims its
full allocation, which is its birthright. Prior to this change, we'd
have on average 1.5 bytes of spare capacity, and now we have 7.5 bytes
of spare capacity.

Breadcrumbs threshold is increased from the super-conservative 32 to
the pretty-conservative 64. Some speed improvements are incorporated
in this change, but more are in flight. Even without those eventual
improvements, this is a worthwhile change (ASCII is still fast-pathed
and irrelevant to breadcrumbing).

For a complex real-world workload, this amounts to around a 5%
improvement to transient heap usage due to all strings and a 4%
improvement to peak heap usage due to all strings. For moderate-length
strings specifically, this gives around 11% improvement to both.
2020-03-05 16:10:23 -08:00
David Smith 007ff00617 Add fast paths for String(decoding:…, as: Unicode.ASCII.self) 2019-10-15 17:26:23 -07:00
David Smith b06137b283 Add a private implementation of a String initializer with access to uninitialized storage (https://github.com/apple/swift-evolution/pull/1022) and use it to speed up uppercased() and lowercased() 2019-07-09 15:05:00 -07:00
Michael Ilseman aab8063267 [SE-0247] Add contiguous string APIs
Adds API for querying, enforcing, and using contiguous strings.
2019-04-02 20:30:02 -07:00
Michael Ilseman 0ece62d911 [String] Add Substring.base
Adds Substring.base, analogous to Slice.base, to access the entire
String.

Tests added.
2019-03-29 15:43:00 -07:00
Michael Ilseman 19014a85af [stdlib] Some cleanup enabled by _alwaysEmitIntoClient.
Refactor some copy-pasted code into a helper computed variable and
outline some cold paths.
2019-03-03 14:10:01 -08:00
Michael Ilseman 877a20ead0 [String] Fix crash when given null UBP 2019-02-06 14:44:01 -08:00
Michael Ilseman 3df92911f9 [String] Speed up ASCII checking.
Perform ASCII checking using pointer-width strides, making sure to
align properly.
2019-01-22 15:06:03 -08:00
Michael Ilseman a088e13224 [String] Add UTF-8 fast-paths for Foundation initializers
Many Foundation initializers could benefit from faster string
construction and subsequent reads in Swift 5. Add UTF-8 fast paths for
when constructing a string from a valid UTF-8 code units.
2019-01-17 14:15:40 -08:00
Mike Ash fa5888fb3f [Stdlib][Overlays] Rename various classes to avoid conflicting ObjC names.
Old Swift and new Swift runtimes and overlays need to coexist in the same process. This means there must not be any classes which have the same ObjC runtime name in old and new, because the ObjC runtime doesn't like name collisions.

When possible without breaking source compatibility, classes were renamed in Swift, which results in a different ObjC name.

Public classes were renamed only on the ObjC side using the @_objcRuntimeName attribute.

This is similar to the work done in pull request #19295. That only renamed @objc classes. This renames all of the others, since even pure Swift classes still get an ObjC name.

rdar://problem/46646438
2019-01-15 12:21:20 -05:00
Michael Ilseman 255c17aeb6 [String] String-from-whole-Substring fast-path.
Add in a fast-path for Strings created from Substring which covers the
entire String. Put String-from-Substring behind a non-inlinable
resilience barrier for future flexibility.
2018-12-05 18:22:47 -08:00
Ben Cohen 1673c12d78 [stdlib] Replace "sanityCheck" with "internalInvariant" (#20616)
* Replace "sanityCheck" with "internalInvariant"
2018-11-15 20:50:22 -08:00
Michael Ilseman 948655e850 [String] Cleanups, comments, documentation
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
2018-11-04 10:42:42 -08:00
Johannes Weiss 79e9f26ad7 integrating utf8 validation 2018-11-04 10:42:41 -08:00
Michael Ilseman 8851bac1be [String] Inlining, NFC fast paths, and more.
Add inlinability annotations to restore performance parity with 4.2 String.

Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.

RRC improvements and optmizations.
2018-11-04 10:42:41 -08:00
Michael Ilseman 9d9f9005e3 [String] Define performance flags and plumb them throughout 2018-11-04 10:42:41 -08:00
Michael Ilseman 9bf2c4d3d3 [String] Use small string at string creation 2018-11-04 10:42:40 -08:00
Michael Ilseman 4ab45dfe20 [String] Drop in initial UTF-8 String prototype
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.

My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
2018-11-04 10:42:40 -08:00
Michael Ilseman ced2e63d95 [test] Make string internal testing a little more robust; NFC
Add an isSmall query to Character so testing doesn't have to bake in
internal format. Clarify the purpose of the invalid UTF-16 backdoor
creation method.
2018-08-02 16:34:19 -07:00
Michael Ilseman 4a66c4719f [gardening] Move string creation internals to StringCreate.swift
NFC
2018-07-25 14:05:46 -07:00