Commit Graph

36 Commits

Author SHA1 Message Date
Guillaume Lessard
dfb2e2f12e [stdlib] annotate uses of Range.init(_uncheckedBounds:) 2025-03-05 18:52:11 -08:00
Doug Gregor
22eecacc35 Adopt unsafe annotations throughout the standard library 2025-02-26 14:28:01 -08:00
Nate Cook
23c19333df [stdlib] Fix String.reserveCapacity underallocation (#65902)
When called on a string that is not uniquely referenced,
`String.reserveCapacity(_:)` ignores the current capacity, using
the passed-in capacity for the size of its new storage. This can
result in an underallocation and write past the end of the new
buffer.

This fix changes the new size calculation to use the current UTF-8
count as the minimum. Non-native or non-unique strings
now allocate the requested capacity (or space enough for the
current contents, if that's larger than what's requested).

rdar://109275875
Fixes #53483
2023-05-17 12:11:23 -05:00
Karoy Lorentey
7a11700d7b [stdlib] Avoid retaining storage in _StringGuts.updateNativeStorage 2023-02-14 18:51:15 -08:00
Karoy Lorentey
2086313f70 [stdlib] Work around the optimizer helpfully reintroducing retain/releases
This is just wild flailing at this point, but it does seem to get
us more plausible assembly.
2023-02-14 15:33:53 -08:00
Karoy Lorentey
cf1b9d9404 [stdlib] String: Fix more potential UB, and rework access patterns 2023-02-13 22:55:32 -08:00
Alejandro Alonso
284f8d4fdd Fix Substring.removeSubrange for entire substring (#60744)
fix start and end

fix test
2022-08-25 16:08:33 -07:00
Karoy Lorentey
47109ac8d6 [stdlib] Fix thinko 2022-04-19 13:08:43 -07:00
Karoy Lorentey
847337efd7 [stdlib][cosmetics] Clean up unused/underused interfaces, update naming
There is little point to having `isUTF16` properties when they simply
return `!isUTF8`; remove them.

Rename `String.Index._copyEncoding(from:)` to
`_copyingEncoding(from:)`.
2022-04-18 21:06:20 -07:00
Karoy Lorentey
83df814c63 [stdlib] _StringObject.isKnownUTF16 → isForeignUTF8
This fixes a compatibility issue with potential future UTF-8 encoded
foreign String forms, as well as simplifying the code a bit — we no
longer need to do an availability check on inlinable fast paths.

The isForeignUTF8 bit is never set by any past or current stdlib
version, but it allows us to introduce UTF-8 encoded foreign forms
without breaking inlinable index encoding validation introduced in
Swift 5.7.
2022-04-09 21:33:53 -07:00
Karoy Lorentey
73312fedd4 [stdlib] Grapheme breaking: Refactor to simplify logic
- Split forward and backward direction into separate code paths.
  This makes the code more readable and paves the way for future
  improvements. (E.g., switching to a linear-time algorithm for
  breaking backwards.)
- `Substring.index(after:)` now uses the same grapheme breaking paths
  as `String.index(after:)`.
- The cached stride value in string indices is now well-defined even
  on indices that aren’t character-aligned.
2022-04-05 20:47:42 -07:00
Karoy Lorentey
e0bd5f7a79 [stdlib] Fix Substring.UnicodeScalarView.replaceSubrange 2022-04-05 20:47:42 -07:00
Karoy Lorentey
755712a25d [stdlib] StringGuts.replaceSubrange: Fast path for replacing with a fast substring
If the replacement collection is a fast UTF-8 substring, we can simply
access its backing store directly — we don’t need to use a circuituous
lazy algorithm.
2022-03-29 20:00:08 -07:00
Karoy Lorentey
87073f2af8 [stdlib] Substring.replaceSubrange: fix startIndex/endIndex adjustment
This used to forward to `Slice.replaceSubrange`, but that’s a generic algorithm that isn’t aware of the pecularities of Unicode extended grapheme clusters, and it can be mislead by unusual cases, like a substring or subrange whose bounds aren’t `Character`-aligned, or a replacement string that starts with a continuation scalar.
2022-03-24 21:00:00 -07:00
Matt Zanchelli
be13b470aa Fix typos
becuase -> because
preceeds -> precedes
initalizer -> initializer
intialize -> initialize
libary -> library
notfication -> notification
reciever -> receiver
collecton -> collection
exlcusive -> exclusive
techincal -> technical
compatability -> compatibility
setps -> steps
accomodate -> accommodate
brakcet -> bracket
fraciton -> fraction
programm -> program
concequently -> consequently
ecoding -> encoding
timeIntervalforSelfEnd -> timeIntervalForSelfEnd
2020-12-21 18:44:03 -05:00
Michael Ilseman
8d5d3815a1 Merge pull request #30180 from benrimmington/se-0263-test
[SE-0263] Add test, rename API, update docs
2020-03-06 08:54:24 -08:00
Michael Ilseman
0ca42e9ef7 [string] Shrink storage class sizes.
* Don't allocate breadrumbs pointer if under threshold
* Increase breadrumbs threshold
* Linear 16-byte bucketing until 128 bytes, malloc_size after
* Allow cap less than _SmallString.capacity (bridging non-ASCII)

This change decreases the amount of heap usage for moderate-length
strings (< 64 UTF-8 code units in length) and increases the amount of
spare code unit capacity available (less growth needed).

Average improvements for moderate-length strings:

* 64-bit: on average, 8 bytes saved and 4 bytes of extra capacity
* 32-bit: on average, 4 bytes saved and 6 bytes of extra capacity

Additionally, on 32-bit, large-length strings also gain an average of
6 bytes of extra spare capacity.

Details:

On 64-bit, half of moderate-length allocations will save 16 bytes
while the other half get an extra 8 bytes of spare capacity.

On 32-bit, a quarter of moderate-length allocations will save 16
bytes, and the rest get an extra 4 bytes of spare
capacity. Additionally, 32-bit string's storage class now claims its
full allocation, which is its birthright. Prior to this change, we'd
have on average 1.5 bytes of spare capacity, and now we have 7.5 bytes
of spare capacity.

Breadcrumbs threshold is increased from the super-conservative 32 to
the pretty-conservative 64. Some speed improvements are incorporated
in this change, but more are in flight. Even without those eventual
improvements, this is a worthwhile change (ASCII is still fast-pathed
and irrelevant to breadcrumbing).

For a complex real-world workload, this amounts to around a 5%
improvement to transient heap usage due to all strings and a 4%
improvement to peak heap usage due to all strings. For moderate-length
strings specifically, this gives around 11% improvement to both.
2020-03-05 16:10:23 -08:00
Michael Ilseman
1255e10b62 [stdlib] [gardening] Remove whitespace, document growth issue 2020-03-05 12:16:26 -08:00
Ben Rimmington
cf8988455f [SE-0263] Rename internal API 2020-03-05 14:51:10 +00:00
David Smith
35e21b0bbd SR-10556 _foreignGrow should use the uninitialized-buffer String initializer once it's in 2020-02-19 11:17:11 -08:00
Paul Hudson
06f82a53b5 Replaced the majority of ' : ' with ': '. 2019-07-18 20:46:07 +01:00
Michael Ilseman
415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Mike Ash
fa5888fb3f [Stdlib][Overlays] Rename various classes to avoid conflicting ObjC names.
Old Swift and new Swift runtimes and overlays need to coexist in the same process. This means there must not be any classes which have the same ObjC runtime name in old and new, because the ObjC runtime doesn't like name collisions.

When possible without breaking source compatibility, classes were renamed in Swift, which results in a different ObjC name.

Public classes were renamed only on the ObjC side using the @_objcRuntimeName attribute.

This is similar to the work done in pull request #19295. That only renamed @objc classes. This renames all of the others, since even pure Swift classes still get an ObjC name.

rdar://problem/46646438
2019-01-15 12:21:20 -05:00
Ben Cohen
1673c12d78 [stdlib] Replace "sanityCheck" with "internalInvariant" (#20616)
* Replace "sanityCheck" with "internalInvariant"
2018-11-15 20:50:22 -08:00
Michael Ilseman
034f76d10b [String] Remove some unneeded inlinable annotations 2018-11-15 09:43:34 -08:00
Maxim Moiseev
cbf83ac04f [NFC][stdlib] Add FIXME markers to simplify audit 2018-11-14 11:58:42 -08:00
Slava Pestov
f6c2caf64b stdlib: Add @inlinable to @inline(__always) declarations
These should be audited since some might not actually need to be
@inlinable, but for now:

- Anything public and @inline(__always) is now also @inlinable
- Anything @usableFromInline and @inline(__always) is now @inlinable
2018-11-13 15:15:07 -05:00
Karoy Lorentey
3820393ea2 [stdlib] Ensure that reserved capacity survives CoW copies (#46)
This shouldn’t really be necessary, but it makes sense to wait until we can reclaim memory like this consistently across the entire stdlib.
2018-11-04 10:42:44 -08:00
Michael Ilseman
948655e850 [String] Cleanups, comments, documentation
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
2018-11-04 10:42:42 -08:00
Michael Ilseman
75728ebee3 [String] Implement in-place generic RRC 2018-11-04 10:42:41 -08:00
Michael Ilseman
9135c07cac [String] Perform small string append in-register 2018-11-04 10:42:41 -08:00
Michael Ilseman
7aea40680d [String] NFC iterator fast-paths
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.

Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
2018-11-04 10:42:41 -08:00
Johannes Weiss
79e9f26ad7 integrating utf8 validation 2018-11-04 10:42:41 -08:00
Michael Ilseman
8851bac1be [String] Inlining, NFC fast paths, and more.
Add inlinability annotations to restore performance parity with 4.2 String.

Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.

RRC improvements and optmizations.
2018-11-04 10:42:41 -08:00
Michael Ilseman
9d9f9005e3 [String] Define performance flags and plumb them throughout 2018-11-04 10:42:41 -08:00
Michael Ilseman
89d18e1a3a [String] Refactor helper code into UnicodeHelpers.swift.
Clean up some of the index assumptions, stick index-aware methods on
_StringGuts, and otherwise migrate code over to UnicodeHelpers.swift.
2018-11-04 10:42:40 -08:00