Commit Graph

62 Commits

Author SHA1 Message Date
Doug Gregor
050a514588 [Strict memory safety] Update standard library for unsafe treated as a call effect 2025-04-25 21:54:23 -07:00
Evan Wilde
ddaf003c56 Get stdlib building again
PR 79186 (https://github.com/swiftlang/swift/pull/79186) moved one of
the mandatory passes from the C++ implementation to the Swift
implementation resulting in a compiler that is unable to build the
standard library. The pass used to ensure that inaccessible control-flow
positions after an infinite loop was marked with `unreachable` in SIL.
Since the pass is no longer running, any function that returns a value
that also has an infinite loop internally must place a fatalError after
the infinite loop or it will fail to compile as the compiler will
determine that the function does not return from all control flow paths
even though some of the paths are unreachable.
2025-03-06 13:32:54 -08:00
Doug Gregor
22eecacc35 Adopt unsafe annotations throughout the standard library 2025-02-26 14:28:01 -08:00
Alex Martini
63323e04a9 Match parameter names in docs to the declaration 2024-08-01 11:01:02 -07:00
Karoy Lorentey
73f349cb15 [stdlib] Rework String breadcrumbs initialization/loading
This is a wild guess at what might be causing our persistent, random
String failures on the main branch:

```
  Swift(macosx-x86_64) :: Prototypes/CollectionTransformers.swift
  Swift(macosx-x86_64) :: stdlib/NSSlowString.swift
  Swift(macosx-x86_64) :: stdlib/NSStringAPI.swift
  Swift(macosx-x86_64) :: stdlib/StringIndex.swift
  Swift-validation(macosx-x86_64) :: stdlib/String.swift
  Swift-validation(macosx-x86_64) :: stdlib/StringBreadcrumbs.swift
  Swift-validation(macosx-x86_64) :: stdlib/StringUTF8.swift
```

FWIW, it appears this is *not* caused by https://github.com/apple/swift/pull/62717:
that change has also landed on release/5.8, and I haven’t seen these
issues on that branch.

Our atomic breadcrumbs initialization vs its non-atomic loading
gives me an uneasy feeling that this may in fact be a long standing
synchronization issue that is only now causing problems (for whatever
reason). I am unable to reproduce these issues locally, so this guess
may be (and probably is) wildly off the mark, but this PR is likely
to be a good idea anyway, if only to rule out this possibility.

rdar://104751936
2023-02-10 20:23:56 -08:00
Karoy Lorentey
e46f8f8244 [stdlib] String.UTF16View: Align indices before calling default algorithms
[Bidirectional]Collection’s default index manipulation methods (as
well as _utf16Distance) do not expect to be given unreachable
indices, and they tend to fail when operating on them. Round indices
down to the nearest scalar boundary before calling these.
2023-01-03 16:12:04 -08:00
Karoy Lorentey
5d354ceb96 [stdlib] Fix String.UTF16View.distance(from:to:)
- Align input indices to scalar boundaries
- Don’t pass decreasing indices to _utf16Distance
2023-01-01 20:58:25 -08:00
Karoy Lorentey
fce428e715 [stdlib] String.UTF16View: Tweak ASCII paths 2022-12-29 16:32:40 -08:00
Karoy Lorentey
d00f8ed44b [stdlib] Optimize StringProtocol._toUTF16Indices/_toUTF16Offsets
Speed up conversion between UTF-16 offset ranges
and string index ranges, by carefully switching
between absolute and relative index calculations,
depending on the distance we need to go.

It is a surprisingly tricky puzzle to do this
correctly while avoiding redundant calculations.
Offset ranges within substrings add the additional
complication of having to bias offset values with
the absolute offset of the substring’s start index.
2022-12-28 20:08:05 -08:00
Karoy Lorentey
ec35728b8d [stdlib] String.UTF16View: Rework thresholds for relative indexing
We commonly start from the `startIndex`, in which case
`_nativeGetOffset` is essentially free. Consider this
case when calculating the threshold for using breadcrumbs.
2022-12-28 20:07:53 -08:00
Karoy Lorentey
6fee1b372b [stdlib] Breadcrumbs are spaced in UTF-16 code units, not UTF-8 2022-12-27 20:22:38 -08:00
Karoy Lorentey
f3a930592f [stdlib] Simplify breadcrumbs avoidance paths in String.UTF16View 2022-12-27 20:22:37 -08:00
Karoy Lorentey
483087a47d [stdlib] Speed up short UTF-16 distance calculations
Previously we insisted on using breadcrumbs even if we only needed to
travel a very short way. This could be as much as ten times slower
than the naive algorithm of simply visiting all the Unicode scalars
in between the start and the end.

(Using breadcrumbs generally means that we need to walk to both
endpoints from their nearest breadcrumb, which on average requires
walking half the distance between breadcrumbs — and this can mean
visiting vastly more Unicode scalars than the ones that are simply
lying in between the endpoints themselves.)
2022-12-27 20:22:34 -08:00
Guillaume Lessard
1b78a1f356 [stdlib] change String’s SIMD bits to load using loadUnaligned 2022-07-13 10:29:58 -06:00
Karoy Lorentey
50c2399a94 [stdlib] Work around binary compatibility issues with String index validation fixes in 5.7
Swift 5.7 added stronger index validation for `String`, so some illegal cases that previously triggered inconsistently diagnosed out of bounds accesses now result in reliable runtime errors. Similarly, attempts at applying an index originally vended by a UTF-8 string on a UTF-16 string now result in a reliable runtime error.

As is usually the case, new traps to the stdlib exposes code that contains previously undiagnosed / unreliably diagnosed coding issues.

Allow invalid code in binaries built with earlier versions of the stdlib to continue running with the 5.7 library by disabling some of the new traps based on the version of Swift the binary was built with.

In the case of an index encoding mismatch, allow transcoding of string storage regardless of the direction of the mismatch. (Previously we only allowed transcoding a UTF-8 string to UTF-16.)

rdar://93379333
2022-05-17 19:25:10 -07:00
Karoy Lorentey
5f9828fb88 [stdlib] Don’t reject trailing surrogates in UTF16View overload of String.Index(_:within:)
Fix a long-standing issue where the UTF16View overload of
`String.Index.init(_:within:)` used to return nil for valid indices
that happened to point to a trailing surrogate in a UTF-8-encoded
string.

rdar://91935537
2022-04-18 21:03:44 -07:00
Karoy Lorentey
4d557b0b45 [stdlib] Make String.Index(_:within:) initializers more permissive
In Swift 5.6 and below, (broken) code that acquired indices from a
UTF-16-encoded string bridged from Cocoa and kept using them after a
`makeContiguousUTF8` call (or other mutation) may have appeared to be
working correctly as long as the string was ASCII.

Since https://github.com/apple/swift/pull/41417, the
`String(_:within:)` initializers recognize miscoded indices and reject
them by returning nil. This is technically correct, but it
unfortunately may be a binary compatibility issue, as these used to
return non-nil in previous versions.

Mitigate this issue by accepting UTF-16 indices on a UTF-8 string,
transcoding their offset as needed. (Attempting to use an UTF-8 index
on a UTF-16 string is still rejected — we do not implicitly convert
strings in that direction.)

rdar://89369680
2022-04-18 21:02:14 -07:00
Karoy Lorentey
58ab3fea34 Apply suggestions from code review
Co-authored-by: Alejandro Alonso <alejandro_alonso@apple.com>
2022-04-10 00:14:43 -07:00
Karoy Lorentey
4ad8b26ab3 [stdlib] String.UTF16View: Review/fix index validation
Also, in UTF-16 slices, forward collection methods to the base view
instead of `Slice`, to make behavior a bit easier to understand.

(There is no need to force readers to page in `Slice`
implementations _in addition to_ whatever the base view is doing.)
2022-03-29 20:00:08 -07:00
Karoy Lorentey
90fee621b6 [stdlib] String.UTF16View: Mark foreign indices as UTF-16 encoded 2022-03-24 21:00:00 -07:00
Karoy Lorentey
321284e9a9 [stdlib] Review & fix index validation during String index conversions
- Validate that the index has the same encoding as the string
- Validate that the index is within bounds
2022-03-24 21:00:00 -07:00
Karoy Lorentey
836bf9ad73 [stdlib] Mark index encodings in String.UTF8View & UTF16View 2022-03-24 21:00:00 -07:00
David Smith
c05e47dd60 Only use SIMD when stdlib vector types are available 2022-03-22 15:48:24 -07:00
David Smith
dbaada435c Stay in vectors longer before doing a horizontal sum 2022-03-18 15:27:40 -07:00
David Smith
eaf3f316ec Vectorize UTF16 offset calculations 2022-03-17 14:18:21 -07:00
Hassan
1d4f220ed4 [stdlib] Replace precondition with the internal _precondition 2021-11-04 23:51:10 +02:00
Kuba (Brecka) Mracek
404badb49a Introduce SWIFT_ENABLE_REFLECTION to turn on/off the support for Mirrors and reflection (#33617) 2021-09-08 13:08:13 -07:00
Doug Gregor
9579390024 [SE-0304] Rename ConcurrentValue to Sendable 2021-03-18 22:48:20 -07:00
Doug Gregor
1a1f79c0de Introduce safety checkin for ConcurrentValue conformance.
Introduce checking of ConcurrentValue conformances:
- For structs, check that each stored property conforms to ConcurrentValue
- For enums, check that each associated value conforms to ConcurrentValue
- For classes, check that each stored property is immutable and conforms
  to ConcurrentValue

Because all of the stored properties / associated values need to be
visible for this check to work, limit ConcurrentValue conformances to
be in the same source file as the type definition.

This checking can be disabled by conforming to a new marker protocol,
UnsafeConcurrentValue, that refines ConcurrentValue.
UnsafeConcurrentValue otherwise his no specific meaning. This allows
both "I know what I'm doing" for types that manage concurrent access
themselves as well as enabling retroactive conformance, both of which
are fundamentally unsafe but also quite necessary.

The bulk of this change ended up being to the standard library, because
all conformances of standard library types to the ConcurrentValue
protocol needed to be sunk down into the standard library so they
would benefit from the checking above. There were numerous little
mistakes in the initial pass through the stsandard library types that
have now been corrected.
2021-02-04 03:45:09 -08:00
Valeriy Van
f434cbab05 Fixes example snippets in StringUTF16View.swift 2020-04-27 23:54:04 +02:00
Michael Ilseman
d8f25be9fa [string] Skip unnecessary self UTF-16 length in isEqual
For isEqual bridging comparisons, skip checking our own UTF-16 length when the
string we're comparing against is known to be ASCII.
2020-03-05 16:13:15 -08:00
Michael Ilseman
7ff3ecf2e5 [string] Skip unnecessary self UTF-16 length in isEqual
For isEqual bridging comparisons, skip checking our own UTF-16 length when the
string we're comparing against is known to be ASCII.
2020-03-05 16:13:15 -08:00
Michael Ilseman
0ca42e9ef7 [string] Shrink storage class sizes.
* Don't allocate breadrumbs pointer if under threshold
* Increase breadrumbs threshold
* Linear 16-byte bucketing until 128 bytes, malloc_size after
* Allow cap less than _SmallString.capacity (bridging non-ASCII)

This change decreases the amount of heap usage for moderate-length
strings (< 64 UTF-8 code units in length) and increases the amount of
spare code unit capacity available (less growth needed).

Average improvements for moderate-length strings:

* 64-bit: on average, 8 bytes saved and 4 bytes of extra capacity
* 32-bit: on average, 4 bytes saved and 6 bytes of extra capacity

Additionally, on 32-bit, large-length strings also gain an average of
6 bytes of extra spare capacity.

Details:

On 64-bit, half of moderate-length allocations will save 16 bytes
while the other half get an extra 8 bytes of spare capacity.

On 32-bit, a quarter of moderate-length allocations will save 16
bytes, and the rest get an extra 4 bytes of spare
capacity. Additionally, 32-bit string's storage class now claims its
full allocation, which is its birthright. Prior to this change, we'd
have on average 1.5 bytes of spare capacity, and now we have 7.5 bytes
of spare capacity.

Breadcrumbs threshold is increased from the super-conservative 32 to
the pretty-conservative 64. Some speed improvements are incorporated
in this change, but more are in flight. Even without those eventual
improvements, this is a worthwhile change (ASCII is still fast-pathed
and irrelevant to breadcrumbing).

For a complex real-world workload, this amounts to around a 5%
improvement to transient heap usage due to all strings and a 4%
improvement to peak heap usage due to all strings. For moderate-length
strings specifically, this gives around 11% improvement to both.
2020-03-05 16:10:23 -08:00
Paul Hudson
06f82a53b5 Replaced the majority of ' : ' with ': '. 2019-07-18 20:46:07 +01:00
Michael Ilseman
63a6794cf9 [String] Switch scalar-aligned bit to a reserved bit.
Since scalar-alignment is set in inlinable code, switch the alignment
bit to one of the previously-reserved bits rather than a grapheme
cache bit. Setting a grapheme cache bit in inlinable would break
backward deployment, as older versions would interpret it as a cached
value.

Also adjust the name to "scalar-aligned", which is clearer, and
removed assertion (which should be a real precondition).
2019-07-02 16:25:04 -07:00
Michael Ilseman
bd5a40ff1b [gardening] Add underscore to internal member 2019-06-27 11:11:44 -07:00
Michael Ilseman
4cd1e812b7 [String] Scalar-alignment bug fixes.
Fixes a general category (pun intended) of scalar-alignment bugs
surrounding exchanging non-scalar-aligned indices between views and
for slicing.

SE-0180 unifies the Index type of String and all its views and allows
non-scalar-aligned indices to be used across views. In order to
guarantee behavior, we often have to check and perform scalar
alignment. To speed up these checks, we allocate a bit denoting
known-to-be-aligned, so that the alignment check can skip the
load. The below shows what views need to check for alignment before
they can operate, and whether the indices they produce are aligned.

┌───────────────╥────────────────────┬──────────────────────────┐
│ View          ║ Requires Alignment │ Produces Aligned Indices │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Native UTF8   ║ no                 │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Native UTF16  ║ yes                │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Foreign UTF8  ║ yes                │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Foreign UTF16 ║ no                 │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ UnicodeScalar ║ yes                │ yes                      │
├───────────────╫────────────────────┼──────────────────────────┤
│ Character     ║ yes                │ yes                      │
└───────────────╨────────────────────┴──────────────────────────┘

The "requires alignment" applies to any operation taking a
String.Index that's not defined entirely in terms of other operations
taking a String.Index. These include:

* index(after:)
* index(before:)
* subscript
* distance(from:to:) (since `to` is compared against directly)
* UTF16View._nativeGetOffset(for:)
2019-06-26 16:42:58 -07:00
Ben Cohen
e9d4687e31 De-underscore @frozen, apply it to structs (#24185)
* De-underscore @frozen for enums

* Add @frozen for structs, deprecate @_fixed_layout for them

* Switch usage from _fixed_layout to frozen
2019-05-30 17:55:37 -07:00
Michael Ilseman
f7cdda2720 [gardening] Clean up many String computed vars 2019-04-08 15:16:48 -07:00
Michael Ilseman
415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Michael Ilseman
b01ee7267a [String] Custom iterator for UTF16View (#20929)
Defining a custom iterator for the UTF16View avoid some redundant
computation over the indexing model. This speeds up iteration by
around 40% on non-ASCII strings.
2018-12-01 09:35:27 -08:00
Michael Ilseman
c3c6fdc77f [String] ASCII fast-path for UTF16View (#20848)
Add an isASCII fast-path for many UTF16View operations. These are
heavily utilized in random-access scenarios, allowing us to both be
more efficient and skip generating breadcrumbs for ASCII strings.
2018-11-29 18:19:32 -08:00
David Smith
8b57921905 Assorted bridging changes:
• Convert _AbstractStringStorage to a protocol, and the free functions used to deduplicate implementations to extensions on that protocol.
• Move 'start' into the abstract type and use that to simplify some code
• Move the ASCII fast path for length into UTF16View.
• Add a weirder but faster way to check which (if any) of our NSString subclasses a given object is, and adopt it
2018-11-28 16:04:34 -08:00
Michael Ilseman
4111b21cb0 [String] Bug fix for empty-range getCharacters 2018-11-26 12:13:23 -08:00
Michael Ilseman
3a0ac0270d [stdlib] Unchecked subscript on UnsafeBufferPointer
Add a use an unchecked subscript on UnsafeBufferPointer, which skips
debugPrecondition checks (in case we're not inlined) as well as a
force-unwrap check.
2018-11-16 11:12:29 -08:00
Ben Cohen
1673c12d78 [stdlib] Replace "sanityCheck" with "internalInvariant" (#20616)
* Replace "sanityCheck" with "internalInvariant"
2018-11-15 20:50:22 -08:00
Michael Ilseman
63fe485758 [String] Audit and publish the rest of the ABI 2018-11-15 11:06:33 -08:00
Michael Ilseman
c749123297 [String] DCE and drop inlinable
Remove some more inlinable annotations and drop dead code.
2018-11-15 11:06:30 -08:00
Michael Ilseman
1939d165ae Make corelibs-foundation build
Expose the old SPIs again so corelibs-foundation can build. We'll want
to wean them off of these and establish proper APIs soon.
2018-11-04 10:42:44 -08:00
Michael Ilseman
c04dcf3b38 [String] More efficient breadcrumb-scanning code.
Rather than rely on the UTF16View, scan between breadcrumbs by hand
for a decent 20% speedup. This code will also make it more obvious how
to slot in a vectorized solution later.
2018-11-04 10:42:44 -08:00