Commit Graph

201 Commits

Author SHA1 Message Date
David Smith
31fa54d3b7 Fix off by one 2025-03-10 02:36:12 -07:00
David Smith
5c5b850f0a Vectorize UTF8 scalar counting 2025-03-10 02:15:22 -07:00
Karoy Lorentey
50c2399a94 [stdlib] Work around binary compatibility issues with String index validation fixes in 5.7
Swift 5.7 added stronger index validation for `String`, so some illegal cases that previously triggered inconsistently diagnosed out of bounds accesses now result in reliable runtime errors. Similarly, attempts at applying an index originally vended by a UTF-8 string on a UTF-16 string now result in a reliable runtime error.

As is usually the case, new traps to the stdlib exposes code that contains previously undiagnosed / unreliably diagnosed coding issues.

Allow invalid code in binaries built with earlier versions of the stdlib to continue running with the 5.7 library by disabling some of the new traps based on the version of Swift the binary was built with.

In the case of an index encoding mismatch, allow transcoding of string storage regardless of the direction of the mismatch. (Previously we only allowed transcoding a UTF-8 string to UTF-16.)

rdar://93379333
2022-05-17 19:25:10 -07:00
Karoy Lorentey
4d557b0b45 [stdlib] Make String.Index(_:within:) initializers more permissive
In Swift 5.6 and below, (broken) code that acquired indices from a
UTF-16-encoded string bridged from Cocoa and kept using them after a
`makeContiguousUTF8` call (or other mutation) may have appeared to be
working correctly as long as the string was ASCII.

Since https://github.com/apple/swift/pull/41417, the
`String(_:within:)` initializers recognize miscoded indices and reject
them by returning nil. This is technically correct, but it
unfortunately may be a binary compatibility issue, as these used to
return non-nil in previous versions.

Mitigate this issue by accepting UTF-16 indices on a UTF-8 string,
transcoding their offset as needed. (Attempting to use an UTF-8 index
on a UTF-16 string is still rejected — we do not implicitly convert
strings in that direction.)

rdar://89369680
2022-04-18 21:02:14 -07:00
Karoy Lorentey
cb2194c024 [stdlib] Fix ABI and portability issues 2022-04-13 19:15:30 -07:00
Karoy Lorentey
b33fefb71c [stdlib] String: be more consistent about when markEncoding is called 2022-04-13 18:38:41 -07:00
Karoy Lorentey
bbb004854e [stdlib] Minor enhancements 2022-04-10 16:49:01 -07:00
Karoy Lorentey
67adcabefc Apply notes from code review 2022-04-10 00:14:43 -07:00
Karoy Lorentey
83df814c63 [stdlib] _StringObject.isKnownUTF16 → isForeignUTF8
This fixes a compatibility issue with potential future UTF-8 encoded
foreign String forms, as well as simplifying the code a bit — we no
longer need to do an availability check on inlinable fast paths.

The isForeignUTF8 bit is never set by any past or current stdlib
version, but it allows us to introduce UTF-8 encoded foreign forms
without breaking inlinable index encoding validation introduced in
Swift 5.7.
2022-04-09 21:33:53 -07:00
Karoy Lorentey
4247279889 [stdlib] String.UnicodeScalarView: Optimize replaceSubrange 2022-04-05 20:47:42 -07:00
Karoy Lorentey
8610bdf515 [stdlib] String.unicodeScalars: Add a _modify accessor
This will eliminate unnecessary CoW copies when calling mutating
Unicode scalar view methods directly through this property.
2022-03-29 18:40:25 -07:00
Karoy Lorentey
d58811262d [stdlib] String.UnicodeScalarView: Review index validation 2022-03-29 18:40:25 -07:00
Karoy Lorentey
321284e9a9 [stdlib] Review & fix index validation during String index conversions
- Validate that the index has the same encoding as the string
- Validate that the index is within bounds
2022-03-24 21:00:00 -07:00
Karoy Lorentey
836bf9ad73 [stdlib] Mark index encodings in String.UTF8View & UTF16View 2022-03-24 21:00:00 -07:00
Karoy Lorentey
a44997eeea [stdlib] Factor scalar-aligned String index validation out into a set of common routines
There are three flavors, corresponding to i < endIndex, i <= endIndex, and range containment checks.
Additionally, we have separate variants for index validation in substrings.
2022-03-24 21:00:00 -07:00
Karoy Lorentey
683b9fa021 [stdlib] Adjust/fix String’s indexing operations to deal with the consequences of SE-0180 2022-03-24 20:59:59 -07:00
Karoy Lorentey
6f9edc8be8 [stdlib] String.UnicodeScalarView: Fix some index validation issues
`String.UnicodeScalarView` currently lacks proper index validation in its `index(after:)` and `index(before:)` methods, leading to out-of-bounds memory accesses when index navigation methods in this view are given invalid indices.

(Also see https://github.com/apple/swift/pull/41598 and https://github.com/apple/swift/pull/41417)

rdar://89498541
2022-02-28 19:49:22 -08:00
Hassan
1d4f220ed4 [stdlib] Replace precondition with the internal _precondition 2021-11-04 23:51:10 +02:00
Kuba (Brecka) Mracek
404badb49a Introduce SWIFT_ENABLE_REFLECTION to turn on/off the support for Mirrors and reflection (#33617) 2021-09-08 13:08:13 -07:00
Doug Gregor
9579390024 [SE-0304] Rename ConcurrentValue to Sendable 2021-03-18 22:48:20 -07:00
Doug Gregor
1a1f79c0de Introduce safety checkin for ConcurrentValue conformance.
Introduce checking of ConcurrentValue conformances:
- For structs, check that each stored property conforms to ConcurrentValue
- For enums, check that each associated value conforms to ConcurrentValue
- For classes, check that each stored property is immutable and conforms
  to ConcurrentValue

Because all of the stored properties / associated values need to be
visible for this check to work, limit ConcurrentValue conformances to
be in the same source file as the type definition.

This checking can be disabled by conforming to a new marker protocol,
UnsafeConcurrentValue, that refines ConcurrentValue.
UnsafeConcurrentValue otherwise his no specific meaning. This allows
both "I know what I'm doing" for types that manage concurrent access
themselves as well as enabling retroactive conformance, both of which
are fundamentally unsafe but also quite necessary.

The bulk of this change ended up being to the standard library, because
all conformances of standard library types to the ConcurrentValue
protocol needed to be sunk down into the standard library so they
would benefit from the checking above. There were numerous little
mistakes in the initial pass through the stsandard library types that
have now been corrected.
2021-02-04 03:45:09 -08:00
Valeriy Van
b0425a6cd8 Fixes example snippet 2020-04-26 22:08:27 +02:00
Paul Hudson
06f82a53b5 Replaced the majority of ' : ' with ': '. 2019-07-18 20:46:07 +01:00
Michael Ilseman
63a6794cf9 [String] Switch scalar-aligned bit to a reserved bit.
Since scalar-alignment is set in inlinable code, switch the alignment
bit to one of the previously-reserved bits rather than a grapheme
cache bit. Setting a grapheme cache bit in inlinable would break
backward deployment, as older versions would interpret it as a cached
value.

Also adjust the name to "scalar-aligned", which is clearer, and
removed assertion (which should be a real precondition).
2019-07-02 16:25:04 -07:00
Michael Ilseman
bd5a40ff1b [gardening] Add underscore to internal member 2019-06-27 11:11:44 -07:00
Michael Ilseman
4cd1e812b7 [String] Scalar-alignment bug fixes.
Fixes a general category (pun intended) of scalar-alignment bugs
surrounding exchanging non-scalar-aligned indices between views and
for slicing.

SE-0180 unifies the Index type of String and all its views and allows
non-scalar-aligned indices to be used across views. In order to
guarantee behavior, we often have to check and perform scalar
alignment. To speed up these checks, we allocate a bit denoting
known-to-be-aligned, so that the alignment check can skip the
load. The below shows what views need to check for alignment before
they can operate, and whether the indices they produce are aligned.

┌───────────────╥────────────────────┬──────────────────────────┐
│ View          ║ Requires Alignment │ Produces Aligned Indices │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Native UTF8   ║ no                 │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Native UTF16  ║ yes                │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Foreign UTF8  ║ yes                │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Foreign UTF16 ║ no                 │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ UnicodeScalar ║ yes                │ yes                      │
├───────────────╫────────────────────┼──────────────────────────┤
│ Character     ║ yes                │ yes                      │
└───────────────╨────────────────────┴──────────────────────────┘

The "requires alignment" applies to any operation taking a
String.Index that's not defined entirely in terms of other operations
taking a String.Index. These include:

* index(after:)
* index(before:)
* subscript
* distance(from:to:) (since `to` is compared against directly)
* UTF16View._nativeGetOffset(for:)
2019-06-26 16:42:58 -07:00
Ben Cohen
e9d4687e31 De-underscore @frozen, apply it to structs (#24185)
* De-underscore @frozen for enums

* Add @frozen for structs, deprecate @_fixed_layout for them

* Switch usage from _fixed_layout to frozen
2019-05-30 17:55:37 -07:00
Michael Ilseman
f7cdda2720 [gardening] Clean up many String computed vars 2019-04-08 15:16:48 -07:00
Michael Ilseman
4967fc08eb [Unicode] Add convenience APIs to Unicode encodings
Add convenience APIs to the stdlib's Unicode encodings:

* Unicode.UTF16
  * isASCII
  * isSurrogate
* Unicode.UTF8
  * isASCII
  * width
* Unicode.UTF32
  * isASCII
* Unicode.ASCII
  * isASCII

Tests added
2019-03-29 15:43:00 -07:00
Michael Ilseman
415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Lance Parker
15aaa1e777 [stdlib]String normalization functions (#21026)
* fast/foreignNormalize functions
2019-01-08 13:55:29 -08:00
Michael Ilseman
b01ee7267a [String] Custom iterator for UTF16View (#20929)
Defining a custom iterator for the UTF16View avoid some redundant
computation over the indexing model. This speeds up iteration by
around 40% on non-ASCII strings.
2018-12-01 09:35:27 -08:00
Ben Cohen
1673c12d78 [stdlib] Replace "sanityCheck" with "internalInvariant" (#20616)
* Replace "sanityCheck" with "internalInvariant"
2018-11-15 20:50:22 -08:00
Michael Ilseman
abe101c5b9 [String] Custom iterator for UnicodeScalarView
Provide a custom iterator rather than relying a the IndexingIterator,
as an indexing model is less efficient for stateful processing of
strings. Provides around a 30% speedup.
2018-11-08 18:00:39 -08:00
Michael Ilseman
948655e850 [String] Cleanups, comments, documentation
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
2018-11-04 10:42:42 -08:00
Michael Ilseman
7aea40680d [String] NFC iterator fast-paths
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.

Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
2018-11-04 10:42:41 -08:00
Michael Ilseman
8851bac1be [String] Inlining, NFC fast paths, and more.
Add inlinability annotations to restore performance parity with 4.2 String.

Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.

RRC improvements and optmizations.
2018-11-04 10:42:41 -08:00
Michael Ilseman
9d9f9005e3 [String] Define performance flags and plumb them throughout 2018-11-04 10:42:41 -08:00
Michael Ilseman
c51aa5988f [String] Cleanup normalization code.
Clean up some of the code surrounding the normalized code unit
iterator.
2018-11-04 10:42:41 -08:00
Lance Parker
f1a35bd1c9 String comparison iterator for UTF8 strings 2018-11-04 10:42:41 -08:00
Michael Ilseman
a0e639eaf5 [String] Grapheme breaking fast-paths
Add in our scalar-based fast-paths for UTF-8 and foreign strings, and
update the grapheme cache.
2018-11-04 10:42:40 -08:00
Michael Ilseman
fe7c3ce2e4 [String] Refactorings and cleanup
* Refactor out RRC implementation into dedicated file.

* Change our `_invariantCheck` pattern to generate efficient code in
  asserts builds and make the optimizer job's easier.

* Drop a few Bidi shims we no longer need.

* Restore View decls to String, workaround no longer needed

* Cleaner unicode helper facilities
2018-11-04 10:42:40 -08:00
Michael Ilseman
f23a3c19b8 [String] Bounds checking and Index cleanup 2018-11-04 10:42:40 -08:00
Michael Ilseman
89d18e1a3a [String] Refactor helper code into UnicodeHelpers.swift.
Clean up some of the index assumptions, stick index-aware methods on
_StringGuts, and otherwise migrate code over to UnicodeHelpers.swift.
2018-11-04 10:42:40 -08:00
Michael Ilseman
4ab45dfe20 [String] Drop in initial UTF-8 String prototype
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.

My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
2018-11-04 10:42:40 -08:00
Michael Ilseman
8294c0003a [string] Drop _StringGuts subscript; NFC
_StringGuts shouldn't expose a subscript, implying efficient
access. Switch to the explicit code unit fetch method. Update tests
accordingly, and switch off of deprecated typealiases.
2018-08-02 16:34:22 -07:00
Ben Cohen
a4230ab2ad [stdlib] Update stdlib to 4.0 and reorganize compatibility shims (#17580)
* Update stdlib to 4.0 and move all compatibility shims into a dedicated source file
2018-06-29 06:26:52 -07:00
Ben Cohen
92b6d8cb8f Remove inlineability from mirrors (#17476) 2018-06-25 19:54:13 -07:00
Michael Ilseman
3ee17102ed [String.Index] Restore compound offsets.
Move the shifts to index creation time rather than index comparison
time. This seems to benefit micro benchmarks and cover up
inefficiencies in our generic index distance calculations.
2018-05-25 09:54:35 -07:00
Michael Ilseman
4a368ab46c [string] Drop many @inlinable from big API.
Drop append-related @inlinable annotations for String, StringGuts,
StringStorage, and the Views. Drop several for larger operations, such
as case conversion. Drop as many as we can from StringGuts for now.
2018-05-13 07:38:55 -07:00