Commit Graph

41 Commits

Author SHA1 Message Date
David Smith def9ee7464 Introduce a "single breadcrumb mode" for Strings decoded from UTF16. (#83987)
This allows us to quickly answer .utf16.count without requiring
additional allocations

Fixes rdar://160656317
2026-03-26 18:09:18 -07:00
Guillaume Lessard 4b96f38864 Adjust test to be more resilient regarding inlining. 2025-10-29 17:38:15 -07:00
Michael Ilseman 2815cf9322 Adjust test to be more resilient w.r.t. inlining. 2025-09-26 10:48:11 -06:00
Karoy Lorentey 29cf262ef7 [stdlib] String.Index: conform to CustomDebugStringConvertible instead
Apply the LSG’s modifications as detailed in their review notes.
2024-10-07 17:00:13 -07:00
Karoy Lorentey 01792372a9 [stdlib] String.Index: conform to CustomStringConvertible
This better exposes the internals of string indices, demystifying their operation and radically simplifying working with them.
2024-10-07 16:02:16 -07:00
Erik Eckstein 7f54c63b29 tests: Disable some tests which fail due to problems in Foundation
Those tests should be part of the Foundation overlay, which is no longer part of the Swift project.

rdar://112643333
2023-07-24 08:34:06 +02:00
Karoy Lorentey 67f049ed10 [test] Reenable stdlib/StringIndex, requiring optimized_stdlib 2023-01-06 13:13:03 -08:00
Karoy Lorentey 2f0eab7fbf [test] Disable test/stdlib/StringIndex.swift to unblock CI 2023-01-05 17:45:26 -08:00
Karoy Lorentey c94556165e Merge pull request #62794 from lorentey/character-recognizer
[stdlib] Export grapheme breaking facility
2023-01-04 23:54:48 -08:00
Karoy Lorentey d358ece41d Merge pull request #62798 from lorentey/string-index-rounding
[stdlib] Expose index rounding entry points
2023-01-04 21:24:32 -08:00
Karoy Lorentey 4ffc5fe737 Merge pull request #62717 from lorentey/string-utf16-speedup
[stdlib] Speed up short UTF-16 distance calculations
2023-01-04 21:20:41 -08:00
Karoy Lorentey cd550160a1 [test] Cleanup 2023-01-03 16:08:14 -08:00
Karoy Lorentey 051f9ede46 [test] String.UTF16View: Add some basic collection tests
Evidently we did not have any tests that exercised
`distance(from:to:)` and `index(_:offsetBy:)`. :-O
2023-01-01 20:58:24 -08:00
Karoy Lorentey 9dd7475d88 [test] Fix and reenable a Substring.removeSubrange test 2022-12-31 17:58:26 -08:00
Karoy Lorentey f8b997b068 [test] Add tests for string index rounding 2022-12-31 17:42:35 -08:00
Karoy Lorentey 4d9edad297 [test] Improve grapheme breaking tests
Instead of just checking the number of breaks in each test case,
expose and check the actual positions of those breaks, too.
2022-12-29 17:56:45 -08:00
Alejandro Alonso bff02ddfbd Disable a test in StringIndex
update

add code
2022-09-08 09:18:18 -07:00
Alejandro Alonso 284f8d4fdd Fix Substring.removeSubrange for entire substring (#60744)
fix start and end

fix test
2022-08-25 16:08:33 -07:00
Karoy Lorentey 50c2399a94 [stdlib] Work around binary compatibility issues with String index validation fixes in 5.7
Swift 5.7 added stronger index validation for `String`, so some illegal cases that previously triggered inconsistently diagnosed out of bounds accesses now result in reliable runtime errors. Similarly, attempts at applying an index originally vended by a UTF-8 string on a UTF-16 string now result in a reliable runtime error.

As is usually the case, new traps to the stdlib exposes code that contains previously undiagnosed / unreliably diagnosed coding issues.

Allow invalid code in binaries built with earlier versions of the stdlib to continue running with the 5.7 library by disabling some of the new traps based on the version of Swift the binary was built with.

In the case of an index encoding mismatch, allow transcoding of string storage regardless of the direction of the mismatch. (Previously we only allowed transcoding a UTF-8 string to UTF-16.)

rdar://93379333
2022-05-17 19:25:10 -07:00
Josh Soref 624a54b9cf Spelling stdlib (#42544)
* spelling: abcdefghijklmnopqrstuvwxyz

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: clazz

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: collection

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: compressible

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: constituent

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: contiguous

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: convertibility

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: element

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: enforce

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: exhaustive

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: exhausts

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: existential

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: facilitate

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: ignored

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: incorporated

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: intersection

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: laziness

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: misaligned

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: overhaul

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: preamble

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: precondition

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: replacement

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: trailing

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: unambiguous

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: uncompressible

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

* spelling: world

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>

Co-authored-by: Josh Soref <jsoref@users.noreply.github.com>
2022-04-22 19:18:38 -07:00
Karoy Lorentey e21b846828 [test] Add new tests for String.Index(_:within:) 2022-04-18 20:57:54 -07:00
Karoy Lorentey 318277c3aa [test] stdlib/StringIndex: Spin off O(n^4) substring replacement test into a standalone long test
Also trim down its input a bit so that this doesn’t take 20 minutes.
2022-04-15 21:29:32 -07:00
Karoy Lorentey 66a8ae07dc [test] Move string test helper methods to StdlibUnittest
This fixes a Windows regression triggered by
https://github.com/apple/swift/pull/41417.
2022-04-14 21:36:45 -07:00
Karoy Lorentey cb2194c024 [stdlib] Fix ABI and portability issues 2022-04-13 19:15:30 -07:00
Karoy Lorentey 71216009e3 [test] Move useful helpers into StdlibUnicodeUnittest 2022-04-06 20:11:05 -07:00
Karoy Lorentey 42c823847e [test] stdlib/StringIndex: Simplify 2022-04-05 20:47:42 -07:00
Karoy Lorentey 1c9c5ccbf6 [test] test/StringIndex: Add some tests exercising replaceSubrange
The exhaustive substring.replaceSubrange test probably takes too long
to include in regular testing, but let’s enable it for now: it has
caught a bunch of problems already and it will probably catch more
before this lands.
2022-03-29 20:00:08 -07:00
Karoy Lorentey 06090ce7f2 [test] Add more String coverage 2022-03-29 20:00:08 -07:00
Karoy Lorentey 99f693e4ba [test] stdlib/StringIndex: Review & extend with more cases 2022-03-24 21:00:00 -07:00
Karoy Lorentey 683b9fa021 [stdlib] Adjust/fix String’s indexing operations to deal with the consequences of SE-0180 2022-03-24 20:59:59 -07:00
Kuba (Brecka) Mracek 15400c3ef7 Set build-swift-stdlib-unicode-data=0 for the freestanding preset (#41258) 2022-02-08 12:31:49 -08:00
Karoy Lorentey e2cfab4f28 [stdlib][test] Adopt availability macros in tests 2021-10-31 15:00:58 -07:00
Michael Ilseman 774788ac18 [test] Disable misaligned indices test prior to 5.1
Misaligned indices were fixed in 5.1, but we should disable the test
when testing back deployment.

Adds a shared helper to StdlibUnittest for the run time check.
2019-08-27 15:13:38 -07:00
Stephen Canon dc5915cdb5 Replace stdlib and test/stdlib 9999 availability. (#26108)
* Replace stdlib and test/stdlib 9999 availability.

macOS 9999 -> macOS 10.15
iOS 9999 -> iOS 13
tvOS 9999 -> tvOS 13
watchOS 9999 -> watchOS 6

* Restore the pre-10.15 version of public init?(_: NSRange, in: __shared String)

We need this to allow master to work on 10.14 systems (in particular, to allow PR testing to work correctly without disabling back-deployment tests).
2019-07-12 16:30:36 -04:00
Michael Ilseman 4cd1e812b7 [String] Scalar-alignment bug fixes.
Fixes a general category (pun intended) of scalar-alignment bugs
surrounding exchanging non-scalar-aligned indices between views and
for slicing.

SE-0180 unifies the Index type of String and all its views and allows
non-scalar-aligned indices to be used across views. In order to
guarantee behavior, we often have to check and perform scalar
alignment. To speed up these checks, we allocate a bit denoting
known-to-be-aligned, so that the alignment check can skip the
load. The below shows what views need to check for alignment before
they can operate, and whether the indices they produce are aligned.

┌───────────────╥────────────────────┬──────────────────────────┐
│ View          ║ Requires Alignment │ Produces Aligned Indices │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Native UTF8   ║ no                 │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Native UTF16  ║ yes                │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Foreign UTF8  ║ yes                │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Foreign UTF16 ║ no                 │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ UnicodeScalar ║ yes                │ yes                      │
├───────────────╫────────────────────┼──────────────────────────┤
│ Character     ║ yes                │ yes                      │
└───────────────╨────────────────────┴──────────────────────────┘

The "requires alignment" applies to any operation taking a
String.Index that's not defined entirely in terms of other operations
taking a String.Index. These include:

* index(after:)
* index(before:)
* subscript
* distance(from:to:) (since `to` is compared against directly)
* UTF16View._nativeGetOffset(for:)
2019-06-26 16:42:58 -07:00
Michael Ilseman c36daeb106 [tests] Adjust tests for Linux 2019-04-01 09:29:20 -07:00
Michael Ilseman 3923fb2268 [String] String.Index.init(_:within:) bounds checks
Bounds check the given index for String.Index's generic initializer
that makes sure a passed index is a valid one for the given
StringProtocol.
2019-03-29 15:43:00 -07:00
Michael Ilseman b19c2cf9c3 [String] Add generic String.Index and range inits within a String
Adds a generic version of String.Index.init?(_:within:) and
Range<String.Index>.init?(_:in:).

Tests added
2019-03-29 15:43:00 -07:00
Michael Ilseman 415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Michael Ilseman 614016fecd [String.Index] Simplify and prepare for more resilience.
Simplify String.Index by sinking transcoded offsets into the .utf8
variant. This is in preparation for a more resilient index type
capable of supporting existential string indices.
2018-05-24 14:47:04 -07:00
Michael Ilseman 7d64d49917 [tests] Add some quick new String index testing 2018-05-14 07:01:44 -07:00