Commit Graph

15 Commits

Author SHA1 Message Date
Michael Ilseman
4cd1e812b7 [String] Scalar-alignment bug fixes.
Fixes a general category (pun intended) of scalar-alignment bugs
surrounding exchanging non-scalar-aligned indices between views and
for slicing.

SE-0180 unifies the Index type of String and all its views and allows
non-scalar-aligned indices to be used across views. In order to
guarantee behavior, we often have to check and perform scalar
alignment. To speed up these checks, we allocate a bit denoting
known-to-be-aligned, so that the alignment check can skip the
load. The below shows what views need to check for alignment before
they can operate, and whether the indices they produce are aligned.

┌───────────────╥────────────────────┬──────────────────────────┐
│ View          ║ Requires Alignment │ Produces Aligned Indices │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Native UTF8   ║ no                 │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Native UTF16  ║ yes                │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Foreign UTF8  ║ yes                │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Foreign UTF16 ║ no                 │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ UnicodeScalar ║ yes                │ yes                      │
├───────────────╫────────────────────┼──────────────────────────┤
│ Character     ║ yes                │ yes                      │
└───────────────╨────────────────────┴──────────────────────────┘

The "requires alignment" applies to any operation taking a
String.Index that's not defined entirely in terms of other operations
taking a String.Index. These include:

* index(after:)
* index(before:)
* subscript
* distance(from:to:) (since `to` is compared against directly)
* UTF16View._nativeGetOffset(for:)
2019-06-26 16:42:58 -07:00
Michael Ilseman
415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Michael Ilseman
e6582c37ee [test] Adjust String tests for UTF-8 representation.
Adjust tests for the UTF-8 representation, in preparation for 32-bit
support. Includes UTF-8 literal update.
2018-11-04 10:42:41 -08:00
Arnold Schwaighofer
2d8a1dbbfe Codesign test/stdlib 2018-08-10 06:58:40 -07:00
Ben Cohen
a70e857d59 [stdlib] Fix issue with UTF16 index(_:offsetBy:limitedBy) (#12378)
* Fix issue with empty string ranges

* Add tests for basic offsetBy operation on UTF16View.Index
2017-10-16 13:44:51 -07:00
Lance Parker
acf94f94ec Add -Onone to all the _Debug tests 2017-09-18 12:56:52 -07:00
Dave Abrahams
9159239995 Un-revert "[stdlib] String index interchange, etc." (#10812)
I failed to merge the upstream changes to swift-corelibs-foundation at the same
time as I merged that #9806, and it broke on linux. Going to get it right this
time.
2017-07-07 12:13:25 -07:00
Xi Ge
d9fb110674 Revert "[stdlib] String index interchange, etc." (#10812)
rdar://33186295
2017-07-07 12:03:16 -07:00
Dave Abrahams
2e0bb2f533 [stdlib] String index interchange, part II (UTF16) 2017-07-07 06:15:23 -07:00
Dmitri Gribenko
984210aa53 tests: replace '// RUN: rm -rf' '// RUN: mkdir' pairs with '%empty-directory(...)'
These changes were made using a script.
2017-06-04 11:08:39 -07:00
Max Moiseev
9fc37efee4 [test] renaming test/1_stdlib to just test/stdlib 2016-09-01 16:51:43 -07:00
Jordan Rose
e83c117c30 [test] Hack: run stdlib tests first to start long-running tests earlier.
This decreases total testing time by over a minute on my old Mac Pro.
It probably has much less effect on systems with fewer cores, but shouldn't
be any worse there.

Swift SVN r22745
2014-10-15 01:30:51 +00:00
Dmitri Hrybenko
5c71e5075e stdlib/String: add tests for trapping on out-of-range String.UTF16View indexes
rdar://17539993


Swift SVN r20246
2014-07-21 08:44:40 +00:00
Dmitri Hrybenko
e6d2194b22 stdlib/String: trap when attempting to increment or index using
String.UTF8View.endIndex

rdar://17539993

Swift SVN r20244
2014-07-21 08:31:30 +00:00
Dmitri Hrybenko
4814e00fda stdlib/String: implement Unicode extended grapheme cluster segmentation
algorithm

The implementation uses a specialized trie that has not been tuned to the table
data.  I tried guessing parameter values that should work well, but did not do
any performance measurements.

There is no efficient way to initialize arrays with static data in Swift.  The
required tables are being generated as C++ code in the runtime library.

rdar://16013860


Swift SVN r19340
2014-06-30 14:38:53 +00:00