Commit Graph

13 Commits

Author SHA1 Message Date
Alejandro Alonso
98aaa157ec Implement native normalization for String
use >/< instead of !=

fix some bugs

fix
2021-09-29 14:20:21 -07:00
Karoy Lorentey
d83a4257f0 [stdlib] Don’t use assert() in the stdlib
assert() is designed to be used in user code only; the equivalent stdlib function is called _internalInvariant().

rdar://57101013
2019-12-11 19:23:46 -08:00
Michael Ilseman
4967fc08eb [Unicode] Add convenience APIs to Unicode encodings
Add convenience APIs to the stdlib's Unicode encodings:

* Unicode.UTF16
  * isASCII
  * isSurrogate
* Unicode.UTF8
  * isASCII
  * width
* Unicode.UTF32
  * isASCII
* Unicode.ASCII
  * isASCII

Tests added
2019-03-29 15:43:00 -07:00
Michael Ilseman
415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Lance Parker
15aaa1e777 [stdlib]String normalization functions (#21026)
* fast/foreignNormalize functions
2019-01-08 13:55:29 -08:00
Michael Ilseman
1706d4c02d [String] Refactor and fast-path normalization
Refactor some normalization queries into StringNormalization.swift,
and add more latiny (<0x300) fast-paths.
2018-12-03 13:22:57 -08:00
Michael Ilseman
948655e850 [String] Cleanups, comments, documentation
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
2018-11-04 10:42:42 -08:00
Michael Ilseman
4ab45dfe20 [String] Drop in initial UTF-8 String prototype
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.

My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
2018-11-04 10:42:40 -08:00
Tony Allevato
54f4c77ce7 [stdlib] Revert hasNormalizationBoundaryBefore
This property is too specific in that it forces a particular normalization; let's not expose it this way, but instead in the future with a full normalization API.
2018-04-22 12:01:03 -07:00
Tony Allevato
5a50f27ae9 [stdlib] Migrate normalization usage to public properties 2018-03-28 06:55:53 -07:00
Lance Parker
0661de22a2 [stdlib]Un-revert string comparison (#14694)
Restore (un-revert) sting comparison, with fixes

More exhaustive testing of opaque strings, which consistently reproduces prior sporadic failure. Shims fixups. Some test tweaking.
2018-02-18 10:50:33 -08:00
Lance Parker
abe6a6d177 Revert string comparison (#14657) 2018-02-15 14:37:43 -08:00
Lance Parker
897963a6f8 Unified String comparison strategy for all platforms 2018-02-14 15:44:11 -08:00