Commit Graph

19 Commits

Author SHA1 Message Date
Clinton Nkwocha 2385488c84 Generalize String functions for typed throws (#87495)
Generalizes `String`:
- `init(unsafeUninitializedCapacity:initializingUTF8With:)`
- `withCString(_:)`
- `withCString(encodedAs:_:)`
- `withUTF8(_:)`
- `_withNFCCodeUnits(_:)`

and `Substring`:
  - `withCString(_:)`
  - `withCString(encodedAs:_:)`
  - `withUTF8(_:)`

for typed throws.
2026-03-10 11:27:06 +00:00
Michael Ilseman e6e4bd6056 UTF8Span (#78531)
Add support for UTF8Span

Also, refactor validation and grapheme breaking
2025-04-11 16:11:11 -06:00
Doug Gregor 22eecacc35 Adopt unsafe annotations throughout the standard library 2025-02-26 14:28:01 -08:00
Karl d66e06a971 Remove unnecessary import 2023-10-15 20:03:52 +02:00
Karl e9f11d70a6 [NFC][In Both Senses] Use _NormData type instead of performing a lookup directly 2023-10-15 19:36:10 +02:00
Karoy Lorentey a3435704f0 [stdlib][NFC] String normalization: fix terminology (index ⟹ offset) 2022-03-24 21:00:00 -07:00
Alejandro Alonso 98aaa157ec Implement native normalization for String
use >/< instead of !=

fix some bugs

fix
2021-09-29 14:20:21 -07:00
Karoy Lorentey d83a4257f0 [stdlib] Don’t use assert() in the stdlib
assert() is designed to be used in user code only; the equivalent stdlib function is called _internalInvariant().

rdar://57101013
2019-12-11 19:23:46 -08:00
Michael Ilseman 4967fc08eb [Unicode] Add convenience APIs to Unicode encodings
Add convenience APIs to the stdlib's Unicode encodings:

* Unicode.UTF16
  * isASCII
  * isSurrogate
* Unicode.UTF8
  * isASCII
  * width
* Unicode.UTF32
  * isASCII
* Unicode.ASCII
  * isASCII

Tests added
2019-03-29 15:43:00 -07:00
Michael Ilseman 415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Lance Parker 15aaa1e777 [stdlib]String normalization functions (#21026)
* fast/foreignNormalize functions
2019-01-08 13:55:29 -08:00
Michael Ilseman 1706d4c02d [String] Refactor and fast-path normalization
Refactor some normalization queries into StringNormalization.swift,
and add more latiny (<0x300) fast-paths.
2018-12-03 13:22:57 -08:00
Michael Ilseman 948655e850 [String] Cleanups, comments, documentation
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
2018-11-04 10:42:42 -08:00
Michael Ilseman 4ab45dfe20 [String] Drop in initial UTF-8 String prototype
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.

My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
2018-11-04 10:42:40 -08:00
Tony Allevato 54f4c77ce7 [stdlib] Revert hasNormalizationBoundaryBefore
This property is too specific in that it forces a particular normalization; let's not expose it this way, but instead in the future with a full normalization API.
2018-04-22 12:01:03 -07:00
Tony Allevato 5a50f27ae9 [stdlib] Migrate normalization usage to public properties 2018-03-28 06:55:53 -07:00
Lance Parker 0661de22a2 [stdlib]Un-revert string comparison (#14694)
Restore (un-revert) sting comparison, with fixes

More exhaustive testing of opaque strings, which consistently reproduces prior sporadic failure. Shims fixups. Some test tweaking.
2018-02-18 10:50:33 -08:00
Lance Parker abe6a6d177 Revert string comparison (#14657) 2018-02-15 14:37:43 -08:00
Lance Parker 897963a6f8 Unified String comparison strategy for all platforms 2018-02-14 15:44:11 -08:00