Commit Graph

54 Commits

Author SHA1 Message Date
Michael Ilseman
e6e4bd6056 UTF8Span (#78531)
Add support for UTF8Span

Also, refactor validation and grapheme breaking
2025-04-11 16:11:11 -06:00
Evan Wilde
ddaf003c56 Get stdlib building again
PR 79186 (https://github.com/swiftlang/swift/pull/79186) moved one of
the mandatory passes from the C++ implementation to the Swift
implementation resulting in a compiler that is unable to build the
standard library. The pass used to ensure that inaccessible control-flow
positions after an infinite loop was marked with `unreachable` in SIL.
Since the pass is no longer running, any function that returns a value
that also has an infinite loop internally must place a fatalError after
the infinite loop or it will fail to compile as the compiler will
determine that the function does not return from all control flow paths
even though some of the paths are unreachable.
2025-03-06 13:32:54 -08:00
Doug Gregor
22eecacc35 Adopt unsafe annotations throughout the standard library 2025-02-26 14:28:01 -08:00
Karl
7a57bd8ae4 [stdlib] Refactor Unicode normalization (#73590)
* [stdlib] Refactor Unicode normalization

* Tweak inlining
2024-05-31 08:22:37 -06:00
Alejandro Alonso
c9115e1cec Create wrapper types with availability NFD and NFC
Swap the bases to unicodeScalarView
2022-04-05 09:06:32 -07:00
swift-ci
691b62f84f Merge pull request #40314 from Catfish-Man/count-von-count 2021-11-30 16:09:31 -08:00
David Smith
3d8c29eaea Early out for unequal NFC counts in String == 2021-11-29 19:49:00 -08:00
Alejandro Alonso
014e822cb2 Address Michael's comments
fix infinite recursion bug

NFC: Remove early ccc check

remember that false is turned on
2021-09-29 14:20:22 -07:00
Alejandro Alonso
98aaa157ec Implement native normalization for String
use >/< instead of !=

fix some bugs

fix
2021-09-29 14:20:21 -07:00
Ben Cohen
e9d4687e31 De-underscore @frozen, apply it to structs (#24185)
* De-underscore @frozen for enums

* Add @frozen for structs, deprecate @_fixed_layout for them

* Switch usage from _fixed_layout to frozen
2019-05-30 17:55:37 -07:00
Michael Ilseman
4967fc08eb [Unicode] Add convenience APIs to Unicode encodings
Add convenience APIs to the stdlib's Unicode encodings:

* Unicode.UTF16
  * isASCII
  * isSurrogate
* Unicode.UTF8
  * isASCII
  * width
* Unicode.UTF32
  * isASCII
* Unicode.ASCII
  * isASCII

Tests added
2019-03-29 15:43:00 -07:00
Michael Ilseman
415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
samding01
197c9634b4 fixed small String comparsion for big_endian 2019-01-21 19:09:07 +00:00
Lance Parker
15aaa1e777 [stdlib]String normalization functions (#21026)
* fast/foreignNormalize functions
2019-01-08 13:55:29 -08:00
Michael Ilseman
b08d94d6ba [String] In-register smol ASCII string compare
Compare small strings in-register when they store ASCII (and thus NFC)
contents.
2018-12-05 18:16:46 -08:00
Michael Ilseman
8530a2c940 [String] Hand-increment loop variable for perf.
Hand-incrementing the loop variable allows us to skip overflow
detection, and will permit more vectorization improvements in the
future. For now, it gives us perf improvements in nano-benchmarks.
2018-12-04 11:51:21 -08:00
Michael Ilseman
c0c530aef8 [String] Speed up constant factors on comparison.
Include some tuning and tweaking to reduce the constant factors
involved in string comparison. This yields considerable improvement on
our micro-benchmarks, and allows us to make less inlinable code and
have a smaller ABI surface area.

Adds more extensive testing of corner cases in our existing
fast-paths.
2018-12-03 15:49:38 -08:00
Michael Ilseman
94942c5b3b [String] Fix corner case in comparison fast-path. (#20937)
When in a post-binary-prefix-scan fast-path, we need to make sure we
are comparing a full-segment scalar, otherwise we miss situations
where a combining end-of-segment scalar would be reordered with a
prior combining scalar in the same segment under normalization in one
string but not the other.

This was hidden by the fact that many combining scalars are not
NFC_QC=maybe, but those which are not present in any precomposed form
have NFC_QC=yes. Added tests.
2018-12-03 10:41:45 -08:00
Michael Ilseman
3a0ac0270d [stdlib] Unchecked subscript on UnsafeBufferPointer
Add a use an unchecked subscript on UnsafeBufferPointer, which skips
debugPrecondition checks (in case we're not inlined) as well as a
force-unwrap check.
2018-11-16 11:12:29 -08:00
Ben Cohen
1673c12d78 [stdlib] Replace "sanityCheck" with "internalInvariant" (#20616)
* Replace "sanityCheck" with "internalInvariant"
2018-11-15 20:50:22 -08:00
Michael Ilseman
53ccd9e054 [string] Less inlining for code size.
Less inlining for hashing and comparison. Saves code size on very
frequent String comparison in exchange for costing us in some of our
ridiculuous micro-benchmarks.

Also adds in more _effects for better codegen
2018-11-04 10:42:42 -08:00
Michael Ilseman
948655e850 [String] Cleanups, comments, documentation
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
2018-11-04 10:42:42 -08:00
Michael Ilseman
d5da6fdbfd [String] More comparison speedups and cleanup 2018-11-04 10:42:41 -08:00
Michael Ilseman
7aea40680d [String] NFC iterator fast-paths
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.

Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
2018-11-04 10:42:41 -08:00
Lance Parker
7376009ccc Add benchmarks and tests for the normalized iterator (#32)
Add benchmarks and tests for the normalized iterator
2018-11-04 10:42:41 -08:00
Michael Ilseman
8851bac1be [String] Inlining, NFC fast paths, and more.
Add inlinability annotations to restore performance parity with 4.2 String.

Take advantage of known NFC as a fast-path for comparison, and
overhaul comparison dispatch.

RRC improvements and optmizations.
2018-11-04 10:42:41 -08:00
Michael Ilseman
9d9f9005e3 [String] Define performance flags and plumb them throughout 2018-11-04 10:42:41 -08:00
Lance Parker
f1a35bd1c9 String comparison iterator for UTF8 strings 2018-11-04 10:42:41 -08:00
Michael Ilseman
f23a3c19b8 [String] Bounds checking and Index cleanup 2018-11-04 10:42:40 -08:00
Michael Ilseman
4ab45dfe20 [String] Drop in initial UTF-8 String prototype
This is a giant squashing of a lot of individual changes prototyping a
switch of String in Swift 5 to be natively encoded as UTF-8. It
includes what's necessary for a functional prototype, dropping some
history, but still leaves plenty of history available for future
commits.

My apologies to anyone trying to do code archeology between this
commit and the one prior. This was the lesser of evils.
2018-11-04 10:42:40 -08:00
Mike Ash
e18e03171f [Stdlib] Change SWIFT_RUNTIME_STDLIB_INTERNAL to not export the symbol.
The functions in LibcShims are used externally, some directly and some through @inlineable functions. These are changed to SWIFT_RUNTIME_STDLIB_SPI to better match their actual usage. Their names are also changed to add "_swift" to the front to match our naming conventions.

Three functions from SwiftObject.mm are changed to SPI and get a _swift prefix.

A few other support functions are also changed to SPI. They already had a prefix and look like they were meant to be SPI anyway. It was just hard to notice any mixup when they were #defined to the same thing.

rdar://problem/35863717
2018-10-03 09:55:33 -04:00
Michael Ilseman
8294c0003a [string] Drop _StringGuts subscript; NFC
_StringGuts shouldn't expose a subscript, implying efficient
access. Switch to the explicit code unit fetch method. Update tests
accordingly, and switch off of deprecated typealiases.
2018-08-02 16:34:22 -07:00
Tony Allevato
d0e93acb00 Various fixes to Unicode.Scalar.Properties.
- numericValue returns nil instead of .nan for non-numerics
- Remove small-string optimizations from _scalarName that failed on 32-bit archs
- Put case mappings back into U.S.Properties
- Added more sanity tests
2018-07-05 20:42:56 -07:00
Tony Allevato
8eef50f6a9 Merge branch 'master' into unicode-properties 2018-07-04 08:42:35 -07:00
Slava Pestov
5d1f48e3ae stdlib: Update for stricter enforcement of @usableFromInline 2018-06-25 16:26:56 -07:00
Erik Little
863f3a19ff Rename @effects to @_effects
@effects is too low a level, and not meant for general usage outside
the standard library. Therefore it deserves to be underscored like
other such attributes.
2018-06-06 12:53:03 -04:00
Michael Ilseman
ebdd5e6d98 [string] Fast-path for small string comparison
Promote small-string to small-string comparison into the fast path for
equality and less-than.

Small ASCII strings that are not binary equal do not compare equal,
allowing us to early exit. Small ASCII strings otherwise compare
lexicographically, which we can call prior to jumping through a few
intermediaries.
2018-05-24 14:47:04 -07:00
Michael Ilseman
ebbfd8c639 [string] Comparison bug fix: Kelvin
Unicode Kelvin sign normalizes to ASCII 'K', but our comparison logic
didn't handle this situation when the other side was single-byte all
ASCII. Fall back to the slow comparison path if the point of
difference between an all-ASCII string and a UTF-16 string falls on
such a non-ASCII-yet-normalizes-to-ASCII scalar (rare).
2018-04-23 17:45:04 -07:00
Tony Allevato
54f4c77ce7 [stdlib] Revert hasNormalizationBoundaryBefore
This property is too specific in that it forces a particular normalization; let's not expose it this way, but instead in the future with a full normalization API.
2018-04-22 12:01:03 -07:00
Slava Pestov
2e5aef9c8d stdlib: Remove redundant @usableFromInline attributes 2018-04-06 00:02:30 -07:00
Tony Allevato
fb9f7ecca1 Merge branch 'master' into unicode-properties 2018-03-31 09:54:18 -07:00
Slava Pestov
e1f50b2d36 SE-0193: Rename @_inlineable to @inlinable, @_versioned to @usableFromInline 2018-03-30 21:55:30 -07:00
Tony Allevato
5a50f27ae9 [stdlib] Migrate normalization usage to public properties 2018-03-28 06:55:53 -07:00
Michael Ilseman
93d6130066 [string] Integrate small strings.
Switch StringObject and StringGuts from opaquely storing tagged cocoa
strings into storing small strings. Plumb small string support
throughout the standard library's routines.
2018-03-27 14:00:59 -07:00
Jordan Rose
9034ba617b Ban @_fixed_layout on enums in favor of @_frozen
In theory there could be a "fixed-layout" enum that's not exhaustive
but promises not to add any more cases with payloads, but we don't
need that distinction today.

(Note that @objc enums are still "fixed-layout" in the actual sense of
"having a compile-time known layout". There's just no special way to
spell that.)
2018-03-20 14:49:10 -07:00
Lance Parker
cbf157f924 [stdlib]Unify String hashing implementation (#14921)
* Add partial range subscripts to _UnmanagedOpaqueString

* Use SipHash13+_NormalizedCodeUnitIterator for String hashes on all platforms

* Remove unecessary collation algorithm shims

* Pass the buffer to the SipHasher for ASCII

* Hash the ascii parts of UTF16 strings the same way we hash pure ascii strings

* De-dupe some code that can be shared between _UnmanagedOpaqueString and _UnmanagedString<UInt16>

* ASCII strings now hash consistently for in hashASCII() and hashUTF16()

* Fix zalgo comparison regression

* Use hasher

* Fix crash when appending to an empty _FixedArray

* Compact ASCII characters into a single UInt64 for hashing

* String: Switch to _hash(into:)-based hashing

This should speed up String hashing quite a bit, as doing it through hashValue involves two rounds of SipHash nested in each other.

* Remove obsolete workaround for ARC traffic

* Ditch _FixedArray<UInt8> in favor of _UIntBuffer<UInt64, UInt8>

* Bad rebase remnants

* Fix failing benchmarks

* michael's feedback

* clarify the comment about nul-terminated string hashes
2018-03-17 22:13:37 -07:00
Sho Ikeda
415ee8d703 [gardening] Change static internal to internal static for consistency
Before the changes:

- `git grep "internal static " | wc -l`: 161
- `git grep "static internal " | wc -l`: 10
2018-03-13 02:29:56 +09:00
Lance Parker
7b55cccf1e shorterPrefixesOther doesn't consume the longer segment, it only invalidates the shorter one. The pathological case needs to compare the entire segment, not skip the end of the longer one. 2018-02-28 11:27:40 -08:00
Lance Parker
4f9bd18c46 Gardening, hasNormalizationBoundary(after:) doesn't need a count parameter 2018-02-28 11:27:40 -08:00
Lance Parker
ecfb6c931e Fix indentation (#21)
* Ditched the simple/complex test distinction as they all pass now

* fixed indentation
2018-02-19 10:14:59 -08:00