Commit Graph

11 Commits

Author SHA1 Message Date
Guillaume Lessard
dfb2e2f12e [stdlib] annotate uses of Range.init(_uncheckedBounds:) 2025-03-05 18:52:11 -08:00
Doug Gregor
22eecacc35 Adopt unsafe annotations throughout the standard library 2025-02-26 14:28:01 -08:00
Karoy Lorentey
67adcabefc Apply notes from code review 2022-04-10 00:14:43 -07:00
Karoy Lorentey
5a22ceb72b [stdlib] _StringGutsSlice: Small adjustments 2022-03-24 21:00:00 -07:00
Karoy Lorentey
0c0cbe290d [stdlib] _StringGutsSlice: Don’t mark methods on non-@usableFromInline internal types @inlinable
(This really ought to be diagnosed by the compiler.)
2022-03-24 21:00:00 -07:00
Karoy Lorentey
6e18955f90 [stdlib] Add bookkeeping to keep track of the encoding of strings and indices
Assign some previously reserved bits in String.Index and _StringObject to keep track of their associated storage encoding (either UTF-8 or UTF-16).

None of these bits will be reliably set in processes that load binaries compiled with older stdlib releases, but when they do end up getting set, we can use them opportunistically to more reliably detect cases where an index is applied on a string with a mismatching encoding.

As more and more code gets recompiled with 5.7+, the stdlib will gradually become able to detect such issues with complete accuracy.

Code that misuses indices this way was always considered broken; however, String wasn’t able to reliably detect these runtime errors before. Therefore, I expect there is a large amount of broken code out there that keeps using bridged Cocoa String indices (UTF-16) after a mutation turns them into native UTF-8 strings. Therefore, instead of trapping, this commit silently corrects the issue, transcoding the offsets into the correct encoding.

It would probably be a good idea to also emit a runtime warning in addition to recovering from the error. This would generate some noise that would gently nudge folks to fix their code.

rdar://89369680
2022-03-24 20:59:59 -07:00
Alejandro Alonso
98aaa157ec Implement native normalization for String
use >/< instead of !=

fix some bugs

fix
2021-09-29 14:20:21 -07:00
Michael Ilseman
415cc8fb0c [String.Index] Deprecate encodedOffset var/init
String.Index has an encodedOffset-based initializer and computed
property that exists for serialization purposes. It was documented as
UTF-16 in the SE proposal introducing it, which was String's
underlying encoding at the time, but the dream of String even then was
to abstract away whatever encoding happend to be used.

Serialization needs an explicit encoding for serialized indices to
make sense: the offsets need to align with the view. With String
utilizing UTF-8 encoding for native contents in Swift 5, serialization
isn't necessarily the most efficient in UTF-16.

Furthermore, the majority of usage of encodedOffset in the wild is
buggy and operates under the assumption that a UTF-16 code unit was a
Swift Character, which isn't even valid if the String is known to be
all-ASCII (because CR-LF).

This change introduces a pair of semantics-preserving alternatives to
encodedOffset that explicitly call out the UTF-16 assumption. These
serve as a gentle off-ramp for current mis-uses of encodedOffset.
2019-02-13 18:42:40 -08:00
Michael Ilseman
e44972dcea [String] Expunge _StringGutsSlice from ABI
_StringGutsSlice is an implementation helper that's no longer needed
from inlinable code. Internalize it.
2018-11-27 15:07:27 -08:00
Michael Ilseman
948655e850 [String] Cleanups, comments, documentation
After rebasing on master and incorporating more 32-bit support,
perform a bunch of cleanup, documentation updates, comments, move code
back to String declaration, etc.
2018-11-04 10:42:42 -08:00
Michael Ilseman
7aea40680d [String] NFC iterator fast-paths
Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a
new buffered iterator.

Also, fix bug in _typeName which used to assume ASCIIness and better
SIL optimizations on StringObject.
2018-11-04 10:42:41 -08:00