Commit Graph

112 Commits

Author SHA1 Message Date
Doug Gregor
22eecacc35 Adopt unsafe annotations throughout the standard library 2025-02-26 14:28:01 -08:00
Carl Peto
3689427834 [AVR] standard library support for AVR
- when compiling embedded cross compile target standard libraries, include AVR
- add 16-bit pointer as a conditional compilation condition and get the void pointer size right for gyb sources
- attempt to fix clang importer not importing __swift_intptr_t correctly on 16 bit platforms
- changed the unit test target to avr-none-none-elf to match the cmake build

[AVR] got the standard library compiling in a somewhat restricted form:

General
- updated the Embedded Runtime
- tweaked CTypes.swift to fix clang import on 16 bit platforms

Strings
- as discussed in https://forums.swift.org/t/stringguts-stringobject-internals-how-to-layout-on-16-bit-platforms/73130, I went for just using the same basic layout in 16 bit as 32 bit but with 16 bit pointers/ints... the conversation is ongoing, I think something more efficient is possible but at least this compiles and will probably work (inefficiently)

Unicode
- the huge arrays of unicode stuff in UnicodeStubs would not compile, so I skipped it for AVR for now.

Synchronization
- disabled building the Synchronization library on AVR for now. It's arguable if it adds value on this platform anyway.
2024-07-16 12:28:27 +01:00
David Smith
3589044213 A new way to bridge constant NSStrings (#74881) 2024-07-03 20:38:33 -07:00
Kuba Mracek
7ae20b7039 [embedded] Port Swift.String to embedded Swift 2024-05-08 11:11:37 -07:00
Jeremy Schonfeld
e59ba66970 Un-deprecate _StringGuts._isContiguousASCII 2024-01-04 14:58:41 -08:00
Karoy Lorentey
298ab208d8 [stdlib][nfc] Prefer to put the _pointerBitWidth(_64) case first, as that is far more common 2023-04-27 16:33:50 -07:00
Karoy Lorentey
b82ce9c3be [stdlib] Adopt _pointerBitWidth conditional 2023-04-27 13:33:24 -07:00
Karoy Lorentey
3f5dfea4b1 [stdlib] String: Avoid retain/release operations around use sites of sharedStorage and cocoaObject 2023-02-15 14:21:46 -08:00
Karoy Lorentey
cf1b9d9404 [stdlib] String: Fix more potential UB, and rework access patterns 2023-02-13 22:55:32 -08:00
Guillaume Lessard
cc16a9f997 [stdlib] assign → update 2022-08-26 17:36:40 -06:00
Anthony Latsis
a12106963b stdlib: Mark some deprecated corelibs-foundation SPI that are no longer used as unavailable
This keeps them around in the ABI while preventing actual usage as we build
up confidence to finally remove them.
2022-07-26 04:29:57 +03:00
Karoy Lorentey
50c2399a94 [stdlib] Work around binary compatibility issues with String index validation fixes in 5.7
Swift 5.7 added stronger index validation for `String`, so some illegal cases that previously triggered inconsistently diagnosed out of bounds accesses now result in reliable runtime errors. Similarly, attempts at applying an index originally vended by a UTF-8 string on a UTF-16 string now result in a reliable runtime error.

As is usually the case, new traps to the stdlib exposes code that contains previously undiagnosed / unreliably diagnosed coding issues.

Allow invalid code in binaries built with earlier versions of the stdlib to continue running with the 5.7 library by disabling some of the new traps based on the version of Swift the binary was built with.

In the case of an index encoding mismatch, allow transcoding of string storage regardless of the direction of the mismatch. (Previously we only allowed transcoding a UTF-8 string to UTF-16.)

rdar://93379333
2022-05-17 19:25:10 -07:00
Karoy Lorentey
3eed5347c7 [stdlib] Update comments 2022-04-19 14:02:13 -07:00
Karoy Lorentey
847337efd7 [stdlib][cosmetics] Clean up unused/underused interfaces, update naming
There is little point to having `isUTF16` properties when they simply
return `!isUTF8`; remove them.

Rename `String.Index._copyEncoding(from:)` to
`_copyingEncoding(from:)`.
2022-04-18 21:06:20 -07:00
Karoy Lorentey
4d557b0b45 [stdlib] Make String.Index(_:within:) initializers more permissive
In Swift 5.6 and below, (broken) code that acquired indices from a
UTF-16-encoded string bridged from Cocoa and kept using them after a
`makeContiguousUTF8` call (or other mutation) may have appeared to be
working correctly as long as the string was ASCII.

Since https://github.com/apple/swift/pull/41417, the
`String(_:within:)` initializers recognize miscoded indices and reject
them by returning nil. This is technically correct, but it
unfortunately may be a binary compatibility issue, as these used to
return non-nil in previous versions.

Mitigate this issue by accepting UTF-16 indices on a UTF-8 string,
transcoding their offset as needed. (Attempting to use an UTF-8 index
on a UTF-16 string is still rejected — we do not implicitly convert
strings in that direction.)

rdar://89369680
2022-04-18 21:02:14 -07:00
Karoy Lorentey
57f0e67658 Merge pull request #41417 from lorentey/the-horror-of-se-0180
[stdlib] Fix String indexing edge cases, anomalies & validation bugs
2022-04-14 14:08:53 -07:00
Karoy Lorentey
b33fefb71c [stdlib] String: be more consistent about when markEncoding is called 2022-04-13 18:38:41 -07:00
Karoy Lorentey
ed7d60c711 [stdlib] Remove unused fn 2022-04-11 14:03:01 -07:00
Karoy Lorentey
3c9968945e [stdlib] String: Implement happy paths for index validation 2022-04-10 00:14:43 -07:00
Karoy Lorentey
d18b5f573f [stdlib] Branchless _StringGuts.hasMatchingEncoding 2022-04-09 21:33:53 -07:00
Karoy Lorentey
b06e6e5dd3 [stdlib] String: Fix major perf regression due to extra arc traffic 2022-04-09 21:33:53 -07:00
Karoy Lorentey
83df814c63 [stdlib] _StringObject.isKnownUTF16 → isForeignUTF8
This fixes a compatibility issue with potential future UTF-8 encoded
foreign String forms, as well as simplifying the code a bit — we no
longer need to do an availability check on inlinable fast paths.

The isForeignUTF8 bit is never set by any past or current stdlib
version, but it allows us to introduce UTF-8 encoded foreign forms
without breaking inlinable index encoding validation introduced in
Swift 5.7.
2022-04-09 21:33:53 -07:00
Guillaume Lessard
8379b2422a [stdlib] remove most uses of _asCChar and _asUInt8 2022-04-06 15:21:00 -06:00
Karoy Lorentey
2e9fd9eb6b [stdlib] Substring.UnicodeScalarView: Add _invariantCheck 2022-04-05 20:47:42 -07:00
Karoy Lorentey
4eab8355ca [stdlib] String: prefer passing ranges to start+end argument pairs 2022-03-29 20:00:08 -07:00
Karoy Lorentey
e8212690d1 [stdlib] String: Apply transcoded offset when converting indices from UTF-16 2022-03-29 20:00:08 -07:00
Karoy Lorentey
5f6c300adb [stdlib] String.UTF8View: Review/fix index validation
Also, in UTF-8 slices, forward collection methods to the base view
instead of `Slice`, to make behavior a bit easier to understand.

(There is no need to force readers to page in `Slice`
implementations _in addition to_ whatever the base view is doing.)
2022-03-29 18:40:25 -07:00
Karoy Lorentey
d58811262d [stdlib] String.UnicodeScalarView: Review index validation 2022-03-29 18:40:25 -07:00
Karoy Lorentey
67f01a1159 [stdlib] Stop inlining String.subscript
`index(after:)`/`index(before:)` aren’t inlinable, so I don’t expect
force-inlining the subscript has much benefit.
2022-03-24 21:00:00 -07:00
Karoy Lorentey
8ab2379946 [stdlib] Round indices down to nearest Character in String’s index algorithms
To prevent unaligned indices from breaking well-defined index distance
and index offset calculations, round every index down to the nearest
whole Character.

For the horrific details, see the forum discussion below.

https://forums.swift.org/t/string-index-unification-vs-bidirectionalcollection-requirements/55946

To avoid rounding from regressing String performance in the regular
case (when indices aren’t being passed across string views), introduce
a new String.Index flag bit that indicates that the index is already
Character aligned.
2022-03-24 21:00:00 -07:00
Karoy Lorentey
a44997eeea [stdlib] Factor scalar-aligned String index validation out into a set of common routines
There are three flavors, corresponding to i < endIndex, i <= endIndex, and range containment checks.
Additionally, we have separate variants for index validation in substrings.
2022-03-24 21:00:00 -07:00
Karoy Lorentey
15c7721caf [stdlib] Use the new index encoding flags when marking the encoding of indices
This removes an unnecessary opaque call from the inlinable path, but it preserves a runtime version check.
2022-03-24 20:59:59 -07:00
Karoy Lorentey
6e18955f90 [stdlib] Add bookkeeping to keep track of the encoding of strings and indices
Assign some previously reserved bits in String.Index and _StringObject to keep track of their associated storage encoding (either UTF-8 or UTF-16).

None of these bits will be reliably set in processes that load binaries compiled with older stdlib releases, but when they do end up getting set, we can use them opportunistically to more reliably detect cases where an index is applied on a string with a mismatching encoding.

As more and more code gets recompiled with 5.7+, the stdlib will gradually become able to detect such issues with complete accuracy.

Code that misuses indices this way was always considered broken; however, String wasn’t able to reliably detect these runtime errors before. Therefore, I expect there is a large amount of broken code out there that keeps using bridged Cocoa String indices (UTF-16) after a mutation turns them into native UTF-8 strings. Therefore, instead of trapping, this commit silently corrects the issue, transcoding the offsets into the correct encoding.

It would probably be a good idea to also emit a runtime warning in addition to recovering from the error. This would generate some noise that would gently nudge folks to fix their code.

rdar://89369680
2022-03-24 20:59:59 -07:00
Doug Gregor
353daabf8d Replace UnsafeSendable with @unchecked Sendable in the standard library. 2021-11-12 07:56:10 -08:00
Robert Widmann
0149ccd0ca Add arm64_32 support for Swift
Commit the platform definition and build script work necessary to
cross-compile for arm64_32.

arm64_32 is a variant of AARCH64 that supports an ILP32 architecture.
2021-04-20 14:59:04 -07:00
Doug Gregor
9579390024 [SE-0304] Rename ConcurrentValue to Sendable 2021-03-18 22:48:20 -07:00
Doug Gregor
1a1f79c0de Introduce safety checkin for ConcurrentValue conformance.
Introduce checking of ConcurrentValue conformances:
- For structs, check that each stored property conforms to ConcurrentValue
- For enums, check that each associated value conforms to ConcurrentValue
- For classes, check that each stored property is immutable and conforms
  to ConcurrentValue

Because all of the stored properties / associated values need to be
visible for this check to work, limit ConcurrentValue conformances to
be in the same source file as the type definition.

This checking can be disabled by conforming to a new marker protocol,
UnsafeConcurrentValue, that refines ConcurrentValue.
UnsafeConcurrentValue otherwise his no specific meaning. This allows
both "I know what I'm doing" for types that manage concurrent access
themselves as well as enabling retroactive conformance, both of which
are fundamentally unsafe but also quite necessary.

The bulk of this change ended up being to the standard library, because
all conformances of standard library types to the ConcurrentValue
protocol needed to be sunk down into the standard library so they
would benefit from the checking above. There were numerous little
mistakes in the initial pass through the stsandard library types that
have now been corrected.
2021-02-04 03:45:09 -08:00
Andrew Trick
5eafc20cdd Fix undefined behavior in SmallString.withUTF8
withUTF8 currently vends a typed UInt8 pointer to the underlying
SmallString. That pointer type differs from SmallString's
representation. It should simply vend a raw pointer, which would be
both type safe and convenient for UTF8 data. However, since this
method is already @inlinable, I added calls to bindMemory to prevent
the optimizer from reasoning about access to the typed pointer that we
vend.

rdar://67983613 (Undefinied behavior in SmallString.withUTF8 is miscompiled)

Additional commentary:

SmallString creates a situation where there are two types, the
in-memory type, (UInt64, UInt64), vs. the element type,
UInt8. `UnsafePointer<T>` specifies the in-memory type of the pointee,
because that's how C works. If you want to specify an element type,
not the in-memory type, then you need to use something other than
UnsafePointer to view the memory. A trivial `BufferView<UInt8>` would
be fine, although, frankly, I think UnsafeRawPointer is a perfectly
good type on its own for UTF8 bytes.

Unfortunately, a lot of the UTF8 helper code is ABI-exposed, so to
work around this, we need to insert calls to bindMemory at strategic
points to avoid undefined behavior. This is high-risk and can
negatively affect performance. So far, I was able to resolve the
regressions in our microbenchmarks just by tweaking the inliner.
2020-09-24 18:36:42 -07:00
Michael Ilseman
4715d68890 Merge pull request #30237 from valeriyvan/RemoveRedundantZeroingStringGuts
Changes implementation of _persistCString from _StringGuts
2020-03-09 14:36:05 -07:00
Valeriy Van
f49f6a99ba Fixes variable name 2020-03-06 06:32:53 +01:00
Michael Ilseman
79bac4e6a3 Merge pull request #30244 from milseman/string_shrink
[string] Shrink storage class sizes
2020-03-05 19:57:40 -08:00
Michael Ilseman
0ca42e9ef7 [string] Shrink storage class sizes.
* Don't allocate breadrumbs pointer if under threshold
* Increase breadrumbs threshold
* Linear 16-byte bucketing until 128 bytes, malloc_size after
* Allow cap less than _SmallString.capacity (bridging non-ASCII)

This change decreases the amount of heap usage for moderate-length
strings (< 64 UTF-8 code units in length) and increases the amount of
spare code unit capacity available (less growth needed).

Average improvements for moderate-length strings:

* 64-bit: on average, 8 bytes saved and 4 bytes of extra capacity
* 32-bit: on average, 4 bytes saved and 6 bytes of extra capacity

Additionally, on 32-bit, large-length strings also gain an average of
6 bytes of extra spare capacity.

Details:

On 64-bit, half of moderate-length allocations will save 16 bytes
while the other half get an extra 8 bytes of spare capacity.

On 32-bit, a quarter of moderate-length allocations will save 16
bytes, and the rest get an extra 4 bytes of spare
capacity. Additionally, 32-bit string's storage class now claims its
full allocation, which is its birthright. Prior to this change, we'd
have on average 1.5 bytes of spare capacity, and now we have 7.5 bytes
of spare capacity.

Breadcrumbs threshold is increased from the super-conservative 32 to
the pretty-conservative 64. Some speed improvements are incorporated
in this change, but more are in flight. Even without those eventual
improvements, this is a worthwhile change (ASCII is still fast-pathed
and irrelevant to breadcrumbing).

For a complex real-world workload, this amounts to around a 5%
improvement to transient heap usage due to all strings and a 4%
improvement to peak heap usage due to all strings. For moderate-length
strings specifically, this gives around 11% improvement to both.
2020-03-05 16:10:23 -08:00
Valeriy Van
190b8a73db Changes implementation of _persistCString from _StringGuts to be in sync with implementation from stdlib 2020-03-05 15:25:45 +01:00
Valeriy Van
47acd72a6b Removes redundand buffer zeroing 2020-02-28 23:23:49 +01:00
Max Desiatov
67297904ac [WebAssembly] Add ifdefs for the WASI target 2020-02-08 07:37:10 +00:00
David Smith
d091ecb009 Restore more-correct behavior of getting the full contents of bridged NSStrings containing invalid UTF-8 2019-07-16 12:05:56 -07:00
Michael Ilseman
63a6794cf9 [String] Switch scalar-aligned bit to a reserved bit.
Since scalar-alignment is set in inlinable code, switch the alignment
bit to one of the previously-reserved bits rather than a grapheme
cache bit. Setting a grapheme cache bit in inlinable would break
backward deployment, as older versions would interpret it as a cached
value.

Also adjust the name to "scalar-aligned", which is clearer, and
removed assertion (which should be a real precondition).
2019-07-02 16:25:04 -07:00
Michael Ilseman
bd5a40ff1b [gardening] Add underscore to internal member 2019-06-27 11:11:44 -07:00
Michael Ilseman
4cd1e812b7 [String] Scalar-alignment bug fixes.
Fixes a general category (pun intended) of scalar-alignment bugs
surrounding exchanging non-scalar-aligned indices between views and
for slicing.

SE-0180 unifies the Index type of String and all its views and allows
non-scalar-aligned indices to be used across views. In order to
guarantee behavior, we often have to check and perform scalar
alignment. To speed up these checks, we allocate a bit denoting
known-to-be-aligned, so that the alignment check can skip the
load. The below shows what views need to check for alignment before
they can operate, and whether the indices they produce are aligned.

┌───────────────╥────────────────────┬──────────────────────────┐
│ View          ║ Requires Alignment │ Produces Aligned Indices │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Native UTF8   ║ no                 │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Native UTF16  ║ yes                │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ Foreign UTF8  ║ yes                │ no                       │
├───────────────╫────────────────────┼──────────────────────────┤
│ Foreign UTF16 ║ no                 │ no                       │
╞═══════════════╬════════════════════╪══════════════════════════╡
│ UnicodeScalar ║ yes                │ yes                      │
├───────────────╫────────────────────┼──────────────────────────┤
│ Character     ║ yes                │ yes                      │
└───────────────╨────────────────────┴──────────────────────────┘

The "requires alignment" applies to any operation taking a
String.Index that's not defined entirely in terms of other operations
taking a String.Index. These include:

* index(after:)
* index(before:)
* subscript
* distance(from:to:) (since `to` is compared against directly)
* UTF16View._nativeGetOffset(for:)
2019-06-26 16:42:58 -07:00
Ben Cohen
e9d4687e31 De-underscore @frozen, apply it to structs (#24185)
* De-underscore @frozen for enums

* Add @frozen for structs, deprecate @_fixed_layout for them

* Switch usage from _fixed_layout to frozen
2019-05-30 17:55:37 -07:00