Commit Graph

7762 Commits

Author SHA1 Message Date
Karoy Lorentey
cf1b9d9404 [stdlib] String: Fix more potential UB, and rework access patterns 2023-02-13 22:55:32 -08:00
Andrew Trick
86467bbe63 Fix potentially undefined behavior in StringObject.nativeStorage
Speculatively fixing this to rule out potential miscompiles.

The compiler needs to know if a reference is being materialized out of
thin air. The proper way to do that is with the Unmanaged API.

Under the hood, this forces the reference into an "unowned(unsafe)"
variable which the reference must be reloaded from. That tells the
compiler that it can't optimize some seemingly unrelated object which
the reference may happen to refer to at runtime.

/// Warning: Casting from an integer or a pointer type to a reference type
/// is undefined behavior. It may result in incorrect code in any future
/// compiler release. To convert a bit pattern to a reference type:
/// 1. convert the bit pattern to an UnsafeRawPointer.
/// 2. create an unmanaged reference using Unmanaged.fromOpaque()
/// 3. obtain a managed reference using Unmanaged.takeUnretainedValue()
/// The programmer must ensure that the resulting reference has already been
/// manually retained.
2023-02-13 22:14:54 -08:00
Karoy Lorentey
73f349cb15 [stdlib] Rework String breadcrumbs initialization/loading
This is a wild guess at what might be causing our persistent, random
String failures on the main branch:

```
  Swift(macosx-x86_64) :: Prototypes/CollectionTransformers.swift
  Swift(macosx-x86_64) :: stdlib/NSSlowString.swift
  Swift(macosx-x86_64) :: stdlib/NSStringAPI.swift
  Swift(macosx-x86_64) :: stdlib/StringIndex.swift
  Swift-validation(macosx-x86_64) :: stdlib/String.swift
  Swift-validation(macosx-x86_64) :: stdlib/StringBreadcrumbs.swift
  Swift-validation(macosx-x86_64) :: stdlib/StringUTF8.swift
```

FWIW, it appears this is *not* caused by https://github.com/apple/swift/pull/62717:
that change has also landed on release/5.8, and I haven’t seen these
issues on that branch.

Our atomic breadcrumbs initialization vs its non-atomic loading
gives me an uneasy feeling that this may in fact be a long standing
synchronization issue that is only now causing problems (for whatever
reason). I am unable to reproduce these issues locally, so this guess
may be (and probably is) wildly off the mark, but this PR is likely
to be a good idea anyway, if only to rule out this possibility.

rdar://104751936
2023-02-10 20:23:56 -08:00
Erik Eckstein
fc6f1d862e stdlib: make type comparison functions transparent
This is needed to be able to optimize them at Onone
2023-02-09 06:49:58 +01:00
Bradley Mackey
78535a23c0 Revert "Format to 80w"
This reverts commit abf877ada2.
2023-02-06 19:50:48 +00:00
Bradley Mackey
abf877ada2 Format to 80w 2023-02-04 11:27:09 +00:00
Bradley Mackey
6f97ee2379 Fix signature of fatalError 2023-02-04 11:18:27 +00:00
Jonathan Grynspan
75dfb56b67 Clarify documentation for CommandLine.unsafeArgv re: the trailing nil. (#58484) (#63413) 2023-02-03 16:28:45 -05:00
swift-ci
d77fa0b016 Merge pull request #63307 from lorentey/identical-string-check
[stdlib] Add String._isIdentical(to:)
2023-02-02 21:04:20 -08:00
Arnold Schwaighofer
a96e5d06c6 Merge pull request #63350 from aschwaighofer/relative_protocol_witness_tables_test_config
Relative protocol witness tables test configuration
2023-02-02 06:48:20 -08:00
Kavon Farvardin
ab130883a3 Initial ban of move-only types from being used generically
Since values of generic type are currently assumed to always
support copying, we need to prevent move-only types from
being substituted for generic type parameters.

This approach leans on a `_Copyable` marker protocol to which
all generic type parameters implicitly must conform.

A few other changes in this initial implementation:

- Now every concrete type that can conform to Copyable will do so. This fixes issues with conforming to a protocol that requires Copyable.
- Narrowly ban writing a concrete type `[T]` when `T` is move-only.
2023-02-01 23:38:28 -08:00
Mike Ash
08e12243a2 Merge pull request #63302 from mikeash/keypath-big-32-bit-pointers
[Runtime] Fix key paths on 32-bit with KVC string pointers in the top half of memory.
2023-02-01 10:22:44 -05:00
Arnold Schwaighofer
90cb8056bd Fix _StringObject._dump() under SWIFT_STDLIB_STATIC_PRINT
It does not compile in this mode.

```
error: no exact matches in call to instance method 'appendInterpolation'
        owner: \(repr._objectIdentifier!), \
```
2023-02-01 07:12:49 -08:00
Karoy Lorentey
108bc0e7b2 [stdlib] Add String._isIdentical(to:)
rdar://104828814
2023-01-30 12:08:35 -08:00
Mike Ash
1f8acac3d4 [Runtime] Fix key paths on 32-bit with KVC string pointers in the top half of memory.
Key paths can store an offset or a pointer in the same field. On 32-bit, the field is considered to be an offset when it's less than the 4kB zero page, and a pointer otherwise.

The check uses a signed comparison, so pointers in the top half of memory would look like negative offsets. Add a check that the offset is zero or positive to avoid this.

rdar://103886537
2023-01-30 13:05:48 -05:00
Kyle Murray
1f0b2e59fa Remove a stale fixme comment.
Per Ben's feedback in the PR.
2023-01-30 10:42:42 -05:00
Kyle Murray
15ebec6e23 [stdlib] NFS: Restore documentation comment for Unsafe[Mutable]BufferPointer
Moves a `//` comment up above a `///` documentation comment, since the latter needs to be attached directly to the declaration or it won't be picked up as documentation.
2023-01-28 12:57:41 -05:00
Alex Martini
2f73addf6c Tighten phrasing.
Co-authored-by: Bradley Mackey <bradley@mcky.dev>
2023-01-27 13:43:41 -08:00
Alex Martini
5ca7f7aa71 Avoid repeating "type".
Co-authored-by: Bradley Mackey <bradley@mcky.dev>
2023-01-27 13:40:10 -08:00
Alex Martini
779a4105b1 Revise docs for Never for reference style.
- Reduce the emphasis on the type theory that Never is an uninhabited
  type, focusing more on its meaning and usage in code.
- Move the definition of uninhabited out of the abstract.  Define
  "nonreturning" more explicitly.
- Expand the favoriteNumber example's code comment into a brief
  paragraph to walk through the code listing.
- Avoid italics in the abstract, future tense, and parenthetical asides.
- Use contractions.
2023-01-27 12:57:36 -08:00
Doug Gregor
5a9a654adb Adopt @freestanding(expression) for all @expression macros 2023-01-25 17:07:38 -08:00
swift-ci
f2302b926b Merge pull request #63034 from xwu/62260-heterogeneous-integer-comparisons-produce-suboptimal-code
[stdlib] Improve performance of heterogeneous binary integer `==` and `<`
2023-01-17 21:23:55 -08:00
Xiaodi Wu
df6a56ecbc [stdlib] Make a stylistic change per reviewer feedback. 2023-01-17 19:39:59 -05:00
Karoy Lorentey
1241df3fab [stdlib] String.debugDescription: Fix quoting behavior
`String.debugDescription` currently fails to protect the contents of
the string from combining with the opening or closing `”` characters
or one of the characters of a quoted scalar:

```swift
let s = “\u{301}A\n\u{302}B\u{70F}”
print(s.debugDescription)
// ⟹ “́A\n̂B܏”  (characters: “́, A, \, n̂, B, ܏”)
```

This can make debug output difficult to read, as string contents are
allowed to spread over and pollute neighboring meta-characters.

This change fixes this by force-quoting the problematic scalars in
these cases:

```swift
let s = “\u{301}A\n\u{302}B\u{70F}”
print(s.debugDescription)
// ⟹ “\u{301}A\n\u{302}B\u{70F}”
```

Of course, Unicode scalars that don’t engage in such behavior are
still allowed to pass through unchanged:

```swift
let s = “Cafe\u{301}”
print(s.debugDescription)
// ⟹ “Café”
```
2023-01-16 01:15:39 -08:00
swift-ci
7a0bcfa09d Merge pull request #63046 from lorentey/stdlib-5.9
[stdlib] Define and bump to version 5.9
2023-01-16 00:27:03 -08:00
Karoy Lorentey
0a041c084d [stdlib] Define and bump to version 5.9 2023-01-15 19:56:18 -08:00
Karoy Lorentey
a3e517ed36 [stdlib] String: Fix forward implementation of grapheme breaking rule 11
Rule GB11 in Unicode Annex 29 is:

GB11: Extended_Pictographic Extend* ZWJ × Extended_Pictographic

However, our forward grapheme breaking state machine implements it as:

GB11: Extended_Pictographic Extend* ZWJ+ × Extended_Pictographic

We implement the correct rules when going backward, which can cause String values to have different counts whether we’re going forward or back.

The rule as implemented would be fine (Unicode doesn’t care much about the placement of grapheme breaks in invalid sequences), but the directional inconsistency messes with String’s Collection conformance.

rdar://104279671
2023-01-15 16:12:38 -08:00
Xiaodi Wu
f671e50265 [stdlib] Further optimize heterogeneous binary integer comparison 2023-01-15 13:32:24 -05:00
Xiaodi Wu
d6ac4e3476 [stdlib] Improve performance of generic binary integer == and < 2023-01-15 13:32:24 -05:00
Cory Benfield
9cb3641ce8 Remove checks in UR[M]BP.Iterator.next() (#62965)
Swift tends to emit unnecessary checks and traps when iterating unsafe
raw buffer pointers. These traps are confirming that the position
pointer isn't nil, but this check is redundant with the bounds check
that is already present. We can safely remove it.
2023-01-12 11:04:42 -05:00
Doug Gregor
4b9615e244 Don't make older compilers parse @expression at all 2023-01-10 11:32:08 -08:00
Doug Gregor
cfa7d379e1 Merge pull request #62932 from DougGregor/stdlib-expression-attr-check 2023-01-09 18:05:20 -08:00
Doug Gregor
728598907d Allow the standard library to build with a slightly older compiler
Check for the `@expression` attribute before using it. Fixes
rdar://104036723.
2023-01-09 11:46:57 -08:00
Karoy Lorentey
2f1ed631e2 [stdlib] _CharacterRecognizer: Add Sendable, Equatable, CustomStringConvertible conformances
Equatability allows faster implementations for updating cached grapheme boundary state after a text mutation, because it enables quick detection of before/after state equality, without having to feed the recognizers until they produce a synchronized grapheme break.

The CustomStringConvertible conformance makes it orders of magnitude more pleasant to debug code that uses this.

Sendable is a baseline requirement for value types these days.
2023-01-06 14:51:37 -08:00
Karoy Lorentey
0bdc84b8de Merge pull request #62859 from lorentey/mark-utf8-index-encoding
[stdlib] String.UTF8View.index(_:offsetBy:limitedBy:): mark encoding of result
2023-01-05 13:57:25 -08:00
Karoy Lorentey
c94556165e Merge pull request #62794 from lorentey/character-recognizer
[stdlib] Export grapheme breaking facility
2023-01-04 23:54:48 -08:00
Karoy Lorentey
d358ece41d Merge pull request #62798 from lorentey/string-index-rounding
[stdlib] Expose index rounding entry points
2023-01-04 21:24:32 -08:00
Karoy Lorentey
4ffc5fe737 Merge pull request #62717 from lorentey/string-utf16-speedup
[stdlib] Speed up short UTF-16 distance calculations
2023-01-04 21:20:41 -08:00
Karoy Lorentey
892d0a278d [stdlib] String.UTF8View.index(_:offsetBy:limitedBy:): mark encoding of result 2023-01-04 20:22:39 -08:00
Evan Wilde
e88e947272 Merge pull request #62712 from etcwilde/ewilde/zipping-zippers-zip
Add zippering support
2023-01-03 21:45:33 -08:00
Karoy Lorentey
fa2f63cae0 [stdlib] _CharacterRecognizer._firstBreak(inUncheckedUnsafeUTF8Buffer:startingAt:) 2023-01-03 21:00:01 -08:00
Karoy Lorentey
87422e5dc4 [stdlib] _CharacterRecognizer: Remove initializer argument 2023-01-03 20:59:24 -08:00
Stephen Canon
701a03b41e Remove prefix+ from StaticBigInt (#62733)
* Remove prefix+ from StaticBigInt

This operator causes source breakage in cases like:
```
let a:Int = 7
let b = +1
let c = a + b  // Error: Cannot convert `b` from `StaticBigInt` to `Int`
```
2023-01-03 19:30:01 -05:00
Karoy Lorentey
e46f8f8244 [stdlib] String.UTF16View: Align indices before calling default algorithms
[Bidirectional]Collection’s default index manipulation methods (as
well as _utf16Distance) do not expect to be given unreachable
indices, and they tend to fail when operating on them. Round indices
down to the nearest scalar boundary before calling these.
2023-01-03 16:12:04 -08:00
Doug Gregor
0868889ba9 Add new source file Macros.swift 2023-01-02 21:22:05 -08:00
Doug Gregor
7000969f14 Introduce and use #externalMacro for externally-defined macros.
Align the grammar of macro declarations with SE-0382, so that macro
definitions are parsed as an expression. External macro definitions
are referenced via a referenced to the macro `#externalMacro`. Define
that macro in the standard library, and recognize uses of it as the
definition of other macros to use externally-defined macros. For
example, this means that the "stringify" macro used in a lot of
examples is now defined as something like this:

    @expression macro stringify<T>(_ value: T) -> (T, String) =
        #externalMacro(module: "MyMacros", type: "StringifyMacro")

We still parse the old "A.B" syntax for two reasons. First, it's
helpful to anyone who has existing code using the prior syntax, so they
get a warning + Fix-It to rewrite to the new syntax. Second, we use it
to define builtin macros like `externalMacro` itself, which looks like this:

    @expression
    public macro externalMacro<T>(module: String, type: String) -> T =
        Builtin.ExternalMacro

This uses the same virtual `Builtin` module as other library builtins,
and we can expand it to handle other builtin macro implementations
(such as #line) over time.
2023-01-02 21:22:05 -08:00
Karoy Lorentey
5d354ceb96 [stdlib] Fix String.UTF16View.distance(from:to:)
- Align input indices to scalar boundaries
- Don’t pass decreasing indices to _utf16Distance
2023-01-01 20:58:25 -08:00
Karoy Lorentey
e885037068 [stdlib] String: Expose _index(roundingDown:) functions in all String views
These simply expose the preexisting internal
`_StringGuts.validate*Index` functions that indexing operations
use to implicitly round indices down to the nearest valid index. (Or, in the case of the encoding views, the nearest scalar boundary.)

Being able to do this as a standalone, explicit, efficient operation
is crucial when implementing some `String` algorithms that need to
work with arbitrary indices.
2022-12-31 17:42:32 -08:00
Karoy Lorentey
55583ac13c [stdlib] Add new SPI for grapheme breaking (outside String)
`Unicode._CharacterRecognizer` is a newly exported opaque type that
exposes the stdlib’s extended grapheme cluster breaking facility,
independent of `String`.

This essentially makes the underlying simple state machine public,
without exposing any of the (unstable) Unicode details.

The ability to perform grapheme breaking over, say, the scalars stored
in multiple `String` values can be extremely useful while building
custom text processing algorithms and data structures.

Ideally this would eventually become API, but before proposing this
to Swift Evolution, I’d like to prove the shape of the type in actual
use (and we’ll also need to find better names for its operations).
2022-12-30 16:32:01 -08:00
Karoy Lorentey
ef0e79b70f [stdlib] String: Move shouldBreak into _GraphemeBreakingState
This turns _GraphemeBreakingState into a more proper state
machine, although it is only able to recognize breaks in the
forward direction.

The backward direction requires arbitrarily long lookback,
and it currently remains in _StringGuts.
2022-12-29 18:04:02 -08:00