swift-mirror

mirror of https://github.com/apple/swift.git synced 2025-12-14 20:36:38 +01:00

Author	SHA1	Message	Date
Andrew Trick	5eafc20cdd	Fix undefined behavior in SmallString.withUTF8 withUTF8 currently vends a typed UInt8 pointer to the underlying SmallString. That pointer type differs from SmallString's representation. It should simply vend a raw pointer, which would be both type safe and convenient for UTF8 data. However, since this method is already @inlinable, I added calls to bindMemory to prevent the optimizer from reasoning about access to the typed pointer that we vend. rdar://67983613 (Undefinied behavior in SmallString.withUTF8 is miscompiled) Additional commentary: SmallString creates a situation where there are two types, the in-memory type, (UInt64, UInt64), vs. the element type, UInt8. `UnsafePointer<T>` specifies the in-memory type of the pointee, because that's how C works. If you want to specify an element type, not the in-memory type, then you need to use something other than UnsafePointer to view the memory. A trivial `BufferView<UInt8>` would be fine, although, frankly, I think UnsafeRawPointer is a perfectly good type on its own for UTF8 bytes. Unfortunately, a lot of the UTF8 helper code is ABI-exposed, so to work around this, we need to insert calls to bindMemory at strategic points to avoid undefined behavior. This is high-risk and can negatively affect performance. So far, I was able to resolve the regressions in our microbenchmarks just by tweaking the inliner.	2020-09-24 18:36:42 -07:00
Michael Ilseman	4715d68890	Merge pull request #30237 from valeriyvan/RemoveRedundantZeroingStringGuts Changes implementation of _persistCString from _StringGuts	2020-03-09 14:36:05 -07:00
Valeriy Van	f49f6a99ba	Fixes variable name	2020-03-06 06:32:53 +01:00
Michael Ilseman	79bac4e6a3	Merge pull request #30244 from milseman/string_shrink [string] Shrink storage class sizes	2020-03-05 19:57:40 -08:00
Michael Ilseman	0ca42e9ef7	[string] Shrink storage class sizes. * Don't allocate breadrumbs pointer if under threshold * Increase breadrumbs threshold * Linear 16-byte bucketing until 128 bytes, malloc_size after * Allow cap less than _SmallString.capacity (bridging non-ASCII) This change decreases the amount of heap usage for moderate-length strings (< 64 UTF-8 code units in length) and increases the amount of spare code unit capacity available (less growth needed). Average improvements for moderate-length strings: * 64-bit: on average, 8 bytes saved and 4 bytes of extra capacity * 32-bit: on average, 4 bytes saved and 6 bytes of extra capacity Additionally, on 32-bit, large-length strings also gain an average of 6 bytes of extra spare capacity. Details: On 64-bit, half of moderate-length allocations will save 16 bytes while the other half get an extra 8 bytes of spare capacity. On 32-bit, a quarter of moderate-length allocations will save 16 bytes, and the rest get an extra 4 bytes of spare capacity. Additionally, 32-bit string's storage class now claims its full allocation, which is its birthright. Prior to this change, we'd have on average 1.5 bytes of spare capacity, and now we have 7.5 bytes of spare capacity. Breadcrumbs threshold is increased from the super-conservative 32 to the pretty-conservative 64. Some speed improvements are incorporated in this change, but more are in flight. Even without those eventual improvements, this is a worthwhile change (ASCII is still fast-pathed and irrelevant to breadcrumbing). For a complex real-world workload, this amounts to around a 5% improvement to transient heap usage due to all strings and a 4% improvement to peak heap usage due to all strings. For moderate-length strings specifically, this gives around 11% improvement to both.	2020-03-05 16:10:23 -08:00
Valeriy Van	190b8a73db	Changes implementation of _persistCString from _StringGuts to be in sync with implementation from stdlib	2020-03-05 15:25:45 +01:00
Valeriy Van	47acd72a6b	Removes redundand buffer zeroing	2020-02-28 23:23:49 +01:00
Max Desiatov	67297904ac	[WebAssembly] Add ifdefs for the WASI target	2020-02-08 07:37:10 +00:00
David Smith	d091ecb009	Restore more-correct behavior of getting the full contents of bridged NSStrings containing invalid UTF-8	2019-07-16 12:05:56 -07:00
Michael Ilseman	63a6794cf9	[String] Switch scalar-aligned bit to a reserved bit. Since scalar-alignment is set in inlinable code, switch the alignment bit to one of the previously-reserved bits rather than a grapheme cache bit. Setting a grapheme cache bit in inlinable would break backward deployment, as older versions would interpret it as a cached value. Also adjust the name to "scalar-aligned", which is clearer, and removed assertion (which should be a real precondition).	2019-07-02 16:25:04 -07:00
Michael Ilseman	bd5a40ff1b	[gardening] Add underscore to internal member	2019-06-27 11:11:44 -07:00
Michael Ilseman	4cd1e812b7	[String] Scalar-alignment bug fixes. Fixes a general category (pun intended) of scalar-alignment bugs surrounding exchanging non-scalar-aligned indices between views and for slicing. SE-0180 unifies the Index type of String and all its views and allows non-scalar-aligned indices to be used across views. In order to guarantee behavior, we often have to check and perform scalar alignment. To speed up these checks, we allocate a bit denoting known-to-be-aligned, so that the alignment check can skip the load. The below shows what views need to check for alignment before they can operate, and whether the indices they produce are aligned. ┌───────────────╥────────────────────┬──────────────────────────┐ │ View ║ Requires Alignment │ Produces Aligned Indices │ ╞═══════════════╬════════════════════╪══════════════════════════╡ │ Native UTF8 ║ no │ no │ ├───────────────╫────────────────────┼──────────────────────────┤ │ Native UTF16 ║ yes │ no │ ╞═══════════════╬════════════════════╪══════════════════════════╡ │ Foreign UTF8 ║ yes │ no │ ├───────────────╫────────────────────┼──────────────────────────┤ │ Foreign UTF16 ║ no │ no │ ╞═══════════════╬════════════════════╪══════════════════════════╡ │ UnicodeScalar ║ yes │ yes │ ├───────────────╫────────────────────┼──────────────────────────┤ │ Character ║ yes │ yes │ └───────────────╨────────────────────┴──────────────────────────┘ The "requires alignment" applies to any operation taking a String.Index that's not defined entirely in terms of other operations taking a String.Index. These include: * index(after:) * index(before:) * subscript * distance(from:to:) (since `to` is compared against directly) * UTF16View._nativeGetOffset(for:)	2019-06-26 16:42:58 -07:00
Ben Cohen	e9d4687e31	De-underscore @frozen, apply it to structs (#24185 ) * De-underscore @frozen for enums * Add @frozen for structs, deprecate @_fixed_layout for them * Switch usage from _fixed_layout to frozen	2019-05-30 17:55:37 -07:00
David Smith	fd0d4d858e	SR-10555 foreignCopyUTF8 should do bulk access	2019-04-29 16:23:55 -07:00
Michael Ilseman	f7cdda2720	[gardening] Clean up many String computed vars	2019-04-08 15:16:48 -07:00
Michael Ilseman	415cc8fb0c	[String.Index] Deprecate encodedOffset var/init String.Index has an encodedOffset-based initializer and computed property that exists for serialization purposes. It was documented as UTF-16 in the SE proposal introducing it, which was String's underlying encoding at the time, but the dream of String even then was to abstract away whatever encoding happend to be used. Serialization needs an explicit encoding for serialized indices to make sense: the offsets need to align with the view. With String utilizing UTF-8 encoding for native contents in Swift 5, serialization isn't necessarily the most efficient in UTF-16. Furthermore, the majority of usage of encodedOffset in the wild is buggy and operates under the assumption that a UTF-16 code unit was a Swift Character, which isn't even valid if the String is known to be all-ASCII (because CR-LF). This change introduces a pair of semantics-preserving alternatives to encodedOffset that explicitly call out the UTF-16 assumption. These serve as a gentle off-ramp for current mis-uses of encodedOffset.	2019-02-13 18:42:40 -08:00
Mike Ash	fa5888fb3f	[Stdlib][Overlays] Rename various classes to avoid conflicting ObjC names. Old Swift and new Swift runtimes and overlays need to coexist in the same process. This means there must not be any classes which have the same ObjC runtime name in old and new, because the ObjC runtime doesn't like name collisions. When possible without breaking source compatibility, classes were renamed in Swift, which results in a different ObjC name. Public classes were renamed only on the ObjC side using the @_objcRuntimeName attribute. This is similar to the work done in pull request #19295. That only renamed @objc classes. This renames all of the others, since even pure Swift classes still get an ObjC name. rdar://problem/46646438	2019-01-15 12:21:20 -05:00
Michael Ilseman	5a6d2dfa59	[String] Switch ABI to only use 4 discriminator bits. In anticipation of potential future HW features, e.g. armv8.5 memory tagging, only use the high 4 bytes as discriminator bits in _BridgeObject rather than the top 8 bits. Utilize two perf flags to cover this instead. This requires shifting around a fair amount of internal complexity.	2018-12-19 13:54:50 -08:00
Michael Ilseman	b08d94d6ba	[String] In-register smol ASCII string compare Compare small strings in-register when they store ASCII (and thus NFC) contents.	2018-12-05 18:16:46 -08:00
Michael Ilseman	c0c530aef8	[String] Speed up constant factors on comparison. Include some tuning and tweaking to reduce the constant factors involved in string comparison. This yields considerable improvement on our micro-benchmarks, and allows us to make less inlinable code and have a smaller ABI surface area. Adds more extensive testing of corner cases in our existing fast-paths.	2018-12-03 15:49:38 -08:00
Ben Cohen	1673c12d78	[stdlib] Replace "sanityCheck" with "internalInvariant" (#20616 ) * Replace "sanityCheck" with "internalInvariant"	2018-11-15 20:50:22 -08:00
David Smith	96691208e9	Bridged Strings should have some different/additional overrides for performance	2018-11-08 11:08:03 -08:00
Michael Ilseman	1939d165ae	Make corelibs-foundation build Expose the old SPIs again so corelibs-foundation can build. We'll want to wean them off of these and establish proper APIs soon.	2018-11-04 10:42:44 -08:00
Michael Ilseman	948655e850	[String] Cleanups, comments, documentation After rebasing on master and incorporating more 32-bit support, perform a bunch of cleanup, documentation updates, comments, move code back to String declaration, etc.	2018-11-04 10:42:42 -08:00
Michael Ilseman	d5da6fdbfd	[String] More comparison speedups and cleanup	2018-11-04 10:42:41 -08:00
Michael Ilseman	a37d110adf	[String] Constant-fold small strings from literals. Tweak and adjust code so that the SIL optimizer can constant-fold small strings from literals. Also some cleanup.	2018-11-04 10:42:41 -08:00
Michael Ilseman	7aea40680d	[String] NFC iterator fast-paths Refactor and rename _StringGutsSlice, apply NFC-aware fast paths to a new buffered iterator. Also, fix bug in _typeName which used to assume ASCIIness and better SIL optimizations on StringObject.	2018-11-04 10:42:41 -08:00
Michael Ilseman	8851bac1be	[String] Inlining, NFC fast paths, and more. Add inlinability annotations to restore performance parity with 4.2 String. Take advantage of known NFC as a fast-path for comparison, and overhaul comparison dispatch. RRC improvements and optmizations.	2018-11-04 10:42:41 -08:00
Michael Ilseman	9d9f9005e3	[String] Define performance flags and plumb them throughout	2018-11-04 10:42:41 -08:00
Michael Ilseman	fe7c3ce2e4	[String] Refactorings and cleanup * Refactor out RRC implementation into dedicated file. * Change our `_invariantCheck` pattern to generate efficient code in asserts builds and make the optimizer job's easier. * Drop a few Bidi shims we no longer need. * Restore View decls to String, workaround no longer needed * Cleaner unicode helper facilities	2018-11-04 10:42:40 -08:00
Michael Ilseman	7c00552729	[String] In-place append and other RRC improvements	2018-11-04 10:42:40 -08:00
Michael Ilseman	9bf2c4d3d3	[String] Use small string at string creation	2018-11-04 10:42:40 -08:00
Michael Ilseman	89d18e1a3a	[String] Refactor helper code into UnicodeHelpers.swift. Clean up some of the index assumptions, stick index-aware methods on _StringGuts, and otherwise migrate code over to UnicodeHelpers.swift.	2018-11-04 10:42:40 -08:00
Michael Ilseman	4ab45dfe20	[String] Drop in initial UTF-8 String prototype This is a giant squashing of a lot of individual changes prototyping a switch of String in Swift 5 to be natively encoded as UTF-8. It includes what's necessary for a functional prototype, dropping some history, but still leaves plenty of history available for future commits. My apologies to anyone trying to do code archeology between this commit and the one prior. This was the lesser of evils.	2018-11-04 10:42:40 -08:00
Brent Royal-Gordon	9bd1a26089	Implementation for SE-0228: Fix ExpressibleByStringInterpolation (#20214 ) * [CodeCompletion] Restrict ancestor search to brace This change allows ExprParentFinder to restrict certain searches for parents to just AST nodes within the nearest surrounding BraceStmt. In the string interpolation rework, BraceStmts can appear in new places in the AST; this keeps code completion from looking at irrelevant context. NFC in this commit, but keeps code completion from crashing once TapExpr is introduced. * Remove test relying on ExpressibleByStringInterpolation being deprecated Since soon enough, it won’t be anymore. * [AST] Introduce TapExpr TapExpr allows a block of code to to be inserted between two expressions, accessing and potentially mutating the result of its subexpression before giving it to its parent expression. It’s roughly equivalent to this function: func _tap<T>(_ value: T, do body: (inout T) throws -> Void) rethrows -> T { var copy = value try body(&copy) return copy } Except that it doesn’t use a closure, so no variables are captured and no call frame is (even notionally) added. This commit does not include tests because nothing in it actually uses TapExpr yet. It will be used by string interpolation. * SE-0228: Fix ExpressibleByStringInterpolation This is the bulk of the implementation of the string interpolation rework. It includes a redesigned AST node, new parsing logic, new constraints and post-typechecking code generation, and new standard library types and members. * [Sema] Rip out typeCheckExpressionShallow() With new string interpolation in place, it is no longer used by anything in the compiler. * [Sema] Diagnose invalid StringInterpolationProtocols StringInterpolationProtocol informally requires conforming types to provide at least one method with the base name “appendInterpolation” with no (or a discardable) return value and visibility at least as broad as the conforming type’s. This change diagnoses an error when a conforming type does not have a method that meets those criteria. * [Stdlib] Fix map(String.init) source break Some users, including some in the source compatibility suite, accidentally used init(stringInterpolationSegment:) by writing code like `map(String.init)`. Now that these intializers have been removed, the remaining initializers often end up tying during overload resolution. This change adds several overloads of `String.init(describing:)` which will break these ties in cases where the compiler previously selected `String.init(stringInterpolationSegment:)`. * [Sema] Make callWitness() take non-mutable arrays It doesn’t actually need to mutate them. * [Stdlib] Improve floating-point interpolation performance This change avoids constructing a String when interpolating a Float, Double, or Float80. Instead, we write the characters to a fixed-size buffer and then append them directly to the string’s storage. This seems to improve performance for all three types, but especially for Double and Float80, which cannot always fit into a small string when stringified. * [NameLookup] Improve MemberLookupTable invalidation In rare cases usually involving generated code, an overload added by an extension in the middle of a file would not be visible below it if the type had lazy members and the same base name had already been referenced above the extension. This change essentially dirties a type’s member lookup table whenever an extension is added to it, ensuring the entries in it will be updated. This change also includes some debugging improvements for NameLookup. * [SILOptimizer] XFAIL dead object removal failure The DeadObjectRemoval pass in SILOptimizer does not currently remove reworked string interpolations as well as the old design because their effects cannot be described by @_effects(readonly). That causes a test failure on Linux. This change temporarily silences that test. The SILOptimizer issue has been filed as SR-9008. * Confess string interpolation’s source stability sins * [Parser] Parse empty interpolations Previously, the parser had an odd asymmetry which caused the same function to accept foo(), but reject “\()”. This change fixes the issue. Already tested by test/Parse/try.swift, which uses this construct in one of its throwing interpolation tests. * [Sema] Fix batch-mode-only lazy var bug The temporary variable used by string interpolation needs to be recontextualized when it’s inserted into a synthesized getter. Fixes a compilation failure in Alamofire. I’ll probably follow up on this bug a bit more after merging.	2018-11-02 19:16:03 -07:00
Michael Munday	2866b4a30d	[string] Fix small string implementation for big endian platforms Exclusively store small strings in little-endian byte order. This will insert byte swaps when accessing small strings on big endian platforms, however these are usually extremely cheap. This approach means that the layout of the code points and count in memory will be the same on both big and little endian machines simplifying future development. Prior to this change this code was broken on big endian machines because the memory layout was different (the count ending up in the middle of the string).	2018-09-24 13:10:09 +01:00
Michael Ilseman	8294c0003a	[string] Drop _StringGuts subscript; NFC _StringGuts shouldn't expose a subscript, implying efficient access. Switch to the explicit code unit fetch method. Update tests accordingly, and switch off of deprecated typealiases.	2018-08-02 16:34:22 -07:00
Michael Ilseman	ced2e63d95	[test] Make string internal testing a little more robust; NFC Add an isSmall query to Character so testing doesn't have to bake in internal format. Clarify the purpose of the invalid UTF-16 backdoor creation method.	2018-08-02 16:34:19 -07:00
Michael Ilseman	336ae86bc5	[string] Un-break Linux build	2018-08-02 10:17:03 -07:00
Michael Ilseman	ba6158d74e	[test] Internalize _StringGuts; Add shared testing struct; NFC Create a _StringRepresentation struct to standardize internal testing on. Internalize much of _StringGuts, except for some SPI hacks, and update tests to use _StringRepresentation.	2018-08-01 14:23:56 -07:00
Sho Ikeda	6d95b8f3a2	[String] Remove `_core` which is not used in swift-corelibs-foundation anymore	2018-06-18 22:03:02 +09:00
Erik Little	863f3a19ff	Rename @effects to @_effects @effects is too low a level, and not meant for general usage outside the standard library. Therefore it deserves to be underscored like other such attributes.	2018-06-06 12:53:03 -04:00
Michael Ilseman	1fe5fb717d	[string] Skip allocation in reserveCapacity if smol If the requested capacity is small enough to fit in our small string representation, don't allocate a UTF-16 buffer, instead just return early.	2018-05-18 21:26:59 -07:00
Michael Ilseman	b2ad5b6f8a	[string] Kill _StringGuts.Iterator _StringGuts is not meant to be an abstraction across all the forms a String may take. It's meant to abstract the book-keeping and the visitor is a parameterization over operations.	2018-05-14 07:01:44 -07:00
Michael Ilseman	00e214ec50	[string] Clean up String.UTF8View Extract slow paths into non-inlinable functions so that fast-paths can be faster and we don't pay the large code bloat for the Unicode parsers. Some tests proactively extended to highlight UTF8View of multiple kinds of Strings.	2018-05-14 07:01:38 -07:00
Michael Ilseman	4a368ab46c	[string] Drop many @inlinable from big API. Drop append-related @inlinable annotations for String, StringGuts, StringStorage, and the Views. Drop several for larger operations, such as case conversion. Drop as many as we can from StringGuts for now.	2018-05-13 07:38:55 -07:00
Michael Ilseman	459833725e	[String] Streamline more String creation logic. Streamline and de-genericize non-inlinable internal functions to create a String from UTF-8 efficiently.	2018-05-13 07:38:55 -07:00
Erik Eckstein	9a961208ec	stdlib: remove some @inlineables from String API functions. Beside the general goal to remove inlinable functions, this reduces code size and also improves performance for several benchmarks. The performance problem was that by inlining top-level String API functions into client code (like String.count) it ended up calling non-inlinable internal String functions eventually. This is much slower than to make a single call at the top-level API boundary into the library. Inside the library all the internal String functions can be specialized and inlined. rdar://problem/39921548	2018-05-03 14:37:11 -07:00
Michael Ilseman	c00a460ea3	[string] Drop StringStorage.create inlinability annotation. StringStorage.create is the primary means of allocating storage for a string, so drop inlinability to allow for future evolution. StringStorage also exposes some .appendInPlace methods, which we currently need to keep inlinable for benchmark performance. We'd really like to drop inlinability for these for evolution purposes (e.g. imagine a future version that adjusts nul-termination or changes in coordination with create). These are flagged with: ` // TODO(inlinability): @usableFromInline - P3` Where "P3" reflects urgency on a scale from 1 (stop the presses) to 5 (whatevs).	2018-04-28 15:36:12 -07:00
Michael Ilseman	715003c206	[gardening] Internalize many non-API String interfaces	2018-04-28 15:36:05 -07:00

1 2

75 Commits