Commit Graph

177 Commits

Author SHA1 Message Date
Michael Ilseman
463e3747a8 [gardening] Factor out String bidi conformance
Add StringCharacterView.swift for String's bidi conformance. NFC.
2018-07-25 14:14:37 -07:00
Ben Cohen
a4230ab2ad [stdlib] Update stdlib to 4.0 and reorganize compatibility shims (#17580)
* Update stdlib to 4.0 and move all compatibility shims into a dedicated source file
2018-06-29 06:26:52 -07:00
Ben Cohen
a51cc89b11 Replace _CharacterView with a typealias (#17472) 2018-06-25 13:22:09 -07:00
Karoy Lorentey
23c630ac92 [stdlib] Add @usableFromInline to internal typealiases that need it
This fixes 3659 warnings in the standard library.
2018-06-18 16:34:19 +01:00
Michael Ilseman
3ee17102ed [String.Index] Restore compound offsets.
Move the shifts to index creation time rather than index comparison
time. This seems to benefit micro benchmarks and cover up
inefficiencies in our generic index distance calculations.
2018-05-25 09:54:35 -07:00
Michael Ilseman
614016fecd [String.Index] Simplify and prepare for more resilience.
Simplify String.Index by sinking transcoded offsets into the .utf8
variant. This is in preparation for a more resilient index type
capable of supporting existential string indices.
2018-05-24 14:47:04 -07:00
Michael Ilseman
4a368ab46c [string] Drop many @inlinable from big API.
Drop append-related @inlinable annotations for String, StringGuts,
StringStorage, and the Views. Drop several for larger operations, such
as case conversion. Drop as many as we can from StringGuts for now.
2018-05-13 07:38:55 -07:00
Nate Cook
58933d88c5 [stdlib] Rename index(...) methods to firstIndex(...)
A la SE-204.
2018-04-21 18:07:25 -05:00
Slava Pestov
2e5aef9c8d stdlib: Remove redundant @usableFromInline attributes 2018-04-06 00:02:30 -07:00
Slava Pestov
e1f50b2d36 SE-0193: Rename @_inlineable to @inlinable, @_versioned to @usableFromInline 2018-03-30 21:55:30 -07:00
Karoy Lorentey
e6afe829a1 [stdlib] Silence deprecation warnings about CharacterView in stdlib
- Rename `Substring.CharacterView` to `Substring._CharacterView`, adding a deprecated typealias for the original name, like we do for `String.CharacterView`.
- Add a non-deprecated `Substring._characters` property, emulating `String.characters`.
- Explicitly deprecate the following members:
    * String.withMutableCharacters<R>(_: (inout CharacterView) -> R) -> R
    * String.subscript(Range<Index>) -> String.CharacterView
    * Substring._CharacterView.subscript(Range<Index>) -> Substring.CharacterView
    * Substring.init(_: CharacterView)
    * String.init(_: Substring.CharacterView)
2018-01-24 21:16:48 +00:00
Michael Ilseman
3be2faf5d3 [String] Initial implementation of 64-bit StringGuts.
Include the initial implementation of _StringGuts, a 2-word
replacement for _LegacyStringCore. 64-bit Darwin supported, 32-bit and
Linux support in subsequent commits.
2018-01-21 12:32:26 -08:00
Michael Ilseman
75463e30f3 [stdlib] Rename _StringCore to _LegacyStringCore. NFC.
In grand LLVM tradition, the first step to redesigning _StringCore is
to first rename it to _LegacyStringCore. Subsequent commits will
introduce the replacement, and eventually all uses of the old one will
be moved to the new one.

NFC.
2018-01-21 12:28:56 -08:00
Ben Cohen
4ddac3fbbd [stdlib] Eradicate IndexDistance associated type (#12641)
* Eradicate IndexDistance associated type, replacing with Int everywhere

* Consistently use Int for ExistentialCollection’s IndexDistance type.

* Fix test for IndexDistance removal

* Remove a handful of no-longer-needed explicit types

* Add compatibility shims for non-Int index distances

* Test compatibility shim

* Move IndexDistance typealias into the Collection protocol
2017-12-08 12:00:23 -08:00
Ben Cohen
dcab9493ae Removed some warnings (#12753) 2017-11-30 15:12:56 -08:00
Max Moiseev
a24998a5b1 [stdlib] Add missing @_fixed_layout attributes to fix resilience build 2017-10-02 15:19:06 -07:00
Max Moiseev
ef6b5c4795 Add missing @_inlineable attributes and deinits 2017-09-29 11:26:56 -07:00
Max Moiseev
53b8419279 [stdlib] Make all the stdlib APIs @_inlineable
This change in theory should allow us to remove a special stdlib-only
sil-serialize-all compilation mode.

<rdar://problem/34138683>
2017-09-29 11:26:56 -07:00
swift-ci
79a3f9c415 Merge pull request #11670 from natecook1000/nc-rev-77-2 2017-09-19 10:15:59 -07:00
Nate Cook
050268d876 [stdlib] Documentation revisions
- Update NSRange -> Range guidance
- Fix example in Optional
- Improve RangeExpression docs
- Fix issue in UnsafeRawBufferPointer.initializeMemory
- Code point -> scalar value most places
- Reposition the dot above the scripty `i'
- Fix ExpressibleByArrayLiteral code sample
2017-08-29 09:41:55 -05:00
Maxim Moiseev
ee5fb33656 [stdlib] Remove the Grand Renaming artifacts of Swift 3 era 2017-08-28 15:54:11 -07:00
Michael Ilseman
7c705c3a75 [stdlib] Deprecate String/Substring.CharacterView
CharacterView is now entirely redundant in Swift 4. Deprecate its
use. This also allows us to schedule the unbreaking of
String.CharacterView leakiness without a hard source break.
2017-08-10 17:24:06 -07:00
Dave Abrahams
9159239995 Un-revert "[stdlib] String index interchange, etc." (#10812)
I failed to merge the upstream changes to swift-corelibs-foundation at the same
time as I merged that #9806, and it broke on linux. Going to get it right this
time.
2017-07-07 12:13:25 -07:00
Xi Ge
d9fb110674 Revert "[stdlib] String index interchange, etc." (#10812)
rdar://33186295
2017-07-07 12:03:16 -07:00
Dave Abrahams
e523c80339 [stdlib] Index interchange, part I 2017-07-07 00:59:04 -07:00
Michael Ilseman
5bc20cba08 [stdlib] Clean up non-contiguous string grapheme breaking code.
Removes the legacy grapheme breaking code paths. Simplifies and
clarifies the non-contiguous grapheme breaking code through consistent
naming and handling of absolute positions vs relative offsets.
2017-06-28 15:46:44 -07:00
Michael Ilseman
b3b28e0c50 [gardening] 80 columns; NFC 2017-06-28 15:46:39 -07:00
Michael Ilseman
a37a823e6e [stdlib] Update non-contiguous NSStrings to Unicode 9
This adds Unicode 9 grapheme breaking support for non-contiguous
NSStrings. Non-contiguous NSStrings that don't hit our fast paths are
very rare, but should still behave identically to contiguous
strings.

We first copy a fixed number of code units into a fixed size buffer
(currently 16 in size) and try to grapheme break inside of that
buffer. This is sufficient storage for all known non-pathological
graphemes. Any graphemes larger than the buffer are handled by copying
larger portions of the string into an Array.

Test cases added, including pathological "zalgo" text that stresses
extremely long graphemes.
2017-06-28 15:35:25 -07:00
Michael Ilseman
4c0ba61e53 [gardening] Remove done TODO comments 2017-06-27 20:37:16 -07:00
Michael Ilseman
bd5189c25a [String] Grapheme fast paths for punctuation: 5-8x speedup.
Many strings use non-sub-300 punctuation characters (e.g. unicode
hyphen, CJK quotes, etc). This can cause switching between fast and
slow paths for grapheme breaking. Add in fast-paths for general
punctuation characters and CJK punctuation and symbol characters.

This results in about a 5-8x speedup for heavily (unicode) punctuated
Latiny and CJKy workloads.
2017-06-27 19:18:51 -07:00
Nate Cook
825e9d077d [stdlib] More documentation revisions / consistency fixes. 2017-06-13 14:08:00 -05:00
Dave Abrahams
562fd79aa6 [stdlib] Encode small Characters as UTF-16
This takes care of the standard library portion, but we need a new
BuiltinUTF16ExtendedGraphemeClusterLiteralConvertible protocol in order to
fully recover the performance of character literals.

Note that part of the character_literals.swift test is currently disabled.  That
will need to be fixed before we can merge this work.
2017-06-01 20:57:25 -07:00
Michael Ilseman
44cccba22d [stdlib] Change dynamic check to sanity check.
Double-checking for CR-LF is redundant in
_internalExtraCheckGraphemeBreakBetween. Add in a sanity check and
omit the overly conservative CR check.
2017-05-31 14:55:24 -07:00
Michael Ilseman
0a88de53d3 [stdlib] Grapheme break fast-paths for Cyrillic, Arabic, Hangul
Add in more grapheme break fast paths for scripts based on Cyrillic,
Arabic, or Hangul. Generates significant performance wins, similar to
those for the unihan fast paths.

While every extra check does slow down the runtime of
_internalExtraCheckGraphemeBreakBetween as currently implemented, I've
not found the performance cost to be relevant for workloads with
occasional mixed emoji contents, nor for workloads that his the
earlier checks. A pure Korean workload (currently the last check) does
pays a rather noticable price for the previous checks, but this is
only because the workload is now so greatly improved. Optimizing this
implementation is interesting future work, but not urgent.
2017-05-31 11:09:43 -07:00
Dave Abrahams
801b9c5544 [stdlib] Move specialization from init to append
Since init just calls append anyway, it's 2 birds/1 stone
2017-05-24 16:10:34 -07:00
Dave Abrahams
794a287c27 Kill a stray TAB
How'd that get in there?

Thanks, @moiseev
2017-05-24 04:10:25 -07:00
Dave Abrahams
3d789cff2d Inlineable character fast paths 2017-05-23 01:42:28 -07:00
swift-ci
c0623c42ce Merge pull request #9722 from apple/stringprotocol-interchange 2017-05-17 19:13:25 -07:00
Dave Abrahams
d6fee05375 [stdlib] Enable interchange among StringProtocol models 2017-05-17 17:21:43 -07:00
Michael Ilseman
97511d65bf [stdlib] Unicode 9 here we come: use ICU for grapheme breaking
Use UBreakIterators to perform grapheme breaking. This gives Unicode 9
grapheme breaking (e.g. family emoji) and provides a means to upgrade
to future versions. It also serves as a model for how to serve up
other advanced functionality in ICU to users.

This has tricky performance implications. Some things are faster and a
number of cases are slower. But, careful use of ICU can help mitigate
and amortize these costs. In conjunction with more early detection of
fast paths, overall grapheme breaking for the average user should be
much faster than in Swift 3.

NOTE: This is incomplete. It currently falls back on the legacy tries
for some bridged strings. There are many potential directions for a
general solution, but for now we'll be interatively adding support for
more and more special cases.
2017-05-16 20:29:21 -07:00
practicalswift
aae419ad30 [gardening] Fix word processing artefacts 2017-05-15 11:30:25 +02:00
Michael Ilseman
fb5734c24f Merge pull request #9575 from milseman/unihan_fasterhan
[stdlib] String: Walk Chinese/Japanese faster: 2x/4x forwards/backwards
2017-05-14 13:50:15 -07:00
Ben Cohen
ea2f64cad2 [stdlib] Add Sequence.Element, change ExpressibleByArrayLiteral.Element to ArrayLiteralElement (#8990)
* Give Sequence a top-level Element, constrain Iterator to match

* Remove many instances of Iterator.

* Fixed various hard-coded tests

* XFAIL a few tests that need further investigation

* Change assoc type for arrayLiteralConvertible

* Mop up remaining "better expressed as a where clause" warnings

* Fix UnicodeDecoders prototype test

* Fix UIntBuffer

* Fix hard-coded Element identifier in CSDiag

* Fix up more tests

* Account for flatMap changes
2017-05-14 06:33:25 -07:00
Nate Cook
f650e0a7da [stdlib] String and range expressions
* finish string documentation revisions
* revise examples throughout to use range expressions instead of e.g.
  prefix(upTo: _)
2017-05-13 10:06:12 -05:00
Michael Ilseman
f08ee0fd93 [stdlib] Walk Chinese/Japanese faster: 2x/4x forwards/backwards
This adds more fast path checks for grapheme breaks between BMP
scalars. Notably the rather vast range of 0x3400–0xA4CF which includes
unified common Han ideographs as well as the first extension to
unified Han ideographs. It also happens to pick up various Yijin and
Yi symbols/radicals. Additionally, the narrow hiragana/katakana ranges
0x3041-0x3096 and 0x30A1-0x30FA (including pre-composed semi-voiced
characters but excluding the combining semi-voice marks) have fast
paths.

The net effect is that the vast majority of modern Chinese and
Japanese text should be fast-pathed. This is especially important, as
adopting Unicode 9 might otherwise pessimize performance here relative
to the tries.
2017-05-12 13:13:33 -07:00
Dave Abrahams
ddf7ad517f UnicodeScalar => Unicode.Scalar 2017-05-11 15:23:25 -07:00
Michael Ilseman
f0abff5539 Revert "Merge pull request #9265 from milseman/tls_ftw"
This reverts commit 26f7659efe, reversing
changes made to 7b927e55e8.
2017-05-11 10:39:58 -07:00
Michael Ilseman
18104c616c [stdlib] Unicode 9 here we come: use ICU for grapheme breaking
Use UBreakIterators to perform grapheme breaking. This gives Unicode 9
grapheme breaking (e.g. family emoji) and provides a means to upgrade
to future versions. It also serves as a model for how to serve up
other advanced functionality in ICU to users.

This has tricky performance implications. Some things are faster and a
number of cases are slower. But, careful use of ICU can help mitigate
and amortize these costs. In conjunction with more early detection of
fast paths, overall grapheme breaking for the average user should be
much faster than in Swift 3.

NOTE: This is incomplete. It currently falls back on the legacy tries
for some bridged strings. There are many potential directions for a
general solution, but for now we'll be interatively adding support for
more and more special cases.
2017-05-10 15:21:08 -07:00
Max Moiseev
178b9f0b44 [stdlib] Adding bounds check in a.subscript(Index) fast path
UnsafeBufferPoiunter subscript used in the fast path only checks bounds
in Debug mode, therefore extra checks are needed.

Addresses: <rdar://problem/31992473>
2017-05-05 15:26:24 -07:00
Michael Ilseman
47d0247476 [stdlib] Speed up Character construction from CharacterView.subscript (#9252)
This adds a fast path for single-code-unit Character
construction. Rather than use the general purpose String based
initializer (which then repeats grapheme breaking to ensure a trap,
amongst other inefficiencies), just make the Character from the single
unicode scalar value directly.

This also speeds up simple iteration of BMP strings when the optimizer
is unable to eliminate the subscript. Around 2x for ASCII, and around
20% for BMP UTF16.
2017-05-04 06:59:30 -07:00