algorithm
The implementation uses a specialized trie that has not been tuned to the table
data. I tried guessing parameter values that should work well, but did not do
any performance measurements.
There is no efficient way to initialize arrays with static data in Swift. The
required tables are being generated as C++ code in the runtime library.
rdar://16013860
Swift SVN r19340
underlying NSString when it ends in a high-surrogate code unit
The tests did not catch this because they were creating CFString, which,
as it turns out, does not perform bounds checking. Replaced the use of
CFString with a custom NSString subclass.
Swift SVN r19329
If underlying NSString contained isolated surrogates, then we were crashing in
following ways:
- subscripting by index could crash;
- index pointing to the second code unit sequence was not moved backwards
correctly. Instead of moving it to pointing to the beginning of the view it
could be moved to point to the code unit before the beginning of the view.
Swift SVN r19230
don't call into CoreFoundation to perform UTF-8 transcoding. CoreFoundation
can replace ill-formed sequences with a single byte, which is not good enough
to implement U+FFFD insertion. Instead, use the same transcoding routine as
for contiguous buffer.
Pulled out the transcoding routine into a generic function that should be
specialized and simplified for the case when input is UnsafeArray; we should
not be losing efficiency here.
Fixes <rdar://problem/17297055> [unicode] println crashes when given string
with unpaired surrogate
Swift SVN r19157
Keep calm: remember that the standard library has many more public exports
than the average target, and that this contains ALL of them at once.
I also deliberately tried to tag nearly every top-level decl, even if that
was just to explicitly mark things @internal, to make sure I didn't miss
something.
This does export more than we might want to, mostly for protocol conformance
reasons, along with our simple-but-limiting typealias rule. I tried to also
mark things private where possible, but it's really going to be up to the
standard library owners to get this right. This is also only validated
against top-level access control; I haven't fully tested against member-level
access control yet, and none of our semantic restrictions are in place.
Along the way I also noticed bits of stdlib cruft; to keep this patch
understandable, I didn't change any of them.
Swift SVN r19145
This is motivated by <rdar://problem/17051606>.
This ends up renaming variables as well, which seems right for
consistency since we use "predicate" as variable name.
Swift SVN r19135
In UTF-8 decoder:
- implement U+FFFD insertion according to the recommendation given in the
Unicode spec. This required changing the decoder to become stateful, which
significantly increased complexity due to the need to maintain an internal
buffer.
- reject invalid code unit sequences properly instead of crashing rdar://16767868
- reject overlong sequences rdar://16767911
In stdlib:
- change APIs that assume that UTF decoding can never fail to account for
possibility of errors
- fix a bug in UnicodeScalarView that could cause a crash during backward
iteration if U+8000 is present in the string
- allow noncharacters in UnicodeScalar. They are explicitly allowed in the
definition of "Unicode scalar" in the specification. Disallowing noncharacters
in UnicodeScalar prevents actually using these scalar values as internal
special values during string processing, which is exactly the reason why they
are reserved in the first place.
- fix a crash in String.fromCString() that could happen if it was passed a null
pointer
In Lexer:
- allow noncharacters in string literals. These Unicode scalar values are not
allowed to be exchanged externally, but it is totally reasonable to have them
in literals as long as they don't escape the program. For example, using
U+FFFF as a delimiter and then calling str.split("\uffff") is completely
reasonable.
This is a lot of changes in a single commit; the primary reason why they are
lumped together is the need to change stdlib APIs to account for the
possibility of UTF decoding failure, and this has long-reaching effects
throughout stdlib where these APIs are used.
Swift SVN r19045
Once we know that the storage is contiguous we use the new API _NthContiguous.
We can further optimize this code by specializing the access to ascii or UTF-16.
Swift SVN r18167