[stdlib] Update UTF8Span documentation (#83418)

Amend formatting of `Substring.utf8Span` example code.
Use DocC tables in `Unicode.UTF8.ValidationError` overview.

---------

Co-authored-by: Alex Martini <amartini@apple.com>
This commit is contained in:
Ben Rimmington
2025-08-01 14:56:19 +01:00
committed by GitHub
parent d63bbb9d0c
commit b57b8368ac
7 changed files with 197 additions and 110 deletions

View File

@@ -1,3 +1,15 @@
//===----------------------------------------------------------------------===//
//
// This source file is part of the Swift.org open source project
//
// Copyright (c) 2025 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
// See https://swift.org/CONTRIBUTORS.txt for the list of Swift project authors
//
//===----------------------------------------------------------------------===//
extension Unicode.UTF8 {
/**
@@ -5,21 +17,17 @@ extension Unicode.UTF8 {
Valid UTF-8 is represented by this table:
```
Scalar value Byte 0 Byte 1 Byte 2 Byte 3
U+0000..U+007F 00..7F
U+0080..U+07FF C2..DF 80..BF
U+0800..U+0FFF E0 A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF
```
| Scalar value | Byte 0 | Byte 1 | Byte 2 | Byte 3 |
| ------------------ | ------ | ------ | ------ | ------ |
| U+0000..U+007F | 00..7F | | | |
| U+0080..U+07FF | C2..DF | 80..BF | | |
| U+0800..U+0FFF | E0 | A0..BF | 80..BF | |
| U+1000..U+CFFF | E1..EC | 80..BF | 80..BF | |
| U+D000..U+D7FF | ED | 80..9F | 80..BF | |
| U+E000..U+FFFF | EE..EF | 80..BF | 80..BF | |
| U+10000..U+3FFFF | F0 | 90..BF | 80..BF | 80..BF |
| U+40000..U+FFFFF | F1..F3 | 80..BF | 80..BF | 80..BF |
| U+100000..U+10FFFF | F4 | 80..8F | 80..BF | 80..BF |
### Classifying errors
@@ -49,8 +57,8 @@ extension Unicode.UTF8 {
encodings are invalid UTF-8 and can lead to security issues if not
correctly detected:
- https://nvd.nist.gov/vuln/detail/CVE-2008-2938
- https://nvd.nist.gov/vuln/detail/CVE-2000-0884
- <https://nvd.nist.gov/vuln/detail/CVE-2008-2938>
- <https://nvd.nist.gov/vuln/detail/CVE-2000-0884>
An overlong encoding of `NUL`, `0xC0 0x80`, is used in Java's Modified
UTF-8 but is invalid UTF-8. Overlong encoding errors often catch attempts
@@ -85,15 +93,11 @@ extension Unicode.UTF8 {
the reported range. Similarly, constructing a single error for the longest
invalid byte range can be constructed by joining adjacent error ranges.
```
61 F1 80 80 E1 80 C2 62
Longest range U+61 err U+62
Maximal subpart U+61 err err err U+62
Error per byte U+61 err err err err err err U+62
```
| Algorithm | 61 | F1 | 80 | 80 | E1 | 80 | C2 | 62 |
| --------------- | ---- | --- | --- | --- | --- | --- | --- | ---- |
| Longest range | U+61 | err | | | | | | U+62 |
| Maximal subpart | U+61 | err | | | err | | err | U+62 |
| Error per byte | U+61 | err | err | err | err | err | err | U+62 |
*/
@available(SwiftStdlib 6.2, *)