mirror of
https://github.com/apple/swift.git
synced 2025-12-21 12:14:44 +01:00
Replace the inline sed commands with sed scripts to avoid the subshells on Windows. Additionally, the unicode handling on Windows causes problems and using the scripts circumvents that problem. Implement an inline dos2unix as the BSD sed does not support `-b` and on Windows, sed will convert the line endings.
33 lines
1.0 KiB
Sed
33 lines
1.0 KiB
Sed
|
||
# [0xC2] is utf8 2 byte character start byte.
|
||
# 0xC2 without second byte is invalid UTF-8 sequence.
|
||
# It becomes garbage text trivia.
|
||
# Marker(1) is replaced to this sequence.
|
||
s/Z1/Â/g
|
||
|
||
# [0xCC, 0x82] in UTF-8 is U+0302.
|
||
# This character is invalid for identifier start, but valid for identifier body.
|
||
# It becomes unknown token.
|
||
# If this type characters are conitguous, they are concatenated to one long unknown token.
|
||
# Marker(2) is replaced to this sequence.
|
||
s/Z2/Ì‚/g
|
||
|
||
# [0xE2, 0x80, 0x9C] in UTF-8 is U+201C, left quote.
|
||
# It becomes single character unknown token.
|
||
# If this left quote and right quote enclosure text,
|
||
# they become one long unknown token.
|
||
# Marker(3) is replaced to this sequence.
|
||
s/Z3/“/g
|
||
|
||
# [0xE2, 0x80, 0x9D] in UTF-8 is U+201D, right quote.
|
||
# It becomes single character unknown token.
|
||
# Marker(4) is replaced to this sequence.
|
||
s/Z4/â€<EFBFBD>/g
|
||
|
||
# [0xE1, 0x9A, 0x80] in UTF-8 is U+1680.
|
||
# This character is invalid for swift source.
|
||
# It becomes garbage trivia.
|
||
# Marker(5) is replaced to this sequence.
|
||
s/Z5/ /g
|
||
|