[String] Grapheme fast paths for punctuation: 5-8x speedup.

Many strings use non-sub-300 punctuation characters (e.g. unicode
hyphen, CJK quotes, etc). This can cause switching between fast and
slow paths for grapheme breaking. Add in fast-paths for general
punctuation characters and CJK punctuation and symbol characters.

This results in about a 5-8x speedup for heavily (unicode) punctuated
Latiny and CJKy workloads.
This commit is contained in:
Michael Ilseman
2017-06-15 13:19:13 -07:00
parent 7580d6a2cb
commit bd5189c25a
4 changed files with 207 additions and 25 deletions

View File

@@ -344,6 +344,14 @@ extension String.CharacterView : BidirectionalCollection {
// 0xAC000xD7AF
case 0xac00...0xd7af: return true
// Common general use punctuation, excluding extenders:
// 0x2010-0x2029
case 0x2010...0x2029: return true
// CJK punctuation characters, excluding extenders:
// 0x3000-0x3029
case 0x3000...0x3029: return true
default: return false
}
}