47 Commits

Author SHA1 Message Date
Kovid Goyal
82e2fe82d6 Add a couple more gseg tests 2025-04-11 13:34:16 +05:30
Kovid Goyal
c03dd673ae Restore fast path for printable ASCII 2025-04-11 09:34:21 +05:30
Kovid Goyal
e976cf67fd Make GraphemeBreakProperty available globally 2025-04-11 09:34:21 +05:30
Kovid Goyal
6712169c0f ... 2025-04-01 17:18:11 +05:30
Kovid Goyal
057dde35a7 Use a two stage lookup table for segmentation
Saves one extra array lookup at no cost in size
2025-04-01 14:25:24 +05:30
Kovid Goyal
557e6547f2 ... 2025-04-01 13:31:20 +05:30
Kovid Goyal
d4d2ae969e Use a branchless check for unicode range 2025-04-01 12:32:17 +05:30
Kovid Goyal
6ecd78d9db Remove bounds checking for unicode table access in Go 2025-04-01 10:41:17 +05:30
Kovid Goyal
de1adeee5e DRYer 2025-03-31 22:01:49 +05:30
Kovid Goyal
66856e7b52 Use a multi-stage lookup table for grapheme segmentation 2025-03-31 21:51:28 +05:30
Kovid Goyal
163b3de85b Also forgot to add non-characters to invalid class 2025-03-30 10:44:26 +05:30
Kovid Goyal
a5a25fbd8c Fix missed out some codepoints when porting is_non_rendered to unicode lookup table
Fixes #8495
2025-03-30 10:40:19 +05:30
Kovid Goyal
2eed7b62ab More work on seg lookup tables 2025-03-29 09:35:44 +05:30
Kovid Goyal
d9d483d2c1 More work on segmentation lookup table 2025-03-29 08:49:52 +05:30
Kovid Goyal
01cdfcd002 Work on table based lookup for grapheme segmentation 2025-03-28 15:06:48 +05:30
Kovid Goyal
3e50588525 Add a test for PUA recog 2025-03-25 16:52:01 +05:30
Kovid Goyal
fd2bbf57e3 Make unicode category data useable in other modules 2025-03-25 16:35:09 +05:30
Kovid Goyal
294de16898 Use ms table for remaining UCD lookups 2025-03-25 15:41:34 +05:30
Kovid Goyal
aad58cf703 Declare CharProps just once 2025-03-25 14:08:47 +05:30
Kovid Goyal
d429f732e1 DRYer 2025-03-25 13:45:56 +05:30
Kovid Goyal
61ae12e0a9 DRYer 2025-03-25 13:29:11 +05:30
Kovid Goyal
b66a763ddf Use a 3 stage table for Unicode properties
Halves the data size and reduces source code size by 50x
Shows no significant runtime performance effect.
Allows for easily adding more properties to the table in the future
2025-03-25 13:16:59 +05:30
Kovid Goyal
9f7643078c Use unicode multi-table for remaining hot path lookups
Results in a 15% improvement in the unicode throughput benchmark
2025-03-24 15:04:33 +05:30
Kovid Goyal
3d0e45ace8 Use the new multi-stage unicode table for wcwidth 2025-03-24 14:20:40 +05:30
Kovid Goyal
7697a1650d Add is_emoji_presentation_base to char props table 2025-03-24 13:55:49 +05:30
Kovid Goyal
16f7380cb0 Implement grapheme segmentation in Go 2025-03-23 19:24:12 +05:30
Kovid Goyal
aa8c32006f Implement grapheme seg algo in Go 2025-03-22 14:54:58 +05:30
Kovid Goyal
7e780a2294 CharProps data for Go 2025-03-22 13:18:09 +05:30
Kovid Goyal
9663f935fb ... 2025-03-22 11:56:56 +05:30
Kovid Goyal
583a858769 Use a multistage lookup table for grapheme segmentation 2025-03-22 11:50:04 +05:30
Kovid Goyal
0d866b1f13 Add tests for grapheme segmentation
Test data provided by Unicode organisation
2025-03-13 13:48:35 +05:30
Kovid Goyal
9c1c141775 Start work on grapheme segmentation algorithm 2025-03-13 11:19:54 +05:30
Kovid Goyal
98f9a568ce Add Extended_Pictographic property 2025-03-13 10:01:41 +05:30
Kovid Goyal
039af78785 Add Indic Conjunct Break data 2025-03-13 09:18:42 +05:30
Kovid Goyal
1ee0b3369d Fix GBP generation 2025-03-13 08:37:52 +05:30
Kovid Goyal
9cb56c2775 Run gofmt on grapheme-segmentation-data 2025-03-13 07:11:21 +05:30
Kovid Goyal
dc625c5e0c Add grapheme break properties when generating wcwidth data 2025-03-13 07:06:46 +05:30
Kovid Goyal
61edc2aef7 Dont treat multicells containing narrow emoji as having emoji presentation
Fixes #8330
2025-02-14 20:37:31 +05:30
Kovid Goyal
1481fb4fe9 Dont generate mark mapping 2024-11-04 09:10:07 +05:30
Kovid Goyal
2b3f2258ff More pyugrade to 3.9 2024-08-05 11:00:51 +05:30
Kovid Goyal
9bea8bb5bc remove no longer needed code 2024-02-05 13:54:22 +05:30
Kovid Goyal
8cc2cad4d9 Use list of legal chars in URL from the WHATWG standard
Notably this excludes some ASCII chars: <>{}[]`|
See https://url.spec.whatwg.org/#url-code-points

Fixes #7095
2024-02-05 13:27:22 +05:30
Kovid Goyal
77292a16d6 Make shebangs consistent
Follow PEP 0394 and use /usr/bin/env python so that the python in the
users venv is respected. Not that the kitty python files are meant to be
executed standalone anyway, but, whatever.

Fixes #6810
2023-11-11 08:32:05 +05:30
Kovid Goyal
119582a9d4 Make relative imports work in gen scripts even when directly executed 2023-10-15 09:51:03 +05:30
Kovid Goyal
a79dd3996a Also move data files for gen scripts into gen dir 2023-10-14 08:04:37 +05:30
Kovid Goyal
e6ef2fceea py3.8 support 2023-10-14 07:57:03 +05:30
Kovid Goyal
56063b96fd Move gen scripts into their own package 2023-10-14 07:44:18 +05:30