34 Commits

Author SHA1 Message Date
Kovid Goyal
294de16898 Use ms table for remaining UCD lookups 2025-03-25 15:41:34 +05:30
Kovid Goyal
9f7643078c Use unicode multi-table for remaining hot path lookups
Results in a 15% improvement in the unicode throughput benchmark
2025-03-24 15:04:33 +05:30
Kovid Goyal
cabd6c0589 Initial port of code to use TextCache 2024-11-04 09:10:07 +05:30
Kovid Goyal
8cc2cad4d9 Use list of legal chars in URL from the WHATWG standard
Notably this excludes some ASCII chars: <>{}[]`|
See https://url.spec.whatwg.org/#url-code-points

Fixes #7095
2024-02-05 13:27:22 +05:30
Sergei Grechanik
d63eeada73 Image placement using Unicode placeholders
This commit introduces the Unicode placeholder image placement method.
In particular:
- Virtual placements can be created by passing `U=1` in a put command.
- Images with virtual placements can be displayed using the placeholder
  character `U+10EEEE` with diacritics indicating rows and columns.
- The image ID is indicated by the foreground color of the placeholder.
  Additionally, the most significant byte of the ID can be specified via
  the third diacritic.
- Underline color can be optionally used to specify the placement ID.
- A bug was fixed, which caused incomplete image removal when it was
  overwritten by another image with the same ID.
2023-02-21 18:23:16 -08:00
pagedown
13a3c6b5b2 Update to Unicode 15.0 2022-09-29 10:13:21 +08:00
Kovid Goyal
d875615c03 Fix a regression in the handling of some combining characters such as zero width joiners
Fixes #4439
2022-01-05 08:50:55 +05:30
Kovid Goyal
fbf47f75d5 Fix soft hyphens not being preserved when round tripping text through the terminal
Also roundtrip all characters in the Cf category.

Characters with the DI (Default Ignorable) property are now
preserved but not rendered and treated as zero-width
as per the unicode standard.
See https://www.unicode.org/faq/unsup_char.html
2021-10-07 12:44:22 +05:30
Kovid Goyal
31e623afb3 Add support for Unicode 14
Fixes #3542
2021-10-04 14:00:35 +05:30
Kovid Goyal
397638998b Dont use static memory for the list of chars options
Saves a couple of KB of RAM and is more flexible in terms
of max number of allowed chars, although for large numbers one really
needs a hash for fast lookups.
2021-06-17 13:27:11 +05:30
Kovid Goyal
6ddbda00df Clean up url excluded chars PR 2021-06-17 13:11:23 +05:30
Radu Butoi
5ee0651f56 Add url_excluded_characters option to exclude characters from URLs.
This option, like select_by_word_characters, is a set of characters, but
for which to *exclude* from URL parsing. See
https://github.com/kovidgoyal/kitty/issues/3688#issuecomment-862711148.
2021-06-17 01:55:21 -04:00
Kovid Goyal
81411e6b54 Fix trailing parentheses in URLs not being detected
Also fix URLs starting near the end of the line not being detected.

Fixes #3688
2021-06-04 18:13:36 +05:30
Kovid Goyal
9bc2ab3245 Function to detect flag pairs 2020-04-06 21:16:14 +05:30
Kovid Goyal
bf4e8c490c Update to Unicode 13.0
Fixes #2513
2020-04-06 18:59:35 +05:30
Kovid Goyal
e86c712424 Dont strip :code:& and :code:- from the end of URLs
Fixes #2436
2020-03-15 08:29:56 +05:30
Kovid Goyal
1fcd6e1811 macOS: Fix finding fallback font for private use unicode symbols not working reliably
Fixes #1650
2019-06-30 18:11:58 +05:30
Kovid Goyal
facd353228 Update to using the Unicode 12 standard 2019-03-06 13:58:16 +05:30
Kovid Goyal
094ddd9333 Round-trip the zwj unicode character
Rendering of sequences containing zwj is still not implemented, since it
can cause the collapse of an unbounded number of characters into a
single cell. However, kitty at least preserves the zwj by storing it as
a combining character.
2018-08-04 18:29:45 +05:30
Kovid Goyal
61dd52b50f Ignore the non-characters from the unicode standard in addition to ignoring the control characters 2018-06-14 10:20:13 +05:30
Kovid Goyal
ff2e5b3966 Avoid unnecessary calls to mark_for_codepoint 2018-02-06 11:23:39 +05:30
Kovid Goyal
80301d465b Handle non-BMP combining characters
Use a level of indirection to store combining characters. This allows
combining characters to be stored using only two bytes, even if they are
after USHORT_MAX
2018-01-18 16:25:42 +05:30
Kovid Goyal
5faa649452 Drop the dependency on libunistring 2018-01-18 00:09:40 +05:30
Kovid Goyal
ed700ff830 ... 2018-01-17 21:59:10 +05:30
Kovid Goyal
33ed873997 Remove unnecessary extra test for combinig characters
There are not combining characters with a non-zero combining class that
are not in the marks category.
2018-01-17 21:56:30 +05:30
Kovid Goyal
804c4fbe19 Recognize characters from the unicode Mark categories as combining characters, even if they do not have a combining class (i.e. are not re-ordered). Fixes #286 2018-01-15 11:24:11 +05:30
Kovid Goyal
0fcce6ec58 Remove trailing whitespace from native code files 2017-12-20 08:44:47 +05:30
Kovid Goyal
464291bbb1 Port click on URL code to C 2017-09-15 10:45:27 +05:30
Kovid Goyal
ed3427f349 Dont use the python unicodedata module as we use libunistring
No sense in loading two hude unicode datasets into memory
2017-09-15 10:45:27 +05:30
Kovid Goyal
1c1d0a4e91 Port mouse cursor change over hyperlinks to C 2017-09-15 10:45:25 +05:30
Kovid Goyal
9bea1001f9 Speedup unicode character property lookup
Use libunistring instead of building predicates from the unicode
database
2017-09-15 10:45:19 +05:30
Kovid Goyal
3989413ff9 ... 2016-11-19 14:41:40 +05:30
Kovid Goyal
5dc0b9af13 Fix unicode data generation 2016-11-13 13:57:02 +05:30
Kovid Goyal
fab2213c25 More work on native streams 2016-11-13 10:24:00 +05:30