Problem: blob to string conversion can be improved
Solution: Compute the output size up front and use a single alloc plus
mch_memmove() (Yasuhiro Matsumoto).
Replace per-byte ga_append/snprintf loops with bulk allocation and
mch_memmove in three hot paths: blob2string() (used by string()),
string_from_blob(), and the UTF-16/UCS path of f_blob2str(). For a
16 MiB blob, string(blob) is ~28x faster and blob2str() is ~2x faster.
Benchmark (16 MiB blob, 5 iterations, total seconds):
| | Before | After | Speedup |
|---|---:|---:|---:|
| `string(blob)` | 6.422 | 0.225 | 28.5x |
| `blob2str(b)` | 0.504 | 0.265 | 1.90x |
| `blob2str(b, {encoding: 'utf-8'})` | 0.507 | 0.282 | 1.80x |
| `blob2str(b, {encoding: 'utf-16le'})` | 0.407 | 0.202 | 2.01x |
closes: #20112
Signed-off-by: Yasuhiro Matsumoto <mattn.jp@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Duplicate code with literal string_T assignment
Solution: Add STR_LITERAL_SET() macro for string_T literal assignment
(Hirohito Higashi).
Previously, assigning a string literal to a string_T variable required
two lines that repeated the literal:
s.string = (char_u *)"open";
s.length = STRLEN_LITERAL("open");
Writing the literal twice is error-prone -- a typo in one of them
leaves the pointer and the cached length out of sync.
Add a STR_LITERAL_SET() macro in macros.h so that the assignment can
be written in one statement with the literal appearing only once:
STR_LITERAL_SET(s, "open");
Replace all occurrences of the two-line pattern across the codebase
with the new macro.
No functional change.
related: #19999
related: #20023
closes: #20025
Signed-off-by: Hirohito Higashi <h.east.727@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: C-type names are marked as translatable
Solution: Use them as-is, do not translate them
(Eisuke Kawashima)
closes: #19861
Signed-off-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
problem: unnecessary work in vim_strchr() and find_term_bykeys()
Solution: Redirect vim_strchr() to vim_strbyte() for ASCII input
Add an early exit to find_term_bykeys() using the terminal
leader table, mirroring check_termcode(). Reduces instruction
count on startup by about 27%. (Yasuhiro Matsumoto)
closes: #19902
Signed-off-by: Yasuhiro Matsumoto <mattn.jp@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Compiler warning in strings.c
(Timothy Rice, after v9.2.0031)
Solution: Return early when str_m is zero
(Hirohito Higashi)
fixes: #19795closes: #19800
Signed-off-by: Hirohito Higashi <h.east.727@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: MS-Windows: Compile warning for converting from size_t to int
breaks the Appveyor CI (after v9.2.0168)
Solution: Explicitly cast to int in convert_string() (ichizok).
closes: #19722
Signed-off-by: ichizok <gclient.gaap@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: invalid pointer casting in string_convert() arguments
Solution: Use a temporary local int variable (James McCoy)
string_convert()/string_convert_ext() accept an "int *lenp" parameter,
however a few call sites were taking the address of a possibly larger
type (long, size_t) and casting it as an int * when calling these
functions.
On big-endian platforms, this passes the (likely) zeroed high bytes of
the known length through to string_convert(). This indicates it received
an empty string and returns an allocated empty string rather than
converting the input. This is exhibited by test failures like
From test_blob.vim:
Found errors in Test_blob2str_multi_byte_encodings():
command line..script src/testdir/runtest.vim[636]..function RunTheTest[63]..Test_blob2str_multi_byte_encodings line 2: Expected ['Hello'] but got ['']
command line..script src/testdir/runtest.vim[636]..function RunTheTest[63]..Test_blob2str_multi_byte_encodings line 3: Expected ['Hello'] but got ['']
command line..script srctestdir/runtest.vim[636]..function RunTheTest[63]..Test_blob2str_multi_byte_encodings line 6: Expected ['Hello'] but got ['']
Instead, use a temporary local int variable as the in/out variable for
string_convert() and assign the result back to the larger typed length
variable post-conversion.
closes: #19672
Signed-off-by: James McCoy <jamessan@debian.org>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: blob_from_string() is slow for long strings
Solution: Use ga_grow() to allocate memory once, perform a bulk copy
with mch_memmove() then translate NL to NUL in-place
(Yasuhiro Matsumoto).
closes: #19665
Signed-off-by: Yasuhiro Matsumoto <mattn.jp@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: byteidx_common() and f_utf16idx() are calling ptr2len() twice
per iteration, instead of reusing the already computed clen.
Solution: Reuse clen for pointer advancement in both functions
(Yasuhiro Matsumoto).
closes: #19573
Signed-off-by: Yasuhiro Matsumoto <mattn.jp@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Inefficient use of list_append_string()
Solution: Pass string length to list_append_string() where it is known
(John Marriott).
closes: #19491
Signed-off-by: John Marriott <basilisk@internode.on.net>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Coverity reports unreachable code, CID: 1681310
Solution: Drop the ternary checking for non-NULL of from_encoding_raw.
closes: #19402
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: blob2str() does not handle UTF-16 encoding
(Hirohito Higashi)
Solution: Refactor the code and fix remaining issues, see below
(Yasuhiro Matsumoto).
blob2str() function did not properly handle UTF-16/UCS-2/UTF-32/UCS-4
encodings with endianness suffixes (e.g., utf-16le, utf-16be, ucs-2le).
The encoding name was canonicalized too aggressively, losing the
endianness information needed by iconv.
This change include few fixes:
- Preserve the raw encoding name with endianness suffix for iconv calls
- Normalize encoding names properly: "ucs2be" → "ucs-2be", "utf16le" →
"utf-16le"
- For multi-byte encodings (UTF-16/32, UCS-2/4), convert the entire blob
first, then split by newlines
convert_string() cannot handle UTF-16 because it uses string_convert()
which expects NUL-terminated strings. UTF-16 contains 0x00 bytes within
characters (e.g., "H" = 0x48 0x00), causing premature termination.
Therefore, for UTF-16/32 encodings, the fix uses string_convert_ext()
with an explicit input length to convert the entire blob at once.
The code appends two NUL bytes (ga_append(&blob_ga, NUL) twice) because
UTF-16 requires a 2-byte NUL terminator (0x00 0x00), not a single-byte
NUL.
- src/strings.c: Add from_encoding_raw to preserve endianness, special
handling for UTF-16/32 and UCS-2/4
- src/mbyte.c: Fix convert_setup_ext() to use == ENC_UNICODE instead of
& ENC_UNICODE. The bitwise AND was incorrectly treating UTF-16/UCS-2
(which have ENC_UNICODE + ENC_2BYTE etc.) as UTF-8, causing iconv
setup to be skipped.
fixes: #19198closes: #19246
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Yasuhiro Matsumoto <mattn.jp@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: blob2string() stopped after an empty line
Solution: Specifically check for empty content (Foxe Chen)
closes: #19001
Signed-off-by: Foxe Chen <chen.foxe@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: blob2str() may call strcpy with a NULL pointer
Solution: Check for NULL before calling STRNCPY()
closes: #18907
Signed-off-by: Foxe Chen <chen.foxe@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: string handling in strings.c can be improved
Solution: Refactor strings.c and remove calls to STRLEN()
(John Marriott)
This change does:
- In vim_strsave_shellescape() a small cosmetic change.
- In string_count() move the call to STRLEN() outside the while loop.
- In blob_from_string() refactor to remove call to STRLEN().
- In string_from_blob() call vim_strnsave() instead of vim_strsave().
- In vim_snprintf_safelen() call vim_vsnprintf_typval() directly instead
of vim_vsnprintf() which then calls vim_vsnprintf_typval().
- In copy_first_char_to_tv() change to return -1 on failure or the length
of resulting v_string. Change string_filter_map() and string_reduce() to
use the return value of copy_first_char_to_tv().
closes: #18617
Signed-off-by: John Marriott <basilisk@internode.on.net>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Generating prototype files does not work on all platforms
Solution: Rework prototypes generation using python instead of cproto,
enable it in CI to test it for each PR (Hirohito Higashi).
closes: #18045
Signed-off-by: Hirohito Higashi <h.east.727@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Vim script: no support for URI de-/encoding
(ubaldot)
Solution: Add the uri_encode() and uri_decode() functions
(Yegappan Lakshmanan)
fixes: #17861closes: #18034
Signed-off-by: Yegappan Lakshmanan <yegappan@yahoo.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: str2blob() treats NULL string and empty string differently
Solution: Treats a NULL string the same as an empty string
(zeertzjq).
closes: #17778
Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Yegappan Lakshmanan <yegappan@yahoo.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: style: more wrong indentation
Solution: reformat a few more places
(Yegappan Lakshmanan)
closes: #17309
Signed-off-by: Yegappan Lakshmanan <yegappan@yahoo.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: max allowed string width too small
Solution: increased MAX_ALLOWED_STRING_WIDTH from 6400 to 1MiB
(Hirohito Higashi)
closes: #17138
Co-authored-by: John Marriott <basilisk@internode.on.net>
Signed-off-by: Hirohito Higashi <h.east.727@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: too many strlen() calls in buffer.c
Solution: refactor buffer.c and remove strlen() calls
(John Marriott)
closes: #17063
Signed-off-by: John Marriott <basilisk@internode.on.net>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: wrong translation for encoding failures because of using
literal "from" and "to" in the resulting error message
(RestorerZ)
Solution: use separate error messages for errors "from" and "to"
encoding errors.
fixes: #16898closes: #16918
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Unnecessary use of vim_tolower() in vim_strnicmp_asc().
Solution: Use TOLOWER_ASC() instead (zeertzjq).
It was passing *s1 and *s2 to vim_tolower(). When char is signed, which
is the case on most platforms, c < 0x80 is always true, so it already
behaves the same as TOLOWER_ASC().
closes: #16826
Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: no way to create raw strings from a blob
Solution: support the "encoding": "none" option
to create raw strings (which may be invalid!)
(Bakudankun)
closes: #16666
Signed-off-by: Bakudankun <bakudankun@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: vim_strnchr() is strange and unnecessary (after v9.1.1009)
Solution: Remove vim_strnchr() and use memchr() instead. Also remove a
comment referencing an #if that is no longer present.
vim_strnchr() is strange in several ways:
- It's named like vim_strchr(), but unlike vim_strchr() it doesn't
support finding a multibyte char.
- Its logic is similar to vim_strbyte(), but unlike vim_strbyte() it
uses char instead of char_u.
- It takes a pointer as its size argument, which isn't convenient for
all its callers.
- It allows embedded NULs, unlike other "strn*" functions which stop
when encountering a NUL byte.
In comparison, memchr() also allows embedded NULs, and it converts bytes
in the string to (unsigned char).
closes: #16579
Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Coverity complains about insecure data handling
(v9.1.1024)
Solution: use int consistently to access the blob index
(Yegappan Lakshmanan)
related: #16468
Signed-off-by: Yegappan Lakshmanan <yegappan@yahoo.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: blob2str/str2blob() do not support list of strings
(after v9.1.1016)
Solution: Add support for using a list of strings (Yegappan Lakshmanan)
closes: #16459
Signed-off-by: Yegappan Lakshmanan <yegappan@yahoo.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Not possible to convert string2blob and blob2string
Solution: add support for the blob2str() and str2blob() functions
closes: #16373
Signed-off-by: Yegappan Lakshmanan <yegappan@yahoo.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: diff feature can be improved
Solution: include the linematch diff alignment algorithm
(Jonathon)
closes: #9661
Signed-off-by: Jonathon <jonathonwhite@protonmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: gcc warns about uninitialized variable in vim_strnicmp_asc()
Solution: initialize variable to 1
(John Marriott)
closes: #16108
Signed-off-by: John Marriott <basilisk@internode.on.net>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: GUIEnter not found in Turkish locale
(James McCoy, after v9.1.0256, the issue was there before,
but v9.1.0256 made it more apparent)
Solution: explicitly compare autocommand events by ASCII value and
ignoring locale, because according to the documentation,
events are case insensitive (:h autocommand-events)
fixes: #15574closes: #15603
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: E1510 may happen when formatting a message
(after 9.1.0181).
Solution: Only give E1510 when using typval. (zeertzjq)
closes: #15391
Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: No test for escaping '<' with shellescape()
Solution: Add a test. Use memcpy() in code to make it easier to
understand. Fix a typo (zeertzjq).
closes: #14876
Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: MS-Windows: Compiler warnings
Solution: Resolve size_t to int warnings
closes: #14874
A couple of warnings in ex_docmd.c have been resolved by modifying their
function argument types, followed by some changes in various function
call sites. This also allowed removal of some casts to cope with
size_t/int conversion.
Signed-off-by: Mike Williams <mrmrdubya@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: no overflow check for string formatting
Solution: Check message formatting function for overflow.
(Chris van Willegen)
closes: #13799
Signed-off-by: Christ van Willegen <cvwillegen@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: Vim is missing a foreach() func
Solution: Implement foreach({expr1}, {expr2}) function,
which applies {expr2} for each item in {expr1}
without changing it (Ernie Rael)
closes: #12166
Signed-off-by: Ernie Rael <errael@raelity.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
Problem: trim(): hard to use default mask (partly revert v9.0.2040)
Solution: use default mask when it is empty
The default 'mask' value is pretty complex, as it includes many
characters. Yet, if one needs to specify the trimming direction, the
third argument, 'trim()' currently requires the 'mask' value to be
provided explicitly.
Currently, an empty 'mask' will make 'trim()' call return 'text' value
that is passed in unmodified. It is unlikely that someone is using it,
so the chances of scripts being broken by this change are low.
Also, this reverts commit 9.0.2040 (which uses v:none for the default
and requires to use an empty string instead).
closes: #13358
Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: Illia Bobyr <illia.bobyr@gmail.com>
Problem: trim(): hard to use default mask
Solution: Use default 'mask' when it is v:none
The default 'mask' value is pretty complex, as it includes many
characters. Yet, if one needs to specify the trimming direction, the
third argument, 'trim()' currently requires the 'mask' value to be
provided explicitly.
'v:none' is already used to mean "use the default argument value" in
user defined functions. See |none-function_argument| in help.
closes: #13363
Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: Illia Bobyr <illia.bobyr@gmail.com>
Problem: Some unused code in move.c and string.c
Solution: Remove it
closes: #13288
Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: dundargoc <gocdundar@gmail.com>
Problem: Wrong order of arguments for error messages
Solution: Reverse order or arguments for e_aptypes_is_null_nr_str
closes: #13051
Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: Christ van Willegen <cvwillegen@gmail.com>
Problem: wrong format specifiers in e_aptypes_is_null_str_nr
Solution: Fix the wrong format specifier
closes: #13020
Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: zeertzjq <zeertzjq@outlook.com>
Problem: C4090 warnings in strings.c
Solution: Add type casts
closes: #12917
MSVC shows the following warnings:
```
strings.c(2436): warning C4090: 'function': different 'const' qualifiers
strings.c(2774): warning C4090: 'function': different 'const' qualifiers
strings.c(3865): warning C4090: 'function': different 'const' qualifiers
```
So add type casts to suppress them.
Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: Ken .Takata <kentkt@csc.jp>