68 Commits

Author SHA1 Message Date
Kovid Goyal d36a64087e Bump Go to 1.23
We need this because Go < 1.23 produces binaries that dont work on
modern OpenBSD because OpenBSD decided to remove syscall() from their
libc. Mad buggers, who removes functions from libc breaking all
binaries!!

Also increase minimum macOS version to 11.0 as Go 1.23 requires that
2024-08-24 08:06:02 +05:30
Kovid Goyal eb07307370 Ignore pedantic warnings from simde headers 2024-04-30 09:54:14 +05:30
Kovid Goyal 393169f79d Fix #7225 2024-03-14 20:55:05 +05:30
Kovid Goyal daeaf65d7e fix compiler warning 2024-02-25 11:17:26 +05:30
Kovid Goyal f4f06222d4 ... 2024-02-25 09:57:44 +05:30
Kovid Goyal ad3ab877f8 Use a fast SIMD implementation to XOR data going into the disk cache 2024-02-25 09:57:43 +05:30
Kovid Goyal 1db7ac5f6b Use our new shift by n functions to improve function to zero last N bytes
Benchmark neutral but cleaner code using one less vector register and equal
number of operations.
2024-02-25 09:57:43 +05:30
Kovid Goyal e77a970ca1 Also implement arbitrary byte shift for 128 bit registers 2024-02-25 09:57:43 +05:30
Kovid Goyal a7c06b38e6 We dont actually need vzeroupper at start of function
GCC emits vzeroupper automatically when compiling with native
optimizations but we still need it otherwise
2024-02-25 09:57:43 +05:30
Kovid Goyal 0a1eb038a5 Implement functions for arbitrary byte shifts in vector registers 2024-02-25 09:57:42 +05:30
Kovid Goyal eb1e3b33b4 Fix test failure on some systems
Broken ass compilers strike again
2024-02-25 09:57:42 +05:30
Kovid Goyal b021e9b648 Do the default func test last so we can see what the failure is on more explicitly 2024-02-25 09:57:42 +05:30
Kovid Goyal 1acd223f45 ... 2024-02-25 09:57:42 +05:30
Kovid Goyal f48e4ffd5e Port aligned load based find algorithm to C 2024-02-25 09:57:42 +05:30
Kovid Goyal 36773c09d3 Functions to get bytes to first match ignoring leading bytes 2024-02-25 09:57:42 +05:30
Kovid Goyal 687340003d ... 2024-02-25 09:57:42 +05:30
Kovid Goyal 493fc900e9 Fix build on ARM 2024-02-25 09:57:41 +05:30
Kovid Goyal f1fe0bf40a Code to easily compare SIMD and scalar decode in a live instance
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal d5f34c401d Better vector registers to pre-calculate before the loop 2024-02-25 09:57:41 +05:30
Kovid Goyal 920b8a2496 Use VZEROUPPER in avx functions
See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf
2024-02-25 09:57:40 +05:30
Kovid Goyal d4c4805f96 const away to glory 2024-02-25 09:57:40 +05:30
Kovid Goyal 6cdc7ac91d A further 5% speedup for UTF-8 decoding
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal 0bccada9d1 No longer need to abort after dealing with trailing bytes 2024-02-25 09:57:40 +05:30
Kovid Goyal 9cb9373274 Allow unbounded output in UTF8Decoder
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal d987ffe49a Use unaligned stores
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal 131716da00 Ignore another warning on some compiler versions in simde 2024-02-25 09:57:39 +05:30
Kovid Goyal 4d35fc2928 Use a custom movmask for ARM rather than the one from simde
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30
Kovid Goyal 9bca415af2 Use aligned loads when finding either of two bytes
No measurable performance improvement, but neater algorithm anyway.
2024-02-25 09:57:39 +05:30
Kovid Goyal 60bc8e6c25 ... 2024-02-25 09:57:39 +05:30
Kovid Goyal 8aa1b112b8 Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit 2024-02-25 09:57:39 +05:30
Kovid Goyal 0bd47d8457 Cleanup KITTY_NO_SIMD compilation 2024-02-25 09:57:39 +05:30
Kovid Goyal fcbda63023 Move finding byte code into separate functions
movemask() is inefficient on ARM64 this will allow us to use a dedicated
implementation for finding bytes on that platform
2024-02-25 09:57:38 +05:30
Kovid Goyal 73342411bc Dont build any SIMD code when the target is neither ARM64 nor x86/amd64 2024-02-25 09:57:38 +05:30
Kovid Goyal 8dd6f9b07c Get universal builds working again
Now we use lipo and build individually so we can pass the correct
compiler flags per arch
2024-02-25 09:57:38 +05:30
Kovid Goyal 7e77a196e6 Build only the SIMD code with SIMD compiler flags 2024-02-25 09:57:38 +05:30
Kovid Goyal 0e4c49a0d6 Fix building on macOS ARM 2024-02-25 09:57:35 +05:30
Kovid Goyal e783eccc97 fix handling of bits from high byte of 4 byte sequences 2024-02-25 09:57:35 +05:30
Kovid Goyal 7e6459a5e4 DRYer 2024-02-25 09:57:35 +05:30
Kovid Goyal 67d22b0ec6 Avoid multiple branches for checking for trailing sequence 2024-02-25 09:57:34 +05:30
Kovid Goyal 79f99bb3ad Make print_register useable without full debug 2024-02-25 09:57:34 +05:30
Kovid Goyal fa3579656b More invalid utf-8 tests 2024-02-25 09:57:34 +05:30
Kovid Goyal 8a10fcaf5a More tests 2024-02-25 09:57:34 +05:30
Kovid Goyal 4c8b8caead Handle trailing incomplete sequences 2024-02-25 09:57:34 +05:30
Kovid Goyal 4238fedee7 More tests 2024-02-25 09:57:34 +05:30
Kovid Goyal b0dcdf74bd More tests and micro-optimize switch to ASCII fast path 2024-02-25 09:57:34 +05:30
Kovid Goyal a63d62fb4e ... 2024-02-25 09:57:34 +05:30
Kovid Goyal 8dbb0cff6f Dont call __builtin_ctz with zero 2024-02-25 09:57:34 +05:30
Kovid Goyal 07bba337f5 fix various bugs in AVX2 utility functions 2024-02-25 09:57:34 +05:30
Kovid Goyal b28fbf6817 fix zero-ing of last n bytes 2024-02-25 09:57:34 +05:30
Kovid Goyal daa169b8ed More work on utf8 SIMD decode 2024-02-25 09:57:34 +05:30