Kovid Goyal
d36a64087e
Bump Go to 1.23
...
We need this because Go < 1.23 produces binaries that dont work on
modern OpenBSD because OpenBSD decided to remove syscall() from their
libc. Mad buggers, who removes functions from libc breaking all
binaries!!
Also increase minimum macOS version to 11.0 as Go 1.23 requires that
2024-08-24 08:06:02 +05:30
Kovid Goyal
eb07307370
Ignore pedantic warnings from simde headers
2024-04-30 09:54:14 +05:30
Kovid Goyal
393169f79d
Fix #7225
2024-03-14 20:55:05 +05:30
Kovid Goyal
daeaf65d7e
fix compiler warning
2024-02-25 11:17:26 +05:30
Kovid Goyal
f4f06222d4
...
2024-02-25 09:57:44 +05:30
Kovid Goyal
ad3ab877f8
Use a fast SIMD implementation to XOR data going into the disk cache
2024-02-25 09:57:43 +05:30
Kovid Goyal
1db7ac5f6b
Use our new shift by n functions to improve function to zero last N bytes
...
Benchmark neutral but cleaner code using one less vector register and equal
number of operations.
2024-02-25 09:57:43 +05:30
Kovid Goyal
e77a970ca1
Also implement arbitrary byte shift for 128 bit registers
2024-02-25 09:57:43 +05:30
Kovid Goyal
a7c06b38e6
We dont actually need vzeroupper at start of function
...
GCC emits vzeroupper automatically when compiling with native
optimizations but we still need it otherwise
2024-02-25 09:57:43 +05:30
Kovid Goyal
0a1eb038a5
Implement functions for arbitrary byte shifts in vector registers
2024-02-25 09:57:42 +05:30
Kovid Goyal
eb1e3b33b4
Fix test failure on some systems
...
Broken ass compilers strike again
2024-02-25 09:57:42 +05:30
Kovid Goyal
b021e9b648
Do the default func test last so we can see what the failure is on more explicitly
2024-02-25 09:57:42 +05:30
Kovid Goyal
1acd223f45
...
2024-02-25 09:57:42 +05:30
Kovid Goyal
f48e4ffd5e
Port aligned load based find algorithm to C
2024-02-25 09:57:42 +05:30
Kovid Goyal
36773c09d3
Functions to get bytes to first match ignoring leading bytes
2024-02-25 09:57:42 +05:30
Kovid Goyal
687340003d
...
2024-02-25 09:57:42 +05:30
Kovid Goyal
493fc900e9
Fix build on ARM
2024-02-25 09:57:41 +05:30
Kovid Goyal
f1fe0bf40a
Code to easily compare SIMD and scalar decode in a live instance
...
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal
d5f34c401d
Better vector registers to pre-calculate before the loop
2024-02-25 09:57:41 +05:30
Kovid Goyal
920b8a2496
Use VZEROUPPER in avx functions
...
See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf
2024-02-25 09:57:40 +05:30
Kovid Goyal
d4c4805f96
const away to glory
2024-02-25 09:57:40 +05:30
Kovid Goyal
6cdc7ac91d
A further 5% speedup for UTF-8 decoding
...
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal
0bccada9d1
No longer need to abort after dealing with trailing bytes
2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274
Allow unbounded output in UTF8Decoder
...
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
d987ffe49a
Use unaligned stores
...
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal
131716da00
Ignore another warning on some compiler versions in simde
2024-02-25 09:57:39 +05:30
Kovid Goyal
4d35fc2928
Use a custom movmask for ARM rather than the one from simde
...
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30
Kovid Goyal
9bca415af2
Use aligned loads when finding either of two bytes
...
No measurable performance improvement, but neater algorithm anyway.
2024-02-25 09:57:39 +05:30
Kovid Goyal
60bc8e6c25
...
2024-02-25 09:57:39 +05:30
Kovid Goyal
8aa1b112b8
Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit
2024-02-25 09:57:39 +05:30
Kovid Goyal
0bd47d8457
Cleanup KITTY_NO_SIMD compilation
2024-02-25 09:57:39 +05:30
Kovid Goyal
fcbda63023
Move finding byte code into separate functions
...
movemask() is inefficient on ARM64 this will allow us to use a dedicated
implementation for finding bytes on that platform
2024-02-25 09:57:38 +05:30
Kovid Goyal
73342411bc
Dont build any SIMD code when the target is neither ARM64 nor x86/amd64
2024-02-25 09:57:38 +05:30
Kovid Goyal
8dd6f9b07c
Get universal builds working again
...
Now we use lipo and build individually so we can pass the correct
compiler flags per arch
2024-02-25 09:57:38 +05:30
Kovid Goyal
7e77a196e6
Build only the SIMD code with SIMD compiler flags
2024-02-25 09:57:38 +05:30
Kovid Goyal
0e4c49a0d6
Fix building on macOS ARM
2024-02-25 09:57:35 +05:30
Kovid Goyal
e783eccc97
fix handling of bits from high byte of 4 byte sequences
2024-02-25 09:57:35 +05:30
Kovid Goyal
7e6459a5e4
DRYer
2024-02-25 09:57:35 +05:30
Kovid Goyal
67d22b0ec6
Avoid multiple branches for checking for trailing sequence
2024-02-25 09:57:34 +05:30
Kovid Goyal
79f99bb3ad
Make print_register useable without full debug
2024-02-25 09:57:34 +05:30
Kovid Goyal
fa3579656b
More invalid utf-8 tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
8a10fcaf5a
More tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
4c8b8caead
Handle trailing incomplete sequences
2024-02-25 09:57:34 +05:30
Kovid Goyal
4238fedee7
More tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
b0dcdf74bd
More tests and micro-optimize switch to ASCII fast path
2024-02-25 09:57:34 +05:30
Kovid Goyal
a63d62fb4e
...
2024-02-25 09:57:34 +05:30
Kovid Goyal
8dbb0cff6f
Dont call __builtin_ctz with zero
2024-02-25 09:57:34 +05:30
Kovid Goyal
07bba337f5
fix various bugs in AVX2 utility functions
2024-02-25 09:57:34 +05:30
Kovid Goyal
b28fbf6817
fix zero-ing of last n bytes
2024-02-25 09:57:34 +05:30
Kovid Goyal
daa169b8ed
More work on utf8 SIMD decode
2024-02-25 09:57:34 +05:30