50 Commits

Author SHA1 Message Date
Kovid Goyal
7e12cc57c6 Fix #7245 2024-03-21 20:50:05 +05:30
Kovid Goyal
3bb9e36fc8 ... 2024-03-14 21:00:57 +05:30
Kovid Goyal
76ae5f5b9b DRYer: Use the SIMD detection in setup.py to avoid calling __builtin_cpu_supports 2024-03-14 20:57:09 +05:30
Kovid Goyal
a839af04dc Fix #7219 2024-03-14 11:13:54 +05:30
Kovid Goyal
1a9a7a59ac Make XOR64 test also test alignment issues 2024-02-25 09:57:44 +05:30
Kovid Goyal
ad3ab877f8 Use a fast SIMD implementation to XOR data going into the disk cache 2024-02-25 09:57:43 +05:30
Kovid Goyal
b021e9b648 Do the default func test last so we can see what the failure is on more explicitly 2024-02-25 09:57:42 +05:30
Kovid Goyal
d0797a025b Add dedicated tests for find_either_of_two 2024-02-25 09:57:42 +05:30
Kovid Goyal
f1fe0bf40a Code to easily compare SIMD and scalar decode in a live instance
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal
6cdc7ac91d A further 5% speedup for UTF-8 decoding
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274 Allow unbounded output in UTF8Decoder
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
66341aa28e Make the env var controlling which SIMD level to use more capable 2024-02-25 09:57:38 +05:30
Kovid Goyal
7e77a196e6 Build only the SIMD code with SIMD compiler flags 2024-02-25 09:57:38 +05:30
Kovid Goyal
4b846e0106 Turns out that using 256 bit code on ARM is slightly faster even though it is emulated with 128 bit registers 2024-02-25 09:57:38 +05:30
Kovid Goyal
76c6630084 Dont use 256 bit code paths on ARM
ARM only has 128 bit registers. simde simulates 256 bit operations using
them, which is fairly pointless for us.
2024-02-25 09:57:38 +05:30
Kovid Goyal
23a4012aeb Add an env var to turn off use of SIMD instructions 2024-02-25 09:57:38 +05:30
Kovid Goyal
eee14ae148 Workaround for machines on GitHub Actions that incorrectly report CPU vector instruction availability 2024-02-25 09:57:37 +05:30
Kovid Goyal
bbaccfdaae DRYer 2024-02-25 09:57:37 +05:30
Kovid Goyal
43f64f71e4 DRYer 2024-02-25 09:57:36 +05:30
Kovid Goyal
0e4c49a0d6 Fix building on macOS ARM 2024-02-25 09:57:35 +05:30
Kovid Goyal
b3ca5d51fb Use the new SIMD utf-8 decoder 2024-02-25 09:57:35 +05:30
Kovid Goyal
7e6459a5e4 DRYer 2024-02-25 09:57:35 +05:30
Kovid Goyal
4c8b8caead Handle trailing incomplete sequences 2024-02-25 09:57:34 +05:30
Kovid Goyal
99e67f0859 ... 2024-02-25 09:57:33 +05:30
Kovid Goyal
2cb87861c0 Ensure cpu is inited before calling cpu_supports() 2024-02-25 09:57:33 +05:30
Kovid Goyal
74391d7c50 More work on SIMD utf-8 decode 2024-02-25 09:57:31 +05:30
Kovid Goyal
8975d1a9f4 no need to parametrize sentinel 2024-02-25 09:57:31 +05:30
Kovid Goyal
0ed1c6f840 Simplify utf8 parser func
Also show a replacement char for incomplete utf-8 sequences interrupted by an esc char
2024-02-25 09:57:31 +05:30
Kovid Goyal
95eac2e510 ... 2024-02-25 09:57:31 +05:30
Kovid Goyal
bc499000a5 Infrastructure for developing and testing UTF-8 SIMD decode 2024-02-25 09:57:31 +05:30
Kovid Goyal
e2be8c2d37 Use unaligned loads for SIMD
makes no difference to the benchmarks and simplifies the code
2024-02-25 09:57:31 +05:30
Kovid Goyal
fd4c8e1e2d Get rid of ByteLoader
Doesnt move the benchmarks
2024-02-25 09:57:31 +05:30
Kovid Goyal
ba18c5a669 Move ByteLoader back to simd-string.c in preparation for getting rid of it 2024-02-25 09:57:31 +05:30
Kovid Goyal
c79baa56e4 Remove unused SIMD code 2024-02-25 09:57:30 +05:30
Kovid Goyal
8742fb8cce Detect availability of intrinsics on intel macs just in case 2024-02-25 09:57:30 +05:30
Kovid Goyal
718f4b328f Go back to a single code path for drawing text
Slightly reduces pure ASCII performance and improves Unicode
performance. We should be able to get pure ASCII performance back
via SIMD eventually.
2024-02-25 09:57:30 +05:30
Kovid Goyal
794bd85371 Ignore warning from simde on clang 2024-02-25 09:57:29 +05:30
Kovid Goyal
49a54b086f Use simde so SIMD speedups work on ARM as well 2024-02-25 09:57:28 +05:30
Kovid Goyal
fe2cd543ba Switch to same algorithm for 128bit SIMD as used for 256 bit SIMD
Avoids needing to write to the haystack and also less chance of a bug in
the never tested simd since all CPUs I have access to have AVX2
2024-02-25 09:57:28 +05:30
Kovid Goyal
1925d5ea65 Prepare for plain sse4 fallback 2024-02-25 09:57:27 +05:30
Kovid Goyal
aacdffd539 DRYer 2024-02-25 09:57:27 +05:30
Kovid Goyal
a0e1eb4985 AVX2 implementation for find either of two 2024-02-25 09:57:27 +05:30
Kovid Goyal
e4c48a5f17 Add AVX2 implementation of find byte not in range
Also fix alignment bug and ensure the simd finders dont return a pointer
beyond the end
2024-02-25 09:57:27 +05:30
Kovid Goyal
b032313c45 Only use SIMD if CPU supports it at runtime 2024-02-25 09:57:27 +05:30
Kovid Goyal
19a41b4d9a Use sse4.2 instruction for normal mode printable ascii detection 2024-02-25 09:57:27 +05:30
Kovid Goyal
25e7a2882d Work on using SIMD for normal mode dispatch 2024-02-25 09:57:27 +05:30
Kovid Goyal
e3d6aa2c60 Use simd in a few loops 2024-02-25 09:57:27 +05:30
Kovid Goyal
89d416806b ... 2024-02-25 09:57:26 +05:30
Kovid Goyal
200e5bf6e3 Examine 8 bytes at once for terminator char 2024-02-25 09:57:26 +05:30
Kovid Goyal
f4819175b0 Start work on vectorizing searches 2024-02-25 09:57:26 +05:30