50 Commits

Author SHA1 Message Date
Kovid Goyal 7e12cc57c6 Fix #7245 2024-03-21 20:50:05 +05:30
Kovid Goyal 3bb9e36fc8 ... 2024-03-14 21:00:57 +05:30
Kovid Goyal 76ae5f5b9b DRYer: Use the SIMD detection in setup.py to avoid calling __builtin_cpu_supports 2024-03-14 20:57:09 +05:30
Kovid Goyal a839af04dc Fix #7219 2024-03-14 11:13:54 +05:30
Kovid Goyal 1a9a7a59ac Make XOR64 test also test alignment issues 2024-02-25 09:57:44 +05:30
Kovid Goyal ad3ab877f8 Use a fast SIMD implementation to XOR data going into the disk cache 2024-02-25 09:57:43 +05:30
Kovid Goyal b021e9b648 Do the default func test last so we can see what the failure is on more explicitly 2024-02-25 09:57:42 +05:30
Kovid Goyal d0797a025b Add dedicated tests for find_either_of_two 2024-02-25 09:57:42 +05:30
Kovid Goyal f1fe0bf40a Code to easily compare SIMD and scalar decode in a live instance
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal 6cdc7ac91d A further 5% speedup for UTF-8 decoding
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal 9cb9373274 Allow unbounded output in UTF8Decoder
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal 66341aa28e Make the env var controlling which SIMD level to use more capable 2024-02-25 09:57:38 +05:30
Kovid Goyal 7e77a196e6 Build only the SIMD code with SIMD compiler flags 2024-02-25 09:57:38 +05:30
Kovid Goyal 4b846e0106 Turns out that using 256 bit code on ARM is slightly faster even though it is emulated with 128 bit registers 2024-02-25 09:57:38 +05:30
Kovid Goyal 76c6630084 Dont use 256 bit code paths on ARM
ARM only has 128 bit registers. simde simulates 256 bit operations using
them, which is fairly pointless for us.
2024-02-25 09:57:38 +05:30
Kovid Goyal 23a4012aeb Add an env var to turn off use of SIMD instructions 2024-02-25 09:57:38 +05:30
Kovid Goyal eee14ae148 Workaround for machines on GitHub Actions that incorrectly report CPU vector instruction availability 2024-02-25 09:57:37 +05:30
Kovid Goyal bbaccfdaae DRYer 2024-02-25 09:57:37 +05:30
Kovid Goyal 43f64f71e4 DRYer 2024-02-25 09:57:36 +05:30
Kovid Goyal 0e4c49a0d6 Fix building on macOS ARM 2024-02-25 09:57:35 +05:30
Kovid Goyal b3ca5d51fb Use the new SIMD utf-8 decoder 2024-02-25 09:57:35 +05:30
Kovid Goyal 7e6459a5e4 DRYer 2024-02-25 09:57:35 +05:30
Kovid Goyal 4c8b8caead Handle trailing incomplete sequences 2024-02-25 09:57:34 +05:30
Kovid Goyal 99e67f0859 ... 2024-02-25 09:57:33 +05:30
Kovid Goyal 2cb87861c0 Ensure cpu is inited before calling cpu_supports() 2024-02-25 09:57:33 +05:30
Kovid Goyal 74391d7c50 More work on SIMD utf-8 decode 2024-02-25 09:57:31 +05:30
Kovid Goyal 8975d1a9f4 no need to parametrize sentinel 2024-02-25 09:57:31 +05:30
Kovid Goyal 0ed1c6f840 Simplify utf8 parser func
Also show a replacement char for incomplete utf-8 sequences interrupted by an esc char
2024-02-25 09:57:31 +05:30
Kovid Goyal 95eac2e510 ... 2024-02-25 09:57:31 +05:30
Kovid Goyal bc499000a5 Infrastructure for developing and testing UTF-8 SIMD decode 2024-02-25 09:57:31 +05:30
Kovid Goyal e2be8c2d37 Use unaligned loads for SIMD
makes no difference to the benchmarks and simplifies the code
2024-02-25 09:57:31 +05:30
Kovid Goyal fd4c8e1e2d Get rid of ByteLoader
Doesnt move the benchmarks
2024-02-25 09:57:31 +05:30
Kovid Goyal ba18c5a669 Move ByteLoader back to simd-string.c in preparation for getting rid of it 2024-02-25 09:57:31 +05:30
Kovid Goyal c79baa56e4 Remove unused SIMD code 2024-02-25 09:57:30 +05:30
Kovid Goyal 8742fb8cce Detect availability of intrinsics on intel macs just in case 2024-02-25 09:57:30 +05:30
Kovid Goyal 718f4b328f Go back to a single code path for drawing text
Slightly reduces pure ASCII performance and improves Unicode
performance. We should be able to get pure ASCII performance back
via SIMD eventually.
2024-02-25 09:57:30 +05:30
Kovid Goyal 794bd85371 Ignore warning from simde on clang 2024-02-25 09:57:29 +05:30
Kovid Goyal 49a54b086f Use simde so SIMD speedups work on ARM as well 2024-02-25 09:57:28 +05:30
Kovid Goyal fe2cd543ba Switch to same algorithm for 128bit SIMD as used for 256 bit SIMD
Avoids needing to write to the haystack and also less chance of a bug in
the never tested simd since all CPUs I have access to have AVX2
2024-02-25 09:57:28 +05:30
Kovid Goyal 1925d5ea65 Prepare for plain sse4 fallback 2024-02-25 09:57:27 +05:30
Kovid Goyal aacdffd539 DRYer 2024-02-25 09:57:27 +05:30
Kovid Goyal a0e1eb4985 AVX2 implementation for find either of two 2024-02-25 09:57:27 +05:30
Kovid Goyal e4c48a5f17 Add AVX2 implementation of find byte not in range
Also fix alignment bug and ensure the simd finders dont return a pointer
beyond the end
2024-02-25 09:57:27 +05:30
Kovid Goyal b032313c45 Only use SIMD if CPU supports it at runtime 2024-02-25 09:57:27 +05:30
Kovid Goyal 19a41b4d9a Use sse4.2 instruction for normal mode printable ascii detection 2024-02-25 09:57:27 +05:30
Kovid Goyal 25e7a2882d Work on using SIMD for normal mode dispatch 2024-02-25 09:57:27 +05:30
Kovid Goyal e3d6aa2c60 Use simd in a few loops 2024-02-25 09:57:27 +05:30
Kovid Goyal 89d416806b ... 2024-02-25 09:57:26 +05:30
Kovid Goyal 200e5bf6e3 Examine 8 bytes at once for terminator char 2024-02-25 09:57:26 +05:30
Kovid Goyal f4819175b0 Start work on vectorizing searches 2024-02-25 09:57:26 +05:30