Kovid Goyal
|
7e12cc57c6
|
Fix #7245
|
2024-03-21 20:50:05 +05:30 |
|
Kovid Goyal
|
3bb9e36fc8
|
...
|
2024-03-14 21:00:57 +05:30 |
|
Kovid Goyal
|
76ae5f5b9b
|
DRYer: Use the SIMD detection in setup.py to avoid calling __builtin_cpu_supports
|
2024-03-14 20:57:09 +05:30 |
|
Kovid Goyal
|
a839af04dc
|
Fix #7219
|
2024-03-14 11:13:54 +05:30 |
|
Kovid Goyal
|
1a9a7a59ac
|
Make XOR64 test also test alignment issues
|
2024-02-25 09:57:44 +05:30 |
|
Kovid Goyal
|
ad3ab877f8
|
Use a fast SIMD implementation to XOR data going into the disk cache
|
2024-02-25 09:57:43 +05:30 |
|
Kovid Goyal
|
b021e9b648
|
Do the default func test last so we can see what the failure is on more explicitly
|
2024-02-25 09:57:42 +05:30 |
|
Kovid Goyal
|
d0797a025b
|
Add dedicated tests for find_either_of_two
|
2024-02-25 09:57:42 +05:30 |
|
Kovid Goyal
|
f1fe0bf40a
|
Code to easily compare SIMD and scalar decode in a live instance
Also remove -mtune=intel as it fails with clang
|
2024-02-25 09:57:41 +05:30 |
|
Kovid Goyal
|
6cdc7ac91d
|
A further 5% speedup for UTF-8 decoding
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
|
2024-02-25 09:57:40 +05:30 |
|
Kovid Goyal
|
9cb9373274
|
Allow unbounded output in UTF8Decoder
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
|
2024-02-25 09:57:39 +05:30 |
|
Kovid Goyal
|
66341aa28e
|
Make the env var controlling which SIMD level to use more capable
|
2024-02-25 09:57:38 +05:30 |
|
Kovid Goyal
|
7e77a196e6
|
Build only the SIMD code with SIMD compiler flags
|
2024-02-25 09:57:38 +05:30 |
|
Kovid Goyal
|
4b846e0106
|
Turns out that using 256 bit code on ARM is slightly faster even though it is emulated with 128 bit registers
|
2024-02-25 09:57:38 +05:30 |
|
Kovid Goyal
|
76c6630084
|
Dont use 256 bit code paths on ARM
ARM only has 128 bit registers. simde simulates 256 bit operations using
them, which is fairly pointless for us.
|
2024-02-25 09:57:38 +05:30 |
|
Kovid Goyal
|
23a4012aeb
|
Add an env var to turn off use of SIMD instructions
|
2024-02-25 09:57:38 +05:30 |
|
Kovid Goyal
|
eee14ae148
|
Workaround for machines on GitHub Actions that incorrectly report CPU vector instruction availability
|
2024-02-25 09:57:37 +05:30 |
|
Kovid Goyal
|
bbaccfdaae
|
DRYer
|
2024-02-25 09:57:37 +05:30 |
|
Kovid Goyal
|
43f64f71e4
|
DRYer
|
2024-02-25 09:57:36 +05:30 |
|
Kovid Goyal
|
0e4c49a0d6
|
Fix building on macOS ARM
|
2024-02-25 09:57:35 +05:30 |
|
Kovid Goyal
|
b3ca5d51fb
|
Use the new SIMD utf-8 decoder
|
2024-02-25 09:57:35 +05:30 |
|
Kovid Goyal
|
7e6459a5e4
|
DRYer
|
2024-02-25 09:57:35 +05:30 |
|
Kovid Goyal
|
4c8b8caead
|
Handle trailing incomplete sequences
|
2024-02-25 09:57:34 +05:30 |
|
Kovid Goyal
|
99e67f0859
|
...
|
2024-02-25 09:57:33 +05:30 |
|
Kovid Goyal
|
2cb87861c0
|
Ensure cpu is inited before calling cpu_supports()
|
2024-02-25 09:57:33 +05:30 |
|
Kovid Goyal
|
74391d7c50
|
More work on SIMD utf-8 decode
|
2024-02-25 09:57:31 +05:30 |
|
Kovid Goyal
|
8975d1a9f4
|
no need to parametrize sentinel
|
2024-02-25 09:57:31 +05:30 |
|
Kovid Goyal
|
0ed1c6f840
|
Simplify utf8 parser func
Also show a replacement char for incomplete utf-8 sequences interrupted by an esc char
|
2024-02-25 09:57:31 +05:30 |
|
Kovid Goyal
|
95eac2e510
|
...
|
2024-02-25 09:57:31 +05:30 |
|
Kovid Goyal
|
bc499000a5
|
Infrastructure for developing and testing UTF-8 SIMD decode
|
2024-02-25 09:57:31 +05:30 |
|
Kovid Goyal
|
e2be8c2d37
|
Use unaligned loads for SIMD
makes no difference to the benchmarks and simplifies the code
|
2024-02-25 09:57:31 +05:30 |
|
Kovid Goyal
|
fd4c8e1e2d
|
Get rid of ByteLoader
Doesnt move the benchmarks
|
2024-02-25 09:57:31 +05:30 |
|
Kovid Goyal
|
ba18c5a669
|
Move ByteLoader back to simd-string.c in preparation for getting rid of it
|
2024-02-25 09:57:31 +05:30 |
|
Kovid Goyal
|
c79baa56e4
|
Remove unused SIMD code
|
2024-02-25 09:57:30 +05:30 |
|
Kovid Goyal
|
8742fb8cce
|
Detect availability of intrinsics on intel macs just in case
|
2024-02-25 09:57:30 +05:30 |
|
Kovid Goyal
|
718f4b328f
|
Go back to a single code path for drawing text
Slightly reduces pure ASCII performance and improves Unicode
performance. We should be able to get pure ASCII performance back
via SIMD eventually.
|
2024-02-25 09:57:30 +05:30 |
|
Kovid Goyal
|
794bd85371
|
Ignore warning from simde on clang
|
2024-02-25 09:57:29 +05:30 |
|
Kovid Goyal
|
49a54b086f
|
Use simde so SIMD speedups work on ARM as well
|
2024-02-25 09:57:28 +05:30 |
|
Kovid Goyal
|
fe2cd543ba
|
Switch to same algorithm for 128bit SIMD as used for 256 bit SIMD
Avoids needing to write to the haystack and also less chance of a bug in
the never tested simd since all CPUs I have access to have AVX2
|
2024-02-25 09:57:28 +05:30 |
|
Kovid Goyal
|
1925d5ea65
|
Prepare for plain sse4 fallback
|
2024-02-25 09:57:27 +05:30 |
|
Kovid Goyal
|
aacdffd539
|
DRYer
|
2024-02-25 09:57:27 +05:30 |
|
Kovid Goyal
|
a0e1eb4985
|
AVX2 implementation for find either of two
|
2024-02-25 09:57:27 +05:30 |
|
Kovid Goyal
|
e4c48a5f17
|
Add AVX2 implementation of find byte not in range
Also fix alignment bug and ensure the simd finders dont return a pointer
beyond the end
|
2024-02-25 09:57:27 +05:30 |
|
Kovid Goyal
|
b032313c45
|
Only use SIMD if CPU supports it at runtime
|
2024-02-25 09:57:27 +05:30 |
|
Kovid Goyal
|
19a41b4d9a
|
Use sse4.2 instruction for normal mode printable ascii detection
|
2024-02-25 09:57:27 +05:30 |
|
Kovid Goyal
|
25e7a2882d
|
Work on using SIMD for normal mode dispatch
|
2024-02-25 09:57:27 +05:30 |
|
Kovid Goyal
|
e3d6aa2c60
|
Use simd in a few loops
|
2024-02-25 09:57:27 +05:30 |
|
Kovid Goyal
|
89d416806b
|
...
|
2024-02-25 09:57:26 +05:30 |
|
Kovid Goyal
|
200e5bf6e3
|
Examine 8 bytes at once for terminator char
|
2024-02-25 09:57:26 +05:30 |
|
Kovid Goyal
|
f4819175b0
|
Start work on vectorizing searches
|
2024-02-25 09:57:26 +05:30 |
|