32 Commits

Author SHA1 Message Date
Kovid Goyal 32f0da2e77 Ensure no frame is created for assembly functions 2024-03-15 07:58:09 +05:30
Kovid Goyal 65923b1aba Add some benchamrking 2024-03-07 11:09:24 +05:30
Kovid Goyal 47fea26b62 Add an IndexByte implementation useful for benchmarking against stdlib SIMD implementation 2024-03-07 09:36:40 +05:30
Kovid Goyal a7c06b38e6 We dont actually need vzeroupper at start of function
GCC emits vzeroupper automatically when compiling with native
optimizations but we still need it otherwise
2024-02-25 09:57:43 +05:30
Kovid Goyal 720618bc37 Use go 1.22 for building
It supports PCALIGN on non ARM arches as well
2024-02-25 09:57:43 +05:30
Kovid Goyal ede4d7fbca ... 2024-02-25 09:57:42 +05:30
Kovid Goyal c01b959723 Fix Go unaligned index implementation 2024-02-25 09:57:42 +05:30
Kovid Goyal 7467307200 Add some alignment tests 2024-02-25 09:57:42 +05:30
Kovid Goyal bbdb0b15f3 DRYer 2024-02-25 09:57:42 +05:30
Kovid Goyal b5edd9ad57 Dont precalculate mask in loop body
No need since we dont shift. Avoids the extra mask instructions for the
not found case.
2024-02-25 09:57:42 +05:30
Kovid Goyal f9fd6ffd46 Use only aligned loads for index funcs
Also obviates the necessity for safe slice wrappers
2024-02-25 09:57:41 +05:30
Kovid Goyal 31a5fcf297 DRYer 2024-02-25 09:57:41 +05:30
Kovid Goyal 561712090d Fix cmplt implementation 2024-02-25 09:57:41 +05:30
Kovid Goyal d9190ea675 DRYer 2024-02-25 09:57:41 +05:30
Kovid Goyal 57f4ea4d4a Add some tests for broadcast from constant intrinsic 2024-02-25 09:57:41 +05:30
Kovid Goyal 9b0ae8d403 Dont use VEX encoded instructions for 128 bit ISA 2024-02-25 09:57:41 +05:30
Kovid Goyal aed0611fb8 Avoid double trailing RET 2024-02-25 09:57:40 +05:30
Kovid Goyal 5a5e31c38b Also zero upper at start of function 2024-02-25 09:57:40 +05:30
Kovid Goyal db2e0e816d Fix mixing of register types in the same function 2024-02-25 09:57:40 +05:30
Kovid Goyal a298781b85 DRYer 2024-02-25 09:57:40 +05:30
Kovid Goyal d5cd9ef2ca ... 2024-02-25 09:57:40 +05:30
Kovid Goyal da31db3212 ... 2024-02-25 09:57:40 +05:30
Kovid Goyal 601c4ad4df Fix some typos 2024-02-25 09:57:40 +05:30
Kovid Goyal 68d800d4fa make clean should clean generated asm as well 2024-02-25 09:57:40 +05:30
Kovid Goyal 9fc3db1dd1 Work on C0 index func 2024-02-25 09:57:40 +05:30
Kovid Goyal 161eae78b6 Make generated asm_* files world readable 2024-02-25 09:57:40 +05:30
Kovid Goyal 77cfd44f24 More efficient clearing of register to all zeros or all ones 2024-02-25 09:57:39 +05:30
Kovid Goyal 59be7213cf Make set1_epi8 more general 2024-02-25 09:57:39 +05:30
Kovid Goyal d60dacbd09 Implement > and < intrinsics for vector registers 2024-02-25 09:57:39 +05:30
Kovid Goyal 82b7b4fcce Make a re-useable template for generating ASM index functions with different tests 2024-02-25 09:57:39 +05:30
Kovid Goyal 4e6138d785 Generate SIMD code during build 2024-02-25 09:57:39 +05:30
Kovid Goyal de8c1e0206 Work on porting SIMD vt arser to Go for the kittens 2024-02-25 09:57:39 +05:30