Skip to content

Add SVE2 optimization for Fill32f#476

Merged
ermig1979 merged 1 commit into
devfrom
cursor/sve2-fill32f-3226
Jun 18, 2026
Merged

Add SVE2 optimization for Fill32f#476
ermig1979 merged 1 commit into
devfrom
cursor/sve2-fill32f-3226

Conversation

@ermig1979

Copy link
Copy Markdown
Owner

Summary

  • Add Simd::Sve2::Fill32f using SVE2/SVE predicated stores for aligned body and tail handling.
  • Route SimdFill32f through SVE2 before NEON when available.
  • Extend Fill32fAutoTest for SVE2 and register the new source in VS2022 SVE2 project files.
  • Document the SVE2 Fill32f optimization in the 7.2.163 release notes.

Testing

  • cmake ./prj/cmake -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc -DSIMD_TOOLCHAIN="g++" -DSIMD_TARGET="" -DSIMD_AVX512VNNI=ON -DSIMD_AMXBF16=ON -DSIMD_TEST_FLAGS="-march=native" -DSIMD_SHARED=ON
  • cmake --build build --parallel$(nproc)
  • export LD_LIBRARY_PATH="/workspace/build:$LD_LIBRARY_PATH" && ./Test "-r=.." -fi=Fill32f -tt=1 -ts=1
  • aarch64-linux-gnu-g++ -march=armv9-a+sve2 -msve-vector-bits=scalable -DSIMD_STATIC -I./src -c src/Simd/SimdSve2Fill.cpp -o /tmp/SimdSve2Fill.o
  • aarch64-linux-gnu-g++ -march=armv9-a+sve2 -msve-vector-bits=scalable -DSIMD_STATIC -I./src -c src/Simd/SimdLib.cpp -o /tmp/SimdLib.o
  • aarch64-linux-gnu-g++ -march=armv9-a+sve2 -msve-vector-bits=scalable -I./src -D_GLIBCXX_USE_NANOSLEEP -c src/Test/TestFill.cpp -o /tmp/TestFill.o
Open in Web Open in Cursor 

Co-authored-by: Ihar Yermalayeu <ermig1979@gmail.com>
@ermig1979 ermig1979 marked this pull request as ready for review June 18, 2026 12:30
@ermig1979 ermig1979 merged commit 878a54e into dev Jun 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants