perf(avx512vl): native EVEX int64 shift, signed rotr, compress/expand by DiamonDinoia · Pull Request #1358 · xtensor-stack/xsimd

DiamonDinoia · 2026-06-02T18:21:19Z

avx512vl_{128,256} inherit AVX2/SSE codegen for un-overridden kernels. Adds three pure-VL overrides (no DQ/CD, no VBMI2), each one EVEX op:

int64 >> (signed): vpsraq/vpsravq (was ~5–10-instr emulation); unsigned/32-bit keep the inherited path.
signed rotr: vprorvq/vprolq (dropped the is_unsigned guard to mirror rotl).
compress/expand: VPCOMPRESSD/Q, VCOMPRESSPS/PD and the VPEXPAND* forms for 32/64-bit int + float/double.

8/16-bit compress/expand (VPCOMPRESSB/VPCOMPRESSW) require AVX512_VBMI2 and are intentionally deferred — they fall through to the common{} fallback here and will be added later alongside new avx512vbmi2_{128,256} arch types.

Depends on #1357 (its swizzle fix unblocks the avx512vl_256 test build).

Add pure-VL overrides so these stop falling back to AVX2/SSE: - int64 signed >> -> vpsraq/vpsravq (unsigned/32-bit unchanged) - signed rotr -> vprorvq/vprolq (drop is_unsigned guard, mirror rotl) - compress/expand -> EVEX forms; 8/16-bit fall through to common{}

DiamonDinoia marked this pull request as draft June 2, 2026 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(avx512vl): native EVEX int64 shift, signed rotr, compress/expand#1358

perf(avx512vl): native EVEX int64 shift, signed rotr, compress/expand#1358
DiamonDinoia wants to merge 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:perf/avx512vl-pure-wins

DiamonDinoia commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DiamonDinoia commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DiamonDinoia commented Jun 2, 2026 •

edited

Loading