Skip to content

perf(avx512vl): native EVEX int64 shift, signed rotr, compress/expand#1358

Draft
DiamonDinoia wants to merge 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:perf/avx512vl-pure-wins
Draft

perf(avx512vl): native EVEX int64 shift, signed rotr, compress/expand#1358
DiamonDinoia wants to merge 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:perf/avx512vl-pure-wins

Conversation

@DiamonDinoia
Copy link
Copy Markdown
Contributor

@DiamonDinoia DiamonDinoia commented Jun 2, 2026

avx512vl_{128,256} inherit AVX2/SSE codegen for un-overridden kernels. Adds three pure-VL overrides (no DQ/CD, no VBMI2), each one EVEX op:

  • int64 >> (signed): vpsraq/vpsravq (was ~5–10-instr emulation); unsigned/32-bit keep the inherited path.
  • signed rotr: vprorvq/vprolq (dropped the is_unsigned guard to mirror rotl).
  • compress/expand: VPCOMPRESSD/Q, VCOMPRESSPS/PD and the VPEXPAND* forms for 32/64-bit int + float/double.

8/16-bit compress/expand (VPCOMPRESSB/VPCOMPRESSW) require AVX512_VBMI2 and are intentionally deferred — they fall through to the common{} fallback here and will be added later alongside new avx512vbmi2_{128,256} arch types.

Depends on #1357 (its swizzle fix unblocks the avx512vl_256 test build).

Add pure-VL overrides so these stop falling back to AVX2/SSE:
- int64 signed >> -> vpsraq/vpsravq (unsigned/32-bit unchanged)
- signed rotr -> vprorvq/vprolq (drop is_unsigned guard, mirror rotl)
- compress/expand -> EVEX forms; 8/16-bit fall through to common{}
@DiamonDinoia DiamonDinoia marked this pull request as draft June 2, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant