Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
d93e4e4
feat: enable native mxfp8 moe for minimax m3 mi300x
Oseltamivir Jun 14, 2026
6b70497
chore: trigger MiniMax M3 MI300X MXFP8 sweep
Oseltamivir Jun 14, 2026
980f9c8
Merge branch 'main' into feat/m3-mi300x-mxfp8
Oseltamivir Jun 14, 2026
e9fa9b7
perf: tune MiniMax M3 gfx942 MXFP8 tiles
Oseltamivir Jun 14, 2026
7f159d3
Merge remote-tracking branch 'origin/main' into feat/m3-mi300x-mxfp8
Oseltamivir Jun 14, 2026
c3cdc37
perf: recover MiniMax M3 MI300X serving curve
Oseltamivir Jun 14, 2026
60a0002
fix: rebuild MI300X patch from pinned vLLM
Oseltamivir Jun 14, 2026
33584f9
Merge remote-tracking branch 'origin/main' into feat/m3-mi300x-mxfp8
Oseltamivir Jun 14, 2026
2bfc584
Merge remote-tracking branch 'origin/main' into feat/m3-mi300x-mxfp8
Oseltamivir Jun 14, 2026
a38f5ab
perf: update MiniMax M3 MI300X MXFP8 patch
Oseltamivir Jun 14, 2026
7678b0b
perf(mi300x): pack MiniMax M3 MXFP8 scales
Oseltamivir Jun 14, 2026
684b6a3
Merge remote-tracking branch 'origin/main' into feat/m3-mi300x-mxfp8
Oseltamivir Jun 14, 2026
280c030
perf(mi300x): tune MiniMax M3 MXFP8 refill dispatch
Oseltamivir Jun 15, 2026
ba30da1
Merge remote-tracking branch 'origin/main' into feat/m3-mi300x-mxfp8
Oseltamivir Jun 15, 2026
23925cc
perf(mi300x): tune short-k MXFP8 MoE GEMM2
Oseltamivir Jun 15, 2026
dd871ac
Merge remote-tracking branch 'origin/main' into feat/m3-mi300x-mxfp8
Oseltamivir Jun 15, 2026
1e3bfdd
fix(benchmarks): fail if MI300X patch is not applied
Oseltamivir Jun 15, 2026
d1638a0
Merge remote-tracking branch 'origin/main' into feat/m3-mi300x-mxfp8
Oseltamivir Jun 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2847,10 +2847,11 @@ minimaxm3-fp8-mi355x-vllm-mtp:
- { tp: 4, conc-start: 1, conc-end: 64, spec-decoding: mtp }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 128, conc-end: 256, spec-decoding: mtp }

# MiniMax-M3 MXFP8 MI300X day-zero recipe. Reuse the dedicated ROCm image and
# MI355X serving shape, but retain the default BF16 KV cache because this
# checkpoint lacks calibrated ROCm FP8 attention scales. Use the TP8-only H100
# search space: TP8 for latency and TP8+EP8 (TEP) at high concurrency.
# MiniMax-M3 MXFP8 MI300X recipe. Apply the checked-in hybrid gfx94x MXFP8 MoE
# patch to the dedicated ROCm image: BF16 for small TP batches and EP, native
# compressed MXFP8 for larger TP batches and long context. Retain the default
# BF16 KV cache because this checkpoint lacks calibrated ROCm FP8 attention
# scales. Use TP8 for latency and TP8+EP8 at high concurrency.
minimaxm3-fp8-mi300x-vllm:
image: vllm/vllm-openai-rocm:minimax-m3
model: MiniMaxAI/MiniMax-M3-MXFP8
Expand Down
32 changes: 28 additions & 4 deletions benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi300x.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
#!/usr/bin/env bash

# MiniMax-M3 MXFP8 MI300X (gfx942) single-node vLLM recipe.
# Reuses the dedicated ROCm image and the MI355X serving shape. Block size 128
# is mandatory for MSA sparse attention. Keep the default BF16 KV cache on
# gfx942: the checkpoint has no calibrated q/prob scales for ROCm FP8
# attention, and vLLM's fallback scale of 1.0 corrupts model accuracy.
# Reuses the dedicated ROCm image and applies the checked-in hybrid gfx94x
# MXFP8 MoE patch before starting vLLM. Block size 128 is mandatory for MSA
# sparse attention. Keep the default BF16 KV cache on gfx942: the checkpoint
# has no calibrated q/prob scales for ROCm FP8 attention, and vLLM's fallback
# scale of 1.0 corrupts model accuracy.
# Target image vLLM revision: 4a560dd8db67c270f5e2afb614558271b76f2294.

source "$(dirname "$0")/../../benchmark_lib.sh"

Expand All @@ -24,6 +26,28 @@ if [[ -n "$SLURM_JOB_ID" ]]; then
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
fi

VLLM_PACKAGE_ROOT="$(
python - <<'PY'
from pathlib import Path

import vllm

print(Path(vllm.__file__).resolve().parent.parent)
PY
)"
MXFP8_PATCH="$(dirname "$0")/minimaxm3_mi300x_mxfp8.patch"
MXFP8_ORACLE="$VLLM_PACKAGE_ROOT/vllm/model_executor/layers/fused_moe/oracle/mxfp8.py"
if ! grep -q "Using fused CDNA3 (gfx94x)" "$MXFP8_ORACLE"; then
if ! patch --batch --forward -d "$VLLM_PACKAGE_ROOT" -p1 < "$MXFP8_PATCH"; then
echo "Failed to apply the MI300X MXFP8 patch" >&2
exit 1
fi
fi
if ! grep -q "Using fused CDNA3 (gfx94x)" "$MXFP8_ORACLE"; then
echo "MI300X MXFP8 backend marker is missing after patching" >&2
exit 1
fi
Comment thread
cursor[bot] marked this conversation as resolved.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MTP script skips MXFP8 patch

Medium Severity

Runtime MXFP8 patching was added only to the non-MTP MI300X benchmark script. launch_mi300x-amds.sh runs minimaxm3_fp8_mi300x_mtp.sh for spec-decoding: mtp configs, so those jobs never apply minimaxm3_mi300x_mxfp8.patch despite the MTP script claiming it mirrors this recipe.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c3cdc37. Configure here.


if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi

if [ -n "$ROCR_VISIBLE_DEVICES" ]; then
Expand Down
Loading