Skip to content

[Klaud Cold] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe#1759

Merged
functionstackx merged 3 commits into
mainfrom
feat/minimax-m3-mi325-mtp-dayzero
Jun 14, 2026
Merged

[Klaud Cold] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe#1759
functionstackx merged 3 commits into
mainfrom
feat/minimax-m3-mi325-mtp-dayzero

Conversation

@functionstackx

@functionstackx functionstackx commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds the EAGLE3 (spec-decoding: mtp) sibling of minimaxm3-fp8-mi325x-vllm (#1748): MiniMax-M3 MXFP8 on MI325X (gfx942) single-node vLLM (ROCm), pairing MiniMaxAI/MiniMax-M3-MXFP8 with the Inferact/MiniMax-M3-EAGLE3 draft. Based on the MI325X non-MTP recipe + the MI300X MTP recipe.

New script minimaxm3_fp8_mi325x_mtp.sh

Config + launcher

  • minimaxm3-fp8-mi325x-vllm-mtp in amd-master.yaml: H200-style search space (TP4/TP8 latency, TP4+EP4/TP8+EP8 TEP, TP8+EP8 dp-attn DEP), trimmed at the high-conc end, latency rows at conc 1.
  • launch_mi325x-amds.sh had no SPEC_SUFFIX (an mtp config would run the non-MTP script); added it so mtp_mtp.sh.

Validation

  • 36 configs generate (all mtp/mi325x, conc-1, max-model-len 2304/9472); bash -n clean on script + launcher; embedded patch dry-runs cleanly against the image's amd/model.py; routing simulated.

Like the other ROCm MTP PRs, this is a validation harness (runtime monkey-patch); once the upstream fix is in a rebuilt image, the patch idempotently no-ops.

🤖 Generated with Claude Code


Note

Medium Risk
Benchmark-only changes, but the job mutates installed vLLM source at runtime; patch drift or a bad apply could fail jobs or skew results until upstream ships the fix in the image.

Overview
Adds MiniMax-M3 MXFP8 on MI325X EAGLE3 speculative decoding (spec-decoding: mtp) as the MTP sibling of the existing non-MTP MI325X recipe.

Registers minimaxm3-fp8-mi325x-vllm-mtp in amd-master.yaml with an H200-style search space (TP4/TP8 latency from conc 1, EP and dp-attn rows trimmed vs the base config). Documents the change in perf-changelog.yaml.

Introduces minimaxm3_fp8_mi325x_mtp.sh, which serves with Inferact/MiniMax-M3-EAGLE3 (3 speculative tokens), CUDA graphs (VLLM_USE_BREAKABLE_CUDAGRAPH=0), and --use-chat-template for benchmarks. Because the shipped ROCm image’s AMD model lacks SupportsEagle3, the script applies an idempotent in-place patch to vllm’s amd/model.py before vllm serve (fails if anchors drift).

Updates launch_mi325x-amds.sh with SPEC_SUFFIX so SPEC_DECODING=mtp runs the _mtp script, matching H200 launchers.

Reviewed by Cursor Bugbot for commit 36ada34. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds the spec-decoding=mtp sibling of minimaxm3-fp8-mi325x-vllm (#1748),
based on the MI325X non-MTP recipe + the MI300X MTP recipe. gfx942 serve
shape (BF16 KV cache, --no-enable-prefix-caching, TRITON_ATTN, minimax_m3
parsers), runs with CUDA graphs (no --enforce-eager,
VLLM_USE_BREAKABLE_CUDAGRAPH=0), plus the Inferact/MiniMax-M3-EAGLE3 draft
via --speculative-config (eagle3, 3 tokens) + chat-template prompts.

Carries the same in-place EAGLE3 patch as the mi300x/mi355x MTP recipes
(functionstackx/vllm#1, upstream vllm-project/vllm#45546): the ROCm image
lacks SupportsEagle3, so the recipe patches the installed amd/model.py
before serving. H200-style search space trimmed at the high-conc end,
latency rows at conc 1. Also adds SPEC_SUFFIX to launch_mi325x-amds.sh so
spec-decoding=mtp routes to the _mtp script.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@functionstackx functionstackx requested a review from a team June 14, 2026 17:22
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx changed the title [Klaud Cold][AI draft test] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe [Klaud Cold] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe Jun 14, 2026
@functionstackx

Copy link
Copy Markdown
Collaborator Author

/reuse-sweep-run

@functionstackx functionstackx merged commit 41c27c5 into main Jun 14, 2026
4 of 6 checks passed
@functionstackx functionstackx deleted the feat/minimax-m3-mi325-mtp-dayzero branch June 14, 2026 20:05
@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant