[Klaud Cold] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe by functionstackx · Pull Request #1759 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-14T17:22:02Z

Summary

Adds the EAGLE3 (spec-decoding: mtp) sibling of minimaxm3-fp8-mi325x-vllm (#1748): MiniMax-M3 MXFP8 on MI325X (gfx942) single-node vLLM (ROCm), pairing MiniMaxAI/MiniMax-M3-MXFP8 with the Inferact/MiniMax-M3-EAGLE3 draft. Based on the MI325X non-MTP recipe + the MI300X MTP recipe.

New script `minimaxm3_fp8_mi325x_mtp.sh`

Mirrors the non-MTP MI325X serve shape: BF16 KV cache (gfx942 lacks calibrated ROCm FP8 attn scales), --no-enable-prefix-caching, --block-size 128, --attention-backend TRITON_ATTN, minimax_m3 parsers.
Runs with CUDA graphs — no --enforce-eager, export VLLM_USE_BREAKABLE_CUDAGRAPH=0.
Adds --speculative-config '{"method":"eagle3","model":"Inferact/MiniMax-M3-EAGLE3","num_speculative_tokens":3}' (no attention_backend pin — ROCm runs TRITON_ATTN) + --use-chat-template.
Carries the same in-place EAGLE3 patch as the mi300x/mi355x MTP recipes ([AI generated draft] minimax_m3(amd): implement SupportsEagle3 for EAGLE3 spec decoding on ROCm functionstackx/vllm#1, upstream [Bug Fix] [MiniMax-M3] Implement EAGLE3 support on the AMD MiniMax M3 vllm-project/vllm#45546): the shipped ROCm image's AMD model lacks SupportsEagle3, so the recipe patches the installed amd/model.py before serving. Idempotent; dry-run verified against the image's file.

Config + launcher

minimaxm3-fp8-mi325x-vllm-mtp in amd-master.yaml: H200-style search space (TP4/TP8 latency, TP4+EP4/TP8+EP8 TEP, TP8+EP8 dp-attn DEP), trimmed at the high-conc end, latency rows at conc 1.
launch_mi325x-amds.sh had no SPEC_SUFFIX (an mtp config would run the non-MTP script); added it so mtp → _mtp.sh.

Validation

36 configs generate (all mtp/mi325x, conc-1, max-model-len 2304/9472); bash -n clean on script + launcher; embedded patch dry-runs cleanly against the image's amd/model.py; routing simulated.

Like the other ROCm MTP PRs, this is a validation harness (runtime monkey-patch); once the upstream fix is in a rebuilt image, the patch idempotently no-ops.

🤖 Generated with Claude Code

Note

Medium Risk
Benchmark-only changes, but the job mutates installed vLLM source at runtime; patch drift or a bad apply could fail jobs or skew results until upstream ships the fix in the image.

Overview
Adds MiniMax-M3 MXFP8 on MI325X EAGLE3 speculative decoding (spec-decoding: mtp) as the MTP sibling of the existing non-MTP MI325X recipe.

Registers minimaxm3-fp8-mi325x-vllm-mtp in amd-master.yaml with an H200-style search space (TP4/TP8 latency from conc 1, EP and dp-attn rows trimmed vs the base config). Documents the change in perf-changelog.yaml.

Introduces minimaxm3_fp8_mi325x_mtp.sh, which serves with Inferact/MiniMax-M3-EAGLE3 (3 speculative tokens), CUDA graphs (VLLM_USE_BREAKABLE_CUDAGRAPH=0), and --use-chat-template for benchmarks. Because the shipped ROCm image’s AMD model lacks SupportsEagle3, the script applies an idempotent in-place patch to vllm’s amd/model.py before vllm serve (fails if anchors drift).

Updates launch_mi325x-amds.sh with SPEC_SUFFIX so SPEC_DECODING=mtp runs the _mtp script, matching H200 launchers.

^{Reviewed by Cursor Bugbot for commit 36ada34. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds the spec-decoding=mtp sibling of minimaxm3-fp8-mi325x-vllm (#1748), based on the MI325X non-MTP recipe + the MI300X MTP recipe. gfx942 serve shape (BF16 KV cache, --no-enable-prefix-caching, TRITON_ATTN, minimax_m3 parsers), runs with CUDA graphs (no --enforce-eager, VLLM_USE_BREAKABLE_CUDAGRAPH=0), plus the Inferact/MiniMax-M3-EAGLE3 draft via --speculative-config (eagle3, 3 tokens) + chat-template prompts. Carries the same in-place EAGLE3 patch as the mi300x/mi355x MTP recipes (functionstackx/vllm#1, upstream vllm-project/vllm#45546): the ROCm image lacks SupportsEagle3, so the recipe patches the installed amd/model.py before serving. H200-style search space trimmed at the high-conc end, latency rows at conc 1. Also adds SPEC_SUFFIX to launch_mi325x-amds.sh so spec-decoding=mtp routes to the _mtp script. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-14T17:22:10Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-14T17:22:36Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27506407976
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27506407976

github-actions · 2026-06-14T19:54:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27506408589
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27506408589

functionstackx · 2026-06-14T20:04:47Z

/reuse-sweep-run

github-actions · 2026-06-14T20:05:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27510493230
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27510493230

functionstackx requested a review from a team June 14, 2026 17:22

perf-changelog: fill in PR link for minimaxm3-fp8-mi325x-vllm-mtp

d4c3a67

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 14, 2026 17:22

github-project-automation Bot added this to InferenceMAX Board Jun 14, 2026

functionstackx added the full-sweep-enabled label Jun 14, 2026

functionstackx changed the title ~~[Klaud Cold][AI draft test] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe~~ [Klaud Cold] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe Jun 14, 2026

Merge branch 'main' into feat/minimax-m3-mi325-mtp-dayzero

36ada34

functionstackx merged commit 41c27c5 into main Jun 14, 2026
4 of 6 checks passed

functionstackx deleted the feat/minimax-m3-mi325-mtp-dayzero branch June 14, 2026 20:05

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe#1759

[Klaud Cold] minimaxm3-fp8-mi325x-vllm-mtp: day-zero MiniMax-M3 EAGLE3 (MTP) MI325X recipe#1759
functionstackx merged 3 commits into
mainfrom
feat/minimax-m3-mi325-mtp-dayzero

functionstackx commented Jun 14, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

functionstackx commented Jun 14, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Jun 14, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New script minimaxm3_fp8_mi325x_mtp.sh

Config + launcher

Validation

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

functionstackx commented Jun 14, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented Jun 14, 2026 •

edited by cursor Bot

Loading

New script `minimaxm3_fp8_mi325x_mtp.sh`