[Klaud Cold] minimaxm3 MI300X/MI325X non-MTP: start TP-only latency rows at conc 1 by functionstackx · Pull Request #1760 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-14T20:11:49Z

Summary

Extends the MiniMax-M3 MXFP8 MI300X and MI325X non-MTP sweeps down to concurrency 1 on the TP-only latency rows (were starting at conc 4), to capture the single-request latency point. Mirrors the H100/H200 conc-1 change (#1743).

minimaxm3-fp8-mi300x-vllm: TP8 latency rows (1k1k + 8k1k) now start at conc 1.
minimaxm3-fp8-mi325x-vllm: TP4 and TP8 latency rows (1k1k + 8k1k) now start at conc 1.

TEP (tp+ep) and DEP (tp+ep+dp-attn) rows keep their higher concurrency starts (128/256) — they only pay off at scale. Config/search-space change only; no script changes.

Validation

generate_sweep_configs.py test-config → 57 configs; min concurrency confirms the TP-only rows now start at 1 (mi300x tp8 / mi325x tp4 / mi325x tp8 → 1), TEP/DEP unchanged (128/256).

🤖 Generated with Claude Code

Note

Low Risk
Benchmark search-space YAML only; adds low-concurrency sweep points without changing runtime code or serve recipes.

Overview
Lowers the starting concurrency on TP-only fixed-seq-len rows for minimaxm3-fp8-mi300x-vllm and minimaxm3-fp8-mi325x-vllm from 4 → 1, so sweeps include the single-request latency point (aligned with the prior H100/H200 change in #1743).

On MI300X, only TP8 latency rows for 1k1k and 8k1k change. On MI325X, TP4 and TP8 latency rows for both ISL/OS pairs change. Rows with expert parallelism (tp+ep) or dp-attn keep their existing higher conc-start values.

perf-changelog.yaml documents the two config keys. No launch scripts or serving flags are touched—search-space YAML only.

^{Reviewed by Cursor Bugbot for commit 0c221e5. Bugbot is set up for automated code reviews on this repo. Configure here.}

Drop the conc-start of the TP-only (latency) search-space rows from 4 to 1 for minimaxm3-fp8-mi300x-vllm and minimaxm3-fp8-mi325x-vllm, capturing the single-request latency point. TEP/DEP rows keep their higher concurrency starts. Mirrors the H100/H200 conc-1 change (#1743). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-14T20:11:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-14T20:11:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-14T20:11:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-14T20:17:25Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27510666835
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27510666835

github-actions · 2026-06-14T23:20:50Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27510667862
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27510667862

functionstackx · 2026-06-14T23:22:34Z

/reuse-sweep-run

functionstackx requested a review from a team June 14, 2026 20:11

perf-changelog: fill in PR link for mi300x/mi325x non-MTP conc-1

0c221e5

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 14, 2026 20:11

github-project-automation Bot added this to InferenceMAX Board Jun 14, 2026

functionstackx added the full-sweep-enabled label Jun 14, 2026

functionstackx mentioned this pull request Jun 14, 2026

[Experimental][DNM till upstream PR merges][AMD] perf: hybrid MXFP8 MoE for MiniMax M3 on MI300X #1753

Open

functionstackx merged commit e2f84d7 into main Jun 14, 2026
83 checks passed

functionstackx deleted the feat/minimax-m3-mi300-mi325-conc1 branch June 14, 2026 23:22

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold] minimaxm3 MI300X/MI325X non-MTP: start TP-only latency rows at conc 1#1760

[Klaud Cold] minimaxm3 MI300X/MI325X non-MTP: start TP-only latency rows at conc 1#1760
functionstackx merged 2 commits into
mainfrom
feat/minimax-m3-mi300-mi325-conc1

functionstackx commented Jun 14, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

functionstackx commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Jun 14, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

functionstackx commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented Jun 14, 2026 •

edited by cursor Bot

Loading