[Klaud Cold] minimaxm3 MI300X/MI325X non-MTP: start TP-only latency rows at conc 1#1760
Conversation
Drop the conc-start of the TP-only (latency) search-space rows from 4 to 1 for minimaxm3-fp8-mi300x-vllm and minimaxm3-fp8-mi325x-vllm, capturing the single-request latency point. TEP/DEP rows keep their higher concurrency starts. Mirrors the H100/H200 conc-1 change (#1743). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27510666835 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27510667862 |
|
/reuse-sweep-run |
Summary
Extends the MiniMax-M3 MXFP8 MI300X and MI325X non-MTP sweeps down to concurrency 1 on the TP-only latency rows (were starting at conc 4), to capture the single-request latency point. Mirrors the H100/H200 conc-1 change (#1743).
minimaxm3-fp8-mi300x-vllm: TP8 latency rows (1k1k + 8k1k) now start at conc 1.minimaxm3-fp8-mi325x-vllm: TP4 and TP8 latency rows (1k1k + 8k1k) now start at conc 1.TEP (
tp+ep) and DEP (tp+ep+dp-attn) rows keep their higher concurrency starts (128/256) — they only pay off at scale. Config/search-space change only; no script changes.Validation
generate_sweep_configs.py test-config→ 57 configs; min concurrency confirms the TP-only rows now start at 1 (mi300x tp8 / mi325x tp4 / mi325x tp8 → 1), TEP/DEP unchanged (128/256).🤖 Generated with Claude Code
Note
Low Risk
Benchmark search-space YAML only; adds low-concurrency sweep points without changing runtime code or serve recipes.
Overview
Lowers the starting concurrency on TP-only fixed-seq-len rows for
minimaxm3-fp8-mi300x-vllmandminimaxm3-fp8-mi325x-vllmfrom 4 → 1, so sweeps include the single-request latency point (aligned with the prior H100/H200 change in #1743).On MI300X, only TP8 latency rows for 1k1k and 8k1k change. On MI325X, TP4 and TP8 latency rows for both ISL/OS pairs change. Rows with expert parallelism (
tp+ep) or dp-attn keep their existing higherconc-startvalues.perf-changelog.yamldocuments the two config keys. No launch scripts or serving flags are touched—search-space YAML only.Reviewed by Cursor Bugbot for commit 0c221e5. Bugbot is set up for automated code reviews on this repo. Configure here.