[AMD][MI35X] 0612 DSV4#1715
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
This comment was marked as outdated.
This comment was marked as outdated.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27406714195 |
|
@functionstackx could you please review this? |
|
yes. lgtm once passes and then i will do /reuse on it https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27475314602/job/81213395277?pr=1715 |
|
hi @1am9trash it seems like conc |
|
hi @1am9trash it seems like conc512 is failing, can u take a look? (will cancel the rest of the conc for now to avoid clogging up the CI queue since conc512 failed already)
|
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27475314602 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27475314602 |
|
Hi, @functionstackx
My assumption is that there were other workloads on the server competing for resources at the time, and therefore the failure is unrelated to the v4 testing changes introduced in this PR. I reran the task (conc=512), and it completed successfully without encountering the issue. Thanks. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27475314602 |
|
i am merging it rn, ty for these changes! |
|
thanks for the contribution @1am9trash any chance yall can optimize the MTP shapes too & DI too (escpially DI with MTP composed together would be amazing). mi355 agg MTP doesnt seem close to SOL.
|
|
/reuse-sweep-run |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27519237142 |



Successful run:
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27406714195
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27406714195
Change:
Note
Low Risk
Benchmark-only image pin and serving-flag tweak for one AMD DSv4 config; no application or auth changes.
Overview
Updates the dsv4-fp4-mi355x-sglang benchmark to SGLang ROCm v0.5.13 (
lmsysorg/sglang-rocm:v0.5.13-rocm720-mi35x-20260612), which picks up upstream MoEintermediate_padfixes (sglang PR#27858) so padding work is not wasted in MoE compute.In
dsv4_fp4_mi355x_sglang.sh,--chunked-prefill-sizeis no longer fixed at 8192 when DP attention is on: it stays 8192 for TP-only paths and becomes8192 × TP(e.g. 65536 at TP8) for the TP8/DP8 sweep.perf-changelog.yamlrecords the image bump and chunked-prefill correction for this config key.Reviewed by Cursor Bugbot for commit dbf706b. Bugbot is set up for automated code reviews on this repo. Configure here.