[NV] Update MiniMax-M3 B200 MTP, B300, and B300 MTP vLLM serving settings by xinli-sw · Pull Request #1802 · SemiAnalysisAI/InferenceX

xinli-sw · 2026-06-16T03:51:31Z

Summary

Extends the B200 fixes from #1779 to the three remaining MiniMax-M3 Blackwell variants:

minimaxm3_fp8_b200_mtp.sh / minimaxm3_fp8_b300.sh / minimaxm3_fp8_b300_mtp.sh: set VLLM_FLOAT32_MATMUL_PRECISION=high; replace dynamic CAPTURE_SIZE computation with hardcoded --max-cudagraph-capture-size 2048
nvidia-master.yaml — add TP4+EP4 coverage to three config keys:
- minimaxm3-fp8-b300-vllm: add tp4ep4 dp-attn for 1k1k; add tp4ep4 and tp4ep4 dp-attn for 8k1k
- minimaxm3-fp8-b200-vllm-mtp: same additions (concs trimmed at high end per MTP convention)
- minimaxm3-fp8-b300-vllm-mtp: same additions as B200 MTP

Test plan

CI sweep on minimaxm3-fp8-b200-vllm-mtp
CI sweep on minimaxm3-fp8-b300-vllm
CI sweep on minimaxm3-fp8-b300-vllm-mtp

Generated with Claude Code

Note

Low Risk
Benchmark and CI sweep configuration only; no application runtime or security-sensitive code paths.

Overview
Brings MiniMax-M3 B200 MTP, B300, and B300 MTP vLLM fixed-seq-len serving in line with the B200 changes from #1779.

The three benchmark launch scripts now export VLLM_FLOAT32_MATMUL_PRECISION=high and pass a fixed --max-cudagraph-capture-size 2048 instead of computing capture size from concurrency (and spec-token count on MTP).

nvidia-master.yaml gains TP4+EP4 sweep rows for minimaxm3-fp8-b300-vllm, minimaxm3-fp8-b200-vllm-mtp, and minimaxm3-fp8-b300-vllm-mtp: DP-attention at 1k1k, plus non-DP and DP-attention at 8k1k (with MTP-appropriate concurrency caps). perf-changelog.yaml records the same for those config keys.

^{Reviewed by Cursor Bugbot for commit bb0bca9. Bugbot is set up for automated code reviews on this repo. Configure here.}

…ings Apply the same fixes from PR #1779 (B200) to the remaining three variants: - Set VLLM_FLOAT32_MATMUL_PRECISION=high in all three runner scripts - Hardcode --max-cudagraph-capture-size 2048 (remove dynamic CAPTURE_SIZE computation) - Add TP4+EP4 dp-attn coverage for 1k1k, and TP4+EP4 + TP4+EP4 dp-attn rows for 8k1k in nvidia-master.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-16T03:51:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-16T03:51:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-16T06:48:35Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27593402054
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27593402054

xinli-sw · 2026-06-16T13:34:22Z

duplicate of #1784 #1781

xinli-sw · 2026-06-16T18:23:06Z

closed as duplicate

xinli-sw requested a review from a team June 16, 2026 03:51

xinli-sw requested review from jgangani and kedarpotdar-nv as code owners June 16, 2026 03:51

github-project-automation Bot added this to InferenceMAX Board Jun 16, 2026

xinli-sw added the full-sweep-enabled label Jun 16, 2026

Update perf-changelog.yaml with new coverage details

bb0bca9

xinli-sw closed this Jun 16, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Update MiniMax-M3 B200 MTP, B300, and B300 MTP vLLM serving settings#1802

[NV] Update MiniMax-M3 B200 MTP, B300, and B300 MTP vLLM serving settings#1802
xinli-sw wants to merge 2 commits into
mainfrom
minimaxm3-settings-clean

xinli-sw commented Jun 16, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

xinli-sw commented Jun 16, 2026

Uh oh!

xinli-sw commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xinli-sw commented Jun 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

xinli-sw commented Jun 16, 2026

Uh oh!

xinli-sw commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xinli-sw commented Jun 16, 2026 •

edited by cursor Bot

Loading