Skip to content

[NV] Update MiniMax-M3 B200 MTP, B300, and B300 MTP vLLM serving settings#1802

Closed
xinli-sw wants to merge 2 commits into
mainfrom
minimaxm3-settings-clean
Closed

[NV] Update MiniMax-M3 B200 MTP, B300, and B300 MTP vLLM serving settings#1802
xinli-sw wants to merge 2 commits into
mainfrom
minimaxm3-settings-clean

Conversation

@xinli-sw

@xinli-sw xinli-sw commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Extends the B200 fixes from #1779 to the three remaining MiniMax-M3 Blackwell variants:

  • minimaxm3_fp8_b200_mtp.sh / minimaxm3_fp8_b300.sh / minimaxm3_fp8_b300_mtp.sh: set VLLM_FLOAT32_MATMUL_PRECISION=high; replace dynamic CAPTURE_SIZE computation with hardcoded --max-cudagraph-capture-size 2048
  • nvidia-master.yaml — add TP4+EP4 coverage to three config keys:
    • minimaxm3-fp8-b300-vllm: add tp4ep4 dp-attn for 1k1k; add tp4ep4 and tp4ep4 dp-attn for 8k1k
    • minimaxm3-fp8-b200-vllm-mtp: same additions (concs trimmed at high end per MTP convention)
    • minimaxm3-fp8-b300-vllm-mtp: same additions as B200 MTP

Test plan

  • CI sweep on minimaxm3-fp8-b200-vllm-mtp
  • CI sweep on minimaxm3-fp8-b300-vllm
  • CI sweep on minimaxm3-fp8-b300-vllm-mtp

Generated with Claude Code


Note

Low Risk
Benchmark and CI sweep configuration only; no application runtime or security-sensitive code paths.

Overview
Brings MiniMax-M3 B200 MTP, B300, and B300 MTP vLLM fixed-seq-len serving in line with the B200 changes from #1779.

The three benchmark launch scripts now export VLLM_FLOAT32_MATMUL_PRECISION=high and pass a fixed --max-cudagraph-capture-size 2048 instead of computing capture size from concurrency (and spec-token count on MTP).

nvidia-master.yaml gains TP4+EP4 sweep rows for minimaxm3-fp8-b300-vllm, minimaxm3-fp8-b200-vllm-mtp, and minimaxm3-fp8-b300-vllm-mtp: DP-attention at 1k1k, plus non-DP and DP-attention at 8k1k (with MTP-appropriate concurrency caps). perf-changelog.yaml records the same for those config keys.

Reviewed by Cursor Bugbot for commit bb0bca9. Bugbot is set up for automated code reviews on this repo. Configure here.

…ings

Apply the same fixes from PR #1779 (B200) to the remaining three variants:
- Set VLLM_FLOAT32_MATMUL_PRECISION=high in all three runner scripts
- Hardcode --max-cudagraph-capture-size 2048 (remove dynamic CAPTURE_SIZE computation)
- Add TP4+EP4 dp-attn coverage for 1k1k, and TP4+EP4 + TP4+EP4 dp-attn rows for 8k1k in nvidia-master.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions

Copy link
Copy Markdown
Contributor

@xinli-sw

Copy link
Copy Markdown
Collaborator Author

duplicate of #1784 #1781

@xinli-sw

Copy link
Copy Markdown
Collaborator Author

closed as duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant