[NV] Use Marlin for MiniMax M3 TP-only configs#1809
Conversation
…25-params' into nv/jasonli/minimaxm3-stack-base-1781-1784 # Conflicts: # perf-changelog.yaml
…tp-serving-settings' into nv/jasonli/minimaxm3-stack-base-1781-1784 # Conflicts: # perf-changelog.yaml
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| PARALLEL_ARGS="--tensor-parallel-size=$TP --enable-expert-parallel" | ||
| else | ||
| PARALLEL_ARGS="--tensor-parallel-size=$TP" | ||
| PARALLEL_ARGS="--tensor-parallel-size=$TP --moe-backend marlin" |
There was a problem hiding this comment.
thanks for contribution! can u add marlin for blackwell marlin on vllm recipes?
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27630519240 |
|
@functionstackx can you merge? thanks |
|
happy to once there is an PR to vllm recipes repo with the recipes changes per #1809 (comment) |
|
@functionstackx PR to vllm recipe: vllm-project/recipes#558 |
|
/reuse-sweep-run |
Stacked on #1781 and #1784.
Adds
--moe-backend marlinfor MiniMax-M3 B200/B300 TP-only vLLM launch paths when expert parallelism is disabled.Validation:
bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b200.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b200_mtp.sh benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_b300_mtp.shgit diff --checkperf-changelog.yamland.github/configs/nvidia-master.yamlNote
Low Risk
Benchmark-only vLLM serve flag change on a narrow parallelism branch; no auth, data, or production runtime impact.
Overview
For MiniMax-M3 MXFP8 B200/B300 fixed-sequence vLLM recipes (standard and EAGLE3 MTP), the TP-only launch path now passes
--moe-backend marlinwhen expert parallelism is off (EP_SIZE≤ 1 and DP attention is false). DP-attention and TP+EP branches are unchanged.perf-changelog.yamlrecords this forminimaxm3-fp8-b200-vllm,minimaxm3-fp8-b300-vllm, and the matching MTP config keys.Reviewed by Cursor Bugbot for commit 0ddf2cd. Bugbot is set up for automated code reviews on this repo. Configure here.