Skip to content

[NV]Add GLM-5 NVFP4 GB300 disagg-mtp TRT-LLM benchmarks via Dynamo #1799

Open
xinli-sw wants to merge 7 commits into
mainfrom
rihuo/glm5-gb300-dynamo-trt-mtp
Open

[NV]Add GLM-5 NVFP4 GB300 disagg-mtp TRT-LLM benchmarks via Dynamo #1799
xinli-sw wants to merge 7 commits into
mainfrom
rihuo/glm5-gb300-dynamo-trt-mtp

Conversation

@xinli-sw

@xinli-sw xinli-sw commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

opened on behalf of @richardhuo-nv


Note

Low Risk
Benchmark and launcher configuration only; no application runtime or security-sensitive logic changes.

Overview
Adds glm5-fp4-gb300-dynamo-trt-mtp to the NVIDIA master benchmark matrix: GLM-5 NVFP4 on GB300 with disaggregated prefill/decode, MTP spec-decoding, and Dynamo + TensorRT-LLM (tensorrtllm-runtime:1.3.0-dev.1-cuda13). Coverage is 23 MTP search points13 at 1K/1K and 10 at 8K/1K—each wired to an CONFIG_FILE recipe on NVIDIA srt-slurm sa-submission-q2-2026.

runners/launch_gb300-nv.sh gains a glm5 + fp4 + dynamo-trt branch that sets SERVED_MODEL_NAME, MODEL_PATH, and SRT_SLURM_MODEL_PREFIX for nvidia/GLM-5-NVFP4. perf-changelog.yaml documents the new config key.

Reviewed by Cursor Bugbot for commit 4dc40fe. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6598e48. Configure here.

num-worker: 1
tp: 32
ep: 32
dp-attn: true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate non-monotonic conc-list 615 suggests misplaced entry

Medium Severity

In the ISL 8192 section, conc-list: [615] appears twice — once at line 2668 (with 10 prefill workers, decode tp=16) and again at line 2698 (with 11 prefill workers, decode tp=32) placed after the conc-list: [1229] entry. This breaks the otherwise strictly ascending concurrency order (5, 15, 30, 84, 180, 333, 615, 1229, 615, 2253). The second 615 likely has an incorrect concurrency value or is misplaced, potentially causing the benchmark to run two different hardware configurations at the same concurrency target unintentionally, or missing a distinct concurrency sweep point.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6598e48. Configure here.

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting on glm5 nvfp4 gb300 mtp sgl PR first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants