[NV] Multi-node synthetic AL injection for MTP (dsv4 mtp2 bring-up) by qiching · Pull Request #1789 · SemiAnalysisAI/InferenceX

qiching · 2026-06-15T20:29:13Z

Summary

Adds opt-in synthetic-acceptance injection for multi-node MTP recipes, enabled on a single dsv4 MTP2 agg cell as
the bring-up / e2e test.

runners/inject_synthetic_acceptance.py (new): when SYNTHETIC_ACCEPTANCE=true, rewrites each speculative-config in the srt-slurm recipe to use synthetic rejection sampling. No-op when the env var is unset/false.
runners/launch_gb200-nv.sh: generalize the watchtower shared-FS staging from minimax-only to a USE_SHARED_FS flag (now also dsv4 dynamo-vllm), and invoke the injection after the name override / before srtctl apply.
.github/configs/nvidia-master.yaml: enable SYNTHETIC_ACCEPTANCE (length 2.27) on the dsv4 gb200 dynamo-vllm mtp2 agg cell only.

Design

Bring-up approach: keep it opt-in so we can enable support incrementally per fw/model/config and roll back easily. Currently per-recipe via additional-settings; promote to a first-class field once enforced everywhere.

Not in this PR (follow-ups)

Per-model toggle ("flip once per model"), flag is currently per recipe cell.
AL auto-resolve from benchmarks/speedbench-reference-al.yaml, this cell hardcodes 2.27; the reference YAML isn't in this repo yet.

Port of internal PR SemiAnalysisAI#95. Adds opt-in synthetic-acceptance injection for the multi-node dsv4 MTP2 agg recipe: - runners/inject_synthetic_acceptance.py: rewrites each speculative-config in the srt-slurm recipe to use synthetic rejection sampling when SYNTHETIC_ACCEPTANCE=true (no-op otherwise). - runners/launch_gb200-nv.sh: USE_SHARED_FS flag (dsv4 dynamo-vllm now uses the same compute-visible shared-FS staging as minimax on watchtower) + invoke the injection after the name override, before srtctl apply. - .github/configs/nvidia-master.yaml: enable SYNTHETIC_ACCEPTANCE on the dsv4 gb200 dynamo-vllm mtp2 agg cell (length 2.27) for the e2e test.

qiching · 2026-06-15T20:46:09Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-gb200-dynamo-vllm-mtp2 --conc 1 --no-evals

github-actions · 2026-06-15T20:46:33Z

@qiching Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27575304265
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-gb200-dynamo-vllm-mtp2 --conc 1 --no-evals
Pinned ref: 7b6736f
Approval: not required (trusted collaborator).

github-project-automation Bot added this to InferenceMAX Board Jun 15, 2026

qiching mentioned this pull request Jun 15, 2026

[Tracking Issue] Synthetic Acceptance for MTP Benchmarks #1651

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Multi-node synthetic AL injection for MTP (dsv4 mtp2 bring-up)#1789

[NV] Multi-node synthetic AL injection for MTP (dsv4 mtp2 bring-up)#1789
qiching wants to merge 1 commit into
SemiAnalysisAI:mainfrom
qiching:albecheng/gb200-synthetic-mtp-multinode

qiching commented Jun 15, 2026

Uh oh!

qiching commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qiching commented Jun 15, 2026

Summary

Design

Not in this PR (follow-ups)

Uh oh!

qiching commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant