Skip to content

[NV] Multi-node synthetic AL injection for MTP (dsv4 mtp2 bring-up)#1789

Draft
qiching wants to merge 1 commit into
SemiAnalysisAI:mainfrom
qiching:albecheng/gb200-synthetic-mtp-multinode
Draft

[NV] Multi-node synthetic AL injection for MTP (dsv4 mtp2 bring-up)#1789
qiching wants to merge 1 commit into
SemiAnalysisAI:mainfrom
qiching:albecheng/gb200-synthetic-mtp-multinode

Conversation

@qiching

@qiching qiching commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds opt-in synthetic-acceptance injection for multi-node MTP recipes, enabled on a single dsv4 MTP2 agg cell as
the bring-up / e2e test.

  • runners/inject_synthetic_acceptance.py (new): when SYNTHETIC_ACCEPTANCE=true, rewrites each speculative-config in the srt-slurm recipe to use synthetic rejection sampling. No-op when the env var is unset/false.
  • runners/launch_gb200-nv.sh: generalize the watchtower shared-FS staging from minimax-only to a USE_SHARED_FS flag (now also dsv4 dynamo-vllm), and invoke the injection after the name override / before srtctl apply.
  • .github/configs/nvidia-master.yaml: enable SYNTHETIC_ACCEPTANCE (length 2.27) on the dsv4 gb200 dynamo-vllm mtp2 agg cell only.

Design

Bring-up approach: keep it opt-in so we can enable support incrementally per fw/model/config and roll back easily. Currently per-recipe via additional-settings; promote to a first-class field once enforced everywhere.

Not in this PR (follow-ups)

  • Per-model toggle ("flip once per model"), flag is currently per recipe cell.
  • AL auto-resolve from benchmarks/speedbench-reference-al.yaml, this cell hardcodes 2.27; the reference YAML isn't in this repo yet.

Port of internal PR SemiAnalysisAI#95. Adds opt-in synthetic-acceptance injection for the
multi-node dsv4 MTP2 agg recipe:

- runners/inject_synthetic_acceptance.py: rewrites each speculative-config in
  the srt-slurm recipe to use synthetic rejection sampling when
  SYNTHETIC_ACCEPTANCE=true (no-op otherwise).
- runners/launch_gb200-nv.sh: USE_SHARED_FS flag (dsv4 dynamo-vllm now uses the
  same compute-visible shared-FS staging as minimax on watchtower) + invoke the
  injection after the name override, before srtctl apply.
- .github/configs/nvidia-master.yaml: enable SYNTHETIC_ACCEPTANCE on the dsv4
  gb200 dynamo-vllm mtp2 agg cell (length 2.27) for the e2e test.
@qiching

qiching commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-gb200-dynamo-vllm-mtp2 --conc 1 --no-evals

@github-actions

Copy link
Copy Markdown
Contributor

@qiching Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27575304265
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-gb200-dynamo-vllm-mtp2 --conc 1 --no-evals
Pinned ref: 7b6736f
Approval: not required (trusted collaborator).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant