[AMD] minimaxm3-fp8-mi300x-vllm: enable AITER kernels + safe ROCm knobs by JohnQinAMD · Pull Request #1804 · SemiAnalysisAI/InferenceX

JohnQinAMD · 2026-06-16T06:58:08Z

Enable AITER on MI300X/gfx942 for MiniMax-M3 MXFP8 via the single master toggle VLLM_ROCM_USE_AITER=1. The per-component AITER flags (_MOE, _LINEAR, _RMSNORM, _FP8BMM) default to True and are gated behind the master flag, so they are left at their defaults. VLLM_ROCM_USE_AITER_MHA defaults to True and is explicitly set to 0 to keep attention on TRITON_ATTN, since the MXFP8 checkpoint lacks calibrated q/prob scales for ROCm FP8 attention.

Also set numerically-inert runtime knobs (hipBLASLt preference, NCCL channels, HW queues). All changes are kernel-selection/runtime only; GSM8K holds ~0.95.

Measured uplift (8xMI300X, 1k1k, total tok/s/gpu): +5.6..+10.8% across conc 4..256; conc 1-2 unchanged (latency-bound).

Note

Low Risk
Benchmark-only env exports and changelog; kernel/runtime selection with reported stable GSM8K, no auth or data-path changes.

Overview
MiniMax-M3 MXFP8 MI300X fixed-sequence vLLM recipe now exports VLLM_ROCM_USE_AITER=1 so decode GEMMs and fused MoE use AITER instead of generic ROCm kernels, with VLLM_ROCM_USE_AITER_MHA=0 so attention stays on TRITON_ATTN (MXFP8 lacks calibrated FP8 attention scales).

Also sets numerically inert MI300X knobs: TORCH_BLAS_PREFER_HIPBLASLT=1, NCCL_MIN_NCHANNELS default 112, and GPU_MAX_HW_QUEUES default 2.

perf-changelog.yaml documents the change for minimaxm3-fp8-mi300x-vllm, including measured 1k1k throughput uplift at conc 4–256 and unchanged GSM8K (~0.95).

^{Reviewed by Cursor Bugbot for commit 5cbf877. Bugbot is set up for automated code reviews on this repo. Configure here.}

Enable AITER on MI300X/gfx942 for MiniMax-M3 MXFP8 via the single master toggle VLLM_ROCM_USE_AITER=1. The per-component AITER flags (_MOE, _LINEAR, _RMSNORM, _FP8BMM) default to True and are gated behind the master flag, so they are left at their defaults. VLLM_ROCM_USE_AITER_MHA defaults to True and is explicitly set to 0 to keep attention on TRITON_ATTN, since the MXFP8 checkpoint lacks calibrated q/prob scales for ROCm FP8 attention. Also set AMD-recommended numerically-inert MI300X runtime knobs: TORCH_BLAS_PREFER_HIPBLASLT=1, NCCL_MIN_NCHANNELS=112 (RCCL channels, raised above the ~32-64 default for TP8), GPU_MAX_HW_QUEUES=2 (HIP streams, capped below the default of 4). All changes are kernel-selection/runtime only; GSM8K holds ~0.95. Measured uplift (8xMI300X, 1k1k, total tok/s/gpu): +5.6..+10.8% across conc 4..256; conc 1-2 unchanged (latency-bound). Co-authored-by: Cursor <cursoragent@cursor.com>

functionstackx · 2026-06-16T07:55:00Z

+export VLLM_ROCM_USE_AITER=1
+export VLLM_ROCM_USE_AITER_MHA=0
+
+export TORCH_BLAS_PREFER_HIPBLASLT=1
+export NCCL_MIN_NCHANNELS="${NCCL_MIN_NCHANNELS:-112}"
+export GPU_MAX_HW_QUEUES="${GPU_MAX_HW_QUEUES:-2}"


@JohnQinAMD thank you for the PR, can u please update the image in amd-master.yaml to include these changes & do an upstream branch instead of forked branch so that we can kick off CI?

@JohnQinAMD can u also update https://github.com/vllm-project/recipes/tree/main with these new env vars

@JohnQinAMD thank you for the PR, can u please update the image in amd-master.yaml to include these changes & do an upstream branch instead of forked branch so that we can kick off CI?

@functionstackx, a upstream branch has been created and will close this pr and switched to #1808 to trigger CI test.

The image tag in this pr currently use the default image tag as amd-master.yaml in main branch

@JohnQinAMD can u also update https://github.com/vllm-project/recipes/tree/main with these new env vars

updating the vllm-recipes via vllm-project/recipes#556

JohnQinAMD · 2026-06-16T16:27:52Z

close this pr and switched to #1808 to trigger CI test.

JohnQinAMD requested a review from a team June 16, 2026 06:58

github-project-automation Bot added this to InferenceMAX Board Jun 16, 2026

JohnQinAMD changed the title ~~minimaxm3-fp8-mi300x-vllm: enable AITER kernels + safe ROCm knobs~~ [AMD] minimaxm3-fp8-mi300x-vllm: enable AITER kernels + safe ROCm knobs Jun 16, 2026

functionstackx reviewed Jun 16, 2026

View reviewed changes

JohnQinAMD mentioned this pull request Jun 16, 2026

[AMD] [MI300X] minimaxm3-fp8-mi300x-vllm: enable AITER kernels for MXFP8 on MI300X #1808

Open

JohnQinAMD closed this Jun 16, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] minimaxm3-fp8-mi300x-vllm: enable AITER kernels + safe ROCm knobs#1804

[AMD] minimaxm3-fp8-mi300x-vllm: enable AITER kernels + safe ROCm knobs#1804
JohnQinAMD wants to merge 1 commit into
SemiAnalysisAI:mainfrom
ZhengGong-amd:minimaxm3-mi300x-aiter-tuning

JohnQinAMD commented Jun 16, 2026 •

edited by cursor Bot

Loading

Uh oh!

functionstackx Jun 16, 2026

Uh oh!

functionstackx Jun 16, 2026

Uh oh!

JohnQinAMD Jun 16, 2026

Uh oh!

JohnQinAMD commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JohnQinAMD commented Jun 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

JohnQinAMD Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

JohnQinAMD commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JohnQinAMD commented Jun 16, 2026 •

edited by cursor Bot

Loading