Update MiniMax M2.5 FP8 H200 vLLM agg recipes by anish-shanbhag · Pull Request #1354 · SemiAnalysisAI/InferenceX

anish-shanbhag · 2026-05-12T22:49:38Z

(Identical to #1298 except the source branch is no longer from a fork so that CI can run)

Set vLLM serving knobs in benchmarks/single_node/minimaxm2.5_fp8_h200.sh: generated benchmark max-model-len, previous eval max-model-len handling, fp8 KV cache, FlashInfer attention/autotune, Triton MoE, and MiniMax QK norm fusion.

github-actions · 2026-05-12T22:49:51Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-13T05:17:34Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25772346949
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25772346949

anish-shanbhag requested a review from a team May 12, 2026 22:49

anish-shanbhag requested review from jgangani and kedarpotdar-nv as code owners May 12, 2026 22:49

github-project-automation Bot added this to InferenceMAX Board May 12, 2026

anish-shanbhag mentioned this pull request May 12, 2026

(Replaced by #1354) Update MiniMax M2.5 FP8 H200 vLLM agg recipes #1298

Closed

kedarpotdar-nv added the NVIDIA label May 13, 2026

anish-shanbhag added the full-sweep-enabled label May 13, 2026

Tune MiniMax M2.5 FP8 H200 vLLM agg

c3d1ef6

anish-shanbhag force-pushed the ashanbhag/minimax-h200 branch from 3dea91d to c3d1ef6 Compare May 13, 2026 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update MiniMax M2.5 FP8 H200 vLLM agg recipes#1354

Update MiniMax M2.5 FP8 H200 vLLM agg recipes#1354
anish-shanbhag wants to merge 1 commit into
mainfrom
ashanbhag/minimax-h200

anish-shanbhag commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anish-shanbhag commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anish-shanbhag commented May 12, 2026 •

edited

Loading