[NV] Kimi fp4 b200 vllm configs by ankursingh-nv · Pull Request #862 · SemiAnalysisAI/InferenceX

ankursingh-nv · 2026-03-03T21:50:28Z

Summary

Add Kimi K2.5 FP4 benchmark configuration for B200 using vLLM.

Changes

New config kimik2.5-fp4-b200-vllm in nvidia-master.yaml
- Model: nvidia/Kimi-K2.5-NVFP4
- Image: vllm/vllm-openai:v0.16.0
- Parallelism: TP=8/EP=1 (conc 4-128) and TP=4/EP=4 (conc 4-64)
- Sequence lengths: 1k1k, 1k8k, 8k1k
New benchmark script benchmarks/single_node/kimik2.5_fp4_b200.sh
- Uses --reasoning-parser kimi_k2 and --tool-call-parser kimi_k2
- Enables --compilation_config.pass_config.fuse_allreduce_rms true
- Sets TORCH_CUDA_ARCH_LIST="10.0" for B200
- GPU memory utilization: 0.90

ankursingh-nv · 2026-03-03T21:51:30Z

@claude update PR description and also add an entry to perf-changelog file.

Klaud-Cold · 2026-03-03T21:51:57Z

Claude finished @ankursingh-nv's task in 1m 27s —— View job

Completed

Gather context on PR changes
Update PR description
Add entry to perf-changelog.yaml
Commit and push changes

Done:

Updated PR description with a summary of config and script changes (model, image, parallelism, sequence lengths, key flags)
Added kimik2.5-fp4-b200-vllm entry to perf-changelog.yaml referencing this PR
|

functionstackx · 2026-03-03T23:01:52Z

@claude update the PR description to match the acutal vllm image being used

Klaud-Cold · 2026-03-03T23:02:18Z

Claude finished @functionstackx's task in 44s —— View job

Completed

Gather context on actual vLLM image used
Update PR description to match

Done: Updated the PR description to reflect the actual vLLM image vllm/vllm-openai:v0.16.0 (was incorrectly listed as v0.15.1).

.github/configs/nvidia-master.yaml

cquil11 · 2026-03-09T17:56:58Z

@ankursingh-nv what is the hold up here? probably should update to vllm 0.17.0 before merging anyway since its up now

cquil11 · 2026-03-11T01:39:14Z

sweep: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22921332620/job/66520265242?pr=862

ankursingh-nv requested a review from a team March 3, 2026 21:50

ankursingh-nv requested review from jgangani and kedarpotdar-nv as code owners March 3, 2026 21:50

github-project-automation bot added this to InferenceMAX Board Mar 3, 2026

ankursingh-nv added the sweep-enabled label Mar 3, 2026

ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 2101d26 to 065f123 Compare March 4, 2026 19:03

functionstackx reviewed Mar 4, 2026

View reviewed changes

.github/configs/nvidia-master.yaml Show resolved Hide resolved

ankursingh-nv changed the title ~~[WIP] [NV] add kimi fp4 configs~~ [WIP] [NV] Kimi fp4 configs Mar 5, 2026

ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 04107b3 to 3423568 Compare March 10, 2026 16:21

add kimi config

f3f5c75

ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 9efee23 to f3f5c75 Compare March 10, 2026 19:45

Ankur-singh added 2 commits March 10, 2026 12:46

Update perf-changelog.yaml

1a5a85c

fix perf changelog

d7947dc

ankursingh-nv changed the title ~~[WIP] [NV] Kimi fp4 configs~~ [WIP] [NV] Kimi fp4 b200 vllm configs Mar 10, 2026

update config

74c5e80

ankursingh-nv changed the title ~~[WIP] [NV] Kimi fp4 b200 vllm configs~~ [NV] Kimi fp4 b200 vllm configs Mar 10, 2026

ankursingh-nv enabled auto-merge (squash) March 11, 2026 03:21

kedarpotdar-nv approved these changes Mar 11, 2026

View reviewed changes

cquil11 approved these changes Mar 11, 2026

View reviewed changes

ankursingh-nv merged commit a58dedd into main Mar 11, 2026
20 of 40 checks passed

ankursingh-nv deleted the kimik2.5-fp4-b200-vllm branch March 11, 2026 16:59

github-project-automation bot moved this to Done in InferenceMAX Board Mar 11, 2026

cquil11 added the NVIDIA label Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Kimi fp4 b200 vllm configs#862

[NV] Kimi fp4 b200 vllm configs#862
ankursingh-nv merged 4 commits intomainfrom
kimik2.5-fp4-b200-vllm

ankursingh-nv commented Mar 3, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

ankursingh-nv commented Mar 3, 2026

Uh oh!

Klaud-Cold commented Mar 3, 2026 •

edited

Loading

Uh oh!

functionstackx commented Mar 3, 2026

Uh oh!

Klaud-Cold commented Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

cquil11 commented Mar 9, 2026

Uh oh!

cquil11 commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

ankursingh-nv commented Mar 3, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

ankursingh-nv commented Mar 3, 2026

Uh oh!

Klaud-Cold commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Completed

Uh oh!

functionstackx commented Mar 3, 2026

Uh oh!

Klaud-Cold commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Completed

Uh oh!

Uh oh!

cquil11 commented Mar 9, 2026

Uh oh!

cquil11 commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ankursingh-nv commented Mar 3, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Mar 3, 2026 •

edited

Loading

Klaud-Cold commented Mar 3, 2026 •

edited

Loading