Skip to content

[NV] Kimi fp4 b200 vllm configs#862

Merged
ankursingh-nv merged 4 commits intomainfrom
kimik2.5-fp4-b200-vllm
Mar 11, 2026
Merged

[NV] Kimi fp4 b200 vllm configs#862
ankursingh-nv merged 4 commits intomainfrom
kimik2.5-fp4-b200-vllm

Conversation

@ankursingh-nv
Copy link
Copy Markdown
Contributor

@ankursingh-nv ankursingh-nv commented Mar 3, 2026

Summary

Add Kimi K2.5 FP4 benchmark configuration for B200 using vLLM.

Changes

  • New config kimik2.5-fp4-b200-vllm in nvidia-master.yaml
    • Model: nvidia/Kimi-K2.5-NVFP4
    • Image: vllm/vllm-openai:v0.16.0
    • Parallelism: TP=8/EP=1 (conc 4-128) and TP=4/EP=4 (conc 4-64)
    • Sequence lengths: 1k1k, 1k8k, 8k1k
  • New benchmark script benchmarks/single_node/kimik2.5_fp4_b200.sh
    • Uses --reasoning-parser kimi_k2 and --tool-call-parser kimi_k2
    • Enables --compilation_config.pass_config.fuse_allreduce_rms true
    • Sets TORCH_CUDA_ARCH_LIST="10.0" for B200
    • GPU memory utilization: 0.90

@ankursingh-nv
Copy link
Copy Markdown
Contributor Author

@claude update PR description and also add an entry to perf-changelog file.

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Mar 3, 2026

Claude finished @ankursingh-nv's task in 1m 27s —— View job


Completed

  • Gather context on PR changes
  • Update PR description
  • Add entry to perf-changelog.yaml
  • Commit and push changes

Done:

  • Updated PR description with a summary of config and script changes (model, image, parallelism, sequence lengths, key flags)
  • Added kimik2.5-fp4-b200-vllm entry to perf-changelog.yaml referencing this PR
    |

@functionstackx
Copy link
Copy Markdown
Contributor

@claude update the PR description to match the acutal vllm image being used

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Mar 3, 2026

Claude finished @functionstackx's task in 44s —— View job


Completed

  • Gather context on actual vLLM image used
  • Update PR description to match

Done: Updated the PR description to reflect the actual vLLM image vllm/vllm-openai:v0.16.0 (was incorrectly listed as v0.15.1).

@ankursingh-nv ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 2101d26 to 065f123 Compare March 4, 2026 19:03
@ankursingh-nv ankursingh-nv changed the title [WIP] [NV] add kimi fp4 configs [WIP] [NV] Kimi fp4 configs Mar 5, 2026
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Mar 9, 2026

@ankursingh-nv what is the hold up here? probably should update to vllm 0.17.0 before merging anyway since its up now

@ankursingh-nv ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 04107b3 to 3423568 Compare March 10, 2026 16:21
@ankursingh-nv ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 9efee23 to f3f5c75 Compare March 10, 2026 19:45
@ankursingh-nv ankursingh-nv changed the title [WIP] [NV] Kimi fp4 configs [WIP] [NV] Kimi fp4 b200 vllm configs Mar 10, 2026
@ankursingh-nv ankursingh-nv changed the title [WIP] [NV] Kimi fp4 b200 vllm configs [NV] Kimi fp4 b200 vllm configs Mar 10, 2026
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Mar 11, 2026

@ankursingh-nv ankursingh-nv enabled auto-merge (squash) March 11, 2026 03:21
@ankursingh-nv ankursingh-nv merged commit a58dedd into main Mar 11, 2026
20 of 40 checks passed
@ankursingh-nv ankursingh-nv deleted the kimik2.5-fp4-b200-vllm branch March 11, 2026 16:59
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

6 participants