Add GLM5 NVFP4 disaggregated inference recipes for GB200/GB300 by yeswanthk-26 · Pull Request #48 · NVIDIA/srt-slurm

yeswanthk-26 · 2026-04-20T19:41:33Z

Summary

Add optimized disaggregated inference recipes for GLM-5 model with NVFP4 precision on GB200 and GB300 GPUs, including sa-bench GLM5 tokenizer configuration in the recipe set.

Recipes Added (66 YAML configs)

GB200 NVFP4:

ISL1K_OSL1K STP: 7 configs | ISL1K_OSL1K MTP: 8 configs
ISL8K_OSL1K STP: 6 configs | ISL8K_OSL1K MTP: 8 configs

GB300 NVFP4:

ISL1K_OSL1K STP: 8 configs | ISL1K_OSL1K MTP: 10 configs
ISL8K_OSL1K STP: 9 configs | ISL8K_OSL1K MTP: 10 configs

Config Details

All configs use dynamo frontend with trtllm backend
Benchmark type: sa-bench with custom_tokenizer: glm_moe_dsa
Model path standardized to nvidia/GLM5-NVFP4 across all GLM5 recipes
Container standardized to nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.3 across all GLM5 recipes
Configs with identical topology consolidated into allconc files
All configs validated against source data tables

Add 66 GLM5 NVFP4 disaggregated recipe configs for GB200 and GB300 on the sa-submission branch; standardize model path and container values across the recipe set for consistency.

codecov-commenter · 2026-04-20T19:42:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (sa-submission-q2-2026@10f4ac9). Learn more about missing BASE report.

Additional details and impacted files

@@                   Coverage Diff                    @@
##             sa-submission-q2-2026      #48   +/-   ##
========================================================
  Coverage                         ?   61.06%           
========================================================
  Files                            ?       48           
  Lines                            ?     4138           
  Branches                         ?        0           
========================================================
  Hits                             ?     2527           
  Misses                           ?     1611           
  Partials                         ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add GLM5 disaggregated recipes for SA submission

7867e8f

Add 66 GLM5 NVFP4 disaggregated recipe configs for GB200 and GB300 on the sa-submission branch; standardize model path and container values across the recipe set for consistency.

xinli-sw self-requested a review April 22, 2026 00:23

xinli-sw approved these changes Apr 22, 2026

View reviewed changes

xinli-sw merged commit a10acd3 into NVIDIA:sa-submission-q2-2026 Apr 22, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GLM5 NVFP4 disaggregated inference recipes for GB200/GB300#48

Add GLM5 NVFP4 disaggregated inference recipes for GB200/GB300#48
xinli-sw merged 1 commit intoNVIDIA:sa-submission-q2-2026from
yeswanthk-26:glm5-nvfp4

yeswanthk-26 commented Apr 20, 2026

Uh oh!

codecov-commenter commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yeswanthk-26 commented Apr 20, 2026

Summary

Recipes Added (66 YAML configs)

Config Details

Uh oh!

codecov-commenter commented Apr 20, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants