Skip to content

Add GLM5 NVFP4 disaggregated inference recipes for GB200/GB300#48

Merged
xinli-sw merged 1 commit intoNVIDIA:sa-submission-q2-2026from
yeswanthk-26:glm5-nvfp4
Apr 22, 2026
Merged

Add GLM5 NVFP4 disaggregated inference recipes for GB200/GB300#48
xinli-sw merged 1 commit intoNVIDIA:sa-submission-q2-2026from
yeswanthk-26:glm5-nvfp4

Conversation

@yeswanthk-26
Copy link
Copy Markdown

Summary

Add optimized disaggregated inference recipes for GLM-5 model with NVFP4 precision on GB200 and GB300 GPUs, including sa-bench GLM5 tokenizer configuration in the recipe set.

Recipes Added (66 YAML configs)

GB200 NVFP4:

  • ISL1K_OSL1K STP: 7 configs | ISL1K_OSL1K MTP: 8 configs
  • ISL8K_OSL1K STP: 6 configs | ISL8K_OSL1K MTP: 8 configs

GB300 NVFP4:

  • ISL1K_OSL1K STP: 8 configs | ISL1K_OSL1K MTP: 10 configs
  • ISL8K_OSL1K STP: 9 configs | ISL8K_OSL1K MTP: 10 configs

Config Details

  • All configs use dynamo frontend with trtllm backend
  • Benchmark type: sa-bench with custom_tokenizer: glm_moe_dsa
  • Model path standardized to nvidia/GLM5-NVFP4 across all GLM5 recipes
  • Container standardized to nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.3 across all GLM5 recipes
  • Configs with identical topology consolidated into allconc files
  • All configs validated against source data tables

Add 66 GLM5 NVFP4 disaggregated recipe configs for GB200 and GB300 on the sa-submission branch; standardize model path and container values across the recipe set for consistency.
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (sa-submission-q2-2026@10f4ac9). Learn more about missing BASE report.

Additional details and impacted files
@@                   Coverage Diff                    @@
##             sa-submission-q2-2026      #48   +/-   ##
========================================================
  Coverage                         ?   61.06%           
========================================================
  Files                            ?       48           
  Lines                            ?     4138           
  Branches                         ?        0           
========================================================
  Hits                             ?     2527           
  Misses                           ?     1611           
  Partials                         ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xinli-sw xinli-sw self-requested a review April 22, 2026 00:23
@xinli-sw xinli-sw merged commit a10acd3 into NVIDIA:sa-submission-q2-2026 Apr 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants