fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue by richardhuo-nv · Pull Request #13 · NVIDIA/srt-slurm

richardhuo-nv · 2026-04-07T23:51:44Z

TRT-LLM is still on Transformers v4, while the GLM-5 model was built with Transformers v5. As a result, the GLM-5 tokenizer cannot be loaded directly with AutoTokenizer in Transformers v4.

Our current workaround is adapted from TensorRT-LLM’s glm_moe_dsa tokenizer implementation:
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/tokenizer/glm_moe_dsa/tokenizer.py

This workaround uses the Rust tokenizer library to load tokenizer.json, and then initializes a Transformers v4 AutoTokenizer with appropriately translated settings from tokenizer_config.json.

At the moment, this workaround does not support chat_template, so we need to disable chat templating for now.

benchmark:
  custom_tokenizer: "glm_moe_dsa"
  use_chat_template: false

Auto-detect container type at runtime: if /sgl-workspace exists (SGLang), use original install path unchanged; otherwise use portable /tmp build path with conditional dependency installation for non-SGLang containers.

* Add Kimi-K2.5 vLLM recipes and fix NIXL side channel host - Add kimi-k2.5 1k1k and 8k1k disagg GB200 recipes (from NVIDIA#7) - Fix vLLM NIXL handshake failures: set VLLM_NIXL_SIDE_CHANNEL_HOST to node's routable IP in get_process_environment() instead of leaving it as 0.0.0.0/localhost which caused transfer handshake failures - Update test_vllm_get_process_environment to cover NIXL host env var Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: run checks on PRs targeting sa-submission-q2-2026 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

fix

codecov-commenter · 2026-04-09T19:37:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@e93856b). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #13   +/-   ##
=======================================
  Coverage        ?   60.15%           
=======================================
  Files           ?       48           
  Lines           ?     4081           
  Branches        ?        0           
=======================================
  Hits            ?     2455           
  Misses          ?     1626           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Albert Cheng (Engrg-Hardware 1) and others added 3 commits April 2, 2026 14:17

Make Dynamo source install container-agnostic (vLLM, SGLang, etc.)

9cc6d50

Auto-detect container type at runtime: if /sgl-workspace exists (SGLang), use original install path unchanged; otherwise use portable /tmp build path with conditional dependency installation for non-SGLang containers.

fix for glm5

d36ebc4

fix

richardhuo-nv changed the base branch from sa-submission-q2-2026 to main April 9, 2026 19:25

Merge branch 'main' into rihuo/fix_glm5_tokenizer

d1b3ef8

richardhuo-nv closed this Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue#13

fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue#13
richardhuo-nv wants to merge 4 commits intoNVIDIA:mainfrom
richardhuo-nv:rihuo/fix_glm5_tokenizer

richardhuo-nv commented Apr 7, 2026

Uh oh!

codecov-commenter commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

richardhuo-nv commented Apr 7, 2026

Uh oh!

codecov-commenter commented Apr 9, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants