Skip to content

fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue#13

Closed
richardhuo-nv wants to merge 4 commits intoNVIDIA:mainfrom
richardhuo-nv:rihuo/fix_glm5_tokenizer
Closed

fix: use custom_tokenizer to workaround the trtllm + glm5 tokenizer loading issue#13
richardhuo-nv wants to merge 4 commits intoNVIDIA:mainfrom
richardhuo-nv:rihuo/fix_glm5_tokenizer

Conversation

@richardhuo-nv
Copy link
Copy Markdown
Collaborator

TRT-LLM is still on Transformers v4, while the GLM-5 model was built with Transformers v5. As a result, the GLM-5 tokenizer cannot be loaded directly with AutoTokenizer in Transformers v4.

Our current workaround is adapted from TensorRT-LLM’s glm_moe_dsa tokenizer implementation:
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/tokenizer/glm_moe_dsa/tokenizer.py

This workaround uses the Rust tokenizer library to load tokenizer.json, and then initializes a Transformers v4 AutoTokenizer with appropriately translated settings from tokenizer_config.json.

At the moment, this workaround does not support chat_template, so we need to disable chat templating for now.

benchmark:
  custom_tokenizer: "glm_moe_dsa"
  use_chat_template: false

Albert Cheng (Engrg-Hardware 1) and others added 3 commits April 2, 2026 14:17
Auto-detect container type at runtime: if /sgl-workspace exists (SGLang),
use original install path unchanged; otherwise use portable /tmp build path
with conditional dependency installation for non-SGLang containers.
* Add Kimi-K2.5 vLLM recipes and fix NIXL side channel host

- Add kimi-k2.5 1k1k and 8k1k disagg GB200 recipes (from NVIDIA#7)
- Fix vLLM NIXL handshake failures: set VLLM_NIXL_SIDE_CHANNEL_HOST to
  node's routable IP in get_process_environment() instead of leaving it
  as 0.0.0.0/localhost which caused transfer handshake failures
- Update test_vllm_get_process_environment to cover NIXL host env var

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: run checks on PRs targeting sa-submission-q2-2026

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@richardhuo-nv richardhuo-nv changed the base branch from sa-submission-q2-2026 to main April 9, 2026 19:25
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@e93856b). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #13   +/-   ##
=======================================
  Coverage        ?   60.15%           
=======================================
  Files           ?       48           
  Lines           ?     4081           
  Branches        ?        0           
=======================================
  Hits            ?     2455           
  Misses          ?     1626           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants