fix: load chat template from chat_template.jinja when available#7
Open
ramkrishna2910 wants to merge 1 commit intomainfrom
Open
fix: load chat template from chat_template.jinja when available#7ramkrishna2910 wants to merge 1 commit intomainfrom
ramkrishna2910 wants to merge 1 commit intomainfrom
Conversation
OGA's ApplyChatTemplate fails for some models (e.g. gpt-oss-20b-NPU) when using the template from tokenizer_config.json, falling back to a simple "System:/User:/Assistant:" format that produces garbage output. The Python reference (model_chat.py) handles this by preferring the chat_template.jinja file from the model folder, which OGA can process correctly. This change mirrors that behavior: check for chat_template.jinja first, fall back to tokenizer_config.json. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
Test Results ✅Tested end-to-end on Windows with Key finding: This model's Before fix (old binary)
After fix (PR binary from CI)
The token count jump (7 → 68) confirms the model's chat template is now properly wrapping messages. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
chat_template.jinjafile from the model folder overtokenizer_config.jsonfor loading chat templatestokenizer_config.jsonif no jinja file is present (preserving existing behavior)model_chat.py) which already handles this correctlyProblem
For models like
gpt-oss-20b-NPU(and likely other MoE models), OGA'sApplyChatTemplatefails when using the template string fromtokenizer_config.json:The fallback template (
System: ... User: ... Assistant: ...) produces garbage output because it doesn't match what the model was trained on.Relates to lemonade-sdk/lemonade#1111
Test plan
gpt-oss-20b-NPUand verify chat completions produce correct output (no fallback warning)chat_template.jinja(e.g.chatglm3-6b-NPU) and verify it still loads template fromtokenizer_config.json🤖 Generated with Claude Code