Skip to content

fix: load chat template from chat_template.jinja when available#7

Open
ramkrishna2910 wants to merge 1 commit intomainfrom
fix/chat-template-jinja-fallback
Open

fix: load chat template from chat_template.jinja when available#7
ramkrishna2910 wants to merge 1 commit intomainfrom
fix/chat-template-jinja-fallback

Conversation

@ramkrishna2910
Copy link
Copy Markdown
Contributor

Summary

  • Prefer chat_template.jinja file from the model folder over tokenizer_config.json for loading chat templates
  • Falls back to tokenizer_config.json if no jinja file is present (preserving existing behavior)
  • Matches the Python reference implementation (model_chat.py) which already handles this correctly

Problem

For models like gpt-oss-20b-NPU (and likely other MoE models), OGA's ApplyChatTemplate fails when using the template string from tokenizer_config.json:

[WARNING] OGA chat template failed: Invalid or unsupported chat template.
[WARNING] Using simple fallback template

The fallback template (System: ... User: ... Assistant: ...) produces garbage output because it doesn't match what the model was trained on.

Relates to lemonade-sdk/lemonade#1111

Test plan

  • Load gpt-oss-20b-NPU and verify chat completions produce correct output (no fallback warning)
  • Load a model without chat_template.jinja (e.g. chatglm3-6b-NPU) and verify it still loads template from tokenizer_config.json
  • Load a model with neither file and verify graceful fallback

🤖 Generated with Claude Code

OGA's ApplyChatTemplate fails for some models (e.g. gpt-oss-20b-NPU)
when using the template from tokenizer_config.json, falling back to a
simple "System:/User:/Assistant:" format that produces garbage output.

The Python reference (model_chat.py) handles this by preferring the
chat_template.jinja file from the model folder, which OGA can process
correctly. This change mirrors that behavior: check for
chat_template.jinja first, fall back to tokenizer_config.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ramkrishna2910
Copy link
Copy Markdown
Contributor Author

Test Results ✅

Tested end-to-end on Windows with gpt-oss-20b-NPU (amd/gpt-oss-20b-onnx-ryzenai-npu).

Key finding: This model's tokenizer_config.json has no chat_template field at all, but ships with a chat_template.jinja file. The old code ignores the jinja file, passes nullptr to OGA, OGA fails, and the fallback template produces garbage.

Before fix (old binary)

  • Prompt tokens: 7 (fallback "System:/User:/Assistant:" template)
  • Response to "hello": "404\n\``\n\nThis example demonstrates the application of the "404" style..."` — garbage

After fix (PR binary from CI)

  • Prompt tokens: 68 (proper jinja template applied)
  • Response to "hello": "Hello! 👋 How can I assist you today?"correct

The token count jump (7 → 68) confirms the model's chat template is now properly wrapping messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant