fix: load chat template from chat_template.jinja when available by ramkrishna2910 · Pull Request #7 · lemonade-sdk/ryzenai-server

ramkrishna2910 · 2026-04-02T23:02:34Z

Summary

Prefer chat_template.jinja file from the model folder over tokenizer_config.json for loading chat templates
Falls back to tokenizer_config.json if no jinja file is present (preserving existing behavior)
Matches the Python reference implementation (model_chat.py) which already handles this correctly

Problem

For models like gpt-oss-20b-NPU (and likely other MoE models), OGA's ApplyChatTemplate fails when using the template string from tokenizer_config.json:

[WARNING] OGA chat template failed: Invalid or unsupported chat template.
[WARNING] Using simple fallback template

The fallback template (System: ... User: ... Assistant: ...) produces garbage output because it doesn't match what the model was trained on.

Relates to lemonade-sdk/lemonade#1111

Test plan

Load gpt-oss-20b-NPU and verify chat completions produce correct output (no fallback warning)
Load a model without chat_template.jinja (e.g. chatglm3-6b-NPU) and verify it still loads template from tokenizer_config.json
Load a model with neither file and verify graceful fallback

🤖 Generated with Claude Code

OGA's ApplyChatTemplate fails for some models (e.g. gpt-oss-20b-NPU) when using the template from tokenizer_config.json, falling back to a simple "System:/User:/Assistant:" format that produces garbage output. The Python reference (model_chat.py) handles this by preferring the chat_template.jinja file from the model folder, which OGA can process correctly. This change mirrors that behavior: check for chat_template.jinja first, fall back to tokenizer_config.json. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ramkrishna2910 · 2026-04-02T23:49:31Z

Test Results ✅

Tested end-to-end on Windows with gpt-oss-20b-NPU (amd/gpt-oss-20b-onnx-ryzenai-npu).

Key finding: This model's tokenizer_config.json has no chat_template field at all, but ships with a chat_template.jinja file. The old code ignores the jinja file, passes nullptr to OGA, OGA fails, and the fallback template produces garbage.

Before fix (old binary)

Prompt tokens: 7 (fallback "System:/User:/Assistant:" template)
Response to "hello": "404\n\``\n\nThis example demonstrates the application of the "404" style..."` — garbage

After fix (PR binary from CI)

Prompt tokens: 68 (proper jinja template applied)
Response to "hello": "Hello! 👋 How can I assist you today?" — correct

The token count jump (7 → 68) confirms the model's chat template is now properly wrapping messages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: load chat template from chat_template.jinja when available#7

fix: load chat template from chat_template.jinja when available#7
ramkrishna2910 wants to merge 1 commit intomainfrom
fix/chat-template-jinja-fallback

ramkrishna2910 commented Apr 2, 2026

Uh oh!

ramkrishna2910 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ramkrishna2910 commented Apr 2, 2026

Summary

Problem

Test plan

Uh oh!

ramkrishna2910 commented Apr 2, 2026

Test Results ✅

Before fix (old binary)

After fix (PR binary from CI)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant