feat: Extend chat template preprocessing to support multi-modality content blocks by guygir · Pull Request #255 · llm-d/llm-d-kv-cache

guygir · 2026-01-14T14:43:21Z

This PR extends chat template preprocessing to support structured multi-modality content blocks (OpenAI API format).
This is the first stage of multi-modality support - only basic technical feasibility. This PR focuses solely on images - not audio or video, because GAIE already supports images but audio/video support requires additional GAIE changes (These will be addressed in the next stage)

Changes:

Extended ChatMessage structure to support both string and structured content blocks
Added ContentBlock and ImageBlock structs matching OpenAI API format
Maintains backward compatibility with text-only content

What Works:

Chat template rendering accepts structured content blocks
Tokenization includes image URLs/base64 in rendered template strings
Python wrapper correctly handles OpenAI API format
Backward compatible with text-only requests

Known Limitations (for current stage):

Tokenization may not match vLLM exactly (images tokenized as text, not vision tokens - this only affects merged preprocessor models like Qwen2-VL)
Block hashes may not match vLLM exactly (missing mm_hash - we're consistent with ourselves but won't match vLLM's hashes for multimodal blocks, which is not an issue, just FYI)
These will be addressed in the next stage.

Testing:

Python wrapper tests for multi-modality support

…ntent blocks - Changed Conversation.Content from string to interface{} to support structured content blocks - Added ContentBlock and ImageBlock structs matching OpenAI API format - Maintains backward compatibility with text-only content - Follows upstream API structure (Conversation, ApplyChatTemplateRequest)

hyeongyun0916 · 2026-02-10T08:01:48Z

It looks like #219 is working toward the same goal — adding multi-modal content support to the chat template preprocessing.

guygir requested review from dannyharnik, elevran, kfirtoledo and vMaroon as code owners January 14, 2026 14:43

vMaroon requested review from hyeongyun0916, liu-cong, sagearc and yankay January 14, 2026 14:43

guygir changed the title ~~feat: Extend chat template preprocessing to support multi-modality content blocks~~ (WIP) feat: Extend chat template preprocessing to support multi-modality content blocks Jan 14, 2026

guygir added do-not-merge Indicates that a PR should not merge hold PRs that are blocked on design, other features, release cycle, etc. labels Jan 14, 2026

guygir force-pushed the multi-modality branch from c217ff1 to f0d0016 Compare January 14, 2026 20:54

guygir removed do-not-merge Indicates that a PR should not merge hold PRs that are blocked on design, other features, release cycle, etc. labels Jan 14, 2026

guygir force-pushed the multi-modality branch from f0d0016 to fc06074 Compare January 14, 2026 21:02

guygir changed the title ~~(WIP) feat: Extend chat template preprocessing to support multi-modality content blocks~~ feat: Extend chat template preprocessing to support multi-modality content blocks Jan 14, 2026

guygir mentioned this pull request Jan 14, 2026

feat: Add multi-modality support for content blocks in PrecisePrefixCacheScorer llm-d/llm-d-inference-scheduler#565

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Extend chat template preprocessing to support multi-modality content blocks#255

feat: Extend chat template preprocessing to support multi-modality content blocks#255
guygir wants to merge 1 commit intollm-d:mainfrom
guygir:multi-modality

guygir commented Jan 14, 2026

Uh oh!

hyeongyun0916 commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants