Skip to content

feat: Extend chat template preprocessing to support multi-modality content blocks#255

Open
guygir wants to merge 1 commit intollm-d:mainfrom
guygir:multi-modality
Open

feat: Extend chat template preprocessing to support multi-modality content blocks#255
guygir wants to merge 1 commit intollm-d:mainfrom
guygir:multi-modality

Conversation

@guygir
Copy link
Collaborator

@guygir guygir commented Jan 14, 2026

This PR extends chat template preprocessing to support structured multi-modality content blocks (OpenAI API format).
This is the first stage of multi-modality support - only basic technical feasibility. This PR focuses solely on images - not audio or video, because GAIE already supports images but audio/video support requires additional GAIE changes (These will be addressed in the next stage)

Changes:

  • Extended ChatMessage structure to support both string and structured content blocks
  • Added ContentBlock and ImageBlock structs matching OpenAI API format
  • Maintains backward compatibility with text-only content

What Works:

  • Chat template rendering accepts structured content blocks
  • Tokenization includes image URLs/base64 in rendered template strings
  • Python wrapper correctly handles OpenAI API format
  • Backward compatible with text-only requests

Known Limitations (for current stage):

  • Tokenization may not match vLLM exactly (images tokenized as text, not vision tokens - this only affects merged preprocessor models like Qwen2-VL)
  • Block hashes may not match vLLM exactly (missing mm_hash - we're consistent with ourselves but won't match vLLM's hashes for multimodal blocks, which is not an issue, just FYI)
  • These will be addressed in the next stage.

Testing:

  • Python wrapper tests for multi-modality support

@guygir guygir changed the title feat: Extend chat template preprocessing to support multi-modality content blocks (WIP) feat: Extend chat template preprocessing to support multi-modality content blocks Jan 14, 2026
@guygir guygir added do-not-merge Indicates that a PR should not merge hold PRs that are blocked on design, other features, release cycle, etc. labels Jan 14, 2026
@guygir guygir removed do-not-merge Indicates that a PR should not merge hold PRs that are blocked on design, other features, release cycle, etc. labels Jan 14, 2026
…ntent blocks

- Changed Conversation.Content from string to interface{} to support structured content blocks
- Added ContentBlock and ImageBlock structs matching OpenAI API format
- Maintains backward compatibility with text-only content
- Follows upstream API structure (Conversation, ApplyChatTemplateRequest)
@guygir guygir changed the title (WIP) feat: Extend chat template preprocessing to support multi-modality content blocks feat: Extend chat template preprocessing to support multi-modality content blocks Jan 14, 2026
@hyeongyun0916
Copy link
Collaborator

It looks like #219 is working toward the same goal — adding multi-modal content support to the chat template preprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants