[Feat][Router] Add Multimodal Intent Recognition Support to Semantic Router Categories #831

NPranitha · 2025-12-14T23:41:25Z

This PR extends the vLLM Semantic Router with CLIP embedding-based multimodal classification and image generation intent detection.

Key Features

Multimodal Classification (Image → Text Routing):

Native CLIP vision transformer integration for image embedding extraction
Embedding fusion pipeline combining text (BERT/Qwen3) and image (CLIP) embeddings via weighted combination
Zero-shot classification using cosine similarity matching against category descriptions
Support for OpenAI-compatible API formats with Base64-encoded images
Automatic fallback to llava:7b when vision transformer is unavailable

Image Generation Routing (Text → Image):

Fine-tuned BERT binary classifier for image generation intent detection
Automatic routing to Stable Diffusion inference service
OpenAI-compatible response format with Base64-encoded image outputs

Technical Implementation

Rust/Candle Layer:

CLIP vision transformer (ViT-B/32) integration for native image embeddings
FFI bindings for Go interop with auto-initialization support
Image preprocessing pipeline (decode, resize to 224x224, CLIP normalization)

Go Service Layer:

Vision Embedding-based multimodal classification pipeline
Weighted fusion with L2 normalization
Cosine similarity-based category matching
Fine-tuned BERT classifier integration for image generation detection

Configuration:

New multimodal_enabled flag for categories
Category descriptions for embedding-based classification
Model capability specifications (capabilities: ["text", "image"])

Testing

Comprehensive unit tests for embedding fusion and classification logic
Backward compatibility validation with existing text-only workflows

Documentation

Updated API documentation for multimodal endpoints
Configuration examples for multimodal categories
Implementation guide for vision transformer integration

Breaking Changes

None. This PR maintains full backward compatibility. Multimodal routing is opt-in via configuration.

[x ] Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

- Resolved merge conflicts in tools/make/models.mk - Added binary_classification model to config/models.yaml - Integrated multimodal vision transformer support - Preserved local changes for multimodal classification - Updated to new modular extproc structure

netlify · 2025-12-14T23:41:34Z

❌ Deploy Preview for vllm-semantic-router failed.

Name	Link
🔨 Latest commit	`e6f2f52`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/693f8c58b8c13d00081b2c13

rootfs · 2025-12-15T14:42:21Z

@NPranitha thank you for the contribution! Looking forward to this feature!

rafkamicheldaou and others added 22 commits November 11, 2025 19:57

Add image detection

37a43a1

Extend to server, router integ and classification

696a225

Added image generation categories

3d35bbe

Added code to detect image generation- pending routing

8f2410a

Connect image model

6f3c346

Add fine-tuned BERT model

a3717ba

Add gemini integration

6e76508

Fix fine-tuned model loading

4245f49

Test scripts to validate integrating model end-to-end

68ff84e

Added envoy for DNS resolution to launch on collab

8ffc658

Added fine tune model to make downloads file

fb35a20

Added unit tests

857116b

rust vit

b2e161a

ffi for vit

60b5d57

vit test

ba4e87e

go bindings

6170a75

multimodal support

842725e

multimodal test

a18a3ba

config changes

51e1a79

feat: Add multimodal vision transformer support

ec012da

Documentation

ea59deb

chore: update performance baselines (nightly run)

e6f2f52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat][Router] Add Multimodal Intent Recognition Support to Semantic Router Categories #831

[Feat][Router] Add Multimodal Intent Recognition Support to Semantic Router Categories #831

Uh oh!

NPranitha commented Dec 14, 2025

Uh oh!

netlify bot commented Dec 14, 2025 •

edited

Loading

Uh oh!

rootfs commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Feat][Router] Add Multimodal Intent Recognition Support to Semantic Router Categories #831

Are you sure you want to change the base?

[Feat][Router] Add Multimodal Intent Recognition Support to Semantic Router Categories #831

Uh oh!

Conversation

NPranitha commented Dec 14, 2025

Key Features

Technical Implementation

Testing

Documentation

Breaking Changes

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

netlify bot commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Deploy Preview for vllm-semantic-router failed.

Uh oh!

rootfs commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Dec 14, 2025 •

edited

Loading