A simple web application that showcases the functionality of a multimodal LLM router which directs conversation chats to different LLM endpoints based on intelligent routing decisions.
- Chat Interface: Clean, intuitive chat UI built with Gradio
- Image Upload: Support for multimodal conversations with image uploads
- Intelligent Routing: Automatic model selection based on request analysis
- Model Transparency: See which model was selected for each request
- Session History: Message history maintained during your session
Two routing strategies are supported (only one can be active at a time due to GPU constraints):
| Method | Routing Backend | Best For |
|---|---|---|
| Intent-Based (default) | Qwen3-1.7B LLM | Fast, lightweight routing via text classification |
| Neural Network | CLIP embeddings + trained NN | Complex multimodal routing with learned patterns |
Configuration: Edit objective_fn in src/nat_sfc_router/configs/config.yml:
- Intent-based:
objective_fn: hf_intent_objective_fn - Neural network:
objective_fn: nn_objective_fn
- GPT-5-chat (Azure OpenAI)
- Nemotron Nano v2 (NVIDIA Build API)
- Nemotron Nano VLM 12B (NVIDIA Build API) - Multimodal
-
Configure environment:
cp demo/env_template.txt .env # Edit .env with your API keys (OPENAI_API_KEY, NVIDIA_API_KEY, AZURE_OPENAI_ENDPOINT) -
Choose and start your routing method:
Intent-based router (default):
docker compose --profile intent up -d --build
Neural network router:
First, edit
src/nat_sfc_router/configs/config.ymlto include:... sfc_router_fn: _type: sfc_router objective_fn: nn_objective_fn workflow: _type: sfc_router objective_fn: nn_objective_fn
Then start the demo with the
nnprofile:# First, update config.yml objective_fn to nn_objective_fn docker compose --profile nn up -d --buildThe GitHub repository includes a pre-trained neural network and the weights are stored in
llm-router/src/nat_sfc_router/training/router_artifacts. The notebook2_Embedding_NN_Training.ipynbre-trains the neural network and over-writes those weights. You can run the demo app without running the training notebook to use the existing neural network OR you can run the training notebook and then use the demo app with your neural network. -
Wait for services to be ready (first time ~2-3 minutes):
Services start in order with health checks:
qwen-routerorclip-serverloads model (~2 min)router-backendwaits for routing service to be healthydemo-appwaits for router-backend to be healthy
Check status:
docker compose ps -
Access the UI: http://localhost:7860
-
Understanding Route Decisions: The best way to undertand the routing decisions is to watch the
router-backendlogs while interacting with the application. Open a terminal and rundocker logs -f router-backend. Interact with the demo application. The logs will contain information on the route selection per turn:
## intent routing log
router-backend | User intent: hard_question (total response time: 94.51ms)
## NN routing log
router-backend | Routing decision | Model: nvidia/nvidia-nemotron-nano-9b-v2 | Confidence: 0.978 | Selection: cost_optimized | Probabilities: {gpt-5-chat: 0.957, nvidia/nvidia-nemotron-nano-9b-v2: 0.978, Qwen/Qwen3-VL-8B-Instruct: 0.821} | Route+Select time: 80.14ms
To switch between routing methods:
-
Stop current services:
docker compose down # or: docker compose --profile <current-profile> down -
Update configuration in
src/nat_sfc_router/configs/config.yml:Change lines 63 and 68:
# For intent-based: objective_fn: hf_intent_objective_fn # For neural network: objective_fn: nn_objective_fn
-
Start with new profile:
docker compose --profile intent up -d --build # or --profile nn
cd demo
uv venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp env_template.txt .env # Edit with your API keys# Terminal 1: Start Qwen router
docker run -d --rm --name qwen-router --gpus all -p 8011:8000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v $(pwd)/qwen3_nonthinking.jinja:/app/qwen3_nonthinking.jinja \
vllm/vllm-openai:latest --model Qwen/Qwen3-1.7B \
--chat-template /app/qwen3_nonthinking.jinja
# Terminal 2: Start router service
cd llm-router && ./scripts/run_local.sh
# Terminal 3: Start demo
cd demo && python app.pyEnsure config.yml has objective_fn: hf_intent_objective_fn
# Terminal 1: Start CLIP server
docker run -d --rm --name clip-server --gpus all -p 51000:51000 \
jinaai/clip-as-service:latest
# Terminal 2: Start router service (with CLIP_SERVER env var)
export CLIP_SERVER=localhost:51000
cd llm-router && ./scripts/run_local.sh
# Terminal 3: Start demo
cd demo && python app.pyEnsure config.yml has objective_fn: nn_objective_fn
Access: http://localhost:7860
- Text queries: Type and click "Send"
- Multimodal: Upload image + optional text, then "Send"
- Clear: Use "Clear Chat" to reset
Example queries:
- "Explain quantum computing in simple terms"
- "Write a Python function to calculate fibonacci numbers"
- "Describe what you see in this image" (with image)
| Issue | Solution |
|---|---|
| Services taking long to start | Normal on first start (~2-3 min). Models must load before backend starts. • Check status: docker compose ps• Check health: docker inspect <container> --format='{{.State.Health.Status}}'• View logs: docker logs qwen-router or docker logs clip-server |
| Router connection refused | Verify router service is running on port 8001: docker ps |
| API key errors | Check .env has correct keys. Azure OpenAI needs both AZURE_OPENAI_ENDPOINT and OPENAI_API_KEY |
| Router fails or wrong routing | Ensure objective_fn in config.yml matches running service:• hf_intent_objective_fn needs qwen-router• nn_objective_fn needs clip-serverCheck: docker ps |
| CLIP server connection failed | 1. Verify running: docker ps | grep clip2. Check logs: docker logs clip-server3. Check health: docker inspect clip-server --format='{{.State.Health.Status}}' |
| Qwen router connection failed | 1. Verify running: docker ps | grep qwen2. Check logs: docker logs qwen-router3. Check health: curl http://localhost:8011/health |
| Image upload errors | Ensure image format is supported (JPEG, PNG, etc.) |
# Start with profile
docker compose --profile intent up -d --build # or: --profile nn
# Check service status and health
docker compose ps
docker inspect <container-name> --format='{{.State.Health.Status}}'
# Stop
docker compose down
# View logs
docker logs router-backend
docker logs qwen-router # for intent profile
docker logs clip-server # for nn profile
docker logs router-demo
# Follow logs in real-time
docker logs -f qwen-router # or clip-server
# Restart specific service
docker compose restart router-backend- Router config:
src/nat_sfc_router/configs/config.yml - Environment:
demo/.envor.env(project root) - Docker Compose:
docker-compose.yml
demo/
├── app.py # Main Gradio application
├── requirements.txt # Python dependencies
├── env_template.txt # Environment template
├── Dockerfile # Demo app container
└── README.md # This file
- Port: Modify
server_portinapp.py - Models: Update
MODELSdictionary inapp.py - UI: Adjust Gradio components in
create_demo()function