feat: add router architecture deployment configs for multi-model production setup #170

blefo · 2025-11-28T10:53:23Z

New AI model services and orchestration:

Added qwen3_coder_30b_gpu service to docker-compose.nilai-router-1.yml for deploying the Qwen3-Coder-30B-A3B-Instruct model with GPU support and health checks.
Added gpt_oss_20b_gpu and qwen3_thinking_4b_gpu services to docker-compose.nilai-router-2.yml for deploying the GPT-OSS-20B and Qwen3-4B-Thinking-2507 models, including inter-service health dependencies.
Added arch_router_1_5b_gpu and qwen3_vl_4b_gpu services to docker-compose.nilai-router-3.yml for deploying the Arch-Router-1.5B and Qwen3-VL-4B-Instruct models, with multimodal support and health dependencies.

… including Qwen3 and GPT models

…r Qwen3 model

…3 model

blefo added 2 commits November 28, 2025 11:18

feat: add new Docker Compose configurations for multiple GPU services…

2401aba

… including Qwen3 and GPT models

fix: update GPU memory utilization in Docker Compose configuration fo…

d3bcc77

…r Qwen3 model

blefo marked this pull request as ready for review November 28, 2025 12:58

fix: adjust GPU memory utilization to 0.20 in Docker Compose for Qwen…

639e69f

…3 model

Provide feedback