This guide covers installing and running SmarterRouter in various environments.
- Docker and Docker Compose installed
- (Optional) NVIDIA GPU with drivers and NVIDIA Container Toolkit for VRAM monitoring
- Python 3.11+
- pip package manager
- Access to an LLM backend (Ollama, llama.cpp server, or OpenAI-compatible API)
The easiest and most reliable way to run SmarterRouter.
git clone https://github.com/peva3/SmarterRouter.git
cd SmarterRouterCopy the environment template and customize:
cp ENV_DEFAULT .env
nano .env # or use your preferred editorMinimum configuration:
- If your Ollama runs on
localhost:11434(default), no changes needed - If Ollama is on a different host/port, set
ROUTER_OLLAMA_URL
Important Docker networking note: When running SmarterRouter in Docker and Ollama on the host machine, use http://172.17.0.1:11434 instead of http://localhost:11434 because localhost inside the container refers to the container itself.
docker-compose up -dThis will:
- Build the Docker image from GitHub Container Registry
- Start SmarterRouter on
http://localhost:11436 - Mount the database at
./router.dbfor persistence
docker logs smarterrouterLook for:
INFO: Uvicorn running on http://0.0.0.0:11436
INFO: Starting router...
INFO: Profiling complete - X models ready
git clone https://github.com/peva3/SmarterRouter.git
cd SmarterRouterpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtcp ENV_DEFAULT .env
nano .envAt minimum, verify ROUTER_OLLAMA_URL points to your backend.
python -m uvicorn main:app --host 0.0.0.0 --port 11436For production: Use a production ASGI server like gunicorn:
pip install gunicorn
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:11436SmarterRouter supports automatic VRAM monitoring across multiple GPU vendors:
| Vendor | Detection Method | Docker Support | Template |
|---|---|---|---|
| NVIDIA | nvidia-smi | ✅ Full | docker-compose.nvidia.yml |
| AMD | rocm-smi or sysfs | ✅ ROCm containers | docker-compose.amd.yml |
| Intel Arc | sysfs (lmem) | docker-compose.intel.yml | |
| Apple Silicon | Unified memory | ❌ Run on host | docker-compose.apple.md |
| Multi-GPU | Combined detection | ✅ Mixed vendors | docker-compose.multi-gpu.yml |
Quick Start: Copy the appropriate template to your project root:
# For NVIDIA GPUs
cp docs/docker-compose.nvidia.yml docker-compose.yml
# For AMD GPUs
cp docs/docker-compose.amd.yml docker-compose.yml
# For Intel Arc GPUs
cp docs/docker-compose.intel.yml docker-compose.yml
# For multi-GPU setups
cp docs/docker-compose.multi-gpu.yml docker-compose.ymlNVIDIA provides the best Docker GPU support with the NVIDIA Container Toolkit.
Requirements:
- NVIDIA GPU with proprietary drivers installed
- NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html
Quick Setup:
# Use the NVIDIA template
cp docs/docker-compose.nvidia.yml docker-compose.yml
docker-compose --compatibility up -dVerify installation:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smiEnable GPU in Docker:
# Method A: Use --compatibility flag
docker-compose --compatibility up -d
# Method B: Use --gpus flag (newer Docker Compose)
docker compose up -d --gpus allAMD GPUs use ROCm (Radeon Open Compute) for GPU monitoring.
Requirements:
- AMD GPU with ROCm support (RX 6000/7000 series, Radeon Instinct, Radeon Pro)
- ROCm runtime installed on host
Quick Setup:
# Use the AMD template
cp docs/docker-compose.amd.yml docker-compose.yml
docker-compose up -dVerify installation:
# Check if rocm-smi is available
rocm-smi
# Or check sysfs
ls /sys/class/drm/card*/device/mem_info_vram_total
**Note:** For full ROCm support in containers, you may need to use a ROCm base image. See [docker-compose.amd.yml](docker-compose.amd.yml) for detailed options.
#### AMD APUs (Unified Memory)
AMD APUs (Accelerated Processing Units) like Ryzen AI 300 series with Radeon 800M graphics use **unified memory** where CPU and GPU share system RAM. This requires special configuration.
**Supported APUs:**
- Ryzen AI 9 HX 370 (Radeon 890M)
- Ryzen AI 9 HX 470 (Radeon 890M)
- Ryzen 8000G series (Radeon 780M/760M)
- Ryzen 5000/6000 mobile series with Radeon Graphics
**Auto-Detection:**
SmarterRouter automatically detects APUs and uses GTT (Graphics Translation Table) to report the unified memory pool, not the small BIOS VRAM carve-out.
**BIOS Configuration (Critical for APUs):**
1. Enter BIOS/UEFI settings
2. Find "UMA Frame Buffer Size" or "UMA Mode" (often under Advanced > NB Configuration)
3. **Set to minimum (512MB - 2GB)** - NOT maximum!
- Why? The BIOS setting is a VRAM *carve-out* that reduces available system RAM
- APUs use GTT for actual GPU memory, which dynamically allocates from system RAM
- Large carve-out just wastes RAM; GTT pool is the real usable memory
4. Save and reboot
**Manual Override (if auto-detection fails):**
```bash
# In .env - set to ~90% of your system RAM for the GPU
# Example: 64GB system -> set ~58GB
ROUTER_AMD_UNIFIED_MEMORY_GB=58
Verification:
# Check GTT pool size (actual unified memory)
cat /sys/class/drm/card*/device/mem_info_gtt_total
# Divide by 1073741824 to get GB
# Check VRAM carve-out (usually small for APUs)
cat /sys/class/drm/card*/device/mem_info_vram_totalArchitecture Override (for gfx1150/gfx1151): Some newer APUs need a ROCm architecture override:
export HSA_OVERRIDE_GFX_VERSION=11.5.1Intel Arc GPUs use sysfs for memory monitoring via local memory (lmem).
Requirements:
- Intel Arc A-series GPU (A380, A770, etc.) or Data Center GPU
- Intel GPU drivers (i915 kernel module)
Quick Setup:
# Use the Intel template
cp docs/docker-compose.intel.yml docker-compose.yml
docker-compose up -dVerify installation:
# Check for Intel GPU with dedicated memory
ls /sys/class/drm/card*/device/lmem_totalNote: Intel GPU support in Docker requires the device to be passed through. Compute workloads may need oneAPI/Level Zero setup. See docker-compose.intel.yml for details.
Apple Silicon uses unified memory where CPU and GPU share system RAM. VRAM monitoring estimates GPU availability as 75% of total RAM.
Important: Docker Desktop on macOS cannot pass GPU to containers. You must run SmarterRouter directly on the host (not in Docker) for Apple Silicon GPU support.
See docker-compose.apple.md for complete native installation instructions.
Quick Start:
# Clone and setup
git clone https://github.com/peva3/SmarterRouter.git
cd SmarterRouter
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Configure for Apple Silicon (optional)
echo "ROUTER_APPLE_UNIFIED_MEMORY_GB=16" >> .env # Set if auto-detect fails
# Run
python -m uvicorn main:app --host 0.0.0.0 --port 11436SmarterRouter automatically detects all GPUs across vendors. For mixed GPU setups (e.g., NVIDIA + AMD, or all three), use the multi-GPU template:
# Copy the multi-GPU template
cp docs/docker-compose.multi-gpu.yml docker-compose.yml
# Edit to uncomment the sections for your specific GPUs
nano docker-compose.yml
# Run with NVIDIA support (if included)
docker-compose --compatibility up -dMulti-GPU configuration:
- Set
ROUTER_VRAM_MAX_TOTAL_GBto limit total VRAM usage across all GPUs - GPUs are indexed globally (0, 1, 2, ...) regardless of vendor
- Check
/admin/vramendpoint to see detected GPUs
Example combinations:
- NVIDIA + AMD: Uncomment both deploy.resources (NVIDIA) and devices (AMD)
- NVIDIA + Intel: Uncomment deploy.resources (NVIDIA) and devices (Intel)
- AMD + Intel: Uncomment devices section only
- All three: Uncomment all GPU sections
If no GPU is detected, SmarterRouter continues to function but:
- No VRAM monitoring available
- No automatic model unloading based on memory
- All model management falls back to the backend (Ollama, etc.)
To explicitly disable VRAM monitoring:
ROUTER_VRAM_MONITOR_ENABLED=false| Feature | NVIDIA | AMD | Intel Arc | Apple Silicon |
|---|---|---|---|---|
| VRAM Detection | ✅ | ✅ | ✅ | ✅ (estimated) |
| Memory Usage | ✅ | ✅ | ✅ | |
| Docker GPU Passthrough | ✅ | ❌ | ||
| Multi-GPU | ✅ | ✅ | ✅ | N/A |
| Model Auto-Unload | ✅ | ✅ | ✅ | ✅ |
| Pinned Model | ✅ | ✅ | ✅ | ✅ |
curl http://localhost:11436/healthExpected response:
{
"status": "healthy",
"profiling_complete": true,
"models_available": 5,
"backend_connected": true
}curl http://localhost:11436/v1/modelsShould return:
{
"object": "list",
"data": [
{
"id": "smarterrouter/main",
"object": "model",
"created": 1708162374.0,
"owned_by": "local"
}
]
}curl -X POST http://localhost:11436/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello, how are you?"}],
"max_tokens": 50
}'On first startup, SmarterRouter:
- Discovers all models from your configured backend
- Profiles each model with standardized prompts (reasoning, coding, creativity)
- Downloads benchmark data from HuggingFace and LMSYS
- Initializes the routing database
Expected timeline:
- ~18 models: 30-60 minutes
- ~50 models: 2-4 hours
- Profile progress is logged; check logs with
docker logs -f smarterrouter
Profiling is one-time only. Subsequent startups only profile newly added models.
docker pull ghcr.io/peva3/smarterrouter:latest
docker-compose down
docker-compose up -dYour router.db file is preserved automatically. Database migrations happen on startup if needed.
git pull
pip install -r requirements.txt
# Restart your serverdocker-compose down
docker volume rm smarterrouter_router-db # if you want to delete the database
docker rmi ghcr.io/peva3/smarterrouter:latest# Stop the process (Ctrl+C or kill)
rm -rf venv router.db data/- Configuration Reference - All available settings
- Backend Providers - Setting up different backends
- API Documentation - Complete API reference
- Troubleshooting - Common issues and solutions