Deploy SmarterRouter in production with proper security, monitoring, and reliability.
- Prerequisites
- Docker Compose Production Setup
- Security Hardening
- Monitoring & Alerting
- SSL/TLS Termination
- Backup Strategy
- Scaling Considerations
- Docker and Docker Compose
- SSL certificate (for production HTTPS)
- Reverse proxy (nginx, Traefik, Caddy) - optional but recommended
- Monitoring infrastructure (Prometheus, Grafana) - optional but recommended
Create docker-compose.prod.yml:
version: '3.8'
services:
smarterrouter:
build:
context: .
dockerfile: Dockerfile
container_name: smarterrouter
ports:
- "11436:11436"
env_file:
- .env
volumes:
- ./router.db:/app/router.db:ro # read-only mount
- ./logs:/app/logs
- ./data:/app/data
restart: unless-stopped
read_only: true # immutable filesystem
security_opt:
- no-new-privileges:true
networks:
- smarterrouter-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11436/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
deploy:
resources:
limits:
memory: 8G
reservations:
memory: 2G
networks:
smarterrouter-network:
driver: bridge# Build production image
docker-compose -f docker-compose.prod.yml build
# Start service
docker-compose -f docker-compose.prod.yml up -d
# Check health
docker-compose -f docker-compose.prod.yml ps
docker logs smarterrouter# Generate secure random key
openssl rand -hex 32
# Add to .env
ROUTER_ADMIN_API_KEY=sk-smarterrouter-<your-random-key>Never leave admin endpoints unprotected in production!
ROUTER_RATE_LIMIT_ENABLED=true
ROUTER_RATE_LIMIT_REQUESTS_PER_MINUTE=120
ROUTER_RATE_LIMIT_ADMIN_REQUESTS_PER_MINUTE=10# Only allow your frontend origins
ROUTER_CORS_ALLOWED_ORIGINS=https://your-app.com,https://admin.your-app.comThe Dockerfile already creates a non-root user smarterrouter. Ensure it's being used:
# In docker-compose.prod.yml
services:
smarterrouter:
user: "1000:1000" # smarterrouter userUse internal Docker network; expose port only to trusted network:
networks:
smarterrouter-network:
internal: true # no external internet accessOr use firewall rules to restrict port 11436 to specific IPs.
# Pull latest security patches
docker-compose -f docker-compose.prod.yml pull
docker-compose -f docker-compose.prod.yml up -dMonitor security advisories:
- Subscribe to GitHub Security Advisories for this repo
- Watch for Python/Docker/NVIDIA security updates
SmarterRouter exposes metrics at GET /metrics.
Add to Prometheus config:
scrape_configs:
- job_name: 'smarterrouter'
static_configs:
- targets: ['localhost:11436']
scrape_interval: 30sCreate alerts for:
- High error rate:
rate(smarterrouter_errors_total[5m]) > 0.1 - High latency:
histogram_quantile(0.95, rate(smarterrouter_request_duration_seconds_bucket[5m])) > 10 - VRAM pressure:
smarterrouter_vram_utilization_pct > 90 - Low cache hit rate:
rate(smarterrouter_cache_hits_total[5m]) / (rate(smarterrouter_cache_hits_total[5m]) + rate(smarterrouter_cache_misses_total[5m])) < 0.5 - Service down:
up{job="smarterrouter"} == 0
Import dashboard (example JSON to be provided). Key panels:
- Request rate and latency
- Error rate by endpoint
- Model selection distribution
- VRAM usage over time
- Cache hit rates
- Backend connectivity status
Use nginx/Traefik/Caddy to handle HTTPS:
# nginx config
server {
listen 443 ssl http2;
server_name router.your-domain.com;
ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
location / {
proxy_pass http://localhost:11436;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}Caddy auto-configures SSL:
router.your-domain.com {
reverse_proxy localhost:11436
}Mount certificates in container:
services:
smarterrouter:
volumes:
- ./certs:/app/certs:ro
environment:
- SSL_CERT_PATH=/app/certs/fullchain.pem
- SSL_KEY_PATH=/app/certs/privkey.pem(Note: SmarterRouter doesn't have built-in TLS; use reverse proxy approach)
- Database (
router.dbor PostgreSQL): Contains all model profiles and routing history - Configuration (
.env): Your settings (redact secrets before storing) - Logs: For debugging and audit trail (optional, can be large)
Create backup.sh:
#!/bin/bash
BACKUP_DIR=/backups/smarterrouter
DATE=$(date +%Y%m%d-%H%M%S)
# Backup database
cp /path/to/router.db $BACKUP_DIR/router-$DATE.db
# Backup .env (redact API keys first!)
sed 's/ROUTER_ADMIN_API_KEY=.*/ROUTER_ADMIN_API_KEY=REDACTED/' .env > $BACKUP_DIR/env-$DATE.txt
# Optional: compress old backups
find $BACKUP_DIR -name "*.db" -mtime +30 -exec gzip {} \;
find $BACKUP_DIR -name "*.txt" -mtime +7 -delete
# Optional: upload to S3
# aws s3 cp $BACKUP_DIR/router-$DATE.db s3://your-bucket/backups/Add to cron:
0 2 * * * /path/to/backup.sh# Stop SmarterRouter
docker-compose down
# Restore database
cp /backups/router-20240220.db router.db
# Restore config (manual edit)
cp /backups/env-20240220.txt .env
# Edit .env to add back your actual secrets
# Restart
docker-compose up -dFor high availability and load distribution:
# docker-compose.yml
services:
smarterrouter-1:
# ... same config
ports:
- "11436:11436"
smarterrouter-2:
# ... same config
ports:
- "11437:11436"Use a load balancer (nginx, HAProxy) in front:
upstream smarterrouter {
server localhost:11436;
server localhost:11437;
}
server {
listen 80;
location / {
proxy_pass http://smarterrouter;
}
}All instances must share the same database:
services:
smarterrouter-1:
volumes:
- postgres_data:/app/data # Use PostgreSQL
smarterrouter-2:
volumes:
- postgres_data:/app/data
volumes:
postgres_data:Or use external PostgreSQL:
ROUTER_DATABASE_URL=postgresql://user:pass@postgres-host:5432/smarterrouterWarning: SQLite doesn't work well with multiple writers. Use PostgreSQL for multi-instance deployments.
- Database: Use managed PostgreSQL (RDS, CloudSQL) with replication
- Multiple router instances: At least 2 in different availability zones
- Load balancer health checks: Route traffic only to healthy instances
- Regular backups: Automated, tested restore process
- Monitoring alerts: Immediate notification of failures
Document step-by-step:
- How to manually failover to backup instance
- How to restore database from backup
- How to rebuild router instance from scratch
- Contact information for critical incidents
See Performance Tuning for detailed guidance.
Production recommendations:
- Set
ROUTER_CACHE_ENABLED=truewith appropriate size (1000-2000) - Pin a small model:
ROUTER_PINNED_MODEL=phi3:mini - Use PostgreSQL instead of SQLite for concurrent access
- Enable
ROUTER_VRAM_AUTO_UNLOAD_ENABLED=true - Set appropriate
ROUTER_VRAM_MAX_TOTAL_GB(leave 10-15% headroom) - Use
ROUTER_LOG_FORMAT=jsonfor log aggregation - Set
ROUTER_LOG_LEVEL=WARNING(avoid INFO logs in production)
Monitor these endpoints:
GET /health- Overall health (profiling complete, backend connected)GET /metrics- Prometheus metricsGET /admin/vram- VRAM status
Configure log rotation to prevent disk fill:
# /etc/logrotate.d/smarterrouter
/path/to/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 644 smarterrouter smarterrouter
postrotate
docker exec smarterrouter kill -USR1 1
endscript
}For PostgreSQL:
- Regular
VACUUM ANALYZE - Monitor table size
- Set up point-in-time recovery
For SQLite:
- Periodically run
VACUUMduring maintenance windows - Backup before large schema changes
- Monitor file size growth
- Configuration Reference - All available settings
- Troubleshooting - Production issues and solutions
- Performance Tuning - Optimize for your workload
- API Documentation - Complete API reference