SmarterRouter is configured via environment variables in the .env file. This reference documents all available options.
- Benchmark Data Sources
- Backend Provider Configuration
- Security Settings
- Routing Configuration
- Timeout Settings
- Profiling Settings
- Cache Configuration
- VRAM Monitoring
- Monitoring & Logging
- Database
- LLM-as-Judge
Comma-separated list of benchmark data sources. Options:
huggingface(default)lmsysartificial_analysis
Example: ROUTER_BENCHMARK_SOURCES=huggingface,lmsys,artificial_analysis
Note: Sources are queried in the order listed. If multiple sources provide data for the same model, the last source's data wins (non-null values overwrite earlier ones).
API key for ArtificialAnalysis.ai (required if artificial_analysis in ROUTER_BENCHMARK_SOURCES).
Get your free API key from: https://artificialanalysis.ai/insights
Rate limit: 1,000 requests per day (free tier). Data is cached for 24 hours by default to stay within limits.
Cache TTL for ArtificialAnalysis data (seconds). Default: 86400 (24 hours).
Increase if you have a paid plan with higher rate limits; decrease if you need fresher data.
Path to YAML file mapping ArtificialAnalysis model identifiers to SmarterRouter model names.
ArtificialAnalysis uses different naming conventions than Ollama. This file lets you explicitly map their model IDs or names to your local model tags.
Example mapping file format (see artificial_analysis_models.example.yaml):
mappings:
# By ArtificialAnalysis model ID (UUID) - most reliable
"2dad8957-4c16-4e74-bf2d-8b21514e0ae9": "openai/o3-mini"
# By ArtificialAnalysis model name/slug
"o3-mini": "openai/o3-mini"
"claude-3-5-sonnet": "anthropic/claude-3-5-sonnet"
"gemini-2.5-pro": "google/gemini-2.5-pro"If no explicit mapping is found, the provider attempts to auto-generate a name using the pattern {creator-slug}/{model-slug}.
Why mapping needed: Your Ollama model tags might be llama3.1:70b while ArtificialAnalysis calls it "Llama-3.1-70B". The mapping bridges this gap.
Which backend to use. Options:
ollama(default) - Local Ollama instancellama.cpp- llama.cpp serveropenai- OpenAI-compatible API
URL of your Ollama instance or OpenAI-compatible endpoint.
Default: http://localhost:11434
Docker note: When SmarterRouter runs in Docker and Ollama on the host, use http://172.17.0.1:11434.
String to prepend to all model names sent to the backend.
Example: ROUTER_MODEL_PREFIX=myorg/ makes model llama3 become myorg/llama3
Use cases: Organizational naming, model registries, API gateways.
When ROUTER_PROVIDER=openai:
ROUTER_OPENAI_BASE_URL=https://api.openai.com/v1
ROUTER_OPENAI_API_KEY=your-api-key-hereWorks with OpenAI, Anthropic (via compatibility layer), vLLM, TGI, LiteLLM, or any OpenAI-compatible API.
Authentication key for admin endpoints (/admin/*).
- Full model performance profiles
- VRAM monitoring data
- Cache management
- Reprofile controls
Generate a secure key:
openssl rand -hex 32
# Copy output to .env: ROUTER_ADMIN_API_KEY=sk-smarterrouter-<output>Default: (empty - insecure)
Enable rate limiting to prevent abuse and DoS attacks.
Default: false
General endpoint rate limit per client IP.
Default: 60
Dedicated chat endpoint (/v1/chat/completions) rate limit per client IP.
This limit is applied specifically to chat completions and takes precedence over the general per-minute limit for that endpoint.
Default: 100
Admin endpoint rate limit per client IP.
Default: 10
Quality vs speed tradeoff. Range: 0.0 (max speed) to 1.0 (max quality).
Default: 0.5
Effects:
- Low (0.0-0.3): Prefers smaller, faster models
- Medium (0.4-0.6): Balanced approach
- High (0.7-1.0): Prefers larger, higher-quality models
If a selected model fails, automatically retry with the next best capable model.
Default: true
Collect user feedback (/v1/feedback) to improve future routing decisions.
Default: true
Prefer smaller models for simple tasks when quality is equal.
Default: true
Prefer newer models when scores are similar.
Default: true
Name the router presents itself as to external UIs (e.g., OpenWebUI).
Default: smarterrouter/main
Timeout for model generation requests (seconds).
Default: 120
Increase for: Large models (14B+), complex reasoning tasks
Enable global request timeout enforcement across full request processing (routing, model loading, generation, and post-processing).
Default: true
Overall request timeout budget in seconds. Requests exceeding this limit are cancelled and return HTTP 504.
Default: 300
Base timeout for profiling operations (seconds).
Default: 90
Increase for: Profiling large models to avoid premature timeouts
Number of test prompts per category (reasoning, coding, creativity) during profiling.
Default: 3
Higher values: More accurate profiles, longer profiling time Lower values: Faster profiling, less accuracy
Measure actual VRAM usage during profiling.
Default: true
Delay after loading model before measuring VRAM (seconds). Allows memory to stabilize.
Default: 2.0
Number of VRAM samples to take during profiling (averaged).
Default: 3
Safety factor for adaptive timeout calculation (default: 2.0 = conservative). Higher = more buffer, lower = more aggressive.
Default: 2.0
Enable smart caching of routing decisions and responses.
Default: true
Maximum number of routing cache entries (SHA-256 hash based).
Default: 500
Time-to-live for cache entries (seconds).
Default: 3600 (1 hour)
Cache backend implementation.
memory(default)redis
Redis connection URL used when ROUTER_CACHE_BACKEND=redis.
Default: redis://localhost:6379
Prefix for Redis cache keys.
Default: smarterrouter:
Interval for background persistent-cache cleanup task.
Set to 0 to disable the periodic cleanup task.
Default: 24
Maximum number of response cache entries.
Default: 200
Embedding model for semantic similarity matching. If set, enables semantic caching in addition to exact hash matching.
Example: nomic-embed-text:latest
Similarity threshold for semantic matching (0.0-1.0). Higher = more strict matching.
Default: 0.85
Enable VRAM monitoring with auto-detection across all GPU vendors (NVIDIA, AMD, Intel, Apple Silicon).
Default: true
Override auto-detected unified memory for Apple Silicon Macs. SmarterRouter estimates GPU memory as a percentage of system RAM (default: 75%). Set this to explicitly define the total GB available for GPU workloads on Apple Silicon.
Default: (auto-detect as 75% of system RAM)
VRAM sampling interval (seconds).
Default: 30
Maximum VRAM the router can allocate. Leave empty to auto-detect 90% of total GPU memory across all detected GPUs.
Example: For 24GB GPU, set to 22.0 to reserve 2GB for system
Default: (auto-detect 90% of total detected VRAM)
VRAM utilization percentage for warnings (not automatic unloads).
Default: 85.0
Automatically unload unused models when VRAM pressure is high.
Default: true
Strategy for selecting models to unload:
lru(default) - least recently usedlargest- unload biggest models first
Default VRAM estimate for models without measured data.
Default: 8.0
Controls how long models stay loaded in VRAM after each request (passed to backend's keep_alive parameter).
-1(default): Keep models loaded indefinitely. They stay in VRAM until explicitly unloaded or the router shuts down.0: Unload models immediately after each response. Good for conserving VRAM at the cost of slower subsequent requests (model must reload).- Positive integer: Number of seconds to keep the model loaded after the response (e.g.,
300= 5 minutes).
Note: This setting only affects backends that support keep_alive (Ollama). Other backends may ignore it.
Example: Set ROUTER_MODEL_KEEP_ALIVE=0 to ensure only the most recently used model remains loaded, freeing VRAM for other applications.
Comma-separated list of glob patterns to include when discovering models. Only models matching these patterns will be available for routing and profiling. Case-insensitive matching.
Patterns:
*matches everything?matches any single character[seq]matches any character in seq[!seq]matches any character not in seq
Default: (empty - include all models)
Examples:
ROUTER_MODEL_FILTER_INCLUDE=gemma*,mistral*- Only include gemma and mistral modelsROUTER_MODEL_FILTER_INCLUDE=llama*,phi*- Include llama and phi model families
Comma-separated list of glob patterns to exclude when discovering models. Models matching these patterns will be removed from the available set. Case-insensitive matching. Exclude patterns take precedence over include patterns.
Default: (empty - exclude no models)
Examples:
ROUTER_MODEL_FILTER_EXCLUDE=*qwen*,*deepseek*- Exclude qwen and deepseek modelsROUTER_MODEL_FILTER_EXCLUDE=*test*,*dev*- Exclude test/dev models
You can use both settings together. The filtering logic is:
- First, exclude patterns are applied (models matching exclude are removed)
- Then, include patterns are applied (if include is non-empty, only matching models remain)
Example - Use gemma and mistral but exclude quantized versions:
ROUTER_MODEL_FILTER_INCLUDE=gemma*,mistral*
ROUTER_MODEL_FILTER_EXCLUDE=*q4_*,*q5_*,*q8_*
Example - Exclude everything except specific models:
ROUTER_MODEL_FILTER_EXCLUDE=*
ROUTER_MODEL_FILTER_INCLUDE=llama3.1:8b,phi3:mini
Multi-GPU Support: SmarterRouter automatically detects all available GPUs regardless of vendor and combines their memory. GPU indexing is global across vendors (0, 1, 2, ...). If no GPUs are detected on startup, VRAM monitoring is disabled with a warning. GPU detection runs on every startup, so adding new hardware requires only a restart.
Supported Vendors:
- NVIDIA: via
nvidia-smi - AMD: via
rocm-smior sysfs - Intel: Arc GPUs with dedicated VRAM (via sysfs
lmem_total) - Apple Silicon: Unified memory estimation (default 75% of system RAM)
Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL
Default: INFO
Log format: text (human-readable) or json (structured for log aggregation)
Default: text
For production: Use json for easy parsing by log aggregation tools
When using json logging, warning/error records include structured context fields where available (e.g., request_id, user_ip, model_name, prompt_hash) to improve incident triage and cross-service correlation.
How often to check for new models in backend (seconds).
Default: 60
Enable gzip compression middleware for API responses.
Default: false
Minimum response size (bytes) before gzip compression is applied.
Default: 1024
Enable slow-request logging middleware.
When enabled, requests that exceed ROUTER_SLOW_QUERY_THRESHOLD_MS are logged with request metadata and a stack snapshot.
Default: false
Slow request threshold in milliseconds.
Default: 500
Database connection URL.
Default: sqlite:///data/router.db
For PostgreSQL in production:
postgresql://user:password@localhost:5432/smarterrouter
Note: The database file and parent directories are automatically created on startup.
These settings tune SQLAlchemy connection pooling (primarily for non-SQLite backends):
ROUTER_DATABASE_POOL_SIZE(default:10)ROUTER_DATABASE_MAX_OVERFLOW(default:20)ROUTER_DATABASE_POOL_RECYCLE(default:3600)ROUTER_DATABASE_POOL_PRE_PING(default:true)
For SQLite, pool settings are less impactful due to file-based locking, but are still accepted.
ROUTER_PROVIDER_DB_ENABLED- Enable provider.db benchmark usage (default:true)ROUTER_PROVIDER_DB_PATH- Path to provider.db (default:data/provider.db)ROUTER_PROVIDER_DB_MAX_AGE_HOURS- Mark provider.db stale iflast_buildis older than this many hours (default:168)ROUTER_PROVIDER_DB_AUTO_UPDATE_HOURS- Background auto-update interval (default:4)ROUTER_PROVIDER_DB_DOWNLOAD_URL- Download source URL for provider.db
ROUTER_DB_SLOW_FALLBACK_ENABLED- Enable temporary stale-cache fallback when provider.db is slow/unavailable (default:true)ROUTER_DB_SLOW_QUERY_THRESHOLD_MS- Query latency threshold that triggers degraded fallback window (default:250)ROUTER_DB_SLOW_FALLBACK_WINDOW_SECONDS- Duration of degraded fallback window after slow/failing query (default:30)ROUTER_DB_STALE_CACHE_MAX_AGE_SECONDS- Maximum age of in-memory benchmark cache allowed for fallback serving (default:300)
ROUTER_BACKEND_RETRY_ENABLED- Enable retry for transient backend errors (default:true)ROUTER_BACKEND_MAX_RETRIES- Maximum retry attempts (default:3)ROUTER_BACKEND_RETRY_BASE_DELAY- Initial backoff delay in seconds (default:0.5)ROUTER_BACKEND_RETRY_MAX_DELAY- Maximum backoff delay in seconds (default:8.0)
Retryable failures include timeouts, network errors, HTTP 429, and HTTP 5xx.
ROUTER_BACKEND_CIRCUIT_BREAKER_ENABLED(default:true)ROUTER_BACKEND_CIRCUIT_BREAKER_FAILURE_THRESHOLD(default:5)ROUTER_BACKEND_CIRCUIT_BREAKER_RESET_TIMEOUT(default:60.0)ROUTER_BACKEND_CIRCUIT_BREAKER_HALF_OPEN_MAX_ATTEMPTS(default:3)ROUTER_BACKEND_CIRCUIT_BREAKER_SLIDING_WINDOW_SIZE(default:100)
When enabled, backend operations open their circuit after repeated failures, fail fast while open, and probe recovery in half-open state.
Enable persistent dead-letter-queue capture for failed background jobs.
Default: true
Maximum retry attempts per failed background task before marking it dead.
Default: 3
Base retry delay in seconds for DLQ retries. Backoff is exponential per attempt.
Default: 60
Maximum number of due DLQ entries retried per retry-worker iteration.
Default: 10
DLQ captures failures from background sync/cleanup workflows and stores them in background_task_dlq for later inspection and retry.
Use an LLM to grade model outputs during profiling (higher quality scores).
Default: false (requires external API)
Enable for: More accurate model capability assessment
Model to use as the judge (e.g., gpt-4o, claude-3-opus).
Default: gpt-4o
Base URL for judge's API endpoint.
Default: https://api.openai.com/v1
API key for judge's service.
Default: (empty)
HTTP referer header (required by some providers like OpenRouter).
Default: (empty)
X-Title header for provider analytics.
Default: (empty)
Max retry attempts for transient errors.
Default: 3
Initial retry delay in seconds (doubles on each retry).
Default: 1.0
# Backend
ROUTER_PROVIDER=ollama
ROUTER_OLLAMA_URL=http://localhost:11434
# Security (CRITICAL FOR PRODUCTION)
ROUTER_ADMIN_API_KEY=sk-smarterrouter-$(openssl rand -hex 32)
ROUTER_RATE_LIMIT_ENABLED=true
# Routing
ROUTER_QUALITY_PREFERENCE=0.5
ROUTER_PINNED_MODEL=phi3:mini
ROUTER_CASCADING_ENABLED=true
# Cache
ROUTER_CACHE_ENABLED=true
ROUTER_CACHE_MAX_SIZE=500
# VRAM
ROUTER_VRAM_MAX_TOTAL_GB=22.0
ROUTER_VRAM_AUTO_UNLOAD_ENABLED=true
ROUTER_MODEL_KEEP_ALIVE=-1
# Logging
ROUTER_LOG_LEVEL=INFO
ROUTER_LOG_FORMAT=json
# Database
ROUTER_DATABASE_URL=sqlite:///data/router.dbSee ENV_DEFAULT for the complete list with inline comments.