Troubleshooting Guide

Common Issues
Diagnosis Steps
Performance Problems
Database Issues
Docker Issues
GPU Issues (All Vendors)
Debug Mode

GPU Support: SmarterRouter supports NVIDIA, AMD (ROCm), Intel Arc, and Apple Silicon GPUs. If your GPU isn't detected, check the GPU issues section below for vendor-specific troubleshooting.

Common Issues

"All connection attempts failed" / "Failed to list models"

Symptom: SmarterRouter cannot reach your LLM backend (Ollama, llama.cpp, etc.).

Diagnosis:

curl http://localhost:11434/api/tags  # adjust port if needed

Should return:

{
  "models": [...]
}

Solutions:

Verify backend is running: systemctl status ollama or docker ps
Check ROUTER_OLLAMA_URL in .env matches backend URL

Test connectivity from SmarterRouter container:

docker exec smarterrouter curl http://<backend-ip>:<port>/api/tags

For Docker networking:
- Host-only: use host.docker.internal (Mac/Windows) or 172.17.0.1 (Linux)
- Docker Compose: use service name (e.g., ollama:11434)

"Port already in use"

Symptom: Container fails to start; logs show "address already in use"

Solution:

# Check what's using the port
lsof -i :11436
# or
netstat -tulpn | grep 11436

# Option 1: Stop existing container
docker stop smarterrouter && docker rm smarterrouter

# Option 2: Change port in .env or docker-compose.yml
ROUTER_PORT=11437

"Database path is a directory"

Symptom: Error "CRITICAL: Database path is a directory"

Cause: Docker created a directory instead of a file because the path didn't exist before mounting.

Solution:

# Stop container
docker-compose down

# Remove the erroneous directory
rm -rf router.db  # or data/router.db depending on your setup

# Ensure your docker-compose.yml has correct volume mapping:
# - ./router.db:/app/router.db  (single file)
# OR
# - ./data:/app/data            (directory, database inside)

# Restart
docker-compose up -d

GPU Issues (All Vendors)

SmarterRouter auto-detects GPUs from all vendors on startup. Check logs for detection messages.

"nvidia-smi not found"

Symptom: GPU monitoring disabled; VRAM not tracking.

Solution:

Install NVIDIA drivers on host

Install NVIDIA Container Toolkit:

# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

Test:

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

Restart SmarterRouter with --gpus all or --compatibility

"AMD APU shows wrong VRAM (e.g., 7.7GB instead of 58GB)"

Symptom: AMD APU (Ryzen AI, Radeon 800M series) detected with small VRAM instead of full unified memory.

Diagnosis:

# Check what's being reported
cat /sys/class/drm/card*/device/mem_info_vram_total
cat /sys/class/drm/card*/device/mem_info_gtt_total

# VRAM total should be small (< 8GB) for APUs
# GTT total should be large (~system RAM) for APUs

Explanation: APUs have two memory pools:

VRAM (mem_info_vram_*): Small BIOS carve-out (512MB-8GB)
GTT (mem_info_gtt_*): Unified memory pool (actual usable GPU memory)

SmarterRouter auto-detects APUs (VRAM < 4GB) and uses GTT for total memory.

Solutions:

Check BIOS UMA Buffer - Should be set to minimum (512MB-2GB):
- Large UMA buffer wastes RAM and confuses detection
- GTT pool is the real usable memory, not VRAM

Manual Override if detection still fails:

# In .env
ROUTER_AMD_UNIFIED_MEMORY_GB=58  # ~90% of your RAM

Verify ROCm architecture (for gfx1150/gfx1151):
```
export HSA_OVERRIDE_GFX_VERSION=11.5.1
```

"AMD GPU not detected"

Symptom: AMD GPU present but VRAM monitoring disabled.

Diagnosis:

# Check for rocm-smi
rocm-smi

# Check sysfs entries
ls /sys/class/drm/card*/device/mem_info_vram_total

Solutions:

Install ROCm runtime:

# Ubuntu 22.04
sudo amdgpu-install --usecase=rocm,graphics --no-dkms
sudo usermod -a -G render,video $USER

For Docker, ensure device passthrough:
```
devices:
  - /dev/kfd
  - /dev/dri
```

Verify AMD vendor ID in sysfs:

cat /sys/class/drm/card*/device/vendor
# Should show: 0x1002 for AMD

"Intel Arc GPU not detected"

Symptom: Intel Arc GPU present but VRAM monitoring disabled.

Diagnosis:

# Check for Intel GPU with dedicated memory
ls /sys/class/drm/card*/device/lmem_total

# Check driver loaded
lsmod | grep i915

Solutions:

Ensure Intel GPU drivers are installed (kernel 5.19+ recommended)
For Docker, ensure device passthrough:
```
devices:
  - /dev/dri
```

Verify Intel vendor ID:

cat /sys/class/drm/card*/device/vendor
# Should show: 0x8086 for Intel

Note: Only Intel Arc dedicated GPUs with local memory (lmem) are supported. Integrated Intel UHD/Iris GPUs use shared system memory and are not monitored.

"Apple Silicon GPU memory incorrect"

Symptom: Apple Silicon detected but VRAM estimate is wrong.

Solutions:

Manually set unified memory:

# In .env
ROUTER_APPLE_UNIFIED_MEMORY_GB=16  # for 16GB Mac

Apple Silicon VRAM is estimated as 75% of total RAM by default
Run SmarterRouter natively on macOS host (not in Docker) for accurate detection:
```
system_profiler SPHardwareDataType | grep "Memory:"
```

Important: Docker Desktop on macOS cannot pass GPU to containers. You must run SmarterRouter on the host directly.

First request is very slow (30+ seconds)

Causes:

Model cold-start: First time loading a model into GPU memory
VRAM pressure triggering model unload/reload

Solutions:

Pin a small model for fast responses:
```
ROUTER_PINNED_MODEL=phi3:mini
```

Increase VRAM allocation if possible:

ROUTER_VRAM_MAX_TOTAL_GB=<your-gpu-total-minus-2>

Accept that first request is always slower; subsequent requests to same model will be fast

"All models failed" error

Diagnosis:

Check backend health:
```
curl http://localhost:11434/api/tags
```
Check SmarterRouter logs:
```
docker logs smarterrouter
```
Look for:
- VRAM OOM errors
- Model loading failures
- Backend connection timeouts
Check VRAM usage:
```
curl http://localhost:11436/admin/vram
```

Common fixes:

Reduce ROUTER_VRAM_MAX_TOTAL_GB if set too high
Enable auto-unload: ROUTER_VRAM_AUTO_UNLOAD_ENABLED=true
Free up GPU memory (stop other containers/processes)
Use smaller models

Low-quality routing / Wrong model selected

Diagnosis:

Check model profiles:
```
curl http://localhost:11436/admin/profiles
```
- Are scores reasonable (0.0-1.0)?
- Are all models 0.0? → Profiling failed

Reprofile if needed:

curl -X POST "http://localhost:11436/admin/reprofile?force=true" \
  -H "Authorization: Bearer your-admin-key"

Use explain endpoint to understand routing decision:

curl "http://localhost:11436/admin/explain?prompt=Your prompt here"

Check complexity detection:
- Simple prompts → smaller models preferred
- Complex prompts → larger models required
- Adjust ROUTER_QUALITY_PREFERENCE if always too aggressive/Conservative

Profiling takes too long or times out

Cause: Small ROUTER_PROFILE_TIMEOUT for large models.

Solution:

Increase timeout in .env:

ROUTER_PROFILE_TIMEOUT=180  # 3 minutes per prompt

For very large models (14B+), consider:

ROUTER_PROFILE_TIMEOUT=300  # 5 minutes per prompt

Adaptive profiling should handle this automatically; check logs for timeout errors

High memory usage / Out of Memory

Solutions:

Unload unused models:

# Manual trigger
curl -X POST http://localhost:11436/admin/cache/invalidate?all=true

Enable aggressive auto-unload:

ROUTER_VRAM_AUTO_UNLOAD_ENABLED=true
ROUTER_VRAM_UNLOAD_THRESHOLD_PCT=75
ROUTER_VRAM_UNLOAD_STRATEGY=largest  # free most memory first

Reduce ROUTER_VRAM_MAX_TOTAL_GB to be more conservative
Use smaller models
Increase system swap (not ideal but prevents crashes)

Model not appearing in `/v1/models`

Causes:

Model not discovered by backend
Backend not running or accessible
Profiling not complete

Check:

# 1. Backend connection
curl http://localhost:11434/api/tags

# 2. SmarterRouter logs for discovery
docker logs smarterrouter | grep -i "discover\|profile"

# 3. Router health
curl http://localhost:11436/health

Fix: Ensure backend is running and ROUTER_OLLAMA_URL is correct.

Response signature showing wrong model

Symptom: Response says "Model: llama3:70b" but you expected a different model.

Explanation: This is correct behavior! SmarterRouter selected that model based on routing algorithm. To see why:

curl "http://localhost:11436/admin/explain?prompt=your prompt"

Stale routing decisions (same model always selected)

Cause: Routing cache preventing re-evaluation.

Solution: Invalidate cache:

curl -X POST http://localhost:11436/admin/cache/invalidate?all=true

Or disable caching temporarily:

ROUTER_CACHE_ENABLED=false

Performance Problems

Slow Response Times

Checklist:

Cold start? First request to a model is slow while it loads. Subsequent requests faster.
VRAM pressure? Check /admin/vram - if utilization >90%, models may be unloading/reloading.
Cache enabled? ROUTER_CACHE_ENABLED=true speeds up repeat requests.
Pinned model? Set ROUTER_PINNED_MODEL=phi3:mini for instant responses to simple queries.
Profiling complete? If profiles are missing or 0.0 scores, router may be making poor decisions.

Quick wins:

# Pin a small model
ROUTER_PINNED_MODEL=phi3:mini

# Increase VRAM limit
ROUTER_VRAM_MAX_TOTAL_GB=22.0

# Enable caching
ROUTER_CACHE_ENABLED=true
ROUTER_CACHE_MAX_SIZE=1000

High VRAM Usage

Diagnose:

curl http://localhost:11436/admin/vram | jq .

Look for:

loaded_models array showing which models are in memory
utilization_pct接近 100%
warnings array

Fix:

Enable auto-unload (if not already):

ROUTER_VRAM_AUTO_UNLOAD_ENABLED=true
ROUTER_VRAM_UNLOAD_THRESHOLD_PCT=80
ROUTER_VRAM_UNLOAD_STRATEGY=largest

Manually unload specific models via admin API (coming soon) or restart
Reduce ROUTER_VRAM_MAX_TOTAL_GB to be more conservative
Use smaller models (7B instead of 70B)
Quantize models (Q4_K_M, Q5_K_S)

Docker Issues

Container exits immediately

Check logs:

docker logs smarterrouter

Common causes:

Invalid .env syntax (use quotes for values with spaces)
Missing required files
Port conflict (11436 already in use)
Backend not accessible at startup

Fix: Read error message from logs; adjust configuration accordingly.

Container can't find Ollama

Symptoms: "Failed to list models: All connection attempts failed"

Solutions:

Both on host (not Docker):

ROUTER_OLLAMA_URL=http://localhost:11434

SmarterRouter in Docker, Ollama on host (Linux):
```
ROUTER_OLLAMA_URL=http://172.17.0.1:11434
```

Both in Docker Compose:

services:
  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
  smarterrouter:
    build: .
    environment:
      - ROUTER_OLLAMA_URL=http://ollama:11434  # use service name

Test connectivity from inside container:

docker exec smarterrouter curl http://172.17.0.1:11434/api/tags

Volume mount not persisting data

Symptom: Database resets on container restart.

Check:

# Verify volume is mounted
docker inspect smarterrouter | grep -A 10 "Mounts"

Ensure docker-compose.yml has:

services:
  smarterrouter:
    volumes:
      - ./router.db:/app/router.db  # or ./data:/app/data

Note: If router.db doesn't exist on host initially, Docker will create it as a file (good). If parent directory structure doesn't exist, ensure it's created or use absolute path.

Debug Mode

Enable verbose logging for investigation:

# In .env
ROUTER_LOG_LEVEL=DEBUG
ROUTER_LOG_FORMAT=json  # easier to parse programmatically

View logs with request tracking:

# Make request with tracking ID
curl -H "X-Request-ID: debug-123" http://localhost:11436/v1/models

# Follow logs
docker logs -f smarterrouter | grep "debug-123"

# Or for JSON logs, use jq
docker logs smarterrouter 2>&1 | jq 'select(.request_id=="debug-123")'

Check detailed request logs (if enabled with --enable-request-logging flag):

logs/detailed_logs/<timestamp>/
├── request.json          # Full request payload
├── response.json         # Full response or error
├── streaming_chunks.jsonl # SSE chunks if streaming
└── metadata.json         # Timing, model selected, scores

Still Stuck?

Check the logs - Most issues are clearly explained there
Run health check - curl http://localhost:11436/health
Verify backend connectivity - curl http://localhost:11434/api/tags
Open an issue - Include:
- SmarterRouter version (from git commit or image tag)
- Full error messages
- Relevant log snippets
- Steps to reproduce
- Your configuration (redact API keys!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting Guide

Table of Contents

Common Issues

"All connection attempts failed" / "Failed to list models"

"Port already in use"

"Database path is a directory"

GPU Issues (All Vendors)

"nvidia-smi not found"

"AMD APU shows wrong VRAM (e.g., 7.7GB instead of 58GB)"

"AMD GPU not detected"

"Intel Arc GPU not detected"

"Apple Silicon GPU memory incorrect"

First request is very slow (30+ seconds)

"All models failed" error

Low-quality routing / Wrong model selected

Profiling takes too long or times out

High memory usage / Out of Memory

Model not appearing in `/v1/models`

Response signature showing wrong model

Stale routing decisions (same model always selected)

Performance Problems

Slow Response Times

High VRAM Usage

Docker Issues

Container exits immediately

Container can't find Ollama

Volume mount not persisting data

Debug Mode

Still Stuck?

FilesExpand file tree

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls

Troubleshooting Guide

Table of Contents

Common Issues

"All connection attempts failed" / "Failed to list models"

"Port already in use"

"Database path is a directory"

GPU Issues (All Vendors)

"nvidia-smi not found"

"AMD APU shows wrong VRAM (e.g., 7.7GB instead of 58GB)"

"AMD GPU not detected"

"Intel Arc GPU not detected"

"Apple Silicon GPU memory incorrect"

First request is very slow (30+ seconds)

"All models failed" error

Low-quality routing / Wrong model selected

Profiling takes too long or times out

High memory usage / Out of Memory

Model not appearing in /v1/models

Response signature showing wrong model

Stale routing decisions (same model always selected)

Performance Problems

Slow Response Times

High VRAM Usage

Docker Issues

Container exits immediately

Container can't find Ollama

Volume mount not persisting data

Debug Mode

Still Stuck?

Model not appearing in `/v1/models`