Troubleshooting

Common issues and diagnostic steps for LLMTrace.

Proxy Won't Start

Config file errors

Error: Failed to parse config file

Validate your YAML syntax. The proxy requires a valid config file:

./target/release/llmtrace-proxy --config config.yaml

Common config mistakes:

Indentation errors in YAML
Missing required upstream_url field
Invalid storage profile name (valid: lite, production, memory)

Port already in use

Error: Address already in use (os error 98)

Another process is using port 8080. Find and stop it:

lsof -i :8080
kill <PID>

Or change the listen address in config:

listen_addr: "0.0.0.0:9090"

ML model download failures

Error: Failed to download model

The proxy downloads ML models from Hugging Face Hub on first startup. If this fails:

Check internet connectivity

Increase the download timeout:

security_analysis:
  ml_download_timeout_seconds: 1200

Pre-download models and set the cache directory (see ML Model Reference)
Disable ML to run regex-only:
```
security_analysis:
  ml_enabled: false
```

Connection Refused / Timeouts

Proxy not reachable

# Verify the proxy is running and listening
curl http://localhost:8080/health

If connection refused:

Confirm the proxy process is running
Check listen_addr in config matches the URL you're hitting
If running in Docker, ensure port mapping (-p 8080:8080)

Upstream provider timeouts

If requests to the LLM provider time out, increase timeouts:

timeout_ms: 60000              # request timeout
connection_timeout_ms: 10000   # connection timeout

Security Findings Not Appearing

Async analysis delay

Security analysis runs asynchronously after the response is forwarded. Findings may take 1-5 seconds to appear after the request completes.

# Wait briefly, then check findings
sleep 3
curl http://localhost:8080/api/v1/security/findings | jq

Storage not configured

If enable_trace_storage: false or no storage backend is configured, findings are analysed but not persisted.

enable_security_analysis: true
enable_trace_storage: true

storage:
  profile: "lite"
  database_path: "llmtrace.db"

ML not compiled in

If you built without --features ml, only regex detection is active:

cargo build --release --features ml

False Positives

Benign requests flagged as threats.

Quick fixes

Switch to a higher-precision operating point:

   security_analysis:
     operating_point: "high_precision"

Enable over-defence suppression:

   security_analysis:
     over_defence: true

Raise the per-model threshold:

   security_analysis:
     ml_threshold: 0.9

See Threshold Tuning for a detailed tuning workflow.

False Negatives

Known attacks not being detected.

Quick fixes

Enable additional detectors:

   security_analysis:
     ml_enabled: true
     injecguard_enabled: true
     piguard_enabled: true

Switch to high-recall operating point:

   security_analysis:
     operating_point: "high_recall"

Lower per-model thresholds:

   security_analysis:
     ml_threshold: 0.7
     injecguard_threshold: 0.75

Rate Limiting / Cost Cap Errors

429 Too Many Requests

The proxy's rate limiter is rejecting requests. Adjust limits in config:

rate_limiting:
  enabled: true
  requests_per_second: 200   # increase limit
  burst_size: 400

Budget exceeded

If cost caps are configured and exceeded, the proxy returns a budget error. Check current spend:

curl http://localhost:8080/api/v1/costs/current | jq

Streaming Issues

SSE not working

Ensure streaming is enabled in config:

enable_streaming: true

streaming_analysis:
  enabled: true
  token_interval: 50
  output_enabled: true

Use -N flag with curl to disable buffering:

curl -N http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}],"stream":true}'

Storage Connectivity

ClickHouse

# Test ClickHouse connectivity
curl http://localhost:8123/ping

Verify the URL in config:

storage:
  profile: "production"
  clickhouse_url: "http://localhost:8123"
  clickhouse_database: "llmtrace"

PostgreSQL

# Test PostgreSQL connectivity
psql "postgres://llmtrace:llmtrace@localhost:5432/llmtrace" -c "SELECT 1"

Redis

# Test Redis connectivity
redis-cli ping

Dashboard Not Loading

Proxy URL configuration

The dashboard needs to connect to the proxy API. Verify the proxy URL is correct in the dashboard settings page, or set the environment variable:

NEXT_PUBLIC_PROXY_URL=http://localhost:8080 npm run dev

CORS errors

The proxy enables CORS by default. If you see CORS errors in the browser console, ensure the proxy is running and reachable from the browser's network.

Diagnostic Commands

Health check

curl http://localhost:8080/health | jq

Returns proxy status, storage connectivity, circuit breaker state, and ML model status.

Prometheus metrics

curl http://localhost:8080/metrics

Returns request counts, latency histograms, error rates, and circuit breaker state in Prometheus exposition format.

Log levels

Increase log verbosity for debugging:

RUST_LOG=debug ./target/release/llmtrace-proxy --config config.yaml

Available levels: error, warn, info, debug, trace.

For production, use structured JSON logs:

logging:
  level: "info"
  format: "json"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting

Proxy Won't Start

Config file errors

Port already in use

ML model download failures

Connection Refused / Timeouts

Proxy not reachable

Upstream provider timeouts

Security Findings Not Appearing

Async analysis delay

Storage not configured

ML not compiled in

False Positives

Quick fixes

False Negatives

Quick fixes

Rate Limiting / Cost Cap Errors

429 Too Many Requests

Budget exceeded

Streaming Issues

SSE not working

Storage Connectivity

ClickHouse

PostgreSQL

Redis

Dashboard Not Loading

Proxy URL configuration

CORS errors

Diagnostic Commands

Health check

Prometheus metrics

Log levels

FilesExpand file tree

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls

Troubleshooting

Proxy Won't Start

Config file errors

Port already in use

ML model download failures

Connection Refused / Timeouts

Proxy not reachable

Upstream provider timeouts

Security Findings Not Appearing

Async analysis delay

Storage not configured

ML not compiled in

False Positives

Quick fixes

False Negatives

Quick fixes

Rate Limiting / Cost Cap Errors

429 Too Many Requests

Budget exceeded

Streaming Issues

SSE not working

Storage Connectivity

ClickHouse

PostgreSQL

Redis

Dashboard Not Loading

Proxy URL configuration

CORS errors

Diagnostic Commands

Health check

Prometheus metrics

Log levels