Monitoring & Observability Guide

LLMTrace exposes Prometheus metrics, a health endpoint, and structured logging for production monitoring.

Health Endpoint

curl http://localhost:8080/health | jq

The /health endpoint returns:

status: healthy or degraded

storage: storage backend connectivity status

circuit_breaker: current circuit breaker state (closed, open, half_open)

ml_models: ML model loading status

uptime: proxy uptime in seconds

Use this for Kubernetes liveness/readiness probes:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 15

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Prometheus Metrics

curl http://localhost:8080/metrics

The /metrics endpoint returns metrics in Prometheus exposition format.

Core Metrics

Metric	Type	Labels	Description
`llmtrace_requests_total`	Counter	`provider`, `model`, `status_code`	Total proxied LLM requests
`llmtrace_request_duration_seconds`	Histogram	`provider`, `model`	Request latency distribution
`llmtrace_tokens_total`	Counter	`direction`, `provider`, `model`	Total tokens (direction = `prompt` or `completion`)
`llmtrace_security_findings_total`	Counter	`severity`, `finding_type`	Security findings detected
`llmtrace_security_detector_latency_seconds`	Histogram	`detector`	Per-detector analysis latency
`llmtrace_circuit_breaker_state`	Gauge	`subsystem`, `state`	Circuit breaker state (1.0 = active state)
`llmtrace_storage_operations_total`	Counter	`operation`, `status`	Storage operations (status = `success` or `error`)
`llmtrace_cost_usd_total`	Counter	`tenant`, `model`	Estimated cost in micro-USD (divide by 1,000,000)
`llmtrace_anomalies_total`	Counter	`anomaly_type`	Anomalies detected
`llmtrace_active_connections`	Gauge	(none)	Current active connections

Boundary Defence Metrics

These metrics track the boundary token injection defence. They are emitted when boundary_defense.enabled: true in config, including in shadow mode.

Metric	Type	Labels	Description
`llmtrace_boundary_defense_applied_total`	Counter	`provider`, `mode`	Requests where defence was applied (`mode` = `active` or `shadow`)
`llmtrace_boundary_defense_messages_wrapped`	Histogram	`provider`	Number of tool messages wrapped per request
`llmtrace_boundary_defense_reminder_injected_total`	Counter	`provider`	Requests where system prompt reminder was injected
`llmtrace_boundary_defense_overhead_bytes`	Histogram	`provider`	Byte delta per request from boundary wrapping
`llmtrace_boundary_defense_errors_total`	Counter	`error_type`	Errors in boundary pipeline (`parse_failed`, `serialize_failed`)
`llmtrace_boundary_defense_skipped_total`	Counter	`reason`	Requests skipped (`disabled`, `no_tool_messages`, `unsupported_provider`)
`llmtrace_boundary_defense_shadow_mode`	Gauge	(none)	1 when shadow mode is active, 0 otherwise

Prometheus Scrape Config

# prometheus.yml
scrape_configs:
  - job_name: 'llmtrace'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: /metrics

Grafana Dashboard Setup

Common Panels

Request Rate:

rate(llmtrace_requests_total[5m])

P95 Latency:

histogram_quantile(0.95, rate(llmtrace_request_duration_seconds_bucket[5m]))

Error Rate:

rate(llmtrace_requests_total{status=~"5.."}[5m]) / rate(llmtrace_requests_total[5m])

Security Findings Rate:

rate(llmtrace_security_findings_total[5m])

Circuit Breaker State:

llmtrace_circuit_breaker_state{state="open"}

Boundary Defence -- Messages Wrapped / hour:

sum(rate(llmtrace_boundary_defense_applied_total[5m])) by (provider, mode)

Boundary Defence -- Error Rate:

rate(llmtrace_boundary_defense_errors_total[5m])

Alerting Thresholds

Suggested Grafana alert rules:

Alert	Condition	Severity
High error rate	Error rate > 5% for 5 minutes	Critical
High latency	P95 > 5s for 5 minutes	Warning
Circuit breaker open	`llmtrace_circuit_breaker_state{state="open"} == 1` for > 30s	Critical
Security spike	Finding rate > 10/min for 5 minutes	Warning
Boundary defence errors	`rate(llmtrace_boundary_defense_errors_total[5m]) > 0` for 5 minutes	Warning

Logging

Log Levels

Set via environment variable or config:

RUST_LOG=info ./target/release/llmtrace-proxy --config config.yaml

logging:
  level: "info"    # error | warn | info | debug | trace
  format: "text"   # text | json

error: only critical failures

warn: degraded operations (circuit breaker trips, timeouts)

info: request summaries, security findings, startup/shutdown

debug: detailed request/response logging, threshold filtering decisions

trace: full request bodies, ML inference details (high volume)

Structured JSON Logs

For production log aggregation (ELK, Datadog, etc.), use JSON format:

logging:
  level: "info"
  format: "json"

Output:

{"timestamp":"2026-01-15T10:30:00Z","level":"INFO","target":"llmtrace_proxy","message":"request completed","trace_id":"abc123","status":200,"latency_ms":142}

Alerting Integration

LLMTrace integrates with external alerting via the custom policies system. Configure alert channels for Slack, PagerDuty, or generic webhooks:

alerts:
  enabled: true
  channels:
    - type: slack
      url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
      min_severity: "Medium"
      min_security_score: 50
    - type: pagerduty
      routing_key: "your-pagerduty-key"
      min_severity: "Critical"
      min_security_score: 90
    - type: webhook
      url: "https://your-endpoint.com/alerts"
      min_severity: "High"
      min_security_score: 70
  cooldown_seconds: 300

See Custom Security Policies for full alerting configuration.

Circuit Breaker

The proxy includes a circuit breaker that degrades to pure pass-through on repeated failures:

circuit_breaker:
  enabled: true
  failure_threshold: 10       # failures before opening
  recovery_timeout_ms: 30000  # time in open state before half-open
  half_open_max_calls: 3      # test calls in half-open state

States:

Closed: normal operation, all features active

Open: storage/security failures exceeded threshold; proxy passes traffic through without analysis

Half-Open: testing recovery with limited calls

Monitor via the /health endpoint or the llmtrace_circuit_breaker_state Prometheus metric.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring & Observability Guide

Health Endpoint

Prometheus Metrics

Core Metrics

Boundary Defence Metrics

Prometheus Scrape Config

Grafana Dashboard Setup

Common Panels

Alerting Thresholds

Logging

Log Levels

Structured JSON Logs

Alerting Integration

Circuit Breaker

FilesExpand file tree

monitoring.md

Latest commit

History

monitoring.md

File metadata and controls

Monitoring & Observability Guide

Health Endpoint

Prometheus Metrics

Core Metrics

Boundary Defence Metrics

Prometheus Scrape Config

Grafana Dashboard Setup

Common Panels

Alerting Thresholds

Logging

Log Levels

Structured JSON Logs

Alerting Integration

Circuit Breaker