Version: 2.0
Date: 2026-02-04
Status: Living
Related Documents: docs/architecture/SYSTEM_ARCHITECTURE.md
This document explains, at a technical level, how the LLMTrace transparent proxy works today in this repository. It focuses on the runtime behaviour, request/response flow, streaming handling, security analysis, and failure modes as implemented in llmtrace-proxy.
LLMTrace is an HTTP proxy that forwards OpenAI-compatible requests to an upstream LLM provider and returns responses verbatim while capturing data for observability and security analysis. The client changes only the base URL; request/response format remains unchanged.
Key properties implemented today:
Protocol compatibility: OpenAI-style /v1/chat/completions and /v1/completions payloads are parsed for metadata and forwarded unchanged. All other paths are proxied as-is without LLM-specific parsing.
SSE passthrough: Streaming responses are relayed to the client while being parsed incrementally.
Async analysis: Trace capture and security analysis are performed in background tasks and do not block the response path.
Fail-open: If analysis fails or circuit breakers open, the proxy still forwards upstream responses.
Receive request: The proxy accepts the HTTP request and reads the full body.
- It extracts metadata: tenant ID, agent ID, provider, and model info.
Resolve tenant: If X-LLMTrace-Tenant-ID is present and valid UUID, it is used.
- When auth is disabled, a deterministic UUID v5 is derived from the API key if present.
- When auth is enabled, tenant identity is derived from the API key record (or from the admin key + header).
Forward upstream: The request is forwarded to the configured upstream_url.
- Response headers and status code are mirrored to the client.
Capture + analyse in background: The proxy spawns a background task that:
- Aggregates the response body.
- Runs security analysis (`SecurityAnalyzer::analyze_interaction`).
- Runs output safety analysis when enabled.
- Stores the trace + findings.
- Emits alerts if thresholds are exceeded.
Receive request: The proxy detects stream: true in the body.
Stream passthrough: The upstream response stream is read chunk-by-chunk.
- Each chunk is forwarded to the client immediately.
Incremental parsing: SSE chunks are parsed to accumulate content and token counts.
- Time-to-first-token (TTFT) is recorded on the first token.
Incremental security checks: StreamingSecurityMonitor runs regex-based checks on new content every N tokens.
StreamingOutputMonitorperforms output-safety checks (PII/leakage, optional toxicity) on the delta.
Early stop (optional): If streaming_analysis.early_stop_on_critical is enabled and a critical output issue is detected, the proxy injects a warning event and terminates the stream.
Post-stream analysis: After the stream finishes, the full captured prompt/response is analysed in the background (same as non-streaming).
Implemented in crates/llmtrace-proxy/src/proxy.rs:
- Parses requests
- Determines tenant and agent IDs
- Detects provider and model
- Forwards to upstream with original headers
- Spawns background tasks for analysis and storage
Implemented by llmtrace-security:
- Default: regex analyser for prompt injection, PII, leakage, jailbreak patterns
- Optional ML ensemble when
mlfeature andml_preloadare enabled - Output safety analysis uses
OutputAnalyzerwith response-only checks
Implemented by llmtrace-storage:
memory: in-memory only (tests/dev)lite: SQLite (traces + metadata), in-memory cacheproduction: ClickHouse + PostgreSQL + Redis (feature gated)
Implemented in crates/llmtrace-proxy/src/auth.rs:
auth.enabled = true: API key required; roles enforcedauth.enabled = false: permissive admin context; tenant resolved from header or API key
Implemented in crates/llmtrace-proxy/src/alerts.rs:
- Supported: webhook, Slack, PagerDuty
- Email channel is recognised in config but not implemented
- Cooldown-based deduplication per finding type
Upstream errors: return 502 Bad Gateway with error response body.
Security analysis failures: logged; request still succeeds.
Storage failures: logged; proxy response still succeeds.
Circuit breakers: if open, skip storage/analysis to protect latency.
Key proxy controls (see ProxyConfig in llmtrace-core):
upstream_url: destination LLM APIenable_security_analysis: enable/disable background analysisenable_trace_storage: enable/disable trace persistenceenable_streaming: enable streaming passthroughstreaming_analysis.*: token interval, early stop, output analysissecurity_analysis.*: ML model config, thresholds, preload behaviourauth.*: API key enforcementstorage.*: profile and backing services
- No inline policy enforcement or request blocking beyond optional output early-stop during streaming.
- No UI/dashboard served by the proxy runtime.
- No external IdP integration; auth is API-key based.
- ML models load only when
mlfeature is enabled andml_preloadis true; otherwise regex-only analysis is used.
- Proxy routing and main server:
crates/llmtrace-proxy/src/main.rs - Proxy handler and SSE parsing:
crates/llmtrace-proxy/src/proxy.rs - Streaming monitors:
crates/llmtrace-proxy/src/streaming.rs - Auth and RBAC:
crates/llmtrace-proxy/src/auth.rs - Storage profiles:
crates/llmtrace-storage/src/lib.rs - Security analysers:
crates/llmtrace-security/src/lib.rs