Skip to content

Latest commit

 

History

History
730 lines (530 loc) · 27 KB

File metadata and controls

730 lines (530 loc) · 27 KB

Custom Policy Setup Guide

This guide covers how to configure LLMTrace's security policies, rate limiting, cost controls, alerting, and other operational features. Every option shown here corresponds to a real configuration field — copy the snippets directly into your config.yaml.

Quick links: Minimal config · Production config · High-security config · Cost-control config

Table of Contents

Configuration Basics

LLMTrace loads configuration from a YAML file. Settings can be overridden via environment variables or CLI flags.

Precedence (highest wins): CLI flags → environment variables → config file → defaults.

# Start with the example config
cp config.example.yaml config.yaml

# Validate before running
./target/release/llmtrace-proxy validate --config config.yaml

# Run the proxy
./target/release/llmtrace-proxy --config config.yaml

Storage Profiles

Choose one of three storage profiles depending on your environment:

Profile Backend Use Case
memory In-memory (lost on restart) Testing, CI/CD
lite SQLite Development, single-node production
production ClickHouse + PostgreSQL + Redis Multi-node, high-throughput production
storage:
  profile: "lite"                # "memory", "lite", or "production"
  database_path: "llmtrace.db"  # Used by "lite" profile

  # Production profile requires all three:
  # clickhouse_url: "http://localhost:8123"
  # clickhouse_database: "llmtrace"
  # postgres_url: "postgres://llmtrace:llmtrace@localhost:5432/llmtrace"
  # redis_url: "redis://127.0.0.1:6379"

Security Analysis Policies

LLMTrace provides regex-based security analysis by default. When the binary is built with the ml feature and ML is enabled in config, the proxy uses an ensemble analyser that combines regex findings with ML classifiers.

Regex-Based Detection (Default)

Regex analysis is enabled by a single toggle and requires no additional infrastructure:

# Master toggle — enables the regex-based security engine
enable_security_analysis: true

# How long to wait for security analysis before falling back
# Set lower for latency-sensitive workloads, higher for thoroughness
security_analysis_timeout_ms: 5000

The regex engine detects a mix of prompt injection, role injection, jailbreak, and leakage patterns. Examples include:

Category Examples Severity
Prompt injection "ignore previous instructions", "forget everything" High
Role injection system:, assistant: appearing in user messages High/Medium
Jailbreaks DAN-style patterns with "no restrictions" Critical
Encoding attacks Base64-encoded malicious instructions High
Delimiter injection ---system:, ===instructions: High
PII patterns Email, phone, SSN, credit card, IBAN, UK NIN Medium
Data leakage System prompt leaks, credential exposure High/Critical
Agent actions Dangerous commands, suspicious URLs, sensitive files Critical/High

No configuration is needed beyond enable_security_analysis: true — all patterns are compiled at startup.

ML-Based Detection (Optional)

For higher accuracy and fewer false positives, enable ML-based detection. When ml_preload: true, the proxy loads HuggingFace models at startup and runs local inference alongside regex analysis. If ml_preload is set to false, the proxy currently stays in regex-only mode (ML models are not loaded lazily).

security_analysis:
  # Enable ML prompt injection detection
  # Requires: binary built with `cargo build --features ml`
  ml_enabled: true

  # HuggingFace model for prompt injection classification
  # This DeBERTa v3 model is specifically trained for prompt injection
  ml_model: "protectai/deberta-v3-base-prompt-injection-v2"

  # Confidence threshold (0.0–1.0)
  # Lower = more sensitive (more detections, more false positives)
  # Higher = more specific (fewer detections, fewer false positives)
  # 0.8 is a good starting point; tune based on your false-positive tolerance
  ml_threshold: 0.8

  # Where to cache downloaded models (avoids re-downloading on restart)
  ml_cache_dir: "~/.cache/llmtrace/models"

  # Pre-load models at startup (recommended for production)
  # If false, first request incurs model download latency
  ml_preload: true

  # Timeout for model download at startup (seconds)
  # Large models (DeBERTa) can be ~500MB — give enough time on slow connections
  ml_download_timeout_seconds: 300

  # Enable NER-based PII detection for person names, organizations, locations
  # Catches PII that regex patterns miss (e.g., "John Smith lives in London")
  ner_enabled: true

  # HuggingFace model for Named Entity Recognition
  ner_model: "dslim/bert-base-NER"

  # Optional jailbreak classifier (runs alongside prompt injection)
  jailbreak_enabled: true
  jailbreak_threshold: 0.7

  # Optional feature-level fusion classifier (ADR-013)
  fusion_enabled: false
  fusion_model_path: null

When to use ML detection:

  • You need to catch adversarial prompt injections that evade regex patterns
  • You handle untrusted user input at scale
  • You want NER-based PII detection for names and organisations (regex can only catch structured formats like emails/SSNs)

When regex-only is sufficient:

  • Internal tools where inputs are semi-trusted
  • You want zero external dependencies and minimal latency overhead
  • Your primary concern is PII in structured formats

Ensemble Behaviour

When ML is enabled, LLMTrace uses the ensemble analyser:

  • Regex findings are always included.
  • ML findings are added when the model is loaded successfully.
  • Findings are merged and deduplicated; overlapping items include metadata such as ensemble_agreement.

Rate Limiting

Rate limiting protects your LLM provider from excessive requests and prevents runaway agents from consuming your entire quota.

Default Configuration

rate_limiting:
  enabled: true

  # Sustained requests per second (across all tenants unless overridden)
  # 100 RPS is a safe starting point for most OpenAI plans
  requests_per_second: 100

  # Burst size — how many requests can fire in a quick burst
  # Set to 2x your RPS for bursty agent workloads
  # Set equal to RPS for smooth, predictable traffic
  burst_size: 200

  # Sliding window for rate calculation (seconds)
  window_seconds: 60

Per-Tenant Overrides

Different tenants (teams, applications, environments) often need different limits:

rate_limiting:
  enabled: true
  requests_per_second: 100   # Default for all tenants
  burst_size: 200
  window_seconds: 60

  # Override specific tenants by UUID
  tenant_overrides:
    # Production AI assistant — needs higher throughput
    "550e8400-e29b-41d4-a716-446655440001":
      requests_per_second: 500
      burst_size: 1000

    # Development/testing tenant — keep it low
    "550e8400-e29b-41d4-a716-446655440002":
      requests_per_second: 20
      burst_size: 40

    # Batch processing tenant — moderate steady, low burst
    "550e8400-e29b-41d4-a716-446655440003":
      requests_per_second: 200
      burst_size: 250

How it works:

  • Requests exceeding the limit receive HTTP 429 Too Many Requests
  • The proxy includes Retry-After headers so well-behaved clients back off
  • Tenant is identified via the X-LLMTrace-Tenant-ID header or API key derivation

Choosing Values

Scenario RPS Burst Why
Single chatbot 10–50 2× RPS Human typing speed limits natural request rate
Agent swarm (10 agents) 100–500 2× RPS Agents fire requests concurrently
Batch embedding pipeline 200–1000 1× RPS Steady throughput, no burst needed
Rate-limited LLM plan Match plan limit +20% headroom Prevent provider-side 429s

Cost Caps & Budgets

Cost caps prevent billing surprises by enforcing per-agent and per-tenant spending limits. When a budget is exceeded, requests are rejected before they reach the LLM provider.

Basic Budget Setup

cost_caps:
  enabled: true

  # Default budgets applied to all tenants/agents
  default_budget_caps:
    # Hourly cap: catch runaway loops quickly
    - window: hourly
      hard_limit_usd: 10.0    # Reject requests above this
      soft_limit_usd: 8.0     # Alert (but allow) above this

    # Daily cap: overall spending control
    - window: daily
      hard_limit_usd: 100.0
      soft_limit_usd: 80.0

Soft vs Hard Limits

Limit Type Behaviour Use Case
Soft limit Triggers an alert but allows the request Early warning — "you're spending fast"
Hard limit Rejects the request with HTTP 429 Hard stop — "no more spending this period"

Set soft limits at ~80% of hard limits to give teams time to react before hitting the wall.

Per-Request Token Caps

Prevent individual requests from consuming excessive tokens (useful for catching infinite loops or excessively long prompts):

cost_caps:
  enabled: true

  default_token_cap:
    max_prompt_tokens: 8192       # Reject prompts longer than this
    max_completion_tokens: 4096   # Limit response length
    max_total_tokens: 16384       # Combined cap

Per-Agent Overrides

Different agents have different cost profiles. Override budgets per agent using the X-LLMTrace-Agent-ID header:

cost_caps:
  enabled: true

  default_budget_caps:
    - window: daily
      hard_limit_usd: 50.0

  agents:
    # Heavy research agent — needs a bigger budget
    - agent_id: "research-agent"
      budget_caps:
        - window: daily
          hard_limit_usd: 500.0
          soft_limit_usd: 400.0
        - window: hourly
          hard_limit_usd: 100.0
      token_cap:
        max_prompt_tokens: 16384
        max_completion_tokens: 8192

    # Simple Q&A bot — keep it tight
    - agent_id: "faq-bot"
      budget_caps:
        - window: daily
          hard_limit_usd: 10.0
          soft_limit_usd: 8.0
      token_cap:
        max_prompt_tokens: 2048
        max_completion_tokens: 1024
        max_total_tokens: 4096

    # Batch processing — weekly budget, no hourly limit
    - agent_id: "batch-embedder"
      budget_caps:
        - window: weekly
          hard_limit_usd: 1000.0
          soft_limit_usd: 800.0

Budget Windows

Window Duration Good For
hourly 1 hour rolling Catching runaway loops quickly
daily 24 hours rolling Day-to-day budget control
weekly 7 days rolling Batch workloads with variable daily usage
monthly 30 days rolling Long-term budget planning

Custom Model Pricing

If you use fine-tuned or self-hosted models, add custom pricing so cost estimation is accurate:

cost_estimation:
  enabled: true

  # Load pricing from an external file (hot-reloaded on SIGHUP)
  # pricing_file: "config/pricing.yaml"

  # Inline pricing overrides (per 1M tokens, in USD)
  custom_models:
    my-fine-tuned-gpt4:
      input_per_million: 5.0
      output_per_million: 10.0
    local-llama-70b:
      input_per_million: 0.0    # Self-hosted = no API cost
      output_per_million: 0.0

Alert Channels

Alerts notify your team when security findings exceed configured thresholds. LLMTrace supports multiple simultaneous channels with independent severity filters.

Single Channel (Legacy Mode)

The simplest setup — one webhook URL:

alerts:
  enabled: true
  webhook_url: "https://hooks.slack.com/services/T00/B00/xxx"
  min_severity: "High"        # Only alert on High and Critical
  min_security_score: 70       # Minimum confidence score (0–100)
  cooldown_seconds: 300        # 5 minutes between duplicate alerts

Multi-Channel Setup

For production, send different severity levels to different channels:

alerts:
  enabled: true
  cooldown_seconds: 300    # Global deduplication window

  channels:
    # Slack: Medium and above — for the security team's awareness
    - type: slack
      url: "https://hooks.slack.com/services/T00/B00/xxx"
      min_severity: "Medium"
      min_security_score: 50

    # PagerDuty: Critical only — wake someone up
    - type: pagerduty
      routing_key: "your-pagerduty-events-v2-routing-key"
      min_severity: "Critical"
      min_security_score: 90

    # Custom webhook: High and above — for your SIEM or log aggregator
    - type: webhook
      url: "https://your-siem.internal/api/llm-alerts"
      min_severity: "High"
      min_security_score: 70

Channel Types

Type Required Fields Notes
slack url (Incoming Webhook URL) Posts formatted Slack messages with finding details
pagerduty routing_key (Events API v2) Creates PagerDuty events
webhook url (any HTTP endpoint) POSTs a JSON payload with full finding data
email (none) Accepted in config but currently skipped (not implemented)

Alert Severity Mapping

LLMTrace compares each finding's severity (InfoCritical) and confidence score (0–100) against the configured minimums. There is no fixed mapping between score and severity; both are emitted by the analysers.

Alert Deduplication

The cooldown_seconds setting prevents alert storms:

alerts:
  cooldown_seconds: 300  # 5 minutes

This means: if the same finding type fires multiple times within 5 minutes, only the first alert is sent. This prevents a flood of identical alerts when an attacker probes your system repeatedly.

Escalation (Not Implemented Yet)

The config schema includes an alerts.escalation block, but the proxy does not currently implement escalation behaviour. Any escalation settings are ignored at runtime.

Circuit Breaker

The circuit breaker degrades LLMTrace to a pure pass-through proxy when internal subsystems (storage, security analysis) fail repeatedly. This ensures your application keeps working even if LLMTrace's analysis layer has issues.

circuit_breaker:
  enabled: true

  # Open the circuit after this many consecutive failures
  # Lower = faster degradation (more protective of upstream)
  # Higher = more tolerant of transient errors
  failure_threshold: 10

  # How long to wait before trying again (half-open state)
  # 30s is a good default; increase for slow-recovering backends
  recovery_timeout_ms: 30000

  # Number of probe requests in half-open state
  # If these succeed, the circuit closes and normal operation resumes
  half_open_max_calls: 3

Circuit States

CLOSED (normal) → failures exceed threshold → OPEN (pass-through)
                                                    │
                                         recovery_timeout_ms
                                                    │
                                               HALF-OPEN
                                              (probe calls)
                                                 ╱     ╲
                                          success      failure
                                             │            │
                                          CLOSED        OPEN

Tuning Guidelines

Environment failure_threshold recovery_timeout_ms Why
Development 3 5000 Fail fast, recover fast
Production (standard) 10 30000 Tolerate transient blips
Production (high-availability) 5 60000 Open quickly, recover cautiously

Anomaly Detection

Anomaly detection uses statistical analysis (moving averages and standard deviations) to identify unusual behaviour per tenant — cost spikes, token spikes, velocity surges, and latency outliers.

anomaly_detection:
  enabled: true

  # Sliding window size (number of recent observations)
  # Larger = more stable baseline, slower to react to real changes
  # Smaller = faster to react, but more false positives
  window_size: 100

  # Sigma threshold for anomaly flagging
  # 2.0 = ~5% false positive rate (aggressive)
  # 3.0 = ~0.3% false positive rate (balanced, recommended)
  # 4.0 = ~0.006% false positive rate (conservative)
  sigma_threshold: 3.0

  # Which dimensions to check
  check_cost: true       # Flag requests with abnormally high estimated cost
  check_tokens: true     # Flag requests with abnormally high token count
  check_velocity: true   # Flag tenants with abnormally high request rate
  check_latency: true    # Flag requests with abnormally high latency

Anomaly Types

Type What It Detects Typical Cause
cost_spike Single request costs much more than the tenant's average Runaway prompt, model upgrade without budget adjustment
token_spike Unusually high token count for a single request Prompt injection inflating context, copy-paste of large documents
velocity_spike Tenant is sending requests much faster than usual Automated loop, DDoS-like behaviour, misconfigured retry logic
latency_spike Response taking much longer than typical Provider degradation, overly complex prompt, model overload

Choosing Window Size and Sigma

                    Sigma Threshold
                 2.0      3.0      4.0
              ┌────────┬────────┬────────┐
   Window 50  │ Noisy  │  OK    │ Quiet  │
   Window 100 │  OK    │ Best   │  OK    │
   Window 200 │ Good   │  OK    │ Slow   │
              └────────┴────────┴────────┘

Start with window_size: 100 and sigma_threshold: 3.0. Adjust based on your false-positive tolerance.

Streaming Security Analysis

For streaming responses (SSE), LLMTrace can run incremental security checks during the stream rather than waiting for completion. This provides early warning when a response starts leaking sensitive data mid-stream.

# Must also have streaming enabled
enable_streaming: true

streaming_analysis:
  enabled: true

  # Check every N tokens during the stream
  # Lower = faster detection, marginally more CPU
  # Higher = less overhead, slower detection
  # 50 is a good balance for most workloads
  token_interval: 50

  # Enable output-side checks (PII/secrets/toxicity) on streaming content
  output_enabled: true

  # If a critical finding is detected mid-stream, inject a warning and stop
  early_stop_on_critical: true

How it works:

  • As SSE chunks arrive, the proxy accumulates tokens
  • Every token_interval tokens, it runs lightweight regex pattern matching on the accumulated content (streaming analysis uses regex only)
  • If a critical finding is detected mid-stream, an alert fires immediately (doesn't wait for stream completion)
  • After the stream completes, full analysis runs on the complete response

Note: Output-side streaming checks require output_safety.enabled: true in addition to streaming_analysis.output_enabled.

When to enable:

  • Long-running streaming responses (creative writing, code generation)
  • High-security environments where early detection matters
  • When you want real-time alerts, not post-hoc analysis

PII Detection & Redaction

Beyond detecting PII, LLMTrace can optionally redact it inside the security analyser output. The proxy does not currently replace upstream responses or stored traces with redacted text — redaction is available for downstream processing and future pipeline use.

pii:
  # "alert_only"       — Detect and report PII, but don't modify text (default)
  # "alert_and_redact" — Detect, report, AND replace PII with [PII:TYPE] tags
  # "redact_silent"    — Redact PII silently without generating findings
  action: "alert_only"

PII Types Detected

Type Pattern Example Confidence
email Standard email format user@example.com 0.90
phone_number US formats 555-123-4567, (555) 123-4567 0.85
ssn US Social Security Number 456-78-9012 0.95
credit_card 16-digit card numbers 4111 1111 1111 1111 0.90
uk_nin UK National Insurance Number AB 12 34 56 C 0.90
iban International Bank Account Number DE89 3704 0044 0532 0130 00 0.85
eu_passport_de German passport C01X00T2Z 0.60
eu_passport_fr French passport 12AB34567 0.65
eu_passport_it Italian passport AA1234567 0.60
eu_passport_es Spanish passport ABC123456 0.60
eu_passport_nl Dutch passport NX1A2B3C4 0.60
intl_phone International phone numbers +44 20 7946 0958 0.80
nhs_number UK NHS number 943 476 5919 0.70
canadian_sin Canadian Social Insurance Number 046-454-286 0.80
australian_tfn Australian Tax File Number 123 456 789 0.70

False Positive Suppression

LLMTrace automatically suppresses PII matches that are likely false positives:

  • Matches inside fenced code blocks (```)
  • Matches on indented code lines (4+ spaces)
  • Matches inside URLs
  • Well-known placeholder values (e.g., 123-45-6789, all-zeros)

Redaction Example

With action: "alert_and_redact", the text:

Contact John at john@example.com or call 555-123-4567

Becomes:

Contact John at [PII:EMAIL] or call [PII:PHONE_NUMBER]

Compliance Reporting

LLMTrace generates compliance reports (SOC2, GDPR, HIPAA) from stored audit events and trace data. Reports are persisted in the metadata repository and can be retrieved via the API.

How Reports Work

  1. Audit events are recorded for tenant and API key operations (and report generation)
  2. Reports aggregate audit events and trace data over a time period into a structured compliance document
  3. Storage: Reports are stored as JSON in the metadata repository (SQLite or PostgreSQL)

Report Types

Type Standard What It Covers
soc2 SOC 2 Type II Audit trail of all operations, access control events, security findings
gdpr GDPR Data processing activity records, PII detection events, data retention compliance
hipaa HIPAA Audit logs for healthcare data, access tracking, security incident records

Enabling Audit Trail

Audit events are generated automatically when trace storage is enabled. For complete compliance coverage, ensure:

# Recommended: traces must be stored to include trace data in reports
enable_trace_storage: true

# Optional: security analysis adds findings that enrich reports
enable_security_analysis: true

# Recommended: production storage for durable audit trail
storage:
  profile: "production"
  postgres_url: "postgres://llmtrace:llmtrace@localhost:5432/llmtrace"

  # Disable auto-migrate in production — run migrations explicitly
  auto_migrate: false

OWASP LLM Top 10 Coverage

LLMTrace includes built-in tests mapped to the OWASP Top 10 for LLM Applications. The current test suite covers the following categories:

Coverage Summary (Tests Present)

OWASP ID Category Tests
LLM01 Prompt Injection
LLM02 Insecure Output Handling
LLM06 Sensitive Information Disclosure
LLM07 Insecure Plugin Design

Other OWASP categories are not currently covered by tests in this repo.

Running OWASP Tests

# All OWASP tests
cargo test --test owasp_llm_top10

# Specific category
cargo test --test owasp_llm_top10 owasp_llm01   # Prompt Injection
cargo test --test owasp_llm_top10 owasp_llm06   # PII Detection
cargo test --test owasp_llm_top10 owasp_llm07   # Agent Action Analysis

For detailed coverage breakdown, see docs/security/OWASP_LLM_TOP10.md.

Putting It All Together

The example configurations in the examples/ directory show complete, ready-to-use setups:

Config Use Case Key Features
config-minimal.yaml Getting started SQLite, regex security, basic rate limits
config-production.yaml Full production ClickHouse + Postgres + Redis, all features, multi-channel alerts
config-high-security.yaml Maximum security ML detection, streaming analysis, strict limits, PagerDuty escalation
config-cost-control.yaml Cost management Tight budgets, per-agent caps, cost anomaly alerts

Recommended Rollout Order

Start with minimal: — Get the proxy running with config-minimal.yaml

Add alerting: — Connect Slack/webhook so you see what's happening

Enable cost caps: — Prevent billing surprises

Tune rate limits: — Adjust per-tenant based on observed traffic

Enable anomaly detection: — After you have ~100 requests of baseline data

Consider ML detection: — If you handle untrusted user input

Enable streaming analysis: — For long-running streaming responses

Set up compliance: — When you need SOC2/GDPR/HIPAA audit trails

Environment Variable Quick Reference

Variable Overrides Example
LLMTRACE_LISTEN_ADDR listen_addr 0.0.0.0:9090
LLMTRACE_UPSTREAM_URL upstream_url http://localhost:11434
LLMTRACE_STORAGE_PROFILE storage.profile production
LLMTRACE_STORAGE_DATABASE_PATH storage.database_path /var/lib/llmtrace/traces.db
LLMTRACE_CLICKHOUSE_URL storage.clickhouse_url http://clickhouse:8123
LLMTRACE_CLICKHOUSE_DATABASE storage.clickhouse_database llmtrace
LLMTRACE_POSTGRES_URL storage.postgres_url postgres://user:pass@pg:5432/llmtrace
LLMTRACE_REDIS_URL storage.redis_url redis://redis:6379
LLMTRACE_LOG_LEVEL logging.level (via CLI env) debug
LLMTRACE_LOG_FORMAT logging.format (via CLI env) json
RUST_LOG Fine-grained tracing llmtrace_proxy=debug,info