Date: 2025-01-10
Phase: Phase 1 - Design & Contracts
This document defines the data entities and their relationships for the Request Metrics feature.
Purpose: Central coordinator for metrics collection, gauge computation, and state management.
Attributes:
registry: Arc<Registry>- Reference to backend registry for computing gaugesstart_time: Instant- Gateway startup time for uptime calculationlabel_cache: DashMap<String, String>- Thread-safe cache for sanitized Prometheus labels
Responsibilities:
- Initialize Prometheus exporter at gateway startup
- Compute gauge metrics from Registry state (backends_total, backends_healthy, models_available)
- Sanitize backend and model names for Prometheus label compatibility
- Provide snapshots for JSON stats endpoint
Lifecycle: Created once at gateway startup, shared via Arc<MetricsCollector> in AppState.
Thread Safety: Uses Arc for shared ownership, DashMap for concurrent label cache access.
Purpose: Represents a single HTTP request event for metrics tracking.
Attributes:
model: String- Requested model name (from request body)backend_id: String- Backend that served the requeststatus_code: u16- HTTP status code (200, 500, etc.)duration_seconds: f64- Total request duration from handler entry to exittimestamp: Instant- Request start time
Recorded Metrics:
- Counter:
nexus_requests_total{model, backend, status} - Histogram:
nexus_request_duration_seconds{model, backend}
Source: Instrumented in src/api/completions.rs handler.
Validation Rules:
- Model and backend names must be sanitized before use as labels
- Duration must be non-negative
- Status code must be valid HTTP code (100-599)
Purpose: Represents an error event with classification for debugging.
Attributes:
error_type: ErrorType- Enum:Timeout,BackendError,NoBackend,NoHealthyBackend,ParseError,Othermodel: String- Model associated with the errorbackend_id: Option<String>- Backend if error occurred at backend leveltimestamp: Instant- Error occurrence time
Recorded Metrics:
- Counter:
nexus_errors_total{error_type, model}
Source: Instrumented at error paths in src/api/completions.rs.
Classification Rules:
- Map
RoutingError::NoHealthyBackend→ErrorType::NoHealthyBackend - Map HTTP 503 from backend →
ErrorType::BackendError - Map request timeout →
ErrorType::Timeout
Purpose: Tracks routing fallback events when primary model is unavailable.
Attributes:
from_model: String- Original requested modelto_model: String- Fallback model usedreason: String- Why fallback occurred (e.g., "primary_unhealthy")timestamp: Instant- Fallback occurrence time
Recorded Metrics:
- Counter:
nexus_fallbacks_total{from_model, to_model}
Source: Instrumented in routing layer when fallback chain is traversed.
Validation Rules:
- Both model names must be sanitized
- Fallback should only increment if actual model substitution occurred
Purpose: Tracks backend health check latency for performance monitoring.
Attributes:
backend_id: String- Backend identifierlatency_ms: u32- Health check round-trip time in millisecondstimestamp: Instant- Health check time
Recorded Metrics:
- Histogram:
nexus_backend_latency_seconds{backend}
Source: Instrumented in src/health/mod.rs after each health check.
Conversion: Convert milliseconds to seconds for histogram (Prometheus convention).
Purpose: Tracks token usage for cost estimation and monitoring.
Attributes:
model: String- Model usedbackend_id: String- Backend that served the requesttoken_type: TokenType- Enum:Prompt,Completioncount: u32- Number of tokens
Recorded Metrics:
- Histogram:
nexus_tokens_total{model, backend, type}
Source: Extracted from backend response (if available in OpenAI-compatible format).
Note: Optional - not all backends provide token counts.
Purpose: Represents current state of the backend fleet (point-in-time snapshot).
Computed Attributes (derived from Registry):
backends_total: usize- Total registered backendsbackends_healthy: usize- Backends with status = Healthymodels_available: usize- Distinct models across all healthy backendspending_requests: usize- Sum of pending requests across all backends (future)
Recorded Metrics:
- Gauge:
nexus_backends_total - Gauge:
nexus_backends_healthy - Gauge:
nexus_models_available - Gauge:
nexus_pending_requests{backend}
Update Strategy: Computed on-demand when /metrics or /v1/stats is called.
Purpose: JSON representation of metrics for /v1/stats endpoint.
Schema:
{
"uptime_seconds": 3600,
"requests": {
"total": 1000,
"success": 950,
"errors": 50
},
"backends": [
{
"id": "ollama-local",
"requests": 500,
"average_latency_ms": 1250.5,
"pending": 2
}
],
"models": [
{
"name": "llama3:70b",
"requests": 300,
"average_duration_ms": 5000.0
}
]
}Rust Types:
#[derive(Serialize)]
pub struct StatsResponse {
pub uptime_seconds: u64,
pub requests: RequestStats,
pub backends: Vec<BackendStats>,
pub models: Vec<ModelStats>,
}
#[derive(Serialize)]
pub struct RequestStats {
pub total: u64,
pub success: u64,
pub errors: u64,
}
#[derive(Serialize)]
pub struct BackendStats {
pub id: String,
pub requests: u64,
pub average_latency_ms: f64,
pub pending: usize,
}
#[derive(Serialize)]
pub struct ModelStats {
pub name: String,
pub requests: u64,
pub average_duration_ms: f64,
}Computation:
- Uptime:
Instant::now() - start_time - Request stats: Query Prometheus metrics handle for counter values
- Per-backend/model stats: Aggregate from Prometheus histogram data
┌─────────────────────┐
│ MetricsCollector │
│ │
│ - registry │◄────┐
│ - start_time │ │
│ - label_cache │ │
└─────────────────────┘ │
│ │
│ queries │
▼ │
┌──────────────┐ │
│ Registry │────────┘
│ │
│ - backends │
│ - models │
└──────────────┘
▲
│
│ updated by
│
┌──────────────┐
│ HealthChecker│
│ │
│ Records: │
│ - backend │
│ latency │
└──────────────┘
┌─────────────────────┐
│ Request Handler │
│ (completions.rs) │
│ │
│ Records: │
│ - request count │
│ - duration │
│ - errors │
│ - fallbacks │
└─────────────────────┘
│
│ writes to
▼
┌──────────────┐
│ Prometheus │
│ Recorder │
│ │
│ (global) │
└──────────────┘
│
│ scraped by
▼
┌──────────────┐
│ /metrics │
│ handler │
└──────────────┘
When Registry.update_status(backend_id, status) is called:
- No direct metric update
- Next
/metricsor/v1/statsquery computes gauges from current Registry state nexus_backends_healthygauge reflects new count
Request arrives
↓
Start timer
↓
Route to backend (record fallback if occurs)
↓
Proxy request
↓
Response received OR error
↓
Stop timer
↓
Record metrics:
- nexus_requests_total (counter)
- nexus_request_duration_seconds (histogram)
- nexus_errors_total (counter, if error)
- nexus_tokens_total (histogram, if available)
Rule: Prometheus labels must match regex [a-zA-Z_][a-zA-Z0-9_]*
Implementation:
fn sanitize_label(s: &str) -> String {
// Replace invalid chars with underscore
let mut result = s.chars()
.map(|c| if c.is_alphanumeric() || c == '_' { c } else { '_' })
.collect::<String>();
// Ensure first char is not a digit
if result.chars().next().map_or(false, |c| c.is_ascii_digit()) {
result.insert(0, '_');
}
result
}Cached: Use DashMap<String, String> to cache sanitized labels for performance.
Risk: Unbounded label values cause memory exhaustion.
Mitigation:
- Only use known dimensions:
model,backend,status,error_type,type - Never use request IDs, timestamps, or user data as labels
- Backend and model names come from finite Registry
- Monitor Prometheus memory usage in production
Requirement: All metrics recording must be thread-safe without locks in hot path.
Implementation:
metricscrate usesAtomicU64for counters- Histograms use pre-allocated atomic buckets
MetricsCollector.label_cacheusesDashMap(concurrent HashMap)- No
MutexorRwLockin request path
| Operation | Target Latency | Implementation |
|---|---|---|
| Record counter | < 50ns | atomic.fetch_add(1, Ordering::Relaxed) |
| Record histogram | < 100ns | Find bucket + atomic increment |
| Sanitize label (cached) | < 10ns | DashMap lookup |
| Sanitize label (uncached) | < 500ns | String iteration + DashMap insert |
| Compute gauges | < 500µs | Iterate Registry backends (typically < 100) |
| Render /metrics | < 1ms | Prometheus exporter serialization |
| Compute /v1/stats | < 2ms | Aggregate histograms + compute averages |
Total Request Overhead: < 0.1ms (counter + histogram recording only)
-
Label sanitization:
- Test valid names pass through unchanged
- Test invalid chars replaced with
_ - Test first char digit handling
- Property test: Output always matches Prometheus regex
-
Gauge computation:
- Mock Registry with known backends
- Verify
backends_total= backend count - Verify
backends_healthycounts only healthy - Verify
models_availablecounts unique models
-
Stats aggregation:
- Mock Prometheus data
- Verify JSON output matches expected schema
- Verify averages computed correctly
-
End-to-end request tracking:
- Send request to
/v1/chat/completions - Query
/metrics - Assert
nexus_requests_totalincremented - Assert
nexus_request_duration_secondshas sample
- Send request to
-
Error tracking:
- Trigger error condition (no backend)
- Query
/metrics - Assert
nexus_errors_total{error_type="no_backend"}incremented
-
Fallback tracking:
- Configure fallback chain
- Trigger fallback
- Query
/metrics - Assert
nexus_fallbacks_totalincremented
-
Benchmark metric recording overhead:
- Use
criterioncrate - Measure request handler latency with/without metrics
- Assert overhead < 0.1ms (100µs)
- Use
-
Benchmark /metrics endpoint:
- Pre-populate with 10k requests
- Measure GET /metrics latency
- Assert < 1ms p95
- Request queue depth: Track pending requests per backend
- Circuit breaker state: Expose circuit breaker status as gauge
- Cache hit rate: If request caching is added
- Streaming metrics: Track SSE stream duration separately
- Cost tracking: If cost-per-request becomes available
These are mentioned for awareness but are NOT part of F09 implementation.