You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Update Metrics Collector to reference actual split files
(registration/saturation.go, prometheus_source.go, replica_metrics.go)
- Update Interfaces section to reference saturation_analyzer.go with
correct type descriptions (ReplicaMetrics, VariantDecision,
VariantReplicaState, SaturationAnalyzer)
- Fix test path from internal/capacity to internal/saturation
- Fix Prometheus query strings to use actual metric names (model_name label)
Signed-off-by: Srujan Reddy <srjnreddy33@gmail.com>
- Defines data structures for replica metrics (including variant cost)
34
-
- Defines analysis results and per-variant decision types
35
-
- Provides interface for capacity analysis
36
-
- Defines `VariantDecision` for per-variant scaling decisions
37
-
- Defines `VariantReplicaState` for current/desired replica tracking
28
+
-**Query registration** (`internal/collector/registration/saturation.go`): defines the `max_over_time[1m]` PromQL templates for `vllm:kv_cache_usage_perc` and `vllm:num_requests_waiting`
29
+
-**Prometheus source** (`internal/collector/source/prometheus/prometheus_source.go`): executes queries and caches results
30
+
-**Replica metrics** (`internal/collector/replica_metrics.go`): enriches raw query results with pod metadata (variant name, accelerator type)
- Sibling `analyzer.go` holds the generic `Analyzer` interface, `AnalyzerInput`, `AnalyzerResult`, and `VariantCapacity` (used by the V2 engine)
38
40
39
41
### Data Flow
40
42
@@ -314,7 +316,7 @@ T+90s: All 5 pods now ready, but we have 3 extra replicas (over-provisioned)
314
316
skip // Wait for pending pods to become ready
315
317
if variant.Cost < cheapest.Cost:
316
318
cheapest = variant
317
-
319
+
318
320
scale_up(cheapest) // Only if no pending replicas
319
321
```
320
322
@@ -353,20 +355,19 @@ data:
353
355
354
356
**Per-model overrides:**
355
357
```yaml
356
-
llama-70b-prod: |
357
-
model_id: meta/llama-70b
358
-
namespace: production
359
-
kvCacheThreshold: 0.85
360
-
kvSpareTrigger: 0.15
358
+
llama-70b-prod: |
359
+
model_id: meta/llama-70b
360
+
namespace: production
361
+
kvCacheThreshold: 0.85
362
+
kvSpareTrigger: 0.15
361
363
```
362
364
363
365
## Testing
364
366
365
-
Comprehensive unit tests are provided in `internal/capacity/analyzer_test.go`:
367
+
Comprehensive unit tests are provided in `internal/saturation/analyzer_test.go`:
366
368
367
369
```bash
368
-
cd internal/capacity
369
-
go test -v
370
+
go test ./internal/saturation/...
370
371
```
371
372
372
373
**Test coverage:**
@@ -432,9 +433,10 @@ INFO Capacity target: scale-up cheapest variant
432
433
433
434
### Prometheus Queries
434
435
435
-
**Two queries per model:**
436
-
1. `max_over_time(constants.VLLMKvCacheUsagePerc{namespace="prod",model_id="llama-70b"}[1m])` (returns N samples with peak values)
437
-
2. `max_over_time(constants.VLLMNumRequestsWaiting{namespace="prod",model_id="llama-70b"}[1m])` (returns N samples with peak values)
436
+
**Two queries per model** (registered in `internal/collector/registration/saturation.go`):
437
+
438
+
1. `max by (pod) (max_over_time(vllm:kv_cache_usage_perc{namespace="prod",model_name="llama-70b"}[1m]))` — peak KV cache utilization per pod
439
+
2. `max by (pod) (max_over_time(vllm:num_requests_waiting{namespace="prod",model_name="llama-70b"}[1m]))` — peak queue length per pod
438
440
439
441
**Query strategy:** Uses `max_over_time[1m]` to capture peak capacity usage in the last minute, providing conservative safety-first analysis that prevents missing saturation events between queries. The `model_id` filter ensures metrics are scoped to the specific model being analyzed, preventing cross-model metric pollution.
440
442
@@ -467,9 +469,10 @@ The saturation analyzer is integrated into the controller's reconciliation loop:
467
469
468
470
### Metrics Requirements
469
471
470
-
The analyzer requires these Prometheus metrics from vLLM (defined in `internal/constants/metrics.go`):
0 commit comments