Skip to content

Commit 6400520

Browse files
committed
docs: fix file paths in saturation-analyzer.md
- Update Metrics Collector to reference actual split files (registration/saturation.go, prometheus_source.go, replica_metrics.go) - Update Interfaces section to reference saturation_analyzer.go with correct type descriptions (ReplicaMetrics, VariantDecision, VariantReplicaState, SaturationAnalyzer) - Fix test path from internal/capacity to internal/saturation - Fix Prometheus query strings to use actual metric names (model_name label) Signed-off-by: Srujan Reddy <srjnreddy33@gmail.com>
1 parent 1de48e3 commit 6400520

1 file changed

Lines changed: 30 additions & 27 deletions

File tree

docs/saturation-analyzer.md

Lines changed: 30 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -17,24 +17,26 @@ The Saturation Analyzer is a **fast, reactive, and safe saturation guardrail** t
1717

1818
### Components
1919

20-
**1. Saturation Analyzer (`internal/capacity/analyzer.go`)**
20+
**1. Saturation Analyzer (`internal/saturation/analyzer.go`)**
2121
- Core analysis logic for saturation-based scaling decisions
2222
- Implements spare capacity calculations
2323
- Performs worst-case scale-down safety simulation
2424
- Makes **per-variant** scaling decisions with cost-awareness
2525

26-
**2. Metrics Collector (`internal/collector/capacity_metrics.go`)**
27-
- Collects vLLM metrics from Prometheus using `max_over_time[1m]` queries
28-
- Queries `constants.VLLMKvCacheUsagePerc` and `constants.VLLMNumRequestsWaiting`
29-
- Uses peak values over 1 minute for safety-first capacity analysis
30-
- Enriches metrics with pod metadata (variant name, accelerator type)
26+
**2. Metrics Collector**
3127

32-
**3. Interfaces (`internal/interfaces/capacity_analyzer.go`)**
33-
- Defines data structures for replica metrics (including variant cost)
34-
- Defines analysis results and per-variant decision types
35-
- Provides interface for capacity analysis
36-
- Defines `VariantDecision` for per-variant scaling decisions
37-
- Defines `VariantReplicaState` for current/desired replica tracking
28+
- **Query registration** (`internal/collector/registration/saturation.go`): defines the `max_over_time[1m]` PromQL templates for `vllm:kv_cache_usage_perc` and `vllm:num_requests_waiting`
29+
- **Prometheus source** (`internal/collector/source/prometheus/prometheus_source.go`): executes queries and caches results
30+
- **Replica metrics** (`internal/collector/replica_metrics.go`): enriches raw query results with pod metadata (variant name, accelerator type)
31+
32+
**3. Interfaces (`internal/interfaces/saturation_analyzer.go`)**
33+
34+
- Defines `ReplicaMetrics` data structure (with variant cost, KV cache, queue fields)
35+
- Defines analysis results: `ModelSaturationAnalysis`, `VariantSaturationAnalysis`
36+
- Defines `VariantDecision` for per-variant scaling decisions (with pipeline step history)
37+
- Defines `VariantReplicaState` for current/desired/pending replica tracking
38+
- Defines `SaturationAnalyzer` interface (`AnalyzeModelSaturation`, `CalculateSaturationTargets`)
39+
- Sibling `analyzer.go` holds the generic `Analyzer` interface, `AnalyzerInput`, `AnalyzerResult`, and `VariantCapacity` (used by the V2 engine)
3840

3941
### Data Flow
4042

@@ -314,7 +316,7 @@ T+90s: All 5 pods now ready, but we have 3 extra replicas (over-provisioned)
314316
skip // Wait for pending pods to become ready
315317
if variant.Cost < cheapest.Cost:
316318
cheapest = variant
317-
319+
318320
scale_up(cheapest) // Only if no pending replicas
319321
```
320322

@@ -353,20 +355,19 @@ data:
353355
354356
**Per-model overrides:**
355357
```yaml
356-
llama-70b-prod: |
357-
model_id: meta/llama-70b
358-
namespace: production
359-
kvCacheThreshold: 0.85
360-
kvSpareTrigger: 0.15
358+
llama-70b-prod: |
359+
model_id: meta/llama-70b
360+
namespace: production
361+
kvCacheThreshold: 0.85
362+
kvSpareTrigger: 0.15
361363
```
362364
363365
## Testing
364366
365-
Comprehensive unit tests are provided in `internal/capacity/analyzer_test.go`:
367+
Comprehensive unit tests are provided in `internal/saturation/analyzer_test.go`:
366368

367369
```bash
368-
cd internal/capacity
369-
go test -v
370+
go test ./internal/saturation/...
370371
```
371372

372373
**Test coverage:**
@@ -432,9 +433,10 @@ INFO Capacity target: scale-up cheapest variant
432433
433434
### Prometheus Queries
434435
435-
**Two queries per model:**
436-
1. `max_over_time(constants.VLLMKvCacheUsagePerc{namespace="prod",model_id="llama-70b"}[1m])` (returns N samples with peak values)
437-
2. `max_over_time(constants.VLLMNumRequestsWaiting{namespace="prod",model_id="llama-70b"}[1m])` (returns N samples with peak values)
436+
**Two queries per model** (registered in `internal/collector/registration/saturation.go`):
437+
438+
1. `max by (pod) (max_over_time(vllm:kv_cache_usage_perc{namespace="prod",model_name="llama-70b"}[1m]))` — peak KV cache utilization per pod
439+
2. `max by (pod) (max_over_time(vllm:num_requests_waiting{namespace="prod",model_name="llama-70b"}[1m]))` — peak queue length per pod
438440
439441
**Query strategy:** Uses `max_over_time[1m]` to capture peak capacity usage in the last minute, providing conservative safety-first analysis that prevents missing saturation events between queries. The `model_id` filter ensures metrics are scoped to the specific model being analyzed, preventing cross-model metric pollution.
440442
@@ -467,9 +469,10 @@ The saturation analyzer is integrated into the controller's reconciliation loop:
467469
468470
### Metrics Requirements
469471
470-
The analyzer requires these Prometheus metrics from vLLM (defined in `internal/constants/metrics.go`):
471-
- `constants.VLLMKvCacheUsagePerc` (`vllm:kv_cache_usage_perc`) — KV cache utilization (0.0-1.0)
472-
- `constants.VLLMNumRequestsWaiting` (`vllm:num_requests_waiting`) — Queue length (integer)
472+
The analyzer requires these Prometheus metrics from vLLM. Queries are registered in `internal/collector/registration/saturation.go`:
473+
474+
- `vllm:kv_cache_usage_perc` — KV cache utilization (0.0-1.0)
475+
- `vllm:num_requests_waiting` — Queue length (integer)
473476
474477
These metrics must include the following labels:
475478
- `pod` or `pod_name` — Pod identification

0 commit comments

Comments
 (0)