Skip to content

Commit 71231a8

Browse files
committed
Improvements: rename KvCacheTokensTotal, expand help text, use unit label
- Rename KvCacheTokensTotal -> KvCacheTokensCapacity on VariantDecision, Actuator.EmitSaturationMetrics, MetricsEmitter.EmitSaturationMetrics, DeleteSaturationMetrics, and the Prometheus metric name itself (wva_kv_cache_tokens_total -> wva_kv_cache_tokens_capacity). "Total" was confusing — the metric is a gauge of capacity, not a cumulative counter. - Replace the analyzer_version="v1"/"v2" label on wva_required_capacity with a unit="binary"/"continuous" label. The label's purpose is to describe the unit of the metric value (a boolean scale-up signal in V1, a continuous token demand in V2), not the code path that produced it. "binary"/"continuous" remains meaningful after V1 is deprecated, whereas "v1"/"v2" becomes vestigial. Rename VariantDecision.AnalyzerVersion -> RequiredCapacityUnit. Rename constants.LabelAnalyzerVersion -> LabelUnit. Rename constants.AnalyzerVersionV1/V2 -> UnitBinary/UnitContinuous. - Expand help strings on wva_saturation_utilization, wva_spare_capacity, wva_kv_cache_tokens_used, and wva_kv_cache_tokens_capacity to specify what is being measured (KV-cache) and how V1 vs V2 paths differ. - Use constants.LabelUnit, UnitBinary, UnitContinuous in the wva_required_capacity help string via fmt.Sprintf, for consistency with how labels are referenced elsewhere.
1 parent 0707e1c commit 71231a8

File tree

6 files changed

+193
-135
lines changed

6 files changed

+193
-135
lines changed

internal/actuator/actuator.go

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,21 @@ func (a *Actuator) EmitSaturationMetrics(ctx context.Context, decision interface
100100
ctx,
101101
decision.VariantName,
102102
decision.Namespace,
103+
decision.ModelID,
103104
decision.AcceleratorName,
104-
decision.AnalyzerVersion,
105+
decision.RequiredCapacityUnit,
105106
decision.Utilization,
106107
decision.SpareCapacity,
107108
decision.RequiredCapacity,
108109
decision.KvCacheTokensUsed,
109-
decision.KvCacheTokensTotal,
110+
decision.KvCacheTokensCapacity,
110111
)
111112
}
113+
114+
// DeleteSaturationMetricsForVariant removes all saturation metric series for a
115+
// variant. Call this when the current optimization cycle produced no fresh
116+
// decision for the variant, or when the VA is being deleted — so dashboards
117+
// don't show stale values.
118+
func (a *Actuator) DeleteSaturationMetricsForVariant(variantName, namespace string) {
119+
a.MetricsEmitter.DeleteSaturationMetricsForVariant(variantName, namespace)
120+
}

internal/constants/metrics.go

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -110,28 +110,28 @@ const (
110110
WVADesiredRatio = "wva_desired_ratio"
111111

112112
// WVASaturationUtilization is a gauge that tracks per-variant utilization ratio (0.0-1.0).
113-
// Labels: variant_name, namespace, accelerator_type
113+
// Labels: variant_name, namespace, model_name, accelerator_type
114114
WVASaturationUtilization = "wva_saturation_utilization"
115115

116116
// WVASpareCapacity is a gauge that tracks per-variant spare capacity (0.0-1.0).
117-
// Labels: variant_name, namespace, accelerator_type
117+
// Labels: variant_name, namespace, model_name, accelerator_type
118118
WVASpareCapacity = "wva_spare_capacity"
119119

120120
// WVARequiredCapacity is a gauge that tracks model-level required capacity.
121121
// >0 means scale-up needed.
122-
// Units differ by analyzer (use the analyzer_version label to distinguish):
123-
// - V1: binary signal (0.0 = no scale-up, 1.0 = scale-up needed)
124-
// - V2: continuous token-based demand
125-
// Labels: variant_name, namespace, analyzer_version
122+
// Value semantics differ by analyzer (use the "unit" label to distinguish):
123+
// - unit="binary" (V1): 0.0 = no scale-up, 1.0 = scale-up needed
124+
// - unit="continuous" (V2): continuous token-based demand
125+
// Labels: variant_name, namespace, model_name, unit
126126
WVARequiredCapacity = "wva_required_capacity"
127127

128128
// WVAKvCacheTokensUsed is a gauge that tracks total KV cache tokens currently in use per variant.
129-
// Labels: variant_name, namespace
129+
// Labels: variant_name, namespace, model_name
130130
WVAKvCacheTokensUsed = "wva_kv_cache_tokens_used"
131131

132-
// WVAKvCacheTokensTotal is a gauge that tracks total KV cache token capacity per variant.
133-
// Labels: variant_name, namespace
134-
WVAKvCacheTokensTotal = "wva_kv_cache_tokens_total"
132+
// WVAKvCacheTokensCapacity is a gauge that tracks total KV cache token capacity per variant.
133+
// Labels: variant_name, namespace, model_name
134+
WVAKvCacheTokensCapacity = "wva_kv_cache_tokens_capacity"
135135
)
136136

137137
// Metric Label Names
@@ -144,11 +144,16 @@ const (
144144
LabelReason = "reason"
145145
LabelAcceleratorType = "accelerator_type"
146146
LabelControllerInstance = "controller_instance"
147-
LabelAnalyzerVersion = "analyzer_version"
147+
// LabelUnit distinguishes the unit of a metric value when a single metric name
148+
// carries values with different semantic units. Currently applied to
149+
// wva_required_capacity, whose value is either a binary scale-up signal (V1)
150+
// or a continuous token-demand value (V2).
151+
LabelUnit = "unit"
148152
)
149153

150-
// Analyzer version label values used in saturation metrics.
154+
// Values for the LabelUnit Prometheus label, describing how to interpret the
155+
// metric value ("binary" 0/1 vs. "continuous" absolute quantity).
151156
const (
152-
AnalyzerVersionV1 = "v1"
153-
AnalyzerVersionV2 = "v2"
157+
UnitBinary = "binary"
158+
UnitContinuous = "continuous"
154159
)

internal/engines/saturation/engine.go

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -764,17 +764,23 @@ func enrichDecisionsFromReplicaMetrics(decisions []interfaces.VariantDecision, r
764764
for i := range decisions {
765765
d := &decisions[i]
766766
d.RequiredCapacity = requiredCapacity
767-
d.AnalyzerVersion = constants.AnalyzerVersionV1
767+
d.RequiredCapacityUnit = constants.UnitBinary
768768
if a, ok := agg[d.VariantName]; ok && a.count > 0 {
769769
d.KvCacheTokensUsed = a.kvUsed
770-
d.KvCacheTokensTotal = a.kvTotal
770+
d.KvCacheTokensCapacity = a.kvTotal
771+
// V1 reasons about saturation per-replica using KvCacheUsage fractions
772+
// (rm.KvCacheUsage is 0.0-1.0), not tokens. Report the mean of those
773+
// per-replica fractions as the variant-level utilization — this
774+
// matches what the V1 analyzer actually evaluates against its
775+
// thresholds. V2 uses a different (token-demand / capacity) formula;
776+
// see the field doc on VariantDecision.Utilization.
771777
d.Utilization = a.kvUsageSum / float64(a.count)
772778
}
773779
}
774780
}
775781

776-
// enrichDecisionsWithKvTokenData sets KvCacheTokensUsed, KvCacheTokensTotal, and
777-
// AnalyzerVersion on decisions from replica metrics aggregated per (model, variant).
782+
// enrichDecisionsWithKvTokenData sets KvCacheTokensUsed, KvCacheTokensCapacity, and
783+
// RequiredCapacityUnit on decisions from replica metrics aggregated per (model, variant).
778784
// Used by V2 path where Utilization and RequiredCapacity are already set from
779785
// AnalyzerResult.
780786
//
@@ -805,10 +811,10 @@ func enrichDecisionsWithKvTokenData(decisions []interfaces.VariantDecision, mode
805811

806812
for i := range decisions {
807813
d := &decisions[i]
808-
d.AnalyzerVersion = constants.AnalyzerVersionV2
814+
d.RequiredCapacityUnit = constants.UnitContinuous
809815
if a, ok := agg[variantKey{modelID: d.ModelID, variant: d.VariantName}]; ok {
810816
d.KvCacheTokensUsed = a.kvUsed
811-
d.KvCacheTokensTotal = a.kvTotal
817+
d.KvCacheTokensCapacity = a.kvTotal
812818
}
813819
}
814820
}
@@ -1157,14 +1163,20 @@ func (e *Engine) applySaturationDecisions(
11571163
}
11581164

11591165
// Emit saturation and capacity metrics for observability.
1160-
// Note: stale time series for deleted VAs are not cleaned up automatically here.
1161-
// The metrics package exposes DeleteSaturationMetrics for callers (e.g., the
1162-
// VariantAutoscaling reconciler's delete handler / finalizer) to remove series
1163-
// when a VA is removed.
1166+
// When this cycle produced no fresh decision for the variant, actively
1167+
// clear the existing series so dashboards show a gap ("no fresh data")
1168+
// rather than stale values that would otherwise persist until Prometheus'
1169+
// 5-minute staleness marker fires. For fully-deleted VAs, additional
1170+
// cleanup via the reconciler's delete handler / finalizer is still
1171+
// required (see DeleteSaturationMetricsForVariant).
11641172
if hasDecision {
11651173
if err := act.EmitSaturationMetrics(ctx, decision); err != nil {
11661174
logger.Error(err, "Failed to emit saturation metrics", "variant", updateVa.Name)
11671175
}
1176+
} else {
1177+
act.DeleteSaturationMetricsForVariant(updateVa.Name, updateVa.Namespace)
1178+
logger.V(logging.DEBUG).Info("Cleared stale saturation metrics (no fresh decision this cycle)",
1179+
"variant", updateVa.Name, "namespace", updateVa.Namespace)
11681180
}
11691181

11701182
// Update Shared State and Trigger Reconcile via Channel

internal/interfaces/saturation_analyzer.go

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -195,25 +195,32 @@ type VariantDecision struct {
195195
// V1: threshold-relative spare KV capacity (AvgSpareKvCapacity).
196196
// V2: 1.0 - Utilization (absolute spare).
197197
SpareCapacity float64
198-
// Utilization is the variant-level utilization ratio (0.0-1.0).
199-
// V2: from AnalyzerResult.VariantCapacities[].Utilization.
200-
// V1: average KvCacheUsage across this variant's replicas.
198+
// Utilization is the variant-level utilization ratio (0.0-1.0) reported for
199+
// observability. The exact formula differs by analyzer because V1 and V2
200+
// reason about saturation differently:
201+
// V1: mean of per-replica KvCacheUsage fractions (matches what V1's
202+
// per-replica threshold check operates on).
203+
// V2: TotalDemand / TotalCapacity from AnalyzerResult (token-demand-based).
204+
// For uniform-capacity replicas the two are numerically equivalent; for
205+
// mixed-capacity replicas V2's value is capacity-weighted.
201206
Utilization float64
202207
// KvCacheTokensUsed is the sum of TokensInUse across this variant's replicas.
203208
KvCacheTokensUsed int64
204-
// KvCacheTokensTotal is the sum of TotalKvCapacityTokens across this variant's replicas.
205-
KvCacheTokensTotal int64
209+
// KvCacheTokensCapacity is the sum of TotalKvCapacityTokens across this variant's replicas.
210+
KvCacheTokensCapacity int64
206211
// RequiredCapacity is the model-level required capacity (>0 means scale-up needed).
207212
// Same value for all variants of a model.
208213
// V1: binary (1.0 if shouldScaleUp, else 0.0).
209214
// V2: continuous token-based demand from AnalyzerResult.
210-
// Use AnalyzerVersion to disambiguate the units when consuming this field
215+
// Use RequiredCapacityUnit to disambiguate the units when consuming this field
211216
// (or its corresponding Prometheus metric).
212217
RequiredCapacity float64
213-
// AnalyzerVersion identifies which analyzer produced this decision ("v1" or "v2").
214-
// Exposed as a Prometheus label on saturation metrics so dashboards can filter
215-
// by analyzer to handle the V1/V2 unit difference in RequiredCapacity.
216-
AnalyzerVersion string
218+
// RequiredCapacityUnit describes the unit of RequiredCapacity ("binary" or "continuous").
219+
// Exposed as the `unit` Prometheus label on wva_required_capacity so dashboards
220+
// can filter by semantics rather than by which analyzer produced the value.
221+
// "binary": V1 path, value is 0.0 or 1.0
222+
// "continuous": V2 path, value is a token-demand magnitude
223+
RequiredCapacityUnit string
217224
// ScaleTargetRef references the Deployment/StatefulSet for scheduling constraints
218225
ScaleTargetRef *autoscalingv2.CrossVersionObjectReference
219226

0 commit comments

Comments
 (0)