fix: always set MetricsAvailable condition in VA status by clubanderson · Pull Request #567 · llm-d/llm-d-workload-variant-autoscaler

clubanderson · 2026-01-09T20:21:22Z

Summary

Fix MetricsReady condition not showing in VA status
The condition was never being set because the code tried to copy from a local object where it didn't exist
Now directly sets the condition based on whether metrics data is available

Test plan

Deploy and check oc get variantautoscaling -A shows MetricsReady column populated

The MetricsAvailable condition was not showing in VA status because the code tried to copy it from a local VA object where it was never set. Instead of copying a potentially nil condition, we now directly set MetricsAvailable based on whether we have metrics data: - True if we have an allocation (from metrics collection) or a decision (from saturation analysis) - False otherwise, indicating pods may not be ready or metrics not yet scraped

Copilot

Pull request overview

This PR fixes a bug where the MetricsReady condition was not being displayed in the VariantAutoscaling (VA) status. The issue occurred because the code attempted to copy a condition from a local object where it didn't exist. The fix directly sets the MetricsAvailable condition based on whether metrics data (allocation or decision) is available for the VA.

Changes:

Replaced condition copying logic with direct condition setting based on metrics availability
Added logic to check for metrics data presence using allocation or decision existence
Introduced clear condition messages for both available and unavailable metrics states

Address Copilot review feedback - the message now accurately reflects that metrics data is available rather than implying active collection.

The previous fix set the condition on a local object that was never persisted. The condition must flow through the DecisionCache to the controller which actually updates the API server. Changes: - Add MetricsAvailable fields to VariantDecision struct - Store metrics availability in the decision cache - Controller reads from cache and sets the condition on VA status

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

When pods aren't ready yet, the engine skips full status updates due to missing accelerator info. However, we still need to set MetricsAvailable=False so users can see the condition in the VA status. Now populates the cache and triggers reconciliation even in this case.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

When cache entry only has MetricsAvailable=false (no accelerator/replicas), don't try to update DesiredOptimizedAlloc as it would fail CRD validation. Still apply MetricsAvailable condition in all cases.

Address Copilot review feedback: - Extract duplicated MetricsReason/MetricsMessage strings as constants - Add comment explaining hasAllocation || hasDecision logic

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

- Add comment explaining partial decision for metrics status only - Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)

- Add MetricsAvailable condition fix to CHANGELOG v0.5.0 - Enhance CRD reference with detailed condition documentation - Add Operations & Monitoring section to main README - Include examples of condition usage and kubectl commands - Link to comprehensive metrics health monitoring guide This update ensures the MetricsAvailable condition feature (PR #567) is properly documented across all relevant guides.

* fix: always set MetricsAvailable condition in VA status The MetricsAvailable condition was not showing in VA status because the code tried to copy it from a local VA object where it was never set. Instead of copying a potentially nil condition, we now directly set MetricsAvailable based on whether we have metrics data: - True if we have an allocation (from metrics collection) or a decision (from saturation analysis) - False otherwise, indicating pods may not be ready or metrics not yet scraped * fix: use more accurate MetricsAvailable condition message Address Copilot review feedback - the message now accurately reflects that metrics data is available rather than implying active collection. * fix: persist MetricsAvailable condition via decision cache The previous fix set the condition on a local object that was never persisted. The condition must flow through the DecisionCache to the controller which actually updates the API server. Changes: - Add MetricsAvailable fields to VariantDecision struct - Store metrics availability in the decision cache - Controller reads from cache and sets the condition on VA status * fix: set MetricsAvailable=False even when no accelerator info When pods aren't ready yet, the engine skips full status updates due to missing accelerator info. However, we still need to set MetricsAvailable=False so users can see the condition in the VA status. Now populates the cache and triggers reconciliation even in this case. * debug: add INFO logging for cache operations * fix: only update DesiredOptimizedAlloc if values are valid When cache entry only has MetricsAvailable=false (no accelerator/replicas), don't try to update DesiredOptimizedAlloc as it would fail CRD validation. Still apply MetricsAvailable condition in all cases. * refactor: extract MetricsAvailable constants and add explanatory comment Address Copilot review feedback: - Extract duplicated MetricsReason/MetricsMessage strings as constants - Add comment explaining hasAllocation || hasDecision logic * fix: address additional Copilot review feedback - Add comment explaining partial decision for metrics status only - Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)

Copilot AI review requested due to automatic review settings January 9, 2026 20:21

Copilot AI reviewed Jan 9, 2026

View reviewed changes

Comment thread internal/engines/saturation/engine.go Outdated

Comment thread internal/engines/saturation/engine.go Outdated

clubanderson added 2 commits January 9, 2026 15:25

fix: use more accurate MetricsAvailable condition message

de890f3

Address Copilot review feedback - the message now accurately reflects that metrics data is available rather than implying active collection.

Copilot AI review requested due to automatic review settings January 9, 2026 20:41

Copilot AI reviewed Jan 9, 2026

View reviewed changes

clubanderson added 2 commits January 9, 2026 15:49

debug: add INFO logging for cache operations

8452253

Copilot AI review requested due to automatic review settings January 9, 2026 20:57

Copilot AI reviewed Jan 9, 2026

View reviewed changes

Comment thread internal/engines/saturation/engine.go

Comment thread internal/engines/saturation/engine.go Outdated

clubanderson added 2 commits January 9, 2026 16:03

fix: only update DesiredOptimizedAlloc if values are valid

9148d1f

When cache entry only has MetricsAvailable=false (no accelerator/replicas), don't try to update DesiredOptimizedAlloc as it would fail CRD validation. Still apply MetricsAvailable condition in all cases.

refactor: extract MetricsAvailable constants and add explanatory comment

b4198a4

Address Copilot review feedback: - Extract duplicated MetricsReason/MetricsMessage strings as constants - Add comment explaining hasAllocation || hasDecision logic

Copilot AI review requested due to automatic review settings January 9, 2026 21:07

Copilot AI reviewed Jan 9, 2026

View reviewed changes

Comment thread internal/engines/saturation/engine.go

Comment thread internal/controller/variantautoscaling_controller.go Outdated

fix: address additional Copilot review feedback

4f37c7f

- Add comment explaining partial decision for metrics status only - Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)

clubanderson requested a review from asm582 January 9, 2026 21:25

asm582 approved these changes Jan 9, 2026

View reviewed changes

clubanderson merged commit 2963cc7 into main Jan 9, 2026
6 checks passed

github-actions bot mentioned this pull request Jan 9, 2026

docs: enhance MetricsAvailable condition documentation #568

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: always set MetricsAvailable condition in VA status#567

fix: always set MetricsAvailable condition in VA status#567
clubanderson merged 8 commits intomainfrom
fix/always-set-metrics-available

clubanderson commented Jan 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clubanderson commented Jan 9, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants