fix: always set MetricsAvailable condition in VA status#567
Merged
clubanderson merged 8 commits intomainfrom Jan 9, 2026
Merged
fix: always set MetricsAvailable condition in VA status#567clubanderson merged 8 commits intomainfrom
clubanderson merged 8 commits intomainfrom
Conversation
The MetricsAvailable condition was not showing in VA status because the
code tried to copy it from a local VA object where it was never set.
Instead of copying a potentially nil condition, we now directly set
MetricsAvailable based on whether we have metrics data:
- True if we have an allocation (from metrics collection) or a decision
(from saturation analysis)
- False otherwise, indicating pods may not be ready or metrics not yet
scraped
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where the MetricsReady condition was not being displayed in the VariantAutoscaling (VA) status. The issue occurred because the code attempted to copy a condition from a local object where it didn't exist. The fix directly sets the MetricsAvailable condition based on whether metrics data (allocation or decision) is available for the VA.
Changes:
- Replaced condition copying logic with direct condition setting based on metrics availability
- Added logic to check for metrics data presence using allocation or decision existence
- Introduced clear condition messages for both available and unavailable metrics states
Address Copilot review feedback - the message now accurately reflects that metrics data is available rather than implying active collection.
The previous fix set the condition on a local object that was never persisted. The condition must flow through the DecisionCache to the controller which actually updates the API server. Changes: - Add MetricsAvailable fields to VariantDecision struct - Store metrics availability in the decision cache - Controller reads from cache and sets the condition on VA status
When pods aren't ready yet, the engine skips full status updates due to missing accelerator info. However, we still need to set MetricsAvailable=False so users can see the condition in the VA status. Now populates the cache and triggers reconciliation even in this case.
When cache entry only has MetricsAvailable=false (no accelerator/replicas), don't try to update DesiredOptimizedAlloc as it would fail CRD validation. Still apply MetricsAvailable condition in all cases.
Address Copilot review feedback: - Extract duplicated MetricsReason/MetricsMessage strings as constants - Add comment explaining hasAllocation || hasDecision logic
- Add comment explaining partial decision for metrics status only - Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)
asm582
approved these changes
Jan 9, 2026
github-actions bot
added a commit
that referenced
this pull request
Jan 9, 2026
- Add MetricsAvailable condition fix to CHANGELOG v0.5.0 - Enhance CRD reference with detailed condition documentation - Add Operations & Monitoring section to main README - Include examples of condition usage and kubectl commands - Link to comprehensive metrics health monitoring guide This update ensures the MetricsAvailable condition feature (PR #567) is properly documented across all relevant guides.
5 tasks
ev-shindin
pushed a commit
to ev-shindin/workload-variant-autoscaler
that referenced
this pull request
Jan 14, 2026
* fix: always set MetricsAvailable condition in VA status
The MetricsAvailable condition was not showing in VA status because the
code tried to copy it from a local VA object where it was never set.
Instead of copying a potentially nil condition, we now directly set
MetricsAvailable based on whether we have metrics data:
- True if we have an allocation (from metrics collection) or a decision
(from saturation analysis)
- False otherwise, indicating pods may not be ready or metrics not yet
scraped
* fix: use more accurate MetricsAvailable condition message
Address Copilot review feedback - the message now accurately reflects
that metrics data is available rather than implying active collection.
* fix: persist MetricsAvailable condition via decision cache
The previous fix set the condition on a local object that was never
persisted. The condition must flow through the DecisionCache to the
controller which actually updates the API server.
Changes:
- Add MetricsAvailable fields to VariantDecision struct
- Store metrics availability in the decision cache
- Controller reads from cache and sets the condition on VA status
* fix: set MetricsAvailable=False even when no accelerator info
When pods aren't ready yet, the engine skips full status updates due to
missing accelerator info. However, we still need to set MetricsAvailable=False
so users can see the condition in the VA status.
Now populates the cache and triggers reconciliation even in this case.
* debug: add INFO logging for cache operations
* fix: only update DesiredOptimizedAlloc if values are valid
When cache entry only has MetricsAvailable=false (no accelerator/replicas),
don't try to update DesiredOptimizedAlloc as it would fail CRD validation.
Still apply MetricsAvailable condition in all cases.
* refactor: extract MetricsAvailable constants and add explanatory comment
Address Copilot review feedback:
- Extract duplicated MetricsReason/MetricsMessage strings as constants
- Add comment explaining hasAllocation || hasDecision logic
* fix: address additional Copilot review feedback
- Add comment explaining partial decision for metrics status only
- Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)
mamy-CS
pushed a commit
to mamy-CS/inferno-autoscaler
that referenced
this pull request
Feb 10, 2026
* fix: always set MetricsAvailable condition in VA status
The MetricsAvailable condition was not showing in VA status because the
code tried to copy it from a local VA object where it was never set.
Instead of copying a potentially nil condition, we now directly set
MetricsAvailable based on whether we have metrics data:
- True if we have an allocation (from metrics collection) or a decision
(from saturation analysis)
- False otherwise, indicating pods may not be ready or metrics not yet
scraped
* fix: use more accurate MetricsAvailable condition message
Address Copilot review feedback - the message now accurately reflects
that metrics data is available rather than implying active collection.
* fix: persist MetricsAvailable condition via decision cache
The previous fix set the condition on a local object that was never
persisted. The condition must flow through the DecisionCache to the
controller which actually updates the API server.
Changes:
- Add MetricsAvailable fields to VariantDecision struct
- Store metrics availability in the decision cache
- Controller reads from cache and sets the condition on VA status
* fix: set MetricsAvailable=False even when no accelerator info
When pods aren't ready yet, the engine skips full status updates due to
missing accelerator info. However, we still need to set MetricsAvailable=False
so users can see the condition in the VA status.
Now populates the cache and triggers reconciliation even in this case.
* debug: add INFO logging for cache operations
* fix: only update DesiredOptimizedAlloc if values are valid
When cache entry only has MetricsAvailable=false (no accelerator/replicas),
don't try to update DesiredOptimizedAlloc as it would fail CRD validation.
Still apply MetricsAvailable condition in all cases.
* refactor: extract MetricsAvailable constants and add explanatory comment
Address Copilot review feedback:
- Extract duplicated MetricsReason/MetricsMessage strings as constants
- Add comment explaining hasAllocation || hasDecision logic
* fix: address additional Copilot review feedback
- Add comment explaining partial decision for metrics status only
- Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
oc get variantautoscaling -Ashows MetricsReady column populated