Skip to content

fix: always set MetricsAvailable condition in VA status#567

Merged
clubanderson merged 8 commits intomainfrom
fix/always-set-metrics-available
Jan 9, 2026
Merged

fix: always set MetricsAvailable condition in VA status#567
clubanderson merged 8 commits intomainfrom
fix/always-set-metrics-available

Conversation

@clubanderson
Copy link
Copy Markdown
Contributor

Summary

  • Fix MetricsReady condition not showing in VA status
  • The condition was never being set because the code tried to copy from a local object where it didn't exist
  • Now directly sets the condition based on whether metrics data is available

Test plan

  • Deploy and check oc get variantautoscaling -A shows MetricsReady column populated

  The MetricsAvailable condition was not showing in VA status because the
  code tried to copy it from a local VA object where it was never set.

  Instead of copying a potentially nil condition, we now directly set
  MetricsAvailable based on whether we have metrics data:
  - True if we have an allocation (from metrics collection) or a decision
    (from saturation analysis)
  - False otherwise, indicating pods may not be ready or metrics not yet
    scraped
Copilot AI review requested due to automatic review settings January 9, 2026 20:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where the MetricsReady condition was not being displayed in the VariantAutoscaling (VA) status. The issue occurred because the code attempted to copy a condition from a local object where it didn't exist. The fix directly sets the MetricsAvailable condition based on whether metrics data (allocation or decision) is available for the VA.

Changes:

  • Replaced condition copying logic with direct condition setting based on metrics availability
  • Added logic to check for metrics data presence using allocation or decision existence
  • Introduced clear condition messages for both available and unavailable metrics states

Comment thread internal/engines/saturation/engine.go Outdated
Comment thread internal/engines/saturation/engine.go Outdated
  Address Copilot review feedback - the message now accurately reflects
  that metrics data is available rather than implying active collection.
  The previous fix set the condition on a local object that was never
  persisted. The condition must flow through the DecisionCache to the
  controller which actually updates the API server.

  Changes:
  - Add MetricsAvailable fields to VariantDecision struct
  - Store metrics availability in the decision cache
  - Controller reads from cache and sets the condition on VA status
Copilot AI review requested due to automatic review settings January 9, 2026 20:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

  When pods aren't ready yet, the engine skips full status updates due to
  missing accelerator info. However, we still need to set MetricsAvailable=False
  so users can see the condition in the VA status.

  Now populates the cache and triggers reconciliation even in this case.
Copilot AI review requested due to automatic review settings January 9, 2026 20:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread internal/engines/saturation/engine.go
Comment thread internal/engines/saturation/engine.go Outdated
  When cache entry only has MetricsAvailable=false (no accelerator/replicas),
  don't try to update DesiredOptimizedAlloc as it would fail CRD validation.
  Still apply MetricsAvailable condition in all cases.
  Address Copilot review feedback:
  - Extract duplicated MetricsReason/MetricsMessage strings as constants
  - Add comment explaining hasAllocation || hasDecision logic
Copilot AI review requested due to automatic review settings January 9, 2026 21:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread internal/engines/saturation/engine.go
Comment thread internal/controller/variantautoscaling_controller.go Outdated
- Add comment explaining partial decision for metrics status only
- Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)
@clubanderson clubanderson requested a review from asm582 January 9, 2026 21:25
@clubanderson clubanderson merged commit 2963cc7 into main Jan 9, 2026
6 checks passed
github-actions bot added a commit that referenced this pull request Jan 9, 2026
- Add MetricsAvailable condition fix to CHANGELOG v0.5.0
- Enhance CRD reference with detailed condition documentation
- Add Operations & Monitoring section to main README
- Include examples of condition usage and kubectl commands
- Link to comprehensive metrics health monitoring guide

This update ensures the MetricsAvailable condition feature (PR #567)
is properly documented across all relevant guides.
ev-shindin pushed a commit to ev-shindin/workload-variant-autoscaler that referenced this pull request Jan 14, 2026
* fix: always set MetricsAvailable condition in VA status

  The MetricsAvailable condition was not showing in VA status because the
  code tried to copy it from a local VA object where it was never set.

  Instead of copying a potentially nil condition, we now directly set
  MetricsAvailable based on whether we have metrics data:
  - True if we have an allocation (from metrics collection) or a decision
    (from saturation analysis)
  - False otherwise, indicating pods may not be ready or metrics not yet
    scraped

* fix: use more accurate MetricsAvailable condition message

  Address Copilot review feedback - the message now accurately reflects
  that metrics data is available rather than implying active collection.

* fix: persist MetricsAvailable condition via decision cache

  The previous fix set the condition on a local object that was never
  persisted. The condition must flow through the DecisionCache to the
  controller which actually updates the API server.

  Changes:
  - Add MetricsAvailable fields to VariantDecision struct
  - Store metrics availability in the decision cache
  - Controller reads from cache and sets the condition on VA status

* fix: set MetricsAvailable=False even when no accelerator info

  When pods aren't ready yet, the engine skips full status updates due to
  missing accelerator info. However, we still need to set MetricsAvailable=False
  so users can see the condition in the VA status.

  Now populates the cache and triggers reconciliation even in this case.

* debug: add INFO logging for cache operations

* fix: only update DesiredOptimizedAlloc if values are valid

  When cache entry only has MetricsAvailable=false (no accelerator/replicas),
  don't try to update DesiredOptimizedAlloc as it would fail CRD validation.
  Still apply MetricsAvailable condition in all cases.

* refactor: extract MetricsAvailable constants and add explanatory comment

  Address Copilot review feedback:
  - Extract duplicated MetricsReason/MetricsMessage strings as constants
  - Add comment explaining hasAllocation || hasDecision logic

* fix: address additional Copilot review feedback

- Add comment explaining partial decision for metrics status only
- Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)
mamy-CS pushed a commit to mamy-CS/inferno-autoscaler that referenced this pull request Feb 10, 2026
* fix: always set MetricsAvailable condition in VA status

  The MetricsAvailable condition was not showing in VA status because the
  code tried to copy it from a local VA object where it was never set.

  Instead of copying a potentially nil condition, we now directly set
  MetricsAvailable based on whether we have metrics data:
  - True if we have an allocation (from metrics collection) or a decision
    (from saturation analysis)
  - False otherwise, indicating pods may not be ready or metrics not yet
    scraped

* fix: use more accurate MetricsAvailable condition message

  Address Copilot review feedback - the message now accurately reflects
  that metrics data is available rather than implying active collection.

* fix: persist MetricsAvailable condition via decision cache

  The previous fix set the condition on a local object that was never
  persisted. The condition must flow through the DecisionCache to the
  controller which actually updates the API server.

  Changes:
  - Add MetricsAvailable fields to VariantDecision struct
  - Store metrics availability in the decision cache
  - Controller reads from cache and sets the condition on VA status

* fix: set MetricsAvailable=False even when no accelerator info

  When pods aren't ready yet, the engine skips full status updates due to
  missing accelerator info. However, we still need to set MetricsAvailable=False
  so users can see the condition in the VA status.

  Now populates the cache and triggers reconciliation even in this case.

* debug: add INFO logging for cache operations

* fix: only update DesiredOptimizedAlloc if values are valid

  When cache entry only has MetricsAvailable=false (no accelerator/replicas),
  don't try to update DesiredOptimizedAlloc as it would fail CRD validation.
  Still apply MetricsAvailable condition in all cases.

* refactor: extract MetricsAvailable constants and add explanatory comment

  Address Copilot review feedback:
  - Extract duplicated MetricsReason/MetricsMessage strings as constants
  - Add comment explaining hasAllocation || hasDecision logic

* fix: address additional Copilot review feedback

- Add comment explaining partial decision for metrics status only
- Allow numReplicas=0 for scale-to-zero scenarios (only require accelerator)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants