Skip to content

cluster-alerts: Add alert rule UnsupportedOrDeprecatedMachineType #3358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dasionov
Copy link
Contributor

@dasionov dasionov commented Mar 26, 2025

What this PR does / why we need it:

This PR adds the UnsupportedOrDeprecatedMachineType alert to detect when a VM is using a deprecated or unsupported machine type.

Rule Expression:

    kubevirt_vm_info * on(machine_type) group_left(deprecated) (
    kubevirt_supported_machine_types unless kubevirt_supported_machine_types{deprecated="yes"})

How It Works:

  • Filters out machine types that have any deprecated="yes" entry.
  • Joins kubevirt_vm_info with the remaining supported types.
  • Triggers an alert if a VM uses a deprecated or unsupported type.

This helps ensure VMs run on supported machine types, preventing potential issues.

Reviewer Checklist

  • PR Message
  • Commit Messages
  • How to test
  • Unit Tests
  • Functional Tests
  • User Documentation
  • Developer Documentation
  • Upgrade Scenario
  • Uninstallation Scenario
  • Backward Compatibility
  • Troubleshooting Friendly

Jira Ticket:

none

Release note:

none

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Mar 26, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign machadovilaca for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot
Copy link
Contributor

@dasionov: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-hyperconverged-cluster-operator-unit-test-s390x 7e08196 link true /test pull-hyperconverged-cluster-operator-unit-test-s390x

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hco-bot
Copy link
Collaborator

hco-bot commented Mar 26, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

openshift-ci bot commented Mar 26, 2025

@dasionov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/hco-e2e-operator-sdk-gcp 7e08196 link true /test hco-e2e-operator-sdk-gcp
ci/prow/hco-e2e-operator-sdk-sno-aws 7e08196 link false /test hco-e2e-operator-sdk-sno-aws
ci/prow/hco-e2e-operator-sdk-aws 7e08196 link true /test hco-e2e-operator-sdk-aws
ci/prow/hco-e2e-upgrade-prev-operator-sdk-aws 7e08196 link true /test hco-e2e-upgrade-prev-operator-sdk-aws
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws 7e08196 link false /test hco-e2e-upgrade-prev-operator-sdk-sno-aws
ci/prow/hco-e2e-upgrade-operator-sdk-aws 7e08196 link true /test hco-e2e-upgrade-operator-sdk-aws
ci/prow/hco-e2e-upgrade-operator-sdk-sno-aws 7e08196 link false /test hco-e2e-upgrade-operator-sdk-sno-aws
ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-aws 7e08196 link true /test hco-e2e-consecutive-operator-sdk-upgrades-aws
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure 7e08196 link false /test hco-e2e-upgrade-prev-operator-sdk-sno-azure
ci/prow/hco-e2e-operator-sdk-sno-azure 7e08196 link false /test hco-e2e-operator-sdk-sno-azure
ci/prow/hco-e2e-operator-sdk-azure 7e08196 link true /test hco-e2e-operator-sdk-azure
ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure 7e08196 link true /test hco-e2e-consecutive-operator-sdk-upgrades-azure
ci/prow/hco-e2e-upgrade-operator-sdk-azure 7e08196 link true /test hco-e2e-upgrade-operator-sdk-azure
ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure 7e08196 link true /test hco-e2e-upgrade-prev-operator-sdk-azure
ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure 7e08196 link false /test hco-e2e-upgrade-operator-sdk-sno-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dasionov dasionov force-pushed the add_alert_for_vms_with_deprecated_machine_type branch from 7e08196 to 871c2da Compare March 26, 2025 14:30
@dasionov dasionov marked this pull request as draft March 26, 2025 14:31
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 26, 2025
@dasionov
Copy link
Contributor Author

/hold, wait for kubevirt/kubevirt#14255 to introduce the metric.

@coveralls
Copy link
Collaborator

coveralls commented Mar 26, 2025

Pull Request Test Coverage Report for Build 14086128705

Details

  • 13 of 13 (100.0%) changed or added relevant lines in 1 file are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 72.235%

Files with Coverage Reduction New Missed Lines %
controllers/operands/operandHandler.go 3 86.14%
Totals Coverage Status
Change from base Build 14057157658: 0.02%
Covered Lines: 6499
Relevant Lines: 8997

💛 - Coveralls

This commit adds the `UnsupportedOrDeprecatedMachineType` alert to
detect when a VM is using a deprecated or unsupported machine type.

Rule Expression:
kubevirt_vm_info * on(machine_type) group_left(deprecated) (
  kubevirt_supported_machine_types unless
  kubevirt_supported_machine_types{deprecated="yes"}
)

How It Works:
- Filters out machine types that have any `deprecated="yes"` entry.
- Joins `kubevirt_vm_info` with the remaining supported types.
- Triggers an alert if a VM uses a deprecated or unsupported type.

This helps ensure VMs run on supported machine types, preventing
potential issues.

Signed-off-by: Daniel Sionov <[email protected]>
@dasionov dasionov force-pushed the add_alert_for_vms_with_deprecated_machine_type branch from 871c2da to a2c3fb6 Compare March 26, 2025 14:41
Copy link

@@ -59,5 +59,18 @@ func clusterAlerts() []promv1.Rule {
"operator_health_impact": "none",
},
},
{
Alert: "UnsupportedOrDeprecatedMachineType",
Expr: intstr.FromString(`kubevirt_vm_info * on(machine_type) group_left(deprecated) (kubevirt_supported_machine_types unless kubevirt_supported_machine_types{deprecated="yes"})`),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to to do count by node so that the alert will fire once for each node that has this issue.
I would do something like:

count by (node) (kubevirt_vm_info * on(machine_type) group_left(deprecated) (kubevirt_supported_machine_types{deprecated="yes"}))

},
Labels: map[string]string{
"severity": "warning",
"operator_health_impact": "none",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a blocker for an upgrade? Shouldnt we degrade the operator and set operator_health_impact to warning?

@kubevirt-bot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants