Skip to content

Report recommended CPU models for heterogeneous clusters#3944

Open
dankenigsberg wants to merge 1 commit intokubevirt:mainfrom
dankenigsberg:recommended-cpu
Open

Report recommended CPU models for heterogeneous clusters#3944
dankenigsberg wants to merge 1 commit intokubevirt:mainfrom
dankenigsberg:recommended-cpu

Conversation

@dankenigsberg
Copy link
Copy Markdown
Member

@dankenigsberg dankenigsberg commented Dec 30, 2025

In heterogeneous clusters with multiple CPU vendors and generations, choosing the best CPU model for VMs is challenging:

  • Using a newer CPU model limits VM placement to a subset of nodes
  • Using an older model impacts guest performance unnecessarily

This change adds automatic CPU model recommendations to the HyperConverged status. The recommendations are sorted by a weighted score that balances:

  • PassMark performance score (50%) - for guest performance
  • Available CPU cores (20%) - for workload capacity
  • Available memory (15%) - for workload capacity
  • Node count (15%) - for availability and redundancy

The top recommendations appear in status.nodeInfo.recommendedCpuModels, helping cluster administrators make informed decisions when setting spec.defaultCPUModel.

The CPU model data is gathered from node labels (cpu-model.node.kubevirt.io/*) set by KubeVirt's virt-handler, and is automatically refreshed when nodes change or hourly.

Assisted-by: claude-4-sonnet

Reviewer Checklist

Reviewers are supposed to review the PR for every aspect below one by one. To check an item means the PR is either "OK" or "Not Applicable" in terms of that item. All items are supposed to be checked before merging a PR.

  • PR Message
  • Commit Messages
  • How to test
  • Unit Tests
  • Functional Tests
  • User Documentation
  • Developer Documentation
  • Upgrade Scenario
  • Uninstallation Scenario
  • Backward Compatibility
  • Troubleshooting Friendly

Jira Ticket:

https://issues.redhat.com/browse/CNV-42906

Release note:

HCO CR status recommends cpu models for VMs

@kubevirt-bot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

1 similar comment
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Dec 30, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Dec 30, 2025
@kubevirt-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nunnatsa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dankenigsberg
Copy link
Copy Markdown
Member Author

/test all

Copy link
Copy Markdown
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the comment, Added some more inline.

Please also handle the linter issues (redefinition of the builtin function max).

@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Jan 2, 2026

Pull Request Test Coverage Report for Build 20880735694

Details

  • 122 of 134 (91.04%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 76.215%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/hyperconverged/hyperconverged_controller.go 1 4 25.0%
pkg/internal/nodeinfo/cpu_models.go 119 128 92.97%
Totals Coverage Status
Change from base Build 20851170488: 0.2%
Covered Lines: 8623
Relevant Lines: 11314

💛 - Coveralls

@dankenigsberg dankenigsberg marked this pull request as ready for review January 4, 2026 16:02
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 4, 2026
@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-gcp

Details

In response to this:

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

Details

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 4, 2026

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

Details

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Collaborator

@nunnatsa nunnatsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dankenigsberg

Added some more comments.

Comment on lines +517 to +525

if cpuModels := nodeinfo.GetRecommendedCpuModels(); !slices.EqualFunc(req.Instance.Status.NodeInfo.RecommendedCpuModels, cpuModels, func(a, b hcov1beta1.CpuModelInfo) bool {
return a.Name == b.Name && a.Benchmark == b.Benchmark && a.Nodes == b.Nodes &&
((a.CPU == nil && b.CPU == nil) || (a.CPU != nil && b.CPU != nil && a.CPU.Equal(*b.CPU))) &&
((a.Memory == nil && b.Memory == nil) || (a.Memory != nil && b.Memory != nil && a.Memory.Equal(*b.Memory)))
}) {
req.Instance.Status.NodeInfo.RecommendedCpuModels = cpuModels
req.StatusDirty = true
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we already have a comparison function in the nodeinfo package, we can't expose it for reuse, because it's in an internal package, but maybe we can add it as the CpuModel type's method, instead?
  2. this code is not tested. Can we have a few unit tests for it? We can use a simple mock, as done for the rest of the functions in the nodeinfo package, to return any desired value.

In heterogeneous clusters with multiple CPU vendors and generations,
choosing the best CPU model for VMs is challenging:

- Using a newer CPU model limits VM placement to a subset of nodes
- Using an older model impacts guest performance unnecessarily

This change adds automatic CPU model recommendations to the
HyperConverged status. The recommendations are sorted by a weighted
score that balances:

- PassMark performance score (50%) - for guest performance
- Available CPU cores (20%) - for workload capacity
- Available memory (15%) - for workload capacity
- Node count (15%) - for availability and redundancy

The top recommendations appear in status.nodeInfo.recommendedCpuModels,
helping cluster administrators make informed decisions when setting
spec.defaultCPUModel.

The CPU model data is gathered from node labels (cpu-model.node.kubevirt.io/*)
set by KubeVirt's virt-handler, and is automatically refreshed when nodes
change or hourly.

Assisted-by: claude-4-sonnet
Signed-off-by: Dan Kenigsberg <danken@redhat.com>
@sonarqubecloud
Copy link
Copy Markdown

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-gcp, ci/prow/hco-e2e-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp
hco-e2e-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-azure

Details

In response to this:

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

Details

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-aws lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Jan 10, 2026

@dankenigsberg: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure f435588 link false /test hco-e2e-upgrade-prev-operator-sdk-sno-azure
ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure f435588 link true /test hco-e2e-upgrade-prev-operator-sdk-azure

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Details

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Copy Markdown
Collaborator

hco-bot commented Jan 10, 2026

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

Details

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Barakmor1
Copy link
Copy Markdown
Member

Thank you for this important effort. Have you considered clusters with multiple vendors or architectures? Would this recommendation still be valid in those cases?

@dankenigsberg
Copy link
Copy Markdown
Member Author

Thank you for this important effort. Have you considered clusters with multiple vendors or architectures? Would this recommendation still be valid in those cases?

Yes: if most of the CPU power of the cluster come from an "exotic" architecture, the code would recommend models from that architecture.

)

// cpuModelPassMarkScores maps libvirt CPU model names to approximate PassMark scores.
// Keys must match exactly the cpu-model.node.kubevirt.io/* label values.
Copy link
Copy Markdown
Member

@Barakmor1 Barakmor1 Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads-up that the keys in this list tend to change with different libvirt versions, so this would require continued maintenance.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 22, 2026
@kubevirt-bot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants