Skip to content

hypervisor: refactor VMM overhead into shared library and use in eve-k#5681

Open
zedi-pramodh wants to merge 2 commits intolf-edge:masterfrom
zedi-pramodh:vmm-overhead
Open

hypervisor: refactor VMM overhead into shared library and use in eve-k#5681
zedi-pramodh wants to merge 2 commits intolf-edge:masterfrom
zedi-pramodh:vmm-overhead

Conversation

@zedi-pramodh
Copy link

This commit adds VMM Overhead support to eve-k

Extract VMM overhead calculation functions from kvm.go into a new shared file vmm_overhead.go (no build tag) so they are available to all hypervisor backends:

  • vmmOverhead: top-level function respecting global config, per-app override, and automatic estimation priority
  • estimatedVMMOverhead: combines all overhead components
  • ramVMMOverhead: 2.5% of domain RAM for page table overhead
  • qemuVMMOverhead: 20 MB for QEMU binaries/libraries
  • mmioVMMOverhead: 1% of total MMIO size for GPU/VGA passthrough devices
  • cpuVMMOverhead: 3 MB per vCPU
  • undefinedVMMOverhead: 350 MB base overhead for QEMU internal use

kvm.go: remove the above functions (now in vmm_overhead.go); CountMemOverhead wrapper is unchanged.

kubevirt.go: add CountMemOverhead method on kubevirtContext that delegates to the shared vmmOverhead function, replacing the inherited zero-overhead ctrdContext implementation. Add uuid import.

How to test and validate this PR

  1. Create a VM with VMM overhead value
  2. Verify that value is calculated and published in zedmanager AppInstanceStatus
  3. VM still starts with actual RAM value provided, verified that getRemainingMemory(ctx) uses this overhead in remaining memory calculation

Changelog notes

Users can now use VMM overhead value in AppInstance config for eve-k

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

…nto KubeVirt

Extract VMM overhead calculation functions from kvm.go into a new shared
file vmm_overhead.go (no build tag) so they are available to all hypervisor
backends:
- vmmOverhead: top-level function respecting global config, per-app override,
  and automatic estimation priority
- estimatedVMMOverhead: combines all overhead components
- ramVMMOverhead: 2.5% of domain RAM for page table overhead
- qemuVMMOverhead: 20 MB for QEMU binaries/libraries
- mmioVMMOverhead: 1% of total MMIO size for GPU/VGA passthrough devices
- cpuVMMOverhead: 3 MB per vCPU
- undefinedVMMOverhead: 350 MB base overhead for QEMU internal use

kvm.go: remove the above functions (now in vmm_overhead.go); CountMemOverhead
wrapper is unchanged.

kubevirt.go: add CountMemOverhead method on kubevirtContext that delegates to
the shared vmmOverhead function, replacing the inherited zero-overhead
ctrdContext implementation. Add uuid import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
Previously AllocatedMB in KubeVirt DomainMetric was set to the Kubernetes
resource limit (guest RAM only), so reported metrics and device-level
AllocatedAppsMB undercounted total host memory consumption by the full
VMM overhead.

Calculate and store the VMM overhead (via the shared vmmOverhead function)
in vmiMetaData when the domain config is created — both for VMI replicasets
(CreateReplicaVMIConfig) and pod replicasets (CreateReplicaPodConfig).

In GetDomsCPUMem (VMI virt-handler metrics path), add the stored overhead
to AllocatedMB in the post-processing loop after all metrics are filled.

In getPodMetrics (pod replicaset path), add the stored overhead to
AllocatedMB before constructing the DomainMetric.

This makes KubeVirt consistent with KVM, where AllocatedMB is read from
the cgroup HierarchicalMemoryLimit which naturally includes QEMU overhead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
@zedi-pramodh
Copy link
Author

Verified domain metrics to see overhead got added..
[kube] root@c2c8906f-4230-4fd6-99c7-4456fcebc6d4:/run/domainmgr/DomainMetric$ cat 5454a0a1-7da4-4278-bb5a-a6c91c09fbb3.json | jq
{
"UUIDandVersion": {
"UUID": "5454a0a1-7da4-4278-bb5a-a6c91c09fbb3",
"Version": "1"
},
"CPUTotalNs": 0,
"CPUScaled": 2,
"AllocatedMB": 2648, --> 600MB overhead
"UsedMemory": 305,
"MaxUsedMemory": 305,
"AvailableMemory": 1598,
"UsedMemoryPercent": 0.11518126888217523,
"LastHeard": "2026-03-17T00:17:13.383722193Z",
"Activated": true,
"NodeName": ""
}

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 29.49%. Comparing base (2281599) to head (1103db0).
⚠️ Report is 341 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5681      +/-   ##
==========================================
+ Coverage   19.52%   29.49%   +9.96%     
==========================================
  Files          19       18       -1     
  Lines        3021     2417     -604     
==========================================
+ Hits          590      713     +123     
+ Misses       2310     1552     -758     
- Partials      121      152      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors KVM VMM memory overhead estimation into a shared hypervisor helper and applies the same overhead calculation to the KubeVirt (“eve-k”) backend so that AppInstance memory accounting can include hypervisor overhead.

Changes:

  • Extracted VMM overhead calculation helpers from kvm.go into a new shared vmm_overhead.go.
  • Updated KubeVirt backend to report/track VMM memory overhead (including adding it into AllocatedMB metrics output).
  • Removed the duplicated overhead helpers from kvm.go (KVM continues to call the same vmmOverhead wrapper).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
pkg/pillar/hypervisor/vmm_overhead.go New shared VMM overhead estimation implementation (RAM/QEMU/MMIO/CPU/base components + global/per-app override priority).
pkg/pillar/hypervisor/kvm.go Removes overhead helper implementations now moved into the shared file.
pkg/pillar/hypervisor/kubevirt.go Adds CountMemOverhead for kubevirt and propagates overhead into domain/pod metrics accounting via stored metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +199 to +200
result, err := vmmOverhead(domainName, domainUUID, domainRAMSize, vmmMaxMem, domainMaxCpus, domainVCpus, domainIoAdapterList, aa, globalConfig)
return uint64(result), err
Comment on lines +515 to +521
overhead, err := vmmOverhead(domainName, config.UUIDandVersion.UUID,
int64(config.Memory), int64(config.VMMMaxMem),
int64(config.MaxCpus), int64(config.VCpus), config.IoAdapterList, aa, nil)
if err != nil {
logrus.Warnf("CreateReplicaVMIConfig: vmmOverhead failed for %s: %v, using 0", domainName, err)
overhead = 0
}
Comment on lines +1439 to +1445
overhead, err := vmmOverhead(domainName, config.UUIDandVersion.UUID,
int64(config.Memory), int64(config.VMMMaxMem),
int64(config.MaxCpus), int64(config.VCpus), config.IoAdapterList, aa, nil)
if err != nil {
logrus.Warnf("CreateReplicaPodConfig: vmmOverhead failed for %s: %v, using 0", domainName, err)
overhead = 0
}
// Add VMM overhead to AllocatedMB so it reflects total host memory
// consumption (guest RAM + hypervisor overhead), consistent with KVM.
if vmis, ok := ctx.vmiList[n]; ok && vmis.memOverhead > 0 {
r.AllocatedMB += uint32(vmis.memOverhead / (1024 * 1024))
// overhead so it reflects total host memory consumption, consistent with KVM.
allocatedMB := uint32(memoryLimits.Value()) / BytesInMegabyte
if vmis != nil && vmis.memOverhead > 0 {
allocatedMB += uint32(vmis.memOverhead / uint64(BytesInMegabyte))
overhead = vmmMaxMem << 10

// Global node setting has a higher priority.
// Note: globalConfig can be nil only in unit tests.
}

// memory allocated by QEMU for its own purposes.
// statistical analysis did not revile any correlation between
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants