Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: check group map before enable pci device passthrough #97

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

WebberHuang1118
Copy link
Member

@WebberHuang1118 WebberHuang1118 commented Nov 15, 2024

Problem:
PCIDevice's device plugin uses an out-of-date group after node rebooting

Solution:
check the group map before enabling PCIDevice device plugin

Related Issue:
harvester/harvester#6892

Test plan:

  • Enabling a PCIDevice
  • Creating a VM attach the PCIDevice
  • After starting the VM successfully, rebooting the node
  • Checking if the VM can start successfully across node rebooting

@ihcsim
Copy link

ihcsim commented Nov 15, 2024

For my own understanding, if the device's status.iommuGroup is stale, should it be updated to match that found in the group path, like what the pci device controller is doing in

devCopy.Status.Update(dev, nodename, iommuGroupMap) // update the in-memory CR with the current PCI info

Looks like the pci device controller is also doing the same group map check in

// Build up the IOMMU group map
iommuGroupPaths, err := iommu.GroupPaths()
if err != nil {
return err
}
iommuGroupMap := iommu.GroupMapForPCIDevices(iommuGroupPaths)

@ibrokethecloud
Copy link
Collaborator

@ihcsim the issue is when the iommu group is stale, likely after an OS upgrade, the device plugin gets setup with the wrong iommu group, which results in wrong iommu group being passed to the launcher pod.

With this change we wait for the periodic device reconcile to update the pcidevice object and this blocks an incorrect device plugin from being setup.

Copy link
Collaborator

@ibrokethecloud ibrokethecloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks.

@WebberHuang1118 WebberHuang1118 merged commit 0cd5672 into harvester:master Dec 17, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants