fix(pillar): unbind IOMMU group siblings during PCI passthrough reserve#5670
fix(pillar): unbind IOMMU group siblings during PCI passthrough reserve#5670rucoder wants to merge 2 commits intolf-edge:masterfrom
Conversation
When reserving a PCI device for VFIO passthrough, PCIReserveGeneric only bound the target device to vfio-pci, leaving other devices in the same IOMMU group with their kernel drivers. Any kernel driver bound to an IOMMU group sibling calls iommu_device_use_default_domain() during probe, which increments the group's DMA owner_cnt. This makes the VFIO group non-viable because iommu_group_dma_owner_claimed() returns true, and QEMU refuses to use the group for passthrough. This was exposed after upgrading from EVE-OS 14.5.3 (kernel 6.1.112) to 16.0.0 (kernel 6.12.49) because CONFIG_I2C_I801 was added as a module in the new kernel. The i801_smbus driver auto-loaded and bound to the SMBus controller (80:1f.4) sharing IOMMU group 19 with the target NIC (80:1f.6), making passthrough impossible. Fix by enumerating actual IOMMU group members from sysfs and unbinding kernel drivers from all sibling devices before binding the target to vfio-pci. On release, re-probe siblings so their original drivers rebind. The IOMMU group helpers are implemented on an iommuGroupContext struct with configurable sysfs paths to enable unit testing with fake sysfs. Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
Add 9 test cases covering IOMMU group operations using a fake sysfs tree in a temp directory: - IOMMU group discovery from sysfs symlinks - group member enumeration (multi-device and single-device groups) - vfio-pci driver detection via os.SameFile - sibling unbind (kernel drivers unbound, vfio-pci and unbound skipped) - sibling reprobe (unbound devices probed, already-bound skipped) Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
8e92dc8 to
6fd1719
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5670 +/- ##
==========================================
+ Coverage 19.52% 29.49% +9.96%
==========================================
Files 19 18 -1
Lines 3021 2417 -604
==========================================
+ Hits 590 713 +123
+ Misses 2310 1552 -758
- Partials 121 152 +31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@rucoder I'm not sure if we should unbind siblings devices deliberated like this. You might ended up unbinding important devices without notice and let the system to freeze or crash. Users must be aware of devices under the same IOMMU group that cannot be split through the ACS patch, if they really want to perform the passthrough, then they should passthrough all the devices of the group. We have different cases with the same situation where the the sibling device was a system device or a Thunderbolt controller... so I think it might be error prone to take this approach... for sure I see the advantages as well, that's why I'm not against it, but I'd like to discuss a bit more.... ☝️ @eriknordmark .... |
@rene yes, but EVE only has some devices in the model and we unbind only devices we know about. during kernel upgrade we introduced a new driver which did not exist so there was no problem to pass-through the whole group, now we still tryin to pass-through the whole group BUT one device got a driver and EVE doesn't know about it. so nothing changed in the way we treat the group : "all or nothing", and the group content is exactly the same on both eve versions but since we do not care about device unknown to eve we cannot pass-through the whole group anymore -- driver prevents it |
|
@rene fix works for pass-through issue but WD reset was reported. converting to draft for now |
Description
When reserving a PCI device for VFIO passthrough,
PCIReserveGenericonly bound the target device tovfio-pci, leaving other devices in the same IOMMU group with their kernel drivers. Any kernel driver bound to an IOMMU group sibling callsiommu_device_use_default_domain()during probe, which increments the group's DMAowner_cnt. This makes the VFIO group non-viable becauseiommu_group_dma_owner_claimed()returns true, and QEMU refuses to use the group for passthrough with:vfio: group <N> is not viableThis was exposed after upgrading from EVE-OS 14.5.3 (kernel 6.1.112) to 16.0.0 (kernel 6.12.49) because
CONFIG_I2C_I801was added as a module in the new kernel. Thei801_smbusdriver auto-loaded and bound to the SMBus controller (80:1f.4) sharing IOMMU group 19 with the target NIC (80:1f.6), making passthrough impossible.Fix by enumerating actual IOMMU group members from sysfs and unbinding kernel drivers from all sibling devices before binding the target to vfio-pci. On release, re-probe siblings so their original drivers rebind.
The IOMMU group helpers are implemented on an
iommuGroupContextstruct with configurable sysfs paths, enabling unit testing with a fake sysfs tree.Changes:
iommuGroupContextstruct with configurable sysfs paths and methods:getIOMMUGroup,getMembers,isBoundToVfioPci,unbindSiblings,reprobeSiblingsPCIReserveGenericto unbind IOMMU group siblings before binding target to vfio-pciPCIReleaseGenericto re-probe siblings after releasing the target devicesibling unbind, and sibling reprobe using a fake sysfs tree
How to test and validate this PR
vfio: group 19 is not viablewhen i801_smbus was bound to a sibling devicego test ./hypervisor/... -run "TestGetIOMMU|TestGetMembers|TestIsBound|TestUnbind|TestReprobe" -vChangelog notes
Fixed VFIO PCI passthrough failure ("group is not viable") that occurred when kernel drivers were bound to sibling devices in the same IOMMU group. This commonly affected systems after upgrading to EVE-OS 16.0.0 where the i801_smbus driver was newly enabled.
PR Backports
Checklist
And the last but not least:
check them.