[VFIO] add basic implementation#5870
Open
ShadowCurse wants to merge 23 commits into
Open
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5870 +/- ##
==========================================
- Coverage 83.00% 80.99% -2.02%
==========================================
Files 277 280 +3
Lines 30106 31189 +1083
==========================================
+ Hits 24989 25260 +271
- Misses 5117 5929 +812
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
f6d6fea to
50e789e
Compare
2f84f01 to
a21e87e
Compare
b2ea5ea to
528e62b
Compare
Add the VfioConfig and VfioConfigs types for describing VFIO device configuration. Wire them into VmResources and VmmConfig so that VFIO devices can be specified before boot. Actual device setup will be added in later commits. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO BAR regions containing MSI-X table/PBA must be split into mmappable and emulated parts. KVM memory slots require host-page alignment, but MSI-X structures can sit at arbitrary offsets within a BAR. Add align_up_host_page, align_down_host_page, and offset_from_lower_host_page helpers to expand emulated regions to page boundaries. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the rust-vmm vfio-bindings (0.6.2) and vfio-ioctls (0.6.0) crates that provide wrappers around VFIO kernel interfaces. These are needed by the VFIO code. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the core VFIO passthrough implementation. This allows physical PCI devices bound to vfio-pci on the host to be presented to the guest with minimal overhead. The implementation covers: - PCI config space: most reads/writes are proxied to the physical device. BARs, MSI-X capability, and select extended capabilities are emulated or masked by Firecracker. - BAR regions: device MMIO regions are mmap'd from the VFIO device fd and mapped into guest address space as KVM memory slots. BARs containing MSI-X table/PBA are split around the emulated regions using either sparse-mmap caps or manual hole calculation. - MSI-X interrupts: the table and PBA are emulated in Firecracker. Physical device interrupts are delivered via eventfds wired through KVM irqfd. - DMA: guest RAM regions are mapped into the VFIO container's IOMMU so the device can DMA directly to guest memory. Only MSI-X interrupts are supported. IO BARs, ROM BARs, legacy INTx, and MSI (non-X) are not handled. Also no support for hot-plug/unplug of VFIO devices is present at this point, so no cleanup for created VFIO devices is present. Only part which is concerned with cleanup is the device setup code which ensures that all resources are cleaned up if there are any errors during device set-up. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add devtool options for preparing a PCI device for VFIO passthrough testing. --vfio-device accepts a block device path (e.g. /dev/nvme1n1) or a PCI SBDF, resolves it to a PCI device, binds it to vfio-pci, and passes the SBDF and sysfs path to the test container via environment variables. --first-vfio-pci-device is a fallback that searches for the first NVMe device already bound to vfio-pci if the primary device is not found. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
afa4c2b to
ee3c804
Compare
Add an integration tests that verify VFIO passthrough with a physical NVMe device. Tests are gated behind the `vfio` pytest mark and FC_VFIO_PCI_SBDF environment variable so they only run when a suitable device is available. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Current VFIO implementation has some restrictions: - Does not work without PCI since VFIO devices are PCI devices - Does not work with virtio-mem device since we don't update DMA mappings on hot-plug/unplug - Does not work with virtio-balloon since it can `fadvise` on memory In order to prevent VMs being launched with invalid configurations, implement multiple checks for invalid configurations: - At API level, prevent adding of incompatible combinations (VFIO after balloon/mem or in reverse) - At vm creation or snapshot restoraton since they get VmResources from other sources. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO device state is opaque to the VMM and cannot be serialized or restored. Add VFIO devices to the list of snapshot-incompatible devices so that snapshot requests are rejected with a clear error instead of producing a corrupt snapshot. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO devices will use pread64/pwrite64 syscalls (from vfio-ioctls) to interact with BARs during runtime. Add them to the VPU thread syscall lists. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The VFIO integration tests use an NVMe device to verify passthrough functionality. Update a kernel config to enable the NVMe core and block device drivers so the guest can detect and use the passthrough device. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO tests need exclusive access to the passthrough device, so they cannot run in parallel with other tests. Add a separate Buildkite step in the PR pipeline that runs only the vfio-marked tests, similar to the existing performance step. CI instances will have an additional 1GB NVMe device at /dev/nvme1n1 for this purpose. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Wire up the code to allow hot-plugging of VFIO devices after VM boot. The API is same as for usual VFIO device addition. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
With VFIO device hot-plug support we need to add all syscalls needed for VFIO devices creation to the VMM thread. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
During VFIO hot-unplug, device will need to return BARs ranges back to the resource allocator. To do this we need to be able to understand which BARs are used and what their sizes are. To do this, add new utilities to the Bars type. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
ee3c804 to
1b9aafc
Compare
1b9aafc to
2084e08
Compare
Implement VFIO device deinit logic and wire it up to the DELETE api. During VFIO device removal, device returns all resources it allocated back to the VM (except kvm_slots since we are not currently concerned with running out of them). The destruction happens in 2 parts (just like initialization) because it requires cooperation from both the device and from a pci_mngr. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add docs/vfio.md covering how VFIO passthrough works in Firecracker, prerequisites (IOMMU, vfio-pci binding), configuration via API and config file, security considerations, snapshot incompatibility, and current limitations. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add a changelog entry for the new VFIO PCI device passthrough feature. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
do not merge: point to vfio artifacts Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add VFIO devices as hot-plug/unplug supported. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
2084e08 to
ca19430
Compare
11 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Add basic implementation of the VFIO device pass-through.
Current version only allows devices to be added before VM boot.
Other limitations:
Reason
Provide a way to pass physical PCI devices into VM
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.