Skip to content

[VFIO] add basic implementation#5870

Open
ShadowCurse wants to merge 23 commits into
firecracker-microvm:mainfrom
ShadowCurse:vfio_with_dependencies
Open

[VFIO] add basic implementation#5870
ShadowCurse wants to merge 23 commits into
firecracker-microvm:mainfrom
ShadowCurse:vfio_with_dependencies

Conversation

@ShadowCurse

@ShadowCurse ShadowCurse commented May 8, 2026

Copy link
Copy Markdown
Contributor

Changes

Add basic implementation of the VFIO device pass-through.
Current version only allows devices to be added before VM boot.
Other limitations:

  • Only devices with MSIx interrupts are supported.
  • No INTx interrupt support
  • No ROM BAR/IO BAR support
  • No BAR relocation/resizing

Reason

Provide a way to pass physical PCI devices into VM

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse self-assigned this May 8, 2026
@codecov

codecov Bot commented May 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 24.26405% with 849 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.99%. Comparing base (9e8b921) to head (ca19430).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/vfio.rs 21.14% 604 Missing ⚠️
src/vmm/src/device_manager/pci_mngr.rs 0.00% 60 Missing ⚠️
src/vmm/src/pci/configuration.rs 22.95% 47 Missing ⚠️
src/vmm/src/rpc_interface.rs 8.33% 44 Missing ⚠️
src/vmm/src/device_manager/mod.rs 2.56% 38 Missing ⚠️
src/vmm/src/lib.rs 4.16% 23 Missing ⚠️
src/vmm/src/resources.rs 35.00% 13 Missing ⚠️
.../firecracker/src/api_server/request/hotplug/mod.rs 0.00% 8 Missing ⚠️
src/vmm/src/pci/msix.rs 79.31% 6 Missing ⚠️
src/vmm/src/builder.rs 70.58% 5 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5870      +/-   ##
==========================================
- Coverage   83.00%   80.99%   -2.02%     
==========================================
  Files         277      280       +3     
  Lines       30106    31189    +1083     
==========================================
+ Hits        24989    25260     +271     
- Misses       5117     5929     +812     
Flag Coverage Δ
5.10-m5n.metal 81.06% <24.26%> (-2.25%) ⬇️
5.10-m6a.metal 80.35% <24.26%> (-2.30%) ⬇️
5.10-m6g.metal 77.78% <24.26%> (-2.17%) ⬇️
5.10-m6i.metal 81.06% <24.26%> (-2.25%) ⬇️
5.10-m7a.metal-48xl 80.35% <24.26%> (-2.30%) ⬇️
5.10-m7g.metal 77.78% <24.26%> (-2.17%) ⬇️
5.10-m7i.metal-24xl 81.03% <24.26%> (-2.25%) ⬇️
5.10-m7i.metal-48xl 81.03% <24.26%> (-2.25%) ⬇️
5.10-m8g.metal-24xl 77.78% <24.26%> (-2.17%) ⬇️
5.10-m8g.metal-48xl 77.78% <24.26%> (-2.17%) ⬇️
5.10-m8i.metal-48xl 81.03% <24.26%> (-2.25%) ⬇️
5.10-m8i.metal-96xl 81.04% <24.26%> (-2.25%) ⬇️
6.1-m5n.metal 81.09% <24.26%> (-2.25%) ⬇️
6.1-m6a.metal 80.38% <24.26%> (-2.29%) ⬇️
6.1-m6g.metal 77.78% <24.26%> (-2.17%) ⬇️
6.1-m6i.metal 81.08% <24.26%> (-2.25%) ⬇️
6.1-m7a.metal-48xl 80.37% <24.26%> (-2.30%) ⬇️
6.1-m7g.metal 77.78% <24.26%> (-2.17%) ⬇️
6.1-m7i.metal-24xl 81.09% <24.26%> (-2.26%) ⬇️
6.1-m7i.metal-48xl 81.10% <24.26%> (-2.25%) ⬇️
6.1-m8g.metal-24xl 77.78% <24.26%> (-2.17%) ⬇️
6.1-m8g.metal-48xl 77.78% <24.26%> (-2.17%) ⬇️
6.1-m8i.metal-48xl 81.10% <24.26%> (-2.25%) ⬇️
6.1-m8i.metal-96xl 81.10% <24.26%> (-2.26%) ⬇️
6.18-m5n.metal 81.08% <24.26%> (-2.26%) ⬇️
6.18-m6a.metal 80.38% <24.26%> (-2.30%) ⬇️
6.18-m6g.metal 77.78% <24.26%> (-2.17%) ⬇️
6.18-m6i.metal 81.08% <24.26%> (-2.25%) ⬇️
6.18-m7a.metal-48xl 80.37% <24.26%> (-2.30%) ⬇️
6.18-m7g.metal 77.78% <24.26%> (-2.17%) ⬇️
6.18-m7i.metal-24xl 81.10% <24.26%> (-2.25%) ⬇️
6.18-m7i.metal-48xl 81.10% <24.26%> (-2.25%) ⬇️
6.18-m8g.metal-24xl 77.78% <24.26%> (-2.17%) ⬇️
6.18-m8g.metal-48xl 77.78% <24.26%> (-2.17%) ⬇️
6.18-m8i.metal-48xl 81.10% <24.26%> (-2.25%) ⬇️
6.18-m8i.metal-96xl 81.10% <24.26%> (-2.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 11 times, most recently from f6d6fea to 50e789e Compare May 14, 2026 16:29
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 12 times, most recently from 2f84f01 to a21e87e Compare May 27, 2026 13:32
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 4 times, most recently from b2ea5ea to 528e62b Compare May 29, 2026 11:53
Add the VfioConfig and VfioConfigs types for describing VFIO device
configuration. Wire them into VmResources and VmmConfig so that VFIO
devices can be specified before boot. Actual device setup will be added
in later commits.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO BAR regions containing MSI-X table/PBA must be split into
mmappable and emulated parts. KVM memory slots require host-page
alignment, but MSI-X structures can sit at arbitrary offsets
within a BAR. Add align_up_host_page, align_down_host_page, and
offset_from_lower_host_page helpers to expand emulated regions
to page boundaries.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the rust-vmm vfio-bindings (0.6.2) and vfio-ioctls (0.6.0)
crates that provide wrappers around VFIO kernel interfaces.
These are needed by the VFIO code.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add the core VFIO passthrough implementation. This allows physical PCI
devices bound to vfio-pci on the host to be presented to the guest with
minimal overhead.

The implementation covers:
- PCI config space: most reads/writes are proxied to the physical
  device. BARs, MSI-X capability, and select extended capabilities are
  emulated or masked by Firecracker.
- BAR regions: device MMIO regions are mmap'd from the VFIO device fd
  and mapped into guest address space as KVM memory slots. BARs
  containing MSI-X table/PBA are split around the emulated regions using
  either sparse-mmap caps or manual hole calculation.
- MSI-X interrupts: the table and PBA are emulated in Firecracker.
  Physical device interrupts are delivered via eventfds wired through
  KVM irqfd.
- DMA: guest RAM regions are mapped into the VFIO container's IOMMU so
  the device can DMA directly to guest memory.

Only MSI-X interrupts are supported. IO BARs, ROM BARs, legacy INTx, and
MSI (non-X) are not handled.

Also no support for hot-plug/unplug of VFIO devices is present at this
point, so no cleanup for created VFIO devices is present. Only part
which is concerned with cleanup is the device setup code which ensures
that all resources are cleaned up if there are any errors during device
set-up.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add devtool options for preparing a PCI device for VFIO passthrough
testing. --vfio-device accepts a block device path (e.g. /dev/nvme1n1)
or a PCI SBDF, resolves it to a PCI device, binds it to vfio-pci, and
passes the SBDF and sysfs path to the test container via environment
variables. --first-vfio-pci-device is a fallback that searches for the
first NVMe device already bound to vfio-pci if the primary device is not
found.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch 3 times, most recently from afa4c2b to ee3c804 Compare June 8, 2026 13:55
Add an integration tests that verify VFIO passthrough with a physical
NVMe device. Tests are gated behind the `vfio` pytest mark and
FC_VFIO_PCI_SBDF environment variable so they only run when a suitable
device is available.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Current VFIO implementation has some restrictions:
- Does not work without PCI since VFIO devices are PCI devices
- Does not work with virtio-mem device since we don't update DMA
  mappings on hot-plug/unplug
- Does not work with virtio-balloon since it can `fadvise` on memory

In order to prevent VMs being launched with invalid configurations,
implement multiple checks for invalid configurations:
- At API level, prevent adding of incompatible combinations (VFIO after
  balloon/mem or in reverse)
- At vm creation or snapshot restoraton since they get VmResources from
  other sources.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO device state is opaque to the VMM and cannot be serialized
or restored. Add VFIO devices to the list of snapshot-incompatible
devices so that snapshot requests are rejected with a clear error
instead of producing a corrupt snapshot.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO devices will use pread64/pwrite64 syscalls (from vfio-ioctls) to
interact with BARs during runtime. Add them to the VPU thread syscall
lists.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
The VFIO integration tests use an NVMe device to verify passthrough
functionality. Update a kernel config to enable the NVMe core and block
device drivers so the guest can detect and use the passthrough device.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
VFIO tests need exclusive access to the passthrough device, so they
cannot run in parallel with other tests. Add a separate Buildkite step
in the PR pipeline that runs only the vfio-marked tests, similar to the
existing performance step. CI instances will have an additional 1GB NVMe
device at /dev/nvme1n1 for this purpose.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Wire up the code to allow hot-plugging of VFIO devices after VM boot.
The API is same as for usual VFIO device addition.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
With VFIO device hot-plug support we need to add all syscalls needed for
VFIO devices creation to the VMM thread.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
During VFIO hot-unplug, device will need to return BARs ranges back to
the resource allocator. To do this we need to be able to understand
which BARs are used and what their sizes are. To do this, add new
utilities to the Bars type.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch from ee3c804 to 1b9aafc Compare June 8, 2026 14:05
@ShadowCurse ShadowCurse marked this pull request as ready for review June 8, 2026 14:30
@ShadowCurse ShadowCurse added Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests labels Jun 8, 2026
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch from 1b9aafc to 2084e08 Compare June 8, 2026 15:14
Implement VFIO device deinit logic and wire it up to the DELETE api.

During VFIO device removal, device returns all resources it allocated
back to the VM (except kvm_slots since we are not currently concerned
with running out of them). The destruction happens in 2 parts (just
like initialization) because it requires cooperation from both the
device and from a pci_mngr.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add docs/vfio.md covering how VFIO passthrough works in Firecracker,
prerequisites (IOMMU, vfio-pci binding), configuration via API and
config file, security considerations, snapshot incompatibility, and
current limitations.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add a changelog entry for the new VFIO PCI device passthrough feature.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
do not merge: point to vfio artifacts

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add VFIO devices as hot-plug/unplug supported.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the vfio_with_dependencies branch from 2084e08 to ca19430 Compare June 8, 2026 15:33
@bacarrdy bacarrdy mentioned this pull request Jun 8, 2026
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant