-
Notifications
You must be signed in to change notification settings - Fork 34
[RFC] Support hardware-accelerated nested translation via iommufd #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add infrastructure to enable VFIO devices to leverage hardware IOMMU
acceleration through iommufd's uAPIs. This allows userspace VMMs to
attach VFIO devices to hardware-accelerated virtual IOMMUs, particularly
enabling userspace to configure stage-1 (guest-managed) page tables that
are composed with stage-2 (host-managed) page tables in hardware.
This depends on the IommufdVIOMMU and IommufdVDevice abstractions
introduced in the iommufd-ioctls crate [1].
New Public Interfaces:
1. VfioIommufd::new() signature change:
- Added `s1_hwpt_data_type: Option<iommu_hwpt_data_type>` parameter
- When `Some`, enables nested translation mode for subsequently attached
VFIO devices
- Supported types: IOMMU_HWPT_DATA_ARM_SMMUV3, IOMMU_HWPT_DATA_VTD_S1
2. VfioDevice::new_with_iommufd():
- New constructor for vfio devices backed by iommufd with
hardware-accelerated nested HWPT support
- Automatically creates IommufdVIommu/IommufdVDevice when nested mode
is enabled via `VfioIommufd`
- Supports sharing a single `IommufdVIommu` instance across multiple
VFIO devices
- Returns `IommufdVDevice` handle for subsequent S1 HWPT operations
- Attaches device to bypass HWPT by default (until guest enables IOMMU)
3. VfioDevice::install_s1_hwpt():
- Install guest-configured stage-1 page tables into hardware
- Called when guest writes to virtual IOMMU stream table entries
- Atomically replaces existing S1 HWPT if present
- Uses `IommufdHwptData` enum for type-safe hardware-specific configuration
4. VfioDevice::uninstall_s1_hwpt():
- Revert device to bypass or abort mode
- abort=true: Use abort HWPT (fault all DMA)
- abort=false: Use bypass HWPT (passthrough translation)
- Called during guest IOMMU reset or shutdown
Dependencies on iommufd-ioctls:
This implementation builds upon three types from iommufd-ioctls [1]:
- `IommufdVIommu`: Represents a physical IOMMU slice managing S2 HWPT
and default S1 HWPTs (bypass/abort). Shared across devices behind the
same virtual IOMMU.
- `IommufdVDevice`: Represents a device attached to a `IommufdVIommu`.
Handles dynamic S1 HWPT allocation and lifecycle management.
- `IommufdHwptData`: Type-safe enum for architecture-specific HWPT
configuration (SMMUv3 STE data, VT-d context entries).
Integration Notes for VMMs:
1. VMM creates `VfioIommufd` with `s1_hwpt_data_type` if hardware
accelerated virtual IOMMUs are enabled and used to manage
VFIO devices
2. VMM calls `VfioDevice::new_with_iommufd()` per passthrough device
- The same instance of virtual IOMMU should reuse the same instance
of `IommufdVIommu`
- Each VFIO device will has its own `VfioDevice` and `IommufdVDevice`
instance
3. VMM need to make sure the virtual IOMMU is compatible with the
physical IOMMU:
- `IommufdVDevice::get_hw_info` is used to retrieve hardware
information of the physical IOMMU
3. VMM traps guest IOMMU commands and calls:
- `install_s1_hwpt()` when guest enables IOMMU
- `uninstall_s1_hwpt()` when guest disables IOMMU
- `IommufdVIommu::invalidate_hwpt()` when guest invalidate IOTLB
entries
This enables VMM to enable hardware-accelerated IOMMU to manage VFIO
devices and use physical IOMMU hardware to directly process guest page
tables.
[1] cloud-hypervisor/iommufd#5
Signed-off-by: Bo Chen <[email protected]>
c3a9795 to
f4bae29
Compare
| iommufd: Arc<IommuFd>, | ||
| ioas_id: Option<u32>, | ||
| device_fd: Option<VfioContainerDeviceHandle>, | ||
| s1_hwpt_data_type: Option<iommu_hwpt_data_type>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| s1_hwpt_data_type: Option<iommu_hwpt_data_type>, | |
| nested_hwpt: Option<iommu_hwpt_data_type>, |
I think using nested_hwpt conveys more clearly that we're trying to use HWPT_NESTED with IOMMUFD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion.
| /// - If `None` and nested HWPT is enabled, a new vIOMMU instance is created and returned. | ||
| /// - If `Some`, the provided instance is reused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we should allow so much flexibility. If we want to maintain a clear API, I'd rather expect the caller to always create the IommufdVIommu (when needed). That means this function should only return Result<Self>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good discussion here.
The goal is to provide a unified API that supports both use cases: standard mode (current behavior with iommufd where devices are not managed by a virtual IOMMU in userspace) and accelerated mode (nested HWPT with a hardware-accelerated virtual IOMMU).
With the current design, the VMM maintains a consistent workflow. The only variation is whether the VfioIommufd instance is initialized with nested_hwpt enabled.
While the interfaces could be decomposed into more primitive operations, this would significantly increase the management burden on the VMM without providing clear added value.
Comparison of the workflows from the caller (e.g. userspace VMM):
// 1. Current Proposal (Unified API)
// The VMM only handles high-level initialization.
let (vfio_device, iommufd_vdevice) = VfioDevice::new_with_iommufd(
vfio_path,
vfio_iommufd,
&mut iommufd_viommu,
virt_sid
);// 2. Hypothetical "Primitive" API
// This forces the VMM to manually glue the components together.
let vfio_device = VfioDevice::new(vfio_path, vfio_iommufd);
// The VMM must manually extract IDs and link objects:
// new API for the accelerated mode only
let vfio_dev_id = vfio_device.get_dev_id();
// new API from iommufd that VMM needs to interact with directly
let iommufd_viommu = IommufdVIommu::new(iommufd, vfio_dev_id);
// new API from iommufd that VMM needs to interact with directly
let iommufd_vdevice = IommufdVDevice::new(iommufd_viommu, virt_sid);
// And manually attach the Stage-1 page table:
vfio_device.attach_default_s1_hwpt(iommufd_viommu); // new API for the accelerated mode onlyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I acknowledge that balancing API simplicity with effective encapsulation is always a trade-off.
It will be easier to gauge the trade-off with this design once we have a concrete implementation. We are currently working on that reference case: integrating accelerated vSMMUv3 support into Cloud Hypervisor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the decision comes down to the expectation we have from this crate. I always think of this crate as a simple Rust layer, which is why I'm expecting the implementation to be as simple as possible. But if others think it's a good idea to embed a bit more logic into it, I'm fine with it!
| /// # Parameters | ||
| /// * `vdevice`: the `IommufdVDevice` instance associated with the vfio device. | ||
| /// * `hwpt_data`: the hwpt data to create s1 hwpt. | ||
| pub fn install_s1_hwpt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd define the install function above the uninstall one.
| hwpt_data: &IommufdHwptData, | ||
| ) -> Result<()> { | ||
| // Uninstall existing s1 hwpt if exists | ||
| self.uninstall_s1_hwpt(vdevice, true)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be part of the install function. The function should fail if some page tables are already there (meaning the caller should be in charge of installing/uninstalling). The API should be as simple as possible, which means it shouldn't perform too many tasks (I think the caller should be in charge of driving the creation/cleanup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this more or less falls into the same trade-off as discussed above, though I am much less opinionated in this case - given the uninstall_s1_hwpt() is always exposed to the caller and is very simple to use.
Motivation
Add infrastructure to enable VFIO devices to leverage hardware IOMMU acceleration through iommufd's uAPIs. This allows userspace VMMs to attach VFIO devices to hardware-accelerated virtual IOMMUs, particularly enabling userspace to configure stage-1 (guest-managed) page tables that are composed with stage-2 (host-managed) page tables in hardware.
This depends on the
IommufdVIOMMUandIommufdVDeviceabstractions introduced in the iommufd-ioctls crate [1].Architecture Overview
New Public Interfaces
VfioIommufd::new()extended with nested hwpt configuration:s1_hwpt_data_type: Option<iommu_hwpt_data_type>parameterSome, enables nested translation mode for subsequently attached VFIO devicesIOMMU_HWPT_DATA_ARM_SMMUV3,IOMMU_HWPT_DATA_VTD_S1VfioDevice::new_with_iommufd():VfioIommufdIommufdVIommuinstance across multiple VFIO devicesIommufdVDevicehandle for subsequent S1 HWPT operationsVfioDevice::install_s1_hwpt():IommufdHwptDataenum for type-safe hardware-specific configurationVfioDevice::uninstall_s1_hwpt():Dependencies on iommufd-ioctls:
This implementation builds upon three types from iommufd-ioctls [1]:
IommufdVIommu: Represents a physical IOMMU slice managing S2 HWPT and default S1 HWPTs (bypass/abort). Shared across devices behind the same virtual IOMMU.IommufdVDevice: Represents a device attached to aIommufdVIommu. Handles dynamic S1 HWPT allocation and lifecycle management.IommufdHwptData: Type-safe enum for architecture-specific HWPT configuration (SMMUv3 STE data, VT-d context entries).Integration Notes for VMMs:
VfioIommufdwiths1_hwpt_data_typeif hardware accelerated virtual IOMMUs are enabled and used to manage VFIO devicesVfioDevice::new_with_iommufd()per passthrough deviceIommufdVIommuVfioDeviceandIommufdVDeviceinstanceIommufdVDevice::get_hw_infois used to retrieve hardware information of the physical IOMMUinstall_s1_hwpt()when guest enables IOMMUuninstall_s1_hwpt()when guest disables IOMMUIommufdVIommu::invalidate_hwpt()when guest invalidate IOTLB entriesThis enables VMM to enable hardware-accelerated IOMMU to manage VFIO devices and use physical IOMMU hardware to directly process guest page tables.
[1] cloud-hypervisor/iommufd#5