|
| 1 | +# Intel iGPU Passthrough in EVE OS |
| 2 | + |
| 3 | +This document explains how Intel integrated GPU (iGPU) passthrough works in EVE OS, |
| 4 | +what was wrong with the original approach, how it is implemented today, what works and |
| 5 | +what does not, and what needs to be updated when Intel releases new GPU generations. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Background: what the iGPU needs from firmware |
| 10 | + |
| 11 | +Before the OS driver (i915 on Linux, or the Intel display driver on Windows) can |
| 12 | +initialize the Intel iGPU in a VM, two things must be set up in the guest PCI config |
| 13 | +space by firmware — either BIOS or UEFI — during the VM boot phase. |
| 14 | + |
| 15 | +### OpRegion — ASLS register (PCI config offset 0xFC) |
| 16 | + |
| 17 | +The OpRegion is an Intel-defined in-memory structure populated by the host platform |
| 18 | +firmware. It contains the **Video BIOS Table (VBT)** which describes the physical display |
| 19 | +topology: which ports exist (HDMI, DisplayPort, eDP), EDID overrides, hotplug |
| 20 | +configuration, panel sequencing, and more. |
| 21 | + |
| 22 | +Without it, the i915 driver has no idea which physical connectors are wired up. DP/HDMI |
| 23 | +detection and hotplug will not work. |
| 24 | + |
| 25 | +QEMU copies the host's OpRegion into the VM via the fw_cfg entry `etc/igd-opregion` when |
| 26 | +`x-igd-opregion=on` is set on the vfio-pci device. Guest firmware must: |
| 27 | + |
| 28 | +1. Read this fw_cfg file |
| 29 | +2. Allocate a reserved memory region below 4 GB (ACPI NVS, 4 KB aligned) |
| 30 | +3. Copy the content into it |
| 31 | +4. Write the 32-bit physical address into ASLS (PCI config offset 0xFC) of the iGPU |
| 32 | + |
| 33 | +### Stolen memory — BDSM register (PCI config offset 0x5C or 0xC0) |
| 34 | + |
| 35 | +The Intel iGPU reserves a region of RAM during POST for the Graphics Translation Table |
| 36 | +(GTT). The base address of this stolen region is held in the BDSM register. The i915 |
| 37 | +driver reads BDSM to locate it. |
| 38 | + |
| 39 | +If BDSM is zero or contains the host physical address (not a valid guest address), i915 |
| 40 | +fails to initialize the GPU. |
| 41 | + |
| 42 | +QEMU writes the stolen memory size to fw_cfg as `etc/igd-bdsm-size` (8-byte little-endian |
| 43 | +integer). Guest firmware must: |
| 44 | + |
| 45 | +1. Read this fw_cfg file to get the size |
| 46 | +2. Allocate a 1 MB-aligned reserved memory region below 4 GB |
| 47 | +3. Write the physical address into BDSM |
| 48 | + |
| 49 | +The register width changed with Intel Gen11: |
| 50 | + |
| 51 | +- **Gen6–Gen10** (Sandy Bridge through Comet Lake): BDSM is a **32-bit** register at |
| 52 | + offset **0x5C** |
| 53 | +- **Gen11+** (Ice Lake, Tiger Lake, Alder Lake, Raptor Lake, and later): BDSM is a |
| 54 | + **64-bit** register at offset **0xC0** |
| 55 | + |
| 56 | +This distinction is critical and was the main bug in the original EVE implementation. |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## Why SeaBIOS / i440fx works |
| 61 | + |
| 62 | +SeaBIOS is a legacy BIOS. When it encounters a PCI option ROM (the VBIOS — the 64 KB |
| 63 | +Video BIOS built into the IGD device), it executes it in 16-bit real mode. The VBIOS: |
| 64 | + |
| 65 | +1. Checks the device is at guest BDF `00:02.0` (hardcoded) |
| 66 | +2. Reads the LPC/ISA bridge at `00:1f.0` to verify device IDs match real Intel hardware |
| 67 | +3. Reads the GMCH register at `00:00.0` to find stolen memory size |
| 68 | +4. Allocates stolen memory, writes address to BDSM |
| 69 | +5. Initializes the framebuffer |
| 70 | + |
| 71 | +For this to work in a VM under i440fx: |
| 72 | + |
| 73 | +- The device must be at guest BDF `00:02.0` |
| 74 | +- A fake LPC bridge must exist at `1f.0` with host device IDs copied in via `x-igd-lpc` |
| 75 | +- The i440fx machine type has no permanent occupant at `1f.0`, so QEMU creates a |
| 76 | + `vfio-pci-igd-lpc-bridge` device there |
| 77 | + |
| 78 | +**i440fx works because slot `1f.0` is free** and can host the fake LPC bridge. |
| 79 | + |
| 80 | +--- |
| 81 | + |
| 82 | +## Why q35/UEFI fails without special handling |
| 83 | + |
| 84 | +UEFI/OVMF in EVE has no CSM (Compatibility Support Module). The VBIOS never executes. |
| 85 | +Nobody sets BDSM. Nobody allocates OpRegion memory. The OS driver fails to initialize. |
| 86 | + |
| 87 | +The q35 machine type permanently occupies `1f.0` with the ICH9 LPC controller. QEMU |
| 88 | +explicitly refuses to enable the legacy VBIOS path when it finds a real device there |
| 89 | +(`hw/vfio/igd.c`: "cannot support legacy mode due to existing devices at 1f.0", also |
| 90 | +called "Sorry Q35" in comments). `x-igd-lpc` — the QEMU option that copies LPC bridge |
| 91 | +device IDs for the VBIOS path — does nothing useful on q35/UEFI. |
| 92 | + |
| 93 | +The additional problem was in QEMU's `vfio_probe_igd_bar4_quirk()`: the code that writes |
| 94 | +`etc/igd-bdsm-size` to fw_cfg and emulates BDSM/GMCH was placed *after* the BDF and LPC |
| 95 | +bridge checks. On q35, the "Sorry Q35" path exits early before reaching that code, so the |
| 96 | +fw_cfg entry is never written and BDSM is never emulated. |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## The correct approach: EFI Option ROM with IgdAssignmentDxe |
| 101 | + |
| 102 | +### VfioIgdPkg |
| 103 | + |
| 104 | +Upstream OVMF maintainers declined to accept IGD-specific code (TianoCore Bug #935). |
| 105 | +The solution is a standalone EFI Option ROM delivered as `romfile=` on the vfio-pci |
| 106 | +device. The project that implements this is |
| 107 | +[VfioIgdPkg](https://github.com/tomitamoeko/VfioIgdPkg). |
| 108 | + |
| 109 | +VfioIgdPkg builds `igd.rom`, an EFI Option ROM containing: |
| 110 | + |
| 111 | +- **IgdAssignmentDxe** — sets up OpRegion and BDSM; this is the required component |
| 112 | +- **PlatformGopPolicy** — implements the protocol needed by the proprietary Intel GOP |
| 113 | + driver for pre-OS framebuffer; only needed if a proprietary GOP ROM is also loaded |
| 114 | + |
| 115 | +### How it works end to end |
| 116 | + |
| 117 | +1. EVE builds `igd.rom` from VfioIgdPkg as part of `pkg/uefi` and ships it in the image. |
| 118 | +2. When an Intel iGPU is configured for passthrough, EVE's KVM hypervisor adds |
| 119 | + `romfile=<path to igd.rom>` to the vfio-pci device arguments and enables |
| 120 | + `x-igd-opregion=on`. |
| 121 | +3. The iGPU is placed at guest BDF `00:02.0` (required by VfioIgdPkg's BDSM allocation |
| 122 | + path). |
| 123 | +4. OVMF's PCI bus driver discovers the EFI ROM on the device (identified by EFI ROM |
| 124 | + header, not the legacy `0x55 0xAA` signature) and loads `IgdAssignmentDxe.efi` as a |
| 125 | + DXE driver. |
| 126 | +5. `IgdAssignmentDxe` runs during the DXE phase: |
| 127 | + - Reads `etc/igd-opregion` from fw_cfg, allocates ACPI NVS memory below 4 GB, copies |
| 128 | + the OpRegion content, and writes the guest physical address to ASLS (0xFC). |
| 129 | + - Registers a PciIo notification callback; when the iGPU PciIo protocol appears, it |
| 130 | + reads the GMS field from the (emulated) GMCH register, allocates 1 MB-aligned |
| 131 | + reserved memory for stolen memory, and writes the guest physical address to BDSM |
| 132 | + (0x5C for Gen6–Gen10, 0xC0 for Gen11+). |
| 133 | +6. The OS driver (i915 / Intel display driver) initializes successfully. |
| 134 | + |
| 135 | +### Changes to QEMU's vfio-igd quirk |
| 136 | + |
| 137 | +The QEMU patches in `pkg/xen-tools` (patches 08–11) rework `hw/vfio/igd.c`: |
| 138 | + |
| 139 | +**Patch 08 — igd_gen() backport**: upstream's `igd_gen()` returns correct generation |
| 140 | +numbers for Gen7 through Gen12 (Haswell through Raptor Lake). The old function returned |
| 141 | +8 for all unrecognised device IDs, making generation-specific checks (BDSM register |
| 142 | +offset, GMS encoding) ineffective on Gen9+ hardware. |
| 143 | + |
| 144 | +**Patch 09 — main rework of `vfio_probe_igd_bar4_quirk()`**: |
| 145 | + |
| 146 | +- **GMCH emulation, `etc/igd-bdsm-size` fw_cfg write, and BDSM emulation are moved |
| 147 | + before the BDF/LPC bridge checks.** On q35 the "Sorry Q35" path exits early; without |
| 148 | + this move, those registers are never set and `IgdAssignmentDxe` cannot do its job. |
| 149 | +- **BDSM is emulated at the correct PCI config offset**: 0x5C (32-bit) for Gen6–Gen10, |
| 150 | + 0xC0 (64-bit) for Gen11+. Initialized to zero so `IgdAssignmentDxe`'s idempotency |
| 151 | + guard (skip if BDSM ≠ 0) is not falsely triggered by the host physical address. |
| 152 | +- **GMS is preserved** in the emulated GMCH register. The guest driver reads GMS to |
| 153 | + determine stolen memory size; zeroing it caused the Windows driver to crash (no |
| 154 | + stolen memory available). Upstream QEMU does not zero GMS. |
| 155 | +- **Stale GTT entries are cleared** before the BDF check. After host POST the GTT |
| 156 | + contains entries pointing to host physical addresses, causing IOMMU faults. |
| 157 | +- **GMS encoding for Gen9+ Atom SKUs** (codes 0xf0–0xff, 4 MB granularity) is fixed to |
| 158 | + match the Linux kernel's `i915_gem_stolen.c`. |
| 159 | +- The generation check is fixed to accept any recognized generation (`gen >= 0`) instead |
| 160 | + of the old hard-coded `gen == 6 || gen == 8` which silently blocked Gen9–Gen12 devices. |
| 161 | + |
| 162 | +**Patch 10 — BAR0 BDSM MMIO mirror** (backported from upstream): the GPU reads BDSM |
| 163 | +through BAR0 MMIO at offset `0x1080C0` as well as PCI config space. Without this |
| 164 | +quirk, the MMIO read returns the host physical address while PCI config returns the |
| 165 | +emulated guest PA. The driver sees conflicting values and crashes. This was the |
| 166 | +critical missing piece for Tiger Lake and other Gen11+ devices. |
| 167 | + |
| 168 | +Based on upstream QEMU commits: |
| 169 | +- [`11b5ce95`](https://github.com/qemu/qemu/commit/11b5ce95beecfd51d1b17858d23fe9cbb0b5783f) |
| 170 | + "vfio/igd: add new bar0 quirk to emulate BDSM mirror" by Corvin Köhne |
| 171 | +- [`f926baa0`](https://github.com/qemu/qemu/commit/f926baa03b7babb8291ea4c1cbeadaf224977dae) |
| 172 | + "vfio/igd: emulate BDSM in mmio bar0 for gen 6-10 devices" by Tomita Moeko |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +## What works and what does not |
| 177 | + |
| 178 | +| Feature | Status | Notes | |
| 179 | +| ------- | ------ | ----- | |
| 180 | +| i915 / Intel display driver initialization | ✓ Works | OpRegion + BDSM set correctly | |
| 181 | +| DP / HDMI output in the guest OS | ✓ Works | VBT from OpRegion describes connectors | |
| 182 | +| Hotplug detection (DP, HDMI) | ✓ Works | HPD interrupts forwarded via OpRegion | |
| 183 | +| Display connector topology | ✓ Works | | |
| 184 | +| Multiple monitors | ✓ Works | Driver-managed | |
| 185 | +| Windows Intel display driver (no Code 43) | ✓ Works | | |
| 186 | +| UEFI framebuffer during firmware phase | ✗ Not available | Requires proprietary Intel GOP driver | |
| 187 | +| Pre-OS graphical output | ✗ Not available | Same reason | |
| 188 | +| Gen12+ (Meteor Lake, Arrow Lake, Lunar Lake) OpRegion | ✓ Works | BDSM quirk not needed (LMEMBAR) | |
| 189 | + |
| 190 | +Pre-OS display requires `IntelGopDriver.efi`, the proprietary Intel GOP driver from the |
| 191 | +host platform firmware. EVE's design does not depend on pre-OS display. If pre-OS display |
| 192 | +is needed in a future use case, a proprietary GOP ROM can be placed alongside `igd.rom` |
| 193 | +and loaded as a second option ROM on the device. |
| 194 | + |
| 195 | +--- |
| 196 | + |
| 197 | +## Supported Intel GPU generations |
| 198 | + |
| 199 | +VfioIgdPkg supports the following generations: |
| 200 | + |
| 201 | +| igd_gen() | BDSM register | Microarchitectures | PCI Device ID prefix | |
| 202 | +| --------- | ------------- | ------------------ | -------------------- | |
| 203 | +| 6 | 32-bit 0x5C | Sandy Bridge, Ivy Bridge | `0x01xx` | |
| 204 | +| 7 | 32-bit 0x5C | Haswell, Valleyview/Bay Trail | `0x04xx`, `0x0axx`, `0x0cxx`, `0x0dxx`, `0x0fxx` | |
| 205 | +| 8 | 32-bit 0x5C | Broadwell, Cherryview | `0x16xx`, `0x22xx` | |
| 206 | +| 9 | 32-bit 0x5C | Skylake, Kaby Lake, Coffee Lake, Comet Lake, Gemini Lake, Broxton | `0x19xx`, `0x59xx`, `0x3exx`, `0x9Bxx`, `0x31xx`, `0x_a84` | |
| 207 | +| 11 | 64-bit 0xC0 | Ice Lake, Elkhart Lake, Jasper Lake | `0x8Axx`, `0x45xx`, `0x4Exx` | |
| 208 | +| 12 | 64-bit 0xC0 | Tiger Lake, Rocket Lake, Alder Lake, Raptor Lake | `0x9Axx`, `0x4Cxx`, `0x46xx`, `0xA7xx` | |
| 209 | +| -1 (unknown) | No BDSM (LMEMBAR) | Meteor Lake (`0x7Dxx`), Arrow Lake, Lunar Lake (`0x64xx`), Panther Lake | not yet in `igd_gen()` | |
| 210 | + |
| 211 | +### Meteor Lake and later (no BDSM) |
| 212 | + |
| 213 | +Starting from Meteor Lake, Intel moved stolen memory access to LMEMBAR (MMIO BAR2) |
| 214 | +and **removed the BDSM register** from PCI config space. For these devices: |
| 215 | + |
| 216 | +- `IgdAssignmentDxe` recognises these devices (they are in VfioIgdPkg's device table |
| 217 | + with `GetStolenSize = NULL` and `&NullPrivate`), so it **skips stolen memory setup |
| 218 | + entirely** and only sets up OpRegion (ASLS). It never reads `etc/igd-bdsm-size` — |
| 219 | + VfioIgdPkg calculates stolen size from GMS internally, so a missing fw_cfg entry is |
| 220 | + irrelevant. |
| 221 | +- OpRegion passthrough works via `x-igd-opregion=on`, which is independent of |
| 222 | + `igd_gen()` and the vfio-igd BAR4 quirk. EVE's `kvm.go` always enables this for |
| 223 | + Intel iGPUs. |
| 224 | +- The QEMU `igd_gen()` function does not yet recognise Meteor Lake+ device IDs |
| 225 | + (returns -1), so GMCH/BDSM emulation and the BAR0 mirror are skipped. This is |
| 226 | + correct — there is no BDSM to emulate. |
| 227 | +- **Meteor Lake+ may already work** with no QEMU changes needed: OpRegion is handled, |
| 228 | + stolen memory is accessed through LMEMBAR (BAR2) which VFIO passes through as a |
| 229 | + normal BAR, and VfioIgdPkg skips BDSM setup. The only thing to verify is that the |
| 230 | + guest driver's LMEMBAR access works correctly through VFIO BAR passthrough. |
| 231 | + |
| 232 | +--- |
| 233 | + |
| 234 | +## Updating EVE for a new Intel iGPU generation |
| 235 | + |
| 236 | +When Intel releases a new GPU generation, check the following in order: |
| 237 | + |
| 238 | +### 1. VfioIgdPkg device table |
| 239 | + |
| 240 | +The primary check: does the new device ID appear in VfioIgdPkg's device table? |
| 241 | + |
| 242 | +- Repository: <https://github.com/tomitamoeko/VfioIgdPkg> |
| 243 | +- File: `IgdAssignmentDxe/IgdPrivate.c` — look for the `IgdIds` array |
| 244 | +- If the new Device ID is missing, open an issue or PR against VfioIgdPkg |
| 245 | +- After a new VfioIgdPkg commit is available, update `VFIOIGD_COMMIT` in |
| 246 | + `pkg/uefi/Dockerfile` and rebuild |
| 247 | + |
| 248 | +### 2. BDSM register location |
| 249 | + |
| 250 | +Check if the new generation uses a new BDSM register offset or width: |
| 251 | + |
| 252 | +- Intel publishes graphics PRM (Programmer's Reference Manual) at |
| 253 | + <https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/> |
| 254 | +- Look for "Base Data of Stolen Memory" in the PCI config space register map |
| 255 | +- If the offset or width changed, `IgdAssignmentDxe/IgdAssignment.c` in VfioIgdPkg needs |
| 256 | + a new generation handler, and the QEMU patch in `pkg/xen-tools` may also need updating |
| 257 | + |
| 258 | +### 3. QEMU vfio-igd quirk |
| 259 | + |
| 260 | +Check `hw/vfio/igd.c` in the upstream QEMU repository: |
| 261 | + |
| 262 | +- The `igd_gen()` function maps PCI Device IDs to generations — new IDs must be added |
| 263 | +- The `GetStolenSize()` variant for the new generation must handle any GMS encoding |
| 264 | + changes (check Linux kernel `drivers/gpu/drm/i915/gem/i915_gem_stolen.c` for reference) |
| 265 | +- If upstream QEMU already has support, the patch in `pkg/xen-tools` should be rebased |
| 266 | + onto the newer xen-qemu base |
| 267 | + |
| 268 | +### 4. EDK2 / OVMF compatibility |
| 269 | + |
| 270 | +- VfioIgdPkg tracks EDK2 stable releases; check VfioIgdPkg's `VfioIgdPkg.dsc` for any |
| 271 | + new EDK2 library dependencies |
| 272 | +- Update `EDK_VERSION` and `EDK_COMMIT` in `pkg/uefi/Dockerfile` if needed |
| 273 | +- Regenerate edk2 patches in `pkg/uefi/edk2-patches/edk2-stable<version>/` against the |
| 274 | + new EDK2 base (use `git apply --ignore-whitespace` for CRLF-tolerant patch application, |
| 275 | + and `git format-patch` from within an actual edk2 checkout to preserve CRLF in context |
| 276 | + lines) |
| 277 | + |
| 278 | +### 5. Stolen memory size encoding |
| 279 | + |
| 280 | +If a new generation introduces new GMS encoding codes in the GMCH register, update both: |
| 281 | + |
| 282 | +- VfioIgdPkg's `IgdAssignmentDxe/IgdPrivate.c` (`GetStolenSize()`) |
| 283 | +- The QEMU patch (`pkg/xen-tools/patches-4.19.0/x86_64/09-vfio-igd-q35-uefi-bdsm-opregion.patch`) |
| 284 | + — specifically the GMS decoding block before the fw_cfg write |
| 285 | + |
| 286 | +--- |
| 287 | + |
| 288 | +## Component map |
| 289 | + |
| 290 | +| Component | File | Role | |
| 291 | +| --------- | ---- | ---- | |
| 292 | +| UEFI Option ROM build | `pkg/uefi/Dockerfile`, `pkg/uefi/build.sh` | Builds `igd.rom` from VfioIgdPkg | |
| 293 | +| EFI Option ROM (runtime) | `pkg/xen-tools/` ships `igd.rom` to host rootfs | Loaded by OVMF; runs `IgdAssignmentDxe` | |
| 294 | +| KVM hypervisor integration | `pkg/pillar/hypervisor/kvm.go` | Detects iGPU, sets `romfile=`, BDF, opregion | |
| 295 | +| QEMU igd_gen() backport | `pkg/xen-tools/.../08-vfio-igd-backport-igd-gen.patch` | Gen7–Gen12 device ID detection | |
| 296 | +| QEMU vfio-igd rework | `pkg/xen-tools/.../09-vfio-igd-q35-uefi-bdsm-opregion.patch` | GMCH/BDSM/fw_cfg/GTT for q35/UEFI | |
| 297 | +| QEMU BAR0 BDSM mirror | `pkg/xen-tools/.../10-vfio-igd-bar0-bdsm-mirror.patch` | Intercepts BAR0 MMIO BDSM reads | |
| 298 | +| EDK2 base | `pkg/uefi/edk2-patches/edk2-stable*/` | EVE-specific patches on top of EDK2 | |
0 commit comments