Skip to content

Commit afcda61

Browse files
committed
docs: add Intel iGPU passthrough architecture document
Explain how Intel integrated GPU passthrough works in EVE OS: the OpRegion (ASLS) and stolen memory (BDSM) firmware requirements, why SeaBIOS/i440fx works via the legacy VBIOS path, why q35/UEFI fails without special handling, and how VfioIgdPkg's IgdAssignmentDxe EFI Option ROM solves it. Covers the four QEMU patches (igd_gen backport, main rework, BAR0 BDSM mirror, diagnostic logging), supported GPU generations (Gen6 through Gen12+), what works and what does not, and a step-by-step guide for updating EVE when Intel releases a new iGPU generation. Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
1 parent 16dc3c6 commit afcda61

File tree

1 file changed

+298
-0
lines changed

1 file changed

+298
-0
lines changed

docs/INTEL-IGPU-PASSTHROUGH.md

Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
# Intel iGPU Passthrough in EVE OS
2+
3+
This document explains how Intel integrated GPU (iGPU) passthrough works in EVE OS,
4+
what was wrong with the original approach, how it is implemented today, what works and
5+
what does not, and what needs to be updated when Intel releases new GPU generations.
6+
7+
---
8+
9+
## Background: what the iGPU needs from firmware
10+
11+
Before the OS driver (i915 on Linux, or the Intel display driver on Windows) can
12+
initialize the Intel iGPU in a VM, two things must be set up in the guest PCI config
13+
space by firmware — either BIOS or UEFI — during the VM boot phase.
14+
15+
### OpRegion — ASLS register (PCI config offset 0xFC)
16+
17+
The OpRegion is an Intel-defined in-memory structure populated by the host platform
18+
firmware. It contains the **Video BIOS Table (VBT)** which describes the physical display
19+
topology: which ports exist (HDMI, DisplayPort, eDP), EDID overrides, hotplug
20+
configuration, panel sequencing, and more.
21+
22+
Without it, the i915 driver has no idea which physical connectors are wired up. DP/HDMI
23+
detection and hotplug will not work.
24+
25+
QEMU copies the host's OpRegion into the VM via the fw_cfg entry `etc/igd-opregion` when
26+
`x-igd-opregion=on` is set on the vfio-pci device. Guest firmware must:
27+
28+
1. Read this fw_cfg file
29+
2. Allocate a reserved memory region below 4 GB (ACPI NVS, 4 KB aligned)
30+
3. Copy the content into it
31+
4. Write the 32-bit physical address into ASLS (PCI config offset 0xFC) of the iGPU
32+
33+
### Stolen memory — BDSM register (PCI config offset 0x5C or 0xC0)
34+
35+
The Intel iGPU reserves a region of RAM during POST for the Graphics Translation Table
36+
(GTT). The base address of this stolen region is held in the BDSM register. The i915
37+
driver reads BDSM to locate it.
38+
39+
If BDSM is zero or contains the host physical address (not a valid guest address), i915
40+
fails to initialize the GPU.
41+
42+
QEMU writes the stolen memory size to fw_cfg as `etc/igd-bdsm-size` (8-byte little-endian
43+
integer). Guest firmware must:
44+
45+
1. Read this fw_cfg file to get the size
46+
2. Allocate a 1 MB-aligned reserved memory region below 4 GB
47+
3. Write the physical address into BDSM
48+
49+
The register width changed with Intel Gen11:
50+
51+
- **Gen6–Gen10** (Sandy Bridge through Comet Lake): BDSM is a **32-bit** register at
52+
offset **0x5C**
53+
- **Gen11+** (Ice Lake, Tiger Lake, Alder Lake, Raptor Lake, and later): BDSM is a
54+
**64-bit** register at offset **0xC0**
55+
56+
This distinction is critical and was the main bug in the original EVE implementation.
57+
58+
---
59+
60+
## Why SeaBIOS / i440fx works
61+
62+
SeaBIOS is a legacy BIOS. When it encounters a PCI option ROM (the VBIOS — the 64 KB
63+
Video BIOS built into the IGD device), it executes it in 16-bit real mode. The VBIOS:
64+
65+
1. Checks the device is at guest BDF `00:02.0` (hardcoded)
66+
2. Reads the LPC/ISA bridge at `00:1f.0` to verify device IDs match real Intel hardware
67+
3. Reads the GMCH register at `00:00.0` to find stolen memory size
68+
4. Allocates stolen memory, writes address to BDSM
69+
5. Initializes the framebuffer
70+
71+
For this to work in a VM under i440fx:
72+
73+
- The device must be at guest BDF `00:02.0`
74+
- A fake LPC bridge must exist at `1f.0` with host device IDs copied in via `x-igd-lpc`
75+
- The i440fx machine type has no permanent occupant at `1f.0`, so QEMU creates a
76+
`vfio-pci-igd-lpc-bridge` device there
77+
78+
**i440fx works because slot `1f.0` is free** and can host the fake LPC bridge.
79+
80+
---
81+
82+
## Why q35/UEFI fails without special handling
83+
84+
UEFI/OVMF in EVE has no CSM (Compatibility Support Module). The VBIOS never executes.
85+
Nobody sets BDSM. Nobody allocates OpRegion memory. The OS driver fails to initialize.
86+
87+
The q35 machine type permanently occupies `1f.0` with the ICH9 LPC controller. QEMU
88+
explicitly refuses to enable the legacy VBIOS path when it finds a real device there
89+
(`hw/vfio/igd.c`: "cannot support legacy mode due to existing devices at 1f.0", also
90+
called "Sorry Q35" in comments). `x-igd-lpc` — the QEMU option that copies LPC bridge
91+
device IDs for the VBIOS path — does nothing useful on q35/UEFI.
92+
93+
The additional problem was in QEMU's `vfio_probe_igd_bar4_quirk()`: the code that writes
94+
`etc/igd-bdsm-size` to fw_cfg and emulates BDSM/GMCH was placed *after* the BDF and LPC
95+
bridge checks. On q35, the "Sorry Q35" path exits early before reaching that code, so the
96+
fw_cfg entry is never written and BDSM is never emulated.
97+
98+
---
99+
100+
## The correct approach: EFI Option ROM with IgdAssignmentDxe
101+
102+
### VfioIgdPkg
103+
104+
Upstream OVMF maintainers declined to accept IGD-specific code (TianoCore Bug #935).
105+
The solution is a standalone EFI Option ROM delivered as `romfile=` on the vfio-pci
106+
device. The project that implements this is
107+
[VfioIgdPkg](https://github.com/tomitamoeko/VfioIgdPkg).
108+
109+
VfioIgdPkg builds `igd.rom`, an EFI Option ROM containing:
110+
111+
- **IgdAssignmentDxe** — sets up OpRegion and BDSM; this is the required component
112+
- **PlatformGopPolicy** — implements the protocol needed by the proprietary Intel GOP
113+
driver for pre-OS framebuffer; only needed if a proprietary GOP ROM is also loaded
114+
115+
### How it works end to end
116+
117+
1. EVE builds `igd.rom` from VfioIgdPkg as part of `pkg/uefi` and ships it in the image.
118+
2. When an Intel iGPU is configured for passthrough, EVE's KVM hypervisor adds
119+
`romfile=<path to igd.rom>` to the vfio-pci device arguments and enables
120+
`x-igd-opregion=on`.
121+
3. The iGPU is placed at guest BDF `00:02.0` (required by VfioIgdPkg's BDSM allocation
122+
path).
123+
4. OVMF's PCI bus driver discovers the EFI ROM on the device (identified by EFI ROM
124+
header, not the legacy `0x55 0xAA` signature) and loads `IgdAssignmentDxe.efi` as a
125+
DXE driver.
126+
5. `IgdAssignmentDxe` runs during the DXE phase:
127+
- Reads `etc/igd-opregion` from fw_cfg, allocates ACPI NVS memory below 4 GB, copies
128+
the OpRegion content, and writes the guest physical address to ASLS (0xFC).
129+
- Registers a PciIo notification callback; when the iGPU PciIo protocol appears, it
130+
reads the GMS field from the (emulated) GMCH register, allocates 1 MB-aligned
131+
reserved memory for stolen memory, and writes the guest physical address to BDSM
132+
(0x5C for Gen6–Gen10, 0xC0 for Gen11+).
133+
6. The OS driver (i915 / Intel display driver) initializes successfully.
134+
135+
### Changes to QEMU's vfio-igd quirk
136+
137+
The QEMU patches in `pkg/xen-tools` (patches 08–11) rework `hw/vfio/igd.c`:
138+
139+
**Patch 08 — igd_gen() backport**: upstream's `igd_gen()` returns correct generation
140+
numbers for Gen7 through Gen12 (Haswell through Raptor Lake). The old function returned
141+
8 for all unrecognised device IDs, making generation-specific checks (BDSM register
142+
offset, GMS encoding) ineffective on Gen9+ hardware.
143+
144+
**Patch 09 — main rework of `vfio_probe_igd_bar4_quirk()`**:
145+
146+
- **GMCH emulation, `etc/igd-bdsm-size` fw_cfg write, and BDSM emulation are moved
147+
before the BDF/LPC bridge checks.** On q35 the "Sorry Q35" path exits early; without
148+
this move, those registers are never set and `IgdAssignmentDxe` cannot do its job.
149+
- **BDSM is emulated at the correct PCI config offset**: 0x5C (32-bit) for Gen6–Gen10,
150+
0xC0 (64-bit) for Gen11+. Initialized to zero so `IgdAssignmentDxe`'s idempotency
151+
guard (skip if BDSM ≠ 0) is not falsely triggered by the host physical address.
152+
- **GMS is preserved** in the emulated GMCH register. The guest driver reads GMS to
153+
determine stolen memory size; zeroing it caused the Windows driver to crash (no
154+
stolen memory available). Upstream QEMU does not zero GMS.
155+
- **Stale GTT entries are cleared** before the BDF check. After host POST the GTT
156+
contains entries pointing to host physical addresses, causing IOMMU faults.
157+
- **GMS encoding for Gen9+ Atom SKUs** (codes 0xf0–0xff, 4 MB granularity) is fixed to
158+
match the Linux kernel's `i915_gem_stolen.c`.
159+
- The generation check is fixed to accept any recognized generation (`gen >= 0`) instead
160+
of the old hard-coded `gen == 6 || gen == 8` which silently blocked Gen9–Gen12 devices.
161+
162+
**Patch 10 — BAR0 BDSM MMIO mirror** (backported from upstream): the GPU reads BDSM
163+
through BAR0 MMIO at offset `0x1080C0` as well as PCI config space. Without this
164+
quirk, the MMIO read returns the host physical address while PCI config returns the
165+
emulated guest PA. The driver sees conflicting values and crashes. This was the
166+
critical missing piece for Tiger Lake and other Gen11+ devices.
167+
168+
Based on upstream QEMU commits:
169+
- [`11b5ce95`](https://github.com/qemu/qemu/commit/11b5ce95beecfd51d1b17858d23fe9cbb0b5783f)
170+
"vfio/igd: add new bar0 quirk to emulate BDSM mirror" by Corvin Köhne
171+
- [`f926baa0`](https://github.com/qemu/qemu/commit/f926baa03b7babb8291ea4c1cbeadaf224977dae)
172+
"vfio/igd: emulate BDSM in mmio bar0 for gen 6-10 devices" by Tomita Moeko
173+
174+
---
175+
176+
## What works and what does not
177+
178+
| Feature | Status | Notes |
179+
| ------- | ------ | ----- |
180+
| i915 / Intel display driver initialization | ✓ Works | OpRegion + BDSM set correctly |
181+
| DP / HDMI output in the guest OS | ✓ Works | VBT from OpRegion describes connectors |
182+
| Hotplug detection (DP, HDMI) | ✓ Works | HPD interrupts forwarded via OpRegion |
183+
| Display connector topology | ✓ Works | |
184+
| Multiple monitors | ✓ Works | Driver-managed |
185+
| Windows Intel display driver (no Code 43) | ✓ Works | |
186+
| UEFI framebuffer during firmware phase | ✗ Not available | Requires proprietary Intel GOP driver |
187+
| Pre-OS graphical output | ✗ Not available | Same reason |
188+
| Gen12+ (Meteor Lake, Arrow Lake, Lunar Lake) OpRegion | ✓ Works | BDSM quirk not needed (LMEMBAR) |
189+
190+
Pre-OS display requires `IntelGopDriver.efi`, the proprietary Intel GOP driver from the
191+
host platform firmware. EVE's design does not depend on pre-OS display. If pre-OS display
192+
is needed in a future use case, a proprietary GOP ROM can be placed alongside `igd.rom`
193+
and loaded as a second option ROM on the device.
194+
195+
---
196+
197+
## Supported Intel GPU generations
198+
199+
VfioIgdPkg supports the following generations:
200+
201+
| igd_gen() | BDSM register | Microarchitectures | PCI Device ID prefix |
202+
| --------- | ------------- | ------------------ | -------------------- |
203+
| 6 | 32-bit 0x5C | Sandy Bridge, Ivy Bridge | `0x01xx` |
204+
| 7 | 32-bit 0x5C | Haswell, Valleyview/Bay Trail | `0x04xx`, `0x0axx`, `0x0cxx`, `0x0dxx`, `0x0fxx` |
205+
| 8 | 32-bit 0x5C | Broadwell, Cherryview | `0x16xx`, `0x22xx` |
206+
| 9 | 32-bit 0x5C | Skylake, Kaby Lake, Coffee Lake, Comet Lake, Gemini Lake, Broxton | `0x19xx`, `0x59xx`, `0x3exx`, `0x9Bxx`, `0x31xx`, `0x_a84` |
207+
| 11 | 64-bit 0xC0 | Ice Lake, Elkhart Lake, Jasper Lake | `0x8Axx`, `0x45xx`, `0x4Exx` |
208+
| 12 | 64-bit 0xC0 | Tiger Lake, Rocket Lake, Alder Lake, Raptor Lake | `0x9Axx`, `0x4Cxx`, `0x46xx`, `0xA7xx` |
209+
| -1 (unknown) | No BDSM (LMEMBAR) | Meteor Lake (`0x7Dxx`), Arrow Lake, Lunar Lake (`0x64xx`), Panther Lake | not yet in `igd_gen()` |
210+
211+
### Meteor Lake and later (no BDSM)
212+
213+
Starting from Meteor Lake, Intel moved stolen memory access to LMEMBAR (MMIO BAR2)
214+
and **removed the BDSM register** from PCI config space. For these devices:
215+
216+
- `IgdAssignmentDxe` recognises these devices (they are in VfioIgdPkg's device table
217+
with `GetStolenSize = NULL` and `&NullPrivate`), so it **skips stolen memory setup
218+
entirely** and only sets up OpRegion (ASLS). It never reads `etc/igd-bdsm-size`
219+
VfioIgdPkg calculates stolen size from GMS internally, so a missing fw_cfg entry is
220+
irrelevant.
221+
- OpRegion passthrough works via `x-igd-opregion=on`, which is independent of
222+
`igd_gen()` and the vfio-igd BAR4 quirk. EVE's `kvm.go` always enables this for
223+
Intel iGPUs.
224+
- The QEMU `igd_gen()` function does not yet recognise Meteor Lake+ device IDs
225+
(returns -1), so GMCH/BDSM emulation and the BAR0 mirror are skipped. This is
226+
correct — there is no BDSM to emulate.
227+
- **Meteor Lake+ may already work** with no QEMU changes needed: OpRegion is handled,
228+
stolen memory is accessed through LMEMBAR (BAR2) which VFIO passes through as a
229+
normal BAR, and VfioIgdPkg skips BDSM setup. The only thing to verify is that the
230+
guest driver's LMEMBAR access works correctly through VFIO BAR passthrough.
231+
232+
---
233+
234+
## Updating EVE for a new Intel iGPU generation
235+
236+
When Intel releases a new GPU generation, check the following in order:
237+
238+
### 1. VfioIgdPkg device table
239+
240+
The primary check: does the new device ID appear in VfioIgdPkg's device table?
241+
242+
- Repository: <https://github.com/tomitamoeko/VfioIgdPkg>
243+
- File: `IgdAssignmentDxe/IgdPrivate.c` — look for the `IgdIds` array
244+
- If the new Device ID is missing, open an issue or PR against VfioIgdPkg
245+
- After a new VfioIgdPkg commit is available, update `VFIOIGD_COMMIT` in
246+
`pkg/uefi/Dockerfile` and rebuild
247+
248+
### 2. BDSM register location
249+
250+
Check if the new generation uses a new BDSM register offset or width:
251+
252+
- Intel publishes graphics PRM (Programmer's Reference Manual) at
253+
<https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/>
254+
- Look for "Base Data of Stolen Memory" in the PCI config space register map
255+
- If the offset or width changed, `IgdAssignmentDxe/IgdAssignment.c` in VfioIgdPkg needs
256+
a new generation handler, and the QEMU patch in `pkg/xen-tools` may also need updating
257+
258+
### 3. QEMU vfio-igd quirk
259+
260+
Check `hw/vfio/igd.c` in the upstream QEMU repository:
261+
262+
- The `igd_gen()` function maps PCI Device IDs to generations — new IDs must be added
263+
- The `GetStolenSize()` variant for the new generation must handle any GMS encoding
264+
changes (check Linux kernel `drivers/gpu/drm/i915/gem/i915_gem_stolen.c` for reference)
265+
- If upstream QEMU already has support, the patch in `pkg/xen-tools` should be rebased
266+
onto the newer xen-qemu base
267+
268+
### 4. EDK2 / OVMF compatibility
269+
270+
- VfioIgdPkg tracks EDK2 stable releases; check VfioIgdPkg's `VfioIgdPkg.dsc` for any
271+
new EDK2 library dependencies
272+
- Update `EDK_VERSION` and `EDK_COMMIT` in `pkg/uefi/Dockerfile` if needed
273+
- Regenerate edk2 patches in `pkg/uefi/edk2-patches/edk2-stable<version>/` against the
274+
new EDK2 base (use `git apply --ignore-whitespace` for CRLF-tolerant patch application,
275+
and `git format-patch` from within an actual edk2 checkout to preserve CRLF in context
276+
lines)
277+
278+
### 5. Stolen memory size encoding
279+
280+
If a new generation introduces new GMS encoding codes in the GMCH register, update both:
281+
282+
- VfioIgdPkg's `IgdAssignmentDxe/IgdPrivate.c` (`GetStolenSize()`)
283+
- The QEMU patch (`pkg/xen-tools/patches-4.19.0/x86_64/09-vfio-igd-q35-uefi-bdsm-opregion.patch`)
284+
— specifically the GMS decoding block before the fw_cfg write
285+
286+
---
287+
288+
## Component map
289+
290+
| Component | File | Role |
291+
| --------- | ---- | ---- |
292+
| UEFI Option ROM build | `pkg/uefi/Dockerfile`, `pkg/uefi/build.sh` | Builds `igd.rom` from VfioIgdPkg |
293+
| EFI Option ROM (runtime) | `pkg/xen-tools/` ships `igd.rom` to host rootfs | Loaded by OVMF; runs `IgdAssignmentDxe` |
294+
| KVM hypervisor integration | `pkg/pillar/hypervisor/kvm.go` | Detects iGPU, sets `romfile=`, BDF, opregion |
295+
| QEMU igd_gen() backport | `pkg/xen-tools/.../08-vfio-igd-backport-igd-gen.patch` | Gen7–Gen12 device ID detection |
296+
| QEMU vfio-igd rework | `pkg/xen-tools/.../09-vfio-igd-q35-uefi-bdsm-opregion.patch` | GMCH/BDSM/fw_cfg/GTT for q35/UEFI |
297+
| QEMU BAR0 BDSM mirror | `pkg/xen-tools/.../10-vfio-igd-bar0-bdsm-mirror.patch` | Intercepts BAR0 MMIO BDSM reads |
298+
| EDK2 base | `pkg/uefi/edk2-patches/edk2-stable*/` | EVE-specific patches on top of EDK2 |

0 commit comments

Comments
 (0)