|
| 1 | +--- |
| 2 | +name: windows-bsod-ssh |
| 3 | +description: Use this skill when diagnosing Windows BSOD (Blue Screen of Death), Code 43 device errors, or driver failures over SSH without a GUI. Applies when the user asks to analyze a Windows crash, investigate a device manager error code, check why a driver failed, or diagnose PCI device failures remotely. |
| 4 | +version: 1.0.0 |
| 5 | +--- |
| 6 | + |
| 7 | +# Windows BSOD and Device Error 43 Diagnosis Over SSH |
| 8 | + |
| 9 | +Remote diagnosis of Windows driver failures and BSODs via SSH (no RDP, no GUI). |
| 10 | + |
| 11 | +## SSH Connection |
| 12 | + |
| 13 | +All commands run via: |
| 14 | +```bash |
| 15 | +ssh -i <key> -l Administrator <host> -p <port> -o ConnectTimeout=10 |
| 16 | +``` |
| 17 | + |
| 18 | +**Important:** Always use one of these two invocation styles — bare `cmd.exe` commands |
| 19 | +or PowerShell with explicit flags. Do NOT mix them in the same ssh call. |
| 20 | + |
| 21 | +```bash |
| 22 | +# PowerShell commands: |
| 23 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"<cmdlet>\"" |
| 24 | + |
| 25 | +# cmd.exe commands (simpler, for reg/pnputil/dir/type): |
| 26 | +ssh ... "reg query ..." |
| 27 | +ssh ... "pnputil ..." |
| 28 | +``` |
| 29 | + |
| 30 | +Prefer `reg query`, `pnputil`, and `dir`/`type` over PowerShell when possible — |
| 31 | +they produce clean text output reliably. |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## Step 1 — Identify the Failing Device |
| 36 | + |
| 37 | +```bash |
| 38 | +# List all devices with errors (fastest overview) |
| 39 | +ssh ... "pnputil /enum-devices /problem" |
| 40 | + |
| 41 | +# Get details for a specific device (replace instance ID) |
| 42 | +ssh ... "pnputil /enum-devices /instanceid \"PCI\\VEN_8086&DEV_A721&SUBSYS_22128086&REV_04\\3&2411E6FE&0&10\" /drivers" |
| 43 | +``` |
| 44 | + |
| 45 | +**Code 43 = CM_PROB_FAILED_POST_START**: driver loaded successfully but called |
| 46 | +`IoStopDevice` at runtime — the device itself reported failure, not Windows PnP. |
| 47 | + |
| 48 | +Other common codes: 10 = cannot start, 28 = no driver, 31 = device not working properly. |
| 49 | + |
| 50 | +```bash |
| 51 | +# Find all Code 43 devices via WMI |
| 52 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 53 | +Get-WmiObject Win32_PnPEntity | Where-Object ConfigManagerErrorCode -eq 43 | \ |
| 54 | +Select-Object Name, PNPDeviceID | Format-Table\"" |
| 55 | +``` |
| 56 | + |
| 57 | +--- |
| 58 | + |
| 59 | +## Step 2 — Check for BSODs |
| 60 | + |
| 61 | +```bash |
| 62 | +# List recent bugchecks with timestamps and parameters |
| 63 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 64 | +\$x = Get-WinEvent -LogName System -MaxEvents 500; \ |
| 65 | +\$x | Where-Object Id -eq 1001 | ForEach-Object { \ |
| 66 | + Write-Host \$_.TimeCreated; Write-Host \$_.Message; Write-Host '---' }\"" |
| 67 | +``` |
| 68 | + |
| 69 | +**Bugcheck code interpretation:** |
| 70 | + |
| 71 | +| Code | Name | Common cause in driver context | |
| 72 | +|--------|-----------------------------------|-------------------------------------------| |
| 73 | +| 0x7E | SYSTEM_THREAD_EXCEPTION_NOT_HANDLED | Access violation (0xC0000005) in kernel | |
| 74 | +| 0xBE | ATTEMPTED_WRITE_TO_READONLY_MEMORY | Driver wrote to read-only page (e.g. BDSM=0 → PA 0) | |
| 75 | +| 0x1A | MEMORY_MANAGEMENT | Corrupt PTE, bad allocation | |
| 76 | +| 0x50 | PAGE_FAULT_IN_NONPAGED_AREA | NULL dereference or bad pointer | |
| 77 | +| 0xD1 | DRIVER_IRQL_NOT_LESS_OR_EQUAL | Bad pointer at DISPATCH_LEVEL | |
| 78 | +| 0x3B | SYSTEM_SERVICE_EXCEPTION | Kernel exception during syscall | |
| 79 | + |
| 80 | +For `0xBE`, Param 2 is the PTE value. If Bit 1 of the PTE is 0 → page is |
| 81 | +not writable. Param 1 is the virtual address that was written. |
| 82 | + |
| 83 | +```bash |
| 84 | +# List minidump files (one per BSOD) |
| 85 | +ssh ... "dir C:\\Windows\\Minidump /B" |
| 86 | + |
| 87 | +# Read WER crash report for a specific bugcheck type |
| 88 | +ssh ... "dir \"C:\\ProgramData\\Microsoft\\Windows\\WER\\ReportQueue\"" |
| 89 | +ssh ... "type \"C:\\ProgramData\\Microsoft\\Windows\\WER\\ReportQueue\\<folder>\\Report.wer\"" |
| 90 | +``` |
| 91 | + |
| 92 | +**WER folder names** encode the bugcheck code: `Kernel_be_...` = 0xBE crash. |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +## Step 3 — Check Driver Failure Events |
| 97 | + |
| 98 | +```bash |
| 99 | +# Boot-start or system-start drivers that failed to load (event 7026) |
| 100 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 101 | +(Get-WinEvent -LogName System -MaxEvents 500 | Where-Object Id -eq 7026)[0].Message\"" |
| 102 | + |
| 103 | +# Service terminated with error (event 7023) |
| 104 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 105 | +(Get-WinEvent -LogName System -MaxEvents 500 | Where-Object Id -eq 7023)[0].Message\"" |
| 106 | + |
| 107 | +# All non-trivial system events (excluding Windows Update noise) |
| 108 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 109 | +Get-WinEvent -LogName System -MaxEvents 500 | \ |
| 110 | +Where-Object { \$_.ProviderName -notmatch 'WindowsUpdate|Kernel-General' } | \ |
| 111 | +Select-Object TimeCreated, ProviderName, Id, Message | \ |
| 112 | +Select-Object -First 30 | Out-String\"" |
| 113 | +``` |
| 114 | + |
| 115 | +--- |
| 116 | + |
| 117 | +## Step 4 — Read Driver Registry Info |
| 118 | + |
| 119 | +Find the device class slot (display class GUID example): |
| 120 | +```bash |
| 121 | +# Enumerate display driver slots to find the device |
| 122 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 123 | +for(\$i=0; \$i -le 6; \$i++) { |
| 124 | + \$p = Get-ItemProperty ('HKLM:\SYSTEM\CurrentControlSet\Control\Class\{4d36e968-e325-11ce-bfc1-08002be10318}\' + \$i.ToString('0000')) -EA SilentlyContinue |
| 125 | + if(\$p) { Write-Host 'Slot' \$i ':' \$p.DriverDesc } |
| 126 | +}\"" |
| 127 | +``` |
| 128 | + |
| 129 | +Read all values from a found slot: |
| 130 | +```bash |
| 131 | +ssh ... "reg query \"HKLM\\SYSTEM\\CurrentControlSet\\Control\\Class\\{4d36e968-e325-11ce-bfc1-08002be10318}\\0005\"" |
| 132 | +``` |
| 133 | + |
| 134 | +Key values to look for: |
| 135 | +- `DriverDesc` — human-readable name |
| 136 | +- `DriverVersion` — e.g. `32.0.101.7085` |
| 137 | +- `MatchingDeviceId` — e.g. `PCI\VEN_8086&DEV_A721` |
| 138 | +- `HardwareInformation.MemorySize` — VRAM size (little-endian DWORD) |
| 139 | +- `VgaCompatible` — 0 means driver does not register as VGA |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +## Step 5 — Check Crash Protection State |
| 144 | + |
| 145 | +Windows enters crash-protection mode after 3+ BSODs from the same driver: the |
| 146 | +driver is still loaded but `IoStartDevice` is not fully called → Code 43. |
| 147 | + |
| 148 | +Crash protection is **per-boot** — a clean reboot where the driver does NOT crash |
| 149 | +clears it. If the underlying bug is fixed, one reboot is enough to recover. |
| 150 | + |
| 151 | +```bash |
| 152 | +# Check if device is in problem state after reboot |
| 153 | +ssh ... "pnputil /enum-devices /problem /class Display" |
| 154 | + |
| 155 | +# Force driver re-enumeration (non-destructive) |
| 156 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 157 | +\$dev = Get-WmiObject Win32_PnPEntity | Where-Object { \$_.PNPDeviceID -like '*A721*' } |
| 158 | +\$dev.Enable()\"" |
| 159 | +``` |
| 160 | + |
| 161 | +--- |
| 162 | + |
| 163 | +## Step 6 — Gather System and BIOS Info |
| 164 | + |
| 165 | +```bash |
| 166 | +# BIOS/UEFI version (useful to verify firmware was updated) |
| 167 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 168 | +(Get-WmiObject Win32_BIOS).SMBIOSBIOSVersion; (Get-WmiObject Win32_BIOS).BIOSVersion\"" |
| 169 | + |
| 170 | +# System product name (from SMBIOS — QEMU sets "Standard PC (Q35 + ICH9, 2009)") |
| 171 | +ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\ |
| 172 | +(Get-WmiObject Win32_ComputerSystem).Model\"" |
| 173 | + |
| 174 | +# All installed drivers for a device |
| 175 | +ssh ... "pnputil /enum-devices /instanceid \"<instance-id>\" /drivers" |
| 176 | +``` |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +## Pitfalls and Lessons Learned |
| 181 | + |
| 182 | +**PowerShell output quirks:** |
| 183 | +- Empty output (no error) often means a pipeline silently produced nothing |
| 184 | + — add `| Out-String` or use `Write-Host` explicitly. |
| 185 | +- `Get-EventLog` is deprecated in Windows 11; use `Get-WinEvent` instead. |
| 186 | +- Unicode text files (WER XML, CSVs) need `Get-Content` not `type` to read |
| 187 | + properly, or use `Select-String` which handles encoding. |
| 188 | + |
| 189 | +**PCI config space:** |
| 190 | +- No built-in PowerShell cmdlet reads raw PCI config registers (ASLS at 0xFC, |
| 191 | + BDSM at 0x5C, GMCH at 0x50). Requires a kernel driver or third-party tool. |
| 192 | +- Intel GPU driver does NOT store ASLS/BDSM values in the registry. |
| 193 | +- Workaround: add `error_report()` logging in the QEMU/firmware source and |
| 194 | + check the hypervisor logs instead. |
| 195 | + |
| 196 | +**Code 43 vs BSOD relationship:** |
| 197 | +- Repeated BSODs (same driver) → Windows disables full init → Code 43 |
| 198 | +- Code 43 WITHOUT recent BSODs in event log = driver failing at startup, not crash protection |
| 199 | +- Code 43 WITH ≥3 recent BSODs = likely crash protection; reboot and check again |
| 200 | + |
| 201 | +**WER folder naming:** |
| 202 | +- `Kernel_be_...` = 0xBE (ATTEMPTED_WRITE_TO_READONLY_MEMORY) |
| 203 | +- `Kernel_7e_...` = 0x7E (SYSTEM_THREAD_EXCEPTION_NOT_HANDLED) |
| 204 | +- `Kernel_1a_...` = 0x1A (MEMORY_MANAGEMENT) |
| 205 | +- Report.wer contains Sig[0].Value = hex bugcheck code, Sig[1-4].Value = parameters |
| 206 | + |
| 207 | +**iGPU-specific (Intel passthrough on q35/KVM):** |
| 208 | +- 0xBE from `dxgmms2.sys` = BDSM register is 0 → driver maps physical address 0 |
| 209 | + as stolen GPU memory → write to read-only null page |
| 210 | +- Root cause: OVMF IgdAssignmentDxe not setting BDSM because QEMU did not write |
| 211 | + `etc/igd-bdsm-size` fw_cfg entry (requires QEMU patch compiled into Xen binary) |
| 212 | +- To verify: add `error_report()` to the QEMU igd.c patch and check hypervisor logs |
| 213 | + for "IGD bdsm-size:" messages after domain start |
0 commit comments