Skip to content

Commit 056c12c

Browse files
rucoderclaude
andcommitted
.claude/skills: add windows-bsod-ssh diagnosis skill
Step-by-step playbook for diagnosing Windows BSOD and device Code 43 errors over SSH without GUI access. Covers: pnputil device enumeration, WinEvent bugcheck log parsing, WER crash report reading, driver registry inspection, crash-protection detection, and iGPU-specific BDSM/ASLS notes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 49fb6b6 commit 056c12c

File tree

1 file changed

+213
-0
lines changed

1 file changed

+213
-0
lines changed
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
---
2+
name: windows-bsod-ssh
3+
description: Use this skill when diagnosing Windows BSOD (Blue Screen of Death), Code 43 device errors, or driver failures over SSH without a GUI. Applies when the user asks to analyze a Windows crash, investigate a device manager error code, check why a driver failed, or diagnose PCI device failures remotely.
4+
version: 1.0.0
5+
---
6+
7+
# Windows BSOD and Device Error 43 Diagnosis Over SSH
8+
9+
Remote diagnosis of Windows driver failures and BSODs via SSH (no RDP, no GUI).
10+
11+
## SSH Connection
12+
13+
All commands run via:
14+
```bash
15+
ssh -i <key> -l Administrator <host> -p <port> -o ConnectTimeout=10
16+
```
17+
18+
**Important:** Always use one of these two invocation styles — bare `cmd.exe` commands
19+
or PowerShell with explicit flags. Do NOT mix them in the same ssh call.
20+
21+
```bash
22+
# PowerShell commands:
23+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"<cmdlet>\""
24+
25+
# cmd.exe commands (simpler, for reg/pnputil/dir/type):
26+
ssh ... "reg query ..."
27+
ssh ... "pnputil ..."
28+
```
29+
30+
Prefer `reg query`, `pnputil`, and `dir`/`type` over PowerShell when possible —
31+
they produce clean text output reliably.
32+
33+
---
34+
35+
## Step 1 — Identify the Failing Device
36+
37+
```bash
38+
# List all devices with errors (fastest overview)
39+
ssh ... "pnputil /enum-devices /problem"
40+
41+
# Get details for a specific device (replace instance ID)
42+
ssh ... "pnputil /enum-devices /instanceid \"PCI\\VEN_8086&DEV_A721&SUBSYS_22128086&REV_04\\3&2411E6FE&0&10\" /drivers"
43+
```
44+
45+
**Code 43 = CM_PROB_FAILED_POST_START**: driver loaded successfully but called
46+
`IoStopDevice` at runtime — the device itself reported failure, not Windows PnP.
47+
48+
Other common codes: 10 = cannot start, 28 = no driver, 31 = device not working properly.
49+
50+
```bash
51+
# Find all Code 43 devices via WMI
52+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
53+
Get-WmiObject Win32_PnPEntity | Where-Object ConfigManagerErrorCode -eq 43 | \
54+
Select-Object Name, PNPDeviceID | Format-Table\""
55+
```
56+
57+
---
58+
59+
## Step 2 — Check for BSODs
60+
61+
```bash
62+
# List recent bugchecks with timestamps and parameters
63+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
64+
\$x = Get-WinEvent -LogName System -MaxEvents 500; \
65+
\$x | Where-Object Id -eq 1001 | ForEach-Object { \
66+
Write-Host \$_.TimeCreated; Write-Host \$_.Message; Write-Host '---' }\""
67+
```
68+
69+
**Bugcheck code interpretation:**
70+
71+
| Code | Name | Common cause in driver context |
72+
|--------|-----------------------------------|-------------------------------------------|
73+
| 0x7E | SYSTEM_THREAD_EXCEPTION_NOT_HANDLED | Access violation (0xC0000005) in kernel |
74+
| 0xBE | ATTEMPTED_WRITE_TO_READONLY_MEMORY | Driver wrote to read-only page (e.g. BDSM=0 → PA 0) |
75+
| 0x1A | MEMORY_MANAGEMENT | Corrupt PTE, bad allocation |
76+
| 0x50 | PAGE_FAULT_IN_NONPAGED_AREA | NULL dereference or bad pointer |
77+
| 0xD1 | DRIVER_IRQL_NOT_LESS_OR_EQUAL | Bad pointer at DISPATCH_LEVEL |
78+
| 0x3B | SYSTEM_SERVICE_EXCEPTION | Kernel exception during syscall |
79+
80+
For `0xBE`, Param 2 is the PTE value. If Bit 1 of the PTE is 0 → page is
81+
not writable. Param 1 is the virtual address that was written.
82+
83+
```bash
84+
# List minidump files (one per BSOD)
85+
ssh ... "dir C:\\Windows\\Minidump /B"
86+
87+
# Read WER crash report for a specific bugcheck type
88+
ssh ... "dir \"C:\\ProgramData\\Microsoft\\Windows\\WER\\ReportQueue\""
89+
ssh ... "type \"C:\\ProgramData\\Microsoft\\Windows\\WER\\ReportQueue\\<folder>\\Report.wer\""
90+
```
91+
92+
**WER folder names** encode the bugcheck code: `Kernel_be_...` = 0xBE crash.
93+
94+
---
95+
96+
## Step 3 — Check Driver Failure Events
97+
98+
```bash
99+
# Boot-start or system-start drivers that failed to load (event 7026)
100+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
101+
(Get-WinEvent -LogName System -MaxEvents 500 | Where-Object Id -eq 7026)[0].Message\""
102+
103+
# Service terminated with error (event 7023)
104+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
105+
(Get-WinEvent -LogName System -MaxEvents 500 | Where-Object Id -eq 7023)[0].Message\""
106+
107+
# All non-trivial system events (excluding Windows Update noise)
108+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
109+
Get-WinEvent -LogName System -MaxEvents 500 | \
110+
Where-Object { \$_.ProviderName -notmatch 'WindowsUpdate|Kernel-General' } | \
111+
Select-Object TimeCreated, ProviderName, Id, Message | \
112+
Select-Object -First 30 | Out-String\""
113+
```
114+
115+
---
116+
117+
## Step 4 — Read Driver Registry Info
118+
119+
Find the device class slot (display class GUID example):
120+
```bash
121+
# Enumerate display driver slots to find the device
122+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
123+
for(\$i=0; \$i -le 6; \$i++) {
124+
\$p = Get-ItemProperty ('HKLM:\SYSTEM\CurrentControlSet\Control\Class\{4d36e968-e325-11ce-bfc1-08002be10318}\' + \$i.ToString('0000')) -EA SilentlyContinue
125+
if(\$p) { Write-Host 'Slot' \$i ':' \$p.DriverDesc }
126+
}\""
127+
```
128+
129+
Read all values from a found slot:
130+
```bash
131+
ssh ... "reg query \"HKLM\\SYSTEM\\CurrentControlSet\\Control\\Class\\{4d36e968-e325-11ce-bfc1-08002be10318}\\0005\""
132+
```
133+
134+
Key values to look for:
135+
- `DriverDesc` — human-readable name
136+
- `DriverVersion` — e.g. `32.0.101.7085`
137+
- `MatchingDeviceId` — e.g. `PCI\VEN_8086&DEV_A721`
138+
- `HardwareInformation.MemorySize` — VRAM size (little-endian DWORD)
139+
- `VgaCompatible` — 0 means driver does not register as VGA
140+
141+
---
142+
143+
## Step 5 — Check Crash Protection State
144+
145+
Windows enters crash-protection mode after 3+ BSODs from the same driver: the
146+
driver is still loaded but `IoStartDevice` is not fully called → Code 43.
147+
148+
Crash protection is **per-boot** — a clean reboot where the driver does NOT crash
149+
clears it. If the underlying bug is fixed, one reboot is enough to recover.
150+
151+
```bash
152+
# Check if device is in problem state after reboot
153+
ssh ... "pnputil /enum-devices /problem /class Display"
154+
155+
# Force driver re-enumeration (non-destructive)
156+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
157+
\$dev = Get-WmiObject Win32_PnPEntity | Where-Object { \$_.PNPDeviceID -like '*A721*' }
158+
\$dev.Enable()\""
159+
```
160+
161+
---
162+
163+
## Step 6 — Gather System and BIOS Info
164+
165+
```bash
166+
# BIOS/UEFI version (useful to verify firmware was updated)
167+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
168+
(Get-WmiObject Win32_BIOS).SMBIOSBIOSVersion; (Get-WmiObject Win32_BIOS).BIOSVersion\""
169+
170+
# System product name (from SMBIOS — QEMU sets "Standard PC (Q35 + ICH9, 2009)")
171+
ssh ... "powershell -OutputFormat Text -NonInteractive -Command \"\
172+
(Get-WmiObject Win32_ComputerSystem).Model\""
173+
174+
# All installed drivers for a device
175+
ssh ... "pnputil /enum-devices /instanceid \"<instance-id>\" /drivers"
176+
```
177+
178+
---
179+
180+
## Pitfalls and Lessons Learned
181+
182+
**PowerShell output quirks:**
183+
- Empty output (no error) often means a pipeline silently produced nothing
184+
— add `| Out-String` or use `Write-Host` explicitly.
185+
- `Get-EventLog` is deprecated in Windows 11; use `Get-WinEvent` instead.
186+
- Unicode text files (WER XML, CSVs) need `Get-Content` not `type` to read
187+
properly, or use `Select-String` which handles encoding.
188+
189+
**PCI config space:**
190+
- No built-in PowerShell cmdlet reads raw PCI config registers (ASLS at 0xFC,
191+
BDSM at 0x5C, GMCH at 0x50). Requires a kernel driver or third-party tool.
192+
- Intel GPU driver does NOT store ASLS/BDSM values in the registry.
193+
- Workaround: add `error_report()` logging in the QEMU/firmware source and
194+
check the hypervisor logs instead.
195+
196+
**Code 43 vs BSOD relationship:**
197+
- Repeated BSODs (same driver) → Windows disables full init → Code 43
198+
- Code 43 WITHOUT recent BSODs in event log = driver failing at startup, not crash protection
199+
- Code 43 WITH ≥3 recent BSODs = likely crash protection; reboot and check again
200+
201+
**WER folder naming:**
202+
- `Kernel_be_...` = 0xBE (ATTEMPTED_WRITE_TO_READONLY_MEMORY)
203+
- `Kernel_7e_...` = 0x7E (SYSTEM_THREAD_EXCEPTION_NOT_HANDLED)
204+
- `Kernel_1a_...` = 0x1A (MEMORY_MANAGEMENT)
205+
- Report.wer contains Sig[0].Value = hex bugcheck code, Sig[1-4].Value = parameters
206+
207+
**iGPU-specific (Intel passthrough on q35/KVM):**
208+
- 0xBE from `dxgmms2.sys` = BDSM register is 0 → driver maps physical address 0
209+
as stolen GPU memory → write to read-only null page
210+
- Root cause: OVMF IgdAssignmentDxe not setting BDSM because QEMU did not write
211+
`etc/igd-bdsm-size` fw_cfg entry (requires QEMU patch compiled into Xen binary)
212+
- To verify: add `error_report()` to the QEMU igd.c patch and check hypervisor logs
213+
for "IGD bdsm-size:" messages after domain start

0 commit comments

Comments
 (0)