Kernel GPF in dbuf_lightweight_bp during concurrent writes to encrypted RAIDZ2 (ZFS 2.4.0, kernel 6.17.9)

# Kernel GPF in dbuf_lightweight_bp during concurrent writes to encrypted RAIDZ2 (ZFS 2.4.0, kernel 6.17.9)

## Environment

- **ZFS**: zfs-kmod-2.4.0 (distribution kernel package)
- **Kernel**: 6.17.9 (PREEMPT_VOLUNTARY, x86_64)
- **Pool**: RAIDZ2, 4x 18TB HDD (Seagate Exos ST18000NM000J), ashift=12
- **Dataset**: native encryption (aes-256-gcm), recordsize=128K
- **No SLOG, no L2ARC**

## Workload

Two independent SMB (Samba) clients writing concurrently to the same encrypted dataset:

- **Client A**: rclone bulk file copy (large files, sequential)
- **Client B**: data recovery tool writing recovered files (mixed sizes, somewhat random)

Both clients sustained heavy writes for several hours (~10+ hours) before the crash.

## Crash Details

General protection fault in `dbuf_lightweight_bp`, triggered from the `z_wr_iss` taskq thread during zio ready processing:

```
[10724.925720] Oops: general protection fault, probably for non-canonical address 0x34c0768bf1ac340c: 0000 [#1] SMP NOPTI
[10724.925732] CPU: 12 UID: 0 PID: 116192 Comm: z_wr_iss Tainted: P           O        6.17.9-1-pve #1 PREEMPT(voluntary)
[10724.925735] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[10724.925737] Hardware name: ASUS System Product Name/PRIME B660M-A D4, BIOS 3801 05/14/2025
[10724.925739] RIP: 0010:dbuf_lightweight_bp+0x1f/0x1b0 [zfs]
[10724.925927] Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 48 83 ec 08 4c 8b 6f 38 49 8b 45 60 <80> 78 02 01 0f 84 9f 00 00 00 48 8b 47 40 45 0f b6 65 73 4c 8b 70
[10724.925931] RSP: 0018:ffffced175c17c80 EFLAGS: 00010282
[10724.925933] RAX: 34c0768bf1ac340a RBX: ffff8a801e209c00 RCX: 0000000000000000
[10724.925936] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8a801e209c00
[10724.925939] RBP: ffffced175c17ca8 R08: 0000000000000000 R09: 0000000000000000
[10724.925941] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a709f815f10
[10724.925943] R13: ffff8a709f815f10 R14: ffff8a7ffa32f980 R15: ffff8a61a36e8358
[10724.925945] FS:  0000000000000000(0000) GS:ffff8a80df986000(0000) knlGS:0000000000000000
[10724.925947] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10724.925949] CR2: 00007ce5a4afe000 CR3: 00000001e41b9006 CR4: 0000000000f72ef0
[10724.925951] PKRU: 55555554
[10724.925953] Call Trace:
[10724.925955]  <TASK>
[10724.925958]  dbuf_lightweight_ready+0x46/0x2b0 [zfs]
[10724.926058]  zio_ready+0x54/0x440 [zfs]
[10724.926161]  zio_execute+0x8f/0x140 [zfs]
[10724.926268]  taskq_thread+0x349/0x720 [spl]
[10724.926276]  ? __pfx_default_wake_function+0x10/0x10
[10724.926280]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[10724.926380]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[10724.926387]  kthread+0x108/0x220
[10724.926389]  ? __pfx_kthread+0x10/0x10
[10724.926392]  ret_from_fork+0x205/0x240
[10724.926395]  ? __pfx_kthread+0x10/0x10
[10724.926398]  ret_from_fork_asm+0x1a/0x30
[10724.926401]  </TASK>
```

## Post-crash behaviour

- Pool remained ONLINE; `zpool status` was responsive
- SMB worker processes hung (likely D-state waiting on ZFS I/O)
- `sync` command hung indefinitely (write path broken)
- Clean reboot was slow (blocked by hung sync) but eventually completed after a ~25 minute wait
- No new data errors detected after reboot; scrub in progress

## Analysis

- The non-canonical address in RAX (`0x34c0768bf1ac340a`) strongly suggests use-after-free or a stale pointer dereference -- the dirty record or parent dnode appears to have been freed or recycled while the zio was still in flight.
- `dbuf_lightweight_bp` is a zio callback registered during txg sync for lightweight dirty records. It dereferences the dbuf's `db_dnode_handle` to reach the dnode, and the faulting instruction is consistent with following a poisoned or freed pointer from that chain.
- The lightweight write path (`dbuf_dirty_lightweight()`) was originally designed for sequential write-only workloads (primarily `zfs receive`), but concurrent random writes via the VFS/Samba path also appear to exercise this code.
- Encryption (aes-256-gcm) extends the zio pipeline with additional async stages (encrypt -> checksum -> write), potentially widening a race window between dirty record lifetime management and zio completion callbacks.
- `zfs_dirty_data_max` was set to 34GB at the time of the crash (auto-tuned), which may have increased dirty record pressure and extended the window for a race on HDD-backed RAIDZ2 where write latency is high.

## Possibly related issues

- #15073 -- `dbuf_dirty_lightweight` assertion failure
- #10570 -- encryption + heavy writes kernel panic
- #16895 -- GPF use-after-free in dnode cleanup
- #17307 -- kernel panic in write path

## Workaround

Reduced `zfs_dirty_data_max` to 2GB, `zfs_txg_timeout` to 3 seconds, and serialised write workloads (one heavy writer at a time). No recurrence since applying these changes, but this is not confirmed as a fix -- it may simply reduce the probability of hitting the race.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel GPF in dbuf_lightweight_bp during concurrent writes to encrypted RAIDZ2 (ZFS 2.4.0, kernel 6.17.9) #18253

Kernel GPF in dbuf_lightweight_bp during concurrent writes to encrypted RAIDZ2 (ZFS 2.4.0, kernel 6.17.9)

Environment

Workload

Crash Details

Post-crash behaviour

Analysis

Possibly related issues

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kernel GPF in dbuf_lightweight_bp during concurrent writes to encrypted RAIDZ2 (ZFS 2.4.0, kernel 6.17.9) #18253

Description

Kernel GPF in dbuf_lightweight_bp during concurrent writes to encrypted RAIDZ2 (ZFS 2.4.0, kernel 6.17.9)

Environment

Workload

Crash Details

Post-crash behaviour

Analysis

Possibly related issues

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions