Reboot after RAIDZ expand causes zpool import to hang

### System information

Type | Version/Name
 --- | ---
Distribution Name	| Proxmox
Distribution Version	| 9.1.4
Kernel Version	| 6.17.4-2-pve
Architecture	| x84_64
OpenZFS Version	| zfs-2.3.4-pve1


### Describe the problem you're observing

After doing a RAID-Z expansion on a zpool, it appeared to be working fine with no issues and expansion was ongoing. However I did a reboot before it completed, and since then `zpool import` hangs indefinitely.

The full command being run is `zpool import -N -d /dev/disk/by-id -o cachefile=none hdd`

This command hangs indefinitely (I have now left it over 12 hours), but immediately after running dmesg contains a *tonne* of failed/errors followed by:
```
[  433.591207] WARNING: Pool 'hdd' has encountered an uncorrectable I/O failure and has been suspended.
```

Then I see kernel hang errors that seem to suggest this is indeed the RAID-Z expand causing it:
```
[  615.642860] INFO: task raidz_expand:4390 blocked for more than 122 seconds.
[  615.643313]       Tainted: P           O        6.17.4-2-pve #1
[  615.643796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.644377] task:raidz_expand    state:D stack:0     pid:4390  tgid:4390  ppid:2      task_flags:0x208040 flags:0x00004000
[  615.644890] Call Trace:
[  615.645369]  <TASK>
[  615.645868]  __schedule+0x468/0x1310
[  615.646406]  schedule+0x27/0xf0
[  615.646900]  io_schedule+0x4c/0x80
[  615.647362]  cv_wait_common+0xb0/0x140 [spl]
[  615.647877]  ? __pfx_autoremove_wake_function+0x10/0x10
[  615.648342]  __cv_wait_io+0x18/0x30 [spl]
[  615.648835]  txg_wait_synced_flags+0xd8/0x130 [zfs]
[  615.649447]  txg_wait_synced+0x10/0x60 [zfs]
[  615.650082]  spa_raidz_expand_thread+0x8a9/0x1090 [zfs]
[  615.650665]  zthr_procedure+0x13a/0x150 [zfs]
[  615.651264]  ? __pfx_zthr_procedure+0x10/0x10 [zfs]
[  615.651910]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[  615.652416]  thread_generic_wrapper+0x60/0x80 [spl]
[  615.652914]  kthread+0x10b/0x220
[  615.653408]  ? __pfx_kthread+0x10/0x10
[  615.653919]  ret_from_fork+0x208/0x240
[  615.654383]  ? __pfx_kthread+0x10/0x10
[  615.654866]  ret_from_fork_asm+0x1a/0x30
[  615.655353]  </TASK>
```

After this, any commands like `zpool status` also hang indefinitely rather than returning an error.

Confusingly, it appears that I *can* mount the disk in readonly mode with these flags enabled in modprobe: `zfs_recover=1 spa_load_verify_data=0 spa_load_verify_metadata=0`

After running `zpool import -d /dev/disk/by-id -o readonly=on -o cachefile=none hdd` I see:

```
> zpool status
  pool: hdd
 state: ONLINE
  scan: scrub in progress since Sun Jan 11 00:24:02 2026
        8.31T / 15.0T scanned, 8.31T / 15.0T issued
        0B repaired, 55.37% done, no estimated completion time
expand: expansion of raidz1-0 in progress since Sun Jan 11 12:17:24 2026
        242G / 0 copied at 30.1M/s, inf% done, (copy is slow, no estimated time)
config:

        NAME                                          STATE     READ WRITE CKSUM
        hdd                                           ONLINE       0     0     0
          raidz1-0                                    ONLINE       0     0     0
            ata-WDC_WD60EFPX-68C5ZN0_WD-WX00000000HY  ONLINE       0     0     0
            ata-WDC_WD60EFPX-68C5ZN0_WD-WX00000000KC  ONLINE       0     0     0
            ata-ST6000VN001-2BB186_ZR0000K6           ONLINE       0     0     0
            ata-ST6000VN001-2BB186_ZR00007H           ONLINE       0     0     0

errors: No known data errors
```

This seemingly suggests there are no errors found and the mounted filesytem *appears* to be fully functional, so this seems to suggest the disks themselves are OK?

### Describe how to reproduce the problem

To be honest, not sure it would be reproducible but my steps were:

- Have a pre-existing RAID-Z1 pool with 3 disks
- Run `zpool attach hdd raidz1-0 /dev/disk/by-id/ata-ST6000VN001-2BB186_ZR00007H`
- Restart system before expansion completes
- `zpool import` now hangs indefinitely

### Include any warning/errors/backtraces from the system logs


Full output from dmesg:
https://gist.github.com/ZephireNZ/6c974188c4d442a1e144cce30c8aa168

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reboot after RAIDZ expand causes zpool import to hang #18129

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Version/Name
Distribution Name	Proxmox
Distribution Version	9.1.4
Kernel Version	6.17.4-2-pve
Architecture	x84_64
OpenZFS Version	zfs-2.3.4-pve1

Reboot after RAIDZ expand causes zpool import to hang #18129

Description

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions