Skip to content

Reboot after RAIDZ expand causes zpool import to hang #18129

@ZephireNZ

Description

@ZephireNZ

System information

Type Version/Name
Distribution Name Proxmox
Distribution Version 9.1.4
Kernel Version 6.17.4-2-pve
Architecture x84_64
OpenZFS Version zfs-2.3.4-pve1

Describe the problem you're observing

After doing a RAID-Z expansion on a zpool, it appeared to be working fine with no issues and expansion was ongoing. However I did a reboot before it completed, and since then zpool import hangs indefinitely.

The full command being run is zpool import -N -d /dev/disk/by-id -o cachefile=none hdd

This command hangs indefinitely (I have now left it over 12 hours), but immediately after running dmesg contains a tonne of failed/errors followed by:

[  433.591207] WARNING: Pool 'hdd' has encountered an uncorrectable I/O failure and has been suspended.

Then I see kernel hang errors that seem to suggest this is indeed the RAID-Z expand causing it:

[  615.642860] INFO: task raidz_expand:4390 blocked for more than 122 seconds.
[  615.643313]       Tainted: P           O        6.17.4-2-pve #1
[  615.643796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.644377] task:raidz_expand    state:D stack:0     pid:4390  tgid:4390  ppid:2      task_flags:0x208040 flags:0x00004000
[  615.644890] Call Trace:
[  615.645369]  <TASK>
[  615.645868]  __schedule+0x468/0x1310
[  615.646406]  schedule+0x27/0xf0
[  615.646900]  io_schedule+0x4c/0x80
[  615.647362]  cv_wait_common+0xb0/0x140 [spl]
[  615.647877]  ? __pfx_autoremove_wake_function+0x10/0x10
[  615.648342]  __cv_wait_io+0x18/0x30 [spl]
[  615.648835]  txg_wait_synced_flags+0xd8/0x130 [zfs]
[  615.649447]  txg_wait_synced+0x10/0x60 [zfs]
[  615.650082]  spa_raidz_expand_thread+0x8a9/0x1090 [zfs]
[  615.650665]  zthr_procedure+0x13a/0x150 [zfs]
[  615.651264]  ? __pfx_zthr_procedure+0x10/0x10 [zfs]
[  615.651910]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[  615.652416]  thread_generic_wrapper+0x60/0x80 [spl]
[  615.652914]  kthread+0x10b/0x220
[  615.653408]  ? __pfx_kthread+0x10/0x10
[  615.653919]  ret_from_fork+0x208/0x240
[  615.654383]  ? __pfx_kthread+0x10/0x10
[  615.654866]  ret_from_fork_asm+0x1a/0x30
[  615.655353]  </TASK>

After this, any commands like zpool status also hang indefinitely rather than returning an error.

Confusingly, it appears that I can mount the disk in readonly mode with these flags enabled in modprobe: zfs_recover=1 spa_load_verify_data=0 spa_load_verify_metadata=0

After running zpool import -d /dev/disk/by-id -o readonly=on -o cachefile=none hdd I see:

> zpool status
  pool: hdd
 state: ONLINE
  scan: scrub in progress since Sun Jan 11 00:24:02 2026
        8.31T / 15.0T scanned, 8.31T / 15.0T issued
        0B repaired, 55.37% done, no estimated completion time
expand: expansion of raidz1-0 in progress since Sun Jan 11 12:17:24 2026
        242G / 0 copied at 30.1M/s, inf% done, (copy is slow, no estimated time)
config:

        NAME                                          STATE     READ WRITE CKSUM
        hdd                                           ONLINE       0     0     0
          raidz1-0                                    ONLINE       0     0     0
            ata-WDC_WD60EFPX-68C5ZN0_WD-WX00000000HY  ONLINE       0     0     0
            ata-WDC_WD60EFPX-68C5ZN0_WD-WX00000000KC  ONLINE       0     0     0
            ata-ST6000VN001-2BB186_ZR0000K6           ONLINE       0     0     0
            ata-ST6000VN001-2BB186_ZR00007H           ONLINE       0     0     0

errors: No known data errors

This seemingly suggests there are no errors found and the mounted filesytem appears to be fully functional, so this seems to suggest the disks themselves are OK?

Describe how to reproduce the problem

To be honest, not sure it would be reproducible but my steps were:

  • Have a pre-existing RAID-Z1 pool with 3 disks
  • Run zpool attach hdd raidz1-0 /dev/disk/by-id/ata-ST6000VN001-2BB186_ZR00007H
  • Restart system before expansion completes
  • zpool import now hangs indefinitely

Include any warning/errors/backtraces from the system logs

Full output from dmesg:
https://gist.github.com/ZephireNZ/6c974188c4d442a1e144cce30c8aa168

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions