-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Proxmox |
| Distribution Version | 9.1.4 |
| Kernel Version | 6.17.4-2-pve |
| Architecture | x84_64 |
| OpenZFS Version | zfs-2.3.4-pve1 |
Describe the problem you're observing
After doing a RAID-Z expansion on a zpool, it appeared to be working fine with no issues and expansion was ongoing. However I did a reboot before it completed, and since then zpool import hangs indefinitely.
The full command being run is zpool import -N -d /dev/disk/by-id -o cachefile=none hdd
This command hangs indefinitely (I have now left it over 12 hours), but immediately after running dmesg contains a tonne of failed/errors followed by:
[ 433.591207] WARNING: Pool 'hdd' has encountered an uncorrectable I/O failure and has been suspended.
Then I see kernel hang errors that seem to suggest this is indeed the RAID-Z expand causing it:
[ 615.642860] INFO: task raidz_expand:4390 blocked for more than 122 seconds.
[ 615.643313] Tainted: P O 6.17.4-2-pve #1
[ 615.643796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 615.644377] task:raidz_expand state:D stack:0 pid:4390 tgid:4390 ppid:2 task_flags:0x208040 flags:0x00004000
[ 615.644890] Call Trace:
[ 615.645369] <TASK>
[ 615.645868] __schedule+0x468/0x1310
[ 615.646406] schedule+0x27/0xf0
[ 615.646900] io_schedule+0x4c/0x80
[ 615.647362] cv_wait_common+0xb0/0x140 [spl]
[ 615.647877] ? __pfx_autoremove_wake_function+0x10/0x10
[ 615.648342] __cv_wait_io+0x18/0x30 [spl]
[ 615.648835] txg_wait_synced_flags+0xd8/0x130 [zfs]
[ 615.649447] txg_wait_synced+0x10/0x60 [zfs]
[ 615.650082] spa_raidz_expand_thread+0x8a9/0x1090 [zfs]
[ 615.650665] zthr_procedure+0x13a/0x150 [zfs]
[ 615.651264] ? __pfx_zthr_procedure+0x10/0x10 [zfs]
[ 615.651910] ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[ 615.652416] thread_generic_wrapper+0x60/0x80 [spl]
[ 615.652914] kthread+0x10b/0x220
[ 615.653408] ? __pfx_kthread+0x10/0x10
[ 615.653919] ret_from_fork+0x208/0x240
[ 615.654383] ? __pfx_kthread+0x10/0x10
[ 615.654866] ret_from_fork_asm+0x1a/0x30
[ 615.655353] </TASK>
After this, any commands like zpool status also hang indefinitely rather than returning an error.
Confusingly, it appears that I can mount the disk in readonly mode with these flags enabled in modprobe: zfs_recover=1 spa_load_verify_data=0 spa_load_verify_metadata=0
After running zpool import -d /dev/disk/by-id -o readonly=on -o cachefile=none hdd I see:
> zpool status
pool: hdd
state: ONLINE
scan: scrub in progress since Sun Jan 11 00:24:02 2026
8.31T / 15.0T scanned, 8.31T / 15.0T issued
0B repaired, 55.37% done, no estimated completion time
expand: expansion of raidz1-0 in progress since Sun Jan 11 12:17:24 2026
242G / 0 copied at 30.1M/s, inf% done, (copy is slow, no estimated time)
config:
NAME STATE READ WRITE CKSUM
hdd ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD60EFPX-68C5ZN0_WD-WX00000000HY ONLINE 0 0 0
ata-WDC_WD60EFPX-68C5ZN0_WD-WX00000000KC ONLINE 0 0 0
ata-ST6000VN001-2BB186_ZR0000K6 ONLINE 0 0 0
ata-ST6000VN001-2BB186_ZR00007H ONLINE 0 0 0
errors: No known data errors
This seemingly suggests there are no errors found and the mounted filesytem appears to be fully functional, so this seems to suggest the disks themselves are OK?
Describe how to reproduce the problem
To be honest, not sure it would be reproducible but my steps were:
- Have a pre-existing RAID-Z1 pool with 3 disks
- Run
zpool attach hdd raidz1-0 /dev/disk/by-id/ata-ST6000VN001-2BB186_ZR00007H - Restart system before expansion completes
zpool importnow hangs indefinitely
Include any warning/errors/backtraces from the system logs
Full output from dmesg:
https://gist.github.com/ZephireNZ/6c974188c4d442a1e144cce30c8aa168