ZFS management commands hang with inaccessible block device
System information
| Type |
Version/Name |
| Distribution Name |
Proxmox VE |
| Distribution Version |
9.1.7 |
| Kernel Version |
6.17.13-2-pve |
| Architecture |
x86_64 |
| OpenZFS Version |
2.4.1-pve1 |
Describe the problem you're observing
The zpool status, zpool clear commands are hanging when a pool vdev is inaccessible.
I have multiple ZFS pools in my system. Temporarily I also have a pool thats accessible through a USB HDD dock.
This pool has a single vdev which is a LUKS device managed by device mapper. The LUKS device is directly stored on the drive that is standing in the HDD dock.
When the drive in the dock becomes inaccessible, because I moved it slightly by accident, I put the computer to sleep or I turned off the dock, thats when the zpool commands start misbehaving.
After the connection is restored, the drive itself is accessible just fine, sudo hexdump --length 256 /dev/sdX works on it. Sometimes I have to reopen the LUKS device, but I think not always.
But after restarting the LUKS device, zpool management commands are running endlessly. If I query the status of a specific pool that works, but if I just run zpool status like this it will hang, and I cant cancel it with Control+C. Same with zpool clear, except I did not try to use it for my other pools.
Files on other pools are accessible, but even an ls in the pool directory hangs, as does all other processes, and Ctrl+C does not help for them either.
Similar issues were reported in the past: #14426, #14491
They got no resolution because the reporter not responding or having worked around the issue.
I think this is a major issue, and I will be able to keep this setup for testing for some time if I can help with diagnosing the problem.
Describe how to reproduce the problem
I will describe it with a HDD dock, but according to the mentioned issues it should be reproducible with a spare pendrive and unplugging it.
- Insert empty HDD to dock, connect dock to computer, turn on dock
- Format HDD as LUKS device, and open it:
head --bytes 1024 /dev/random | sudo tee /tmp/lukskey
cryptsetup luksFormat /dev/disk/by-id/disk-id /tmp/lukskey
cryptsetup luksOpen /dev/disk/by-id/disk-id yellow_pool_zfs_1 --key-file
- create zpool with /dev/mapper/yellow_pool_zfs_1 as its single vdev
- write something to the pool for good measure
- remove HDD from dock
- read the file you have written to the pool
- run
zpool status, and it should hang
Include any warning/errors/backtraces from the system logs
dmesg does not contain much relevant:
WARNING: Pool 'yellow_pool' has encountered an uncorrectable I/O failure and has been suspended.
Logs from /proc/spl/kstat/zfs/dbgmsg, without logs of other pools:
https://gist.github.com/mpeter50/fcfe20f8c0b96b75f7aee27eadf2ebd4
According to ps -elfL, the hanging zpool processes are in uninterruptable sleep:
$ ps -elfL | egrep zpool
4 D root 3161959 3161958 3161959 0 1 80 0 - 5179 - 19:05 pts/2 00:00:00 zpool status
4 D root 3173850 3173849 3173850 0 1 80 0 - 5179 - 19:13 ? 00:00:00 zpool status
4 D root 3174017 3174016 3174017 0 1 80 0 - 5098 - 19:13 pts/9 00:00:00 zpool clear yellow_pool yellow_pool_zfs_1
0 S apophis 3289002 3167725 3289002 0 1 80 0 - 2391 anon_p 20:37 pts/5 00:00:00 grep -E --color=auto zpool
In man ps, at "PROCESS STATE CODES", it says:
"D uninterruptible sleep (usually IO)"
The stack trace of one of them according to /proc/3161959/stack:
[<0>] __cv_timedwait_common+0x143/0x180 [spl]
[<0>] __cv_timedwait_io+0x19/0x30 [spl]
[<0>] zio_wait+0x146/0x2f0 [zfs]
[<0>] dbuf_read+0x353/0x6d0 [zfs]
[<0>] dmu_buf_hold_by_dnode+0x5c/0xa0 [zfs]
[<0>] zap_lockdir+0xa1/0x110 [zfs]
[<0>] zap_count+0x48/0x120 [zfs]
[<0>] approx_errlog_size_impl.part.0+0x78/0xe0 [zfs]
[<0>] spa_approx_errlog_size+0x17f/0x1e0 [zfs]
[<0>] spa_get_stats+0xcc/0x530 [zfs]
[<0>] zfs_ioc_pool_stats+0x3e/0xa0 [zfs]
[<0>] zfsdev_ioctl_common+0x8ae/0x970 [zfs]
[<0>] zfsdev_ioctl+0x57/0xf0 [zfs]
[<0>] __x64_sys_ioctl+0xa5/0x100
[<0>] x64_sys_call+0x1151/0x2330
[<0>] do_syscall_64+0x80/0x8f0
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
ZFS management commands hang with inaccessible block device
System information
Describe the problem you're observing
The
zpool status,zpool clearcommands are hanging when a pool vdev is inaccessible.I have multiple ZFS pools in my system. Temporarily I also have a pool thats accessible through a USB HDD dock.
This pool has a single vdev which is a LUKS device managed by device mapper. The LUKS device is directly stored on the drive that is standing in the HDD dock.
When the drive in the dock becomes inaccessible, because I moved it slightly by accident, I put the computer to sleep or I turned off the dock, thats when the zpool commands start misbehaving.
After the connection is restored, the drive itself is accessible just fine,
sudo hexdump --length 256 /dev/sdXworks on it. Sometimes I have to reopen the LUKS device, but I think not always.But after restarting the LUKS device, zpool management commands are running endlessly. If I query the status of a specific pool that works, but if I just run
zpool statuslike this it will hang, and I cant cancel it with Control+C. Same withzpool clear, except I did not try to use it for my other pools.Files on other pools are accessible, but even an
lsin the pool directory hangs, as does all other processes, and Ctrl+C does not help for them either.Similar issues were reported in the past: #14426, #14491
They got no resolution because the reporter not responding or having worked around the issue.
I think this is a major issue, and I will be able to keep this setup for testing for some time if I can help with diagnosing the problem.
Describe how to reproduce the problem
I will describe it with a HDD dock, but according to the mentioned issues it should be reproducible with a spare pendrive and unplugging it.
zpool status, and it should hangInclude any warning/errors/backtraces from the system logs
dmesg does not contain much relevant:
Logs from
/proc/spl/kstat/zfs/dbgmsg, without logs of other pools:https://gist.github.com/mpeter50/fcfe20f8c0b96b75f7aee27eadf2ebd4
According to
ps -elfL, the hanging zpool processes are in uninterruptable sleep:In man ps, at "PROCESS STATE CODES", it says:
The stack trace of one of them according to
/proc/3161959/stack: