Skip to content

ZFS management commands hang with inaccessible block device #18446

@mpeter50

Description

@mpeter50

ZFS management commands hang with inaccessible block device

System information

Type Version/Name
Distribution Name Proxmox VE
Distribution Version 9.1.7
Kernel Version 6.17.13-2-pve
Architecture x86_64
OpenZFS Version 2.4.1-pve1

Describe the problem you're observing

The zpool status, zpool clear commands are hanging when a pool vdev is inaccessible.

I have multiple ZFS pools in my system. Temporarily I also have a pool thats accessible through a USB HDD dock.
This pool has a single vdev which is a LUKS device managed by device mapper. The LUKS device is directly stored on the drive that is standing in the HDD dock.

When the drive in the dock becomes inaccessible, because I moved it slightly by accident, I put the computer to sleep or I turned off the dock, thats when the zpool commands start misbehaving.
After the connection is restored, the drive itself is accessible just fine, sudo hexdump --length 256 /dev/sdX works on it. Sometimes I have to reopen the LUKS device, but I think not always.
But after restarting the LUKS device, zpool management commands are running endlessly. If I query the status of a specific pool that works, but if I just run zpool status like this it will hang, and I cant cancel it with Control+C. Same with zpool clear, except I did not try to use it for my other pools.
Files on other pools are accessible, but even an ls in the pool directory hangs, as does all other processes, and Ctrl+C does not help for them either.

Similar issues were reported in the past: #14426, #14491
They got no resolution because the reporter not responding or having worked around the issue.
I think this is a major issue, and I will be able to keep this setup for testing for some time if I can help with diagnosing the problem.

Describe how to reproduce the problem

I will describe it with a HDD dock, but according to the mentioned issues it should be reproducible with a spare pendrive and unplugging it.

  1. Insert empty HDD to dock, connect dock to computer, turn on dock
  2. Format HDD as LUKS device, and open it:
head --bytes 1024 /dev/random | sudo tee /tmp/lukskey
cryptsetup luksFormat /dev/disk/by-id/disk-id /tmp/lukskey

cryptsetup luksOpen /dev/disk/by-id/disk-id yellow_pool_zfs_1 --key-file
  1. create zpool with /dev/mapper/yellow_pool_zfs_1 as its single vdev
  2. write something to the pool for good measure
  3. remove HDD from dock
  4. read the file you have written to the pool
  5. run zpool status, and it should hang

Include any warning/errors/backtraces from the system logs

dmesg does not contain much relevant:

WARNING: Pool 'yellow_pool' has encountered an uncorrectable I/O failure and has been suspended.

Logs from /proc/spl/kstat/zfs/dbgmsg, without logs of other pools:
https://gist.github.com/mpeter50/fcfe20f8c0b96b75f7aee27eadf2ebd4

According to ps -elfL, the hanging zpool processes are in uninterruptable sleep:

$ ps -elfL | egrep zpool
4 D root     3161959 3161958 3161959  0    1  80   0 -  5179 -      19:05 pts/2    00:00:00 zpool status
4 D root     3173850 3173849 3173850  0    1  80   0 -  5179 -      19:13 ?        00:00:00 zpool status
4 D root     3174017 3174016 3174017  0    1  80   0 -  5098 -      19:13 pts/9    00:00:00 zpool clear yellow_pool yellow_pool_zfs_1
0 S apophis  3289002 3167725 3289002  0    1  80   0 -  2391 anon_p 20:37 pts/5    00:00:00 grep -E --color=auto zpool

In man ps, at "PROCESS STATE CODES", it says:

"D uninterruptible sleep (usually IO)"

The stack trace of one of them according to /proc/3161959/stack:

[<0>] __cv_timedwait_common+0x143/0x180 [spl]
[<0>] __cv_timedwait_io+0x19/0x30 [spl]
[<0>] zio_wait+0x146/0x2f0 [zfs]
[<0>] dbuf_read+0x353/0x6d0 [zfs]
[<0>] dmu_buf_hold_by_dnode+0x5c/0xa0 [zfs]
[<0>] zap_lockdir+0xa1/0x110 [zfs]
[<0>] zap_count+0x48/0x120 [zfs]
[<0>] approx_errlog_size_impl.part.0+0x78/0xe0 [zfs]
[<0>] spa_approx_errlog_size+0x17f/0x1e0 [zfs]
[<0>] spa_get_stats+0xcc/0x530 [zfs]
[<0>] zfs_ioc_pool_stats+0x3e/0xa0 [zfs]
[<0>] zfsdev_ioctl_common+0x8ae/0x970 [zfs]
[<0>] zfsdev_ioctl+0x57/0xf0 [zfs]
[<0>] __x64_sys_ioctl+0xa5/0x100
[<0>] x64_sys_call+0x1151/0x2330
[<0>] do_syscall_64+0x80/0x8f0
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions