-
Notifications
You must be signed in to change notification settings - Fork 91
Description
I have proxmox + linstor + linstor-proxmox plugin environment.
linstor: 1.31.1-1
proxmox: 8.4.0
linstor-proxmox: 8.1.1-1
When vm disk is linstor volume (1 replica drbd) for which local (pve node where vm is registered ) node is diskless, attempt to move such a disk to another storage when vm is powered off fails.
proxmox says:
blockdev: cannot open /dev/drbd/by-res/pm-901dd2b4/0: No such file or directory
create full clone of drive scsi0 (ext-hdd-repl1:pm-901dd2b4_10017)
failed to stat '/dev/drbd/by-res/pm-901dd2b4/0'
Use of uninitialized value $size in division (/) at /usr/share/perl5/PVE/QemuServer.pm line 8491.
NOTICE
Trying to create diskful resource (pm-c6380af5) on (cirrus1).
Use of uninitialized value $size in multiplication (*) at /usr/share/perl5/PVE/QemuServer.pm line 8084.
Use of uninitialized value $value in numeric gt (>) at /usr/share/perl5/PVE/Format.pm line 74.
Use of uninitialized value $value in sprintf at /usr/share/perl5/PVE/Format.pm line 79.
transferred 0.0 B of 0.0 B (0.00%)
qemu-img: output file is smaller than input file
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f raw -O raw /dev/drbd/by-res/pm-901dd2b4/0 zeroinit:/dev/drbd/by-res/pm-c6380af5/0' failed: exit code 1
From host logs:
May 29 17:26:42 cirrus1 Satellite[3348808]: 2025-05-29 17:26:42.380 [DeviceManager] INFO LINSTOR/Satellite/3bb565 SYSTEM - DRBD regenerated resource file: /var/lib/linstor.d/pm-901dd2b4.res
May 29 17:26:42 cirrus1 kernel: drbd pm-901dd2b4: Starting worker thread (node-id 1)
May 29 17:26:42 cirrus1 kernel: drbd pm-901dd2b4 superstorage: Starting sender thread (peer-node-id 0)
May 29 17:26:42 cirrus1 kernel: drbd pm-901dd2b4 superstorage: conn( StandAlone -> Unconnected ) [connect]
May 29 17:26:42 cirrus1 kernel: drbd pm-901dd2b4 superstorage: Starting receiver thread (peer-node-id 0)
May 29 17:26:42 cirrus1 kernel: drbd pm-901dd2b4 superstorage: conn( Unconnected -> Connecting ) [connecting]
May 29 17:26:42 cirrus1 Satellite[3348808]: 2025-05-29 17:26:42.395 [DeviceManager] INFO LINSTOR/Satellite/3bb565 SYSTEM - Resource 'pm-901dd2b4' [DRBD] adjusted.
May 29 17:26:42 cirrus1 Satellite[3348808]: 2025-05-29 17:26:42.401 [DeviceManager] INFO LINSTOR/Satellite/ SYSTEM - End DeviceManager cycle 902
May 29 17:26:42 cirrus1 Satellite[3348808]: 2025-05-29 17:26:42.401 [DeviceManager] INFO LINSTOR/Satellite/c218b0 SYSTEM - Begin DeviceManager cycle 903
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4 superstorage: Handshake to peer 0 successful: Agreed network protocol version 122
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4 superstorage: Feature flags enabled on protocol level: 0x7f TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES RESYNC_DAGTAG
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4 superstorage: Peer authenticated using 20 bytes HMAC
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4: Preparing remote state change 828630895: 0->1 role( Secondary ) conn( Connected )
May 29 17:26:43 cirrus1 kernel: drbd1078: detected capacity change from 0 to 272629760
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4/0 drbd1078: size = 130 GB (136314880 KB)
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4/0 drbd1078 superstorage: my exposed UUID: 0000000000000000
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4/0 drbd1078 superstorage: peer 1658BD60BA1EFD68:FFFFFFFFFFFFFFFF:0000000000000000:0000000000000000 bits:0 flags:20
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4/0 drbd1078: Setting exposed data uuid: 1658BD60BA1EFD68
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4 superstorage: Committing remote state change 828630895 (primary_nodes=0)
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4 superstorage: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [remote]
May 29 17:26:43 cirrus1 kernel: drbd pm-901dd2b4/0 drbd1078 superstorage: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) [remote]
May 29 17:26:43 cirrus1 pvedaemon[722348]: failed to stat '/dev/drbd/by-res/pm-901dd2b4/0'
May 29 17:26:43 cirrus1 pvedaemon[722348]: Use of uninitialized value $size in division (/) at /usr/share/perl5/PVE/QemuServer.pm line 8491.
linstor volume info:
root@cirrus1:~# linstor v l |grep pm-901dd2b4
| pm-901dd2b4 | cirrus1 | DfltDisklessStorPool | 0 | 1078 | /dev/drbd1078 | | Unused | Diskless | Established(1) |
| pm-901dd2b4 | superstorage | external_hdd_thin | 0 | 1078 | /dev/drbd1078 | 19.52 GiB | Unused | UpToDate | Established(1) |
It looks like udevd did not manage to create drbd device link(s) (/dev/drbd/by-res/XXX) "in time" (before proxmox tries to use it).
I think satellite should wait until udevd creates links for drbd device (using "udevadm settle" ?) before returning success status to controller in this situation.
For now I managed to work-around the problem adding "udevadm settle" system call in linstor-proxmox plugin code (after calling controller in activate_volume function),