Skip to content

Conversation

@huww98
Copy link
Contributor

@huww98 huww98 commented Jan 5, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:

We used to use /etc/kubernetes/volumes/disk/d-*.conf files to record the relationships between disks without serial number and the device path. But it has multiple drawbacks:

  • leak: we may fail to remove the files we created
  • inaccurate: if the disk is detached, switched driver, the device will gone without our knowledge. In this case, the conf file may points to a non-exist file, or even a wrong file.

Replace the conf files with xattr, which we are already using to calculate number of volumes available on node. The xattrs are attached to the device inode, and will go with the inode. So that we don't need to worry about any cleanup or inaccuracy.

Old conf files are migrated to xattr in one go, in the init container.

As a bonus, the partition support is now more decoupled, and should work on more scenarios. xattrs are always attached to root block device, not partition.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

/hold
for manual verification on disks without serial number

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 5, 2026
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 5, 2026
@huww98
Copy link
Contributor Author

huww98 commented Jan 10, 2026

manual verification:

Before upgrade, two disks attached to the node:

  • d-2ze4p69x0n3tx1ch0xch: a disk without serial number, with a partition
  • d-2zecxehhi1afgh1rl7dh: newly created disk.
[root@iZ2zeaaxogmsvnc525axfoZ ~]# grep '' /etc/kubernetes/volumes/disk/*.conf
/etc/kubernetes/volumes/disk/d-2ze4p69x0n3tx1ch0xch.conf:/dev/nvme3n1p1
/etc/kubernetes/volumes/disk/d-2zecxehhi1afgh1rl7dh.conf:/dev/disk/by-id/nvme-Alibaba_Cloud_Elastic_Block_Storage_2zecxehhi1afgh1rl7dh

Upgrade:
init:

migrating disk conf: /host/etc/kubernetes/volumes/disk/d-2ze4p69x0n3tx1ch0xch.conf
device /dev/nvme3n1p1 is a partition of /dev/nvme3n1
cat: can't open '/sys/devices/pci0000:00/0000:00:09.0/nvme/nvme3/nvme3n1/serial': No such file or directory
device /dev/nvme3n1 has no serial, assigning disk ID d-2ze4p69x0n3tx1ch0xch
migrating disk conf: /host/etc/kubernetes/volumes/disk/d-2zecxehhi1afgh1rl7dh.conf
device /dev/disk/by-id/nvme-Alibaba_Cloud_Elastic_Block_Storage_2zecxehhi1afgh1rl7dh is a symlink, skip

unmount globalmount, optionally restart csi, then restart kubelet to verify load devMap from xattr:

I0110 17:47:41.041626   18140 nodeserver.go:509] NodeStageVolume: Stage VolumeId: d-2ze4p69x0n3tx1ch0xch, Target Path: /var/lib/kubelet/plugins/kubernetes.io/csi/diskplugin.csi.alibabacloud.com/2d5f5866b9f40da3d6181467c60adda021b5d011f52c7c195162dd1fb370c409/globalmount, VolumeContext: map[]
I0110 17:47:41.041810   18140 cloud.go:98] GetRootBlockDevice: got disk d-2ze4p69x0n3tx1ch0xch device name /dev/nvme3n1 from devMap
I0110 17:47:41.041817   18140 bdf.go:543] NewDeviceDriver: start to get deviceNumber from device: /dev/nvme3n1
I0110 17:47:41.041841   18140 device_manager.go:404] NewDeviceDriver: get symlink dir: /sys/devices/pci0000:00/0000:00:09.0/nvme/nvme3/nvme3n1
I0110 17:47:41.041846   18140 device_manager.go:411] NewDeviceDriver: busPrefix: ^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}, parentDir: nvme3, matched: false
I0110 17:47:41.041850   18140 device_manager.go:404] NewDeviceDriver: get symlink dir: /sys/devices/pci0000:00/0000:00:09.0/nvme/nvme3
I0110 17:47:41.041853   18140 device_manager.go:411] NewDeviceDriver: busPrefix: ^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}, parentDir: nvme, matched: false
I0110 17:47:41.041855   18140 device_manager.go:404] NewDeviceDriver: get symlink dir: /sys/devices/pci0000:00/0000:00:09.0/nvme
I0110 17:47:41.041860   18140 device_manager.go:411] NewDeviceDriver: busPrefix: ^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}, parentDir: 0000:00:09.0, matched: true
I0110 17:47:41.041876   18140 nodeserver.go:1481] "checkMountedOfRunvAndRund: check pvmMounted" device="/dev/nvme3n1" pvmMounted=false driver="nvme"
I0110 17:47:41.041884   18140 cloud.go:195] "Starting Do AttachDisk" method="/csi.v1.Node/NodeStageVolume" volumeID="d-2ze4p69x0n3tx1ch0xch"
I0110 17:47:41.115795   18140 low_latency.go:187] "got batch" type="disk" n=1 requestID="252FCE8B-FE8F-514F-A5E8-FDAC1DD8D403" duration="69.983169ms" wait="3.90955ms"
I0110 17:47:41.126638   18140 nodeserver.go:585] NodeStageVolume: Volume Successful Attached: d-2ze4p69x0n3tx1ch0xch, to Node: i-2zeaaxogmsvnc525axfo, Device: /dev/nvme3n1p1

Simulating the devMap is not accurate:

[root@iZ2zeaaxogmsvnc525axfoZ ~]# setfattr -x trusted.csi-managed-disk /dev/nvme1n1
[root@iZ2zeaaxogmsvnc525axfoZ ~]# umount /var/lib/kubelet/plugins/kubernetes.io/csi/diskplugin.csi.alibabacloud.com/2d5f5866b9f40da3d6181467c60adda021b5d011f52c7c195162dd1fb370c409/globalmount
[root@iZ2zeaaxogmsvnc525axfoZ ~]# systemctl restart kubelet.service
I0110 18:18:42.760176   21112 nodeserver.go:509] NodeStageVolume: Stage VolumeId: d-2ze4p69x0n3tx1ch0xch, Target Path: /var/lib/kubelet/plugins/kubernetes.io/csi/diskplugin.csi.alibabacloud.com/2d5f5866b9f40da3d6181467c60adda021b5d011f52c7c195162dd1fb370c409/globalmount, VolumeContext: map[]
I0110 18:18:42.760316   21112 xattr.go:117] "disk has no xattr" dev="/dev/nvme1n1" diskID="d-2ze4p69x0n3tx1ch0xch"
W0110 18:18:42.760337   21112 nodeserver.go:1463] NodeStageVolume: GetVolumeDeviceName failed: [get by link "/dev/disk/by-id/virtio-2ze4p69x0n3tx1ch0xch" failed: no such file or directory, get by link "/dev/disk/by-id/nvme-Alibaba_Cloud_Elastic_Block_Storage_2ze4p69x0n3tx1ch0xch" failed: no such file or directory, find by serial: file does not exist]
E0110 18:18:42.760488   21112 bdf.go:593] "Failed to execute xdragon-bdf command" err="fork/exec /usr/bin/nsenter: no such file or directory" volumeId="d-2ze4p69x0n3tx1ch0xch" output=""
E0110 18:18:42.760556   21112 bdf.go:593] "Failed to execute xdragon-bdf command" err="fork/exec /usr/bin/nsenter: no such file or directory" volumeId="d-2ze4p69x0n3tx1ch0xch" output=""
E0110 18:18:42.760562   21112 nodeserver.go:1468] "NodeStageVolume:  Failed to get bdf number" err="Failed to find device number for d-2ze4p69x0n3tx1ch0xch" volumeId="d-2ze4p69x0n3tx1ch0xch"
I0110 18:18:42.760568   21112 cloud.go:195] "Starting Do AttachDisk" method="/csi.v1.Node/NodeStageVolume" volumeID="d-2ze4p69x0n3tx1ch0xch"
I0110 18:18:42.843227   21112 low_latency.go:187] "got batch" type="disk" n=1 requestID="B8426F15-0733-5B9A-8EC3-7CE0B628A529" duration="78.736296ms" wait="3.909001ms"
W0110 18:18:42.843248   21112 cloud.go:257] AttachDisk: Disk (no serial) d-2ze4p69x0n3tx1ch0xch is already attached to instance i-2zeaaxogmsvnc525axfo, but device unknown, will be detached and try again
I0110 18:18:42.843254   21112 cloud.go:275] AttachDisk: Disk d-2ze4p69x0n3tx1ch0xch is already attached to instance i-2zeaaxogmsvnc525axfo, will be detached
I0110 18:18:42.984471   21112 cloud.go:287] AttachDisk: Wait for disk d-2ze4p69x0n3tx1ch0xch to be detached
I0110 18:18:43.055697   21112 batched.go:216] "polled batch" type="disk" n=1 interval="2.269µs" duration="71.198942ms" requestID="474B8D9C-0644-593F-9EA6-1ADCA2D34703"
I0110 18:18:43.055717   21112 batched.go:120] "poll response processed" type="disk" queueDepth=1 requeue=1
I0110 18:18:45.069247   21112 batched.go:216] "polled batch" type="disk" n=1 interval="1.928777808s" duration="84.740777ms" requestID="D0DF021E-E9D0-513B-BC2C-959F0B4EB787"
I0110 18:18:45.069273   21112 batched.go:120] "poll response processed" type="disk" queueDepth=0 requeue=0
I0110 18:18:45.299942   21112 cloud.go:341] AttachDisk: Waiting for Disk d-2ze4p69x0n3tx1ch0xch is Attached to instance i-2zeaaxogmsvnc525axfo with RequestId: 5CF90C09-8642-5AA4-B26C-7FB83D523E3E
I0110 18:18:47.050801   21112 batched.go:216] "polled batch" type="disk" n=1 interval="1.6845354s" duration="66.295782ms" requestID="9728B7A3-88F0-5338-9F66-D2473C54F78F"
I0110 18:18:47.050824   21112 batched.go:120] "poll response processed" type="disk" queueDepth=0 requeue=0
I0110 18:18:47.050902   21112 cloud.go:153] "found device by diff" method="/csi.v1.Node/NodeStageVolume" volumeID="d-2ze4p69x0n3tx1ch0xch" device="/dev/nvme2n1"
I0110 18:18:47.066834   21112 nodeserver.go:585] NodeStageVolume: Volume Successful Attached: d-2ze4p69x0n3tx1ch0xch, to Node: i-2zeaaxogmsvnc525axfo, Device: /dev/nvme2n1p1
I0110 18:18:47.066852   21112 util.go:574] formatAndMount: mount options : [shared]
I0110 18:18:47.076557   21112 nodeserver.go:660] "mount successful" method="/csi.v1.Node/NodeStageVolume" volumeID="d-2ze4p69x0n3tx1ch0xch" target="/var/lib/kubelet/plugins/kubernetes.io/csi/diskplugin.csi.alibabacloud.com/2d5f5866b9f40da3d6181467c60adda021b5d011f52c7c195162dd1fb370c409/globalmount" device="/dev/nvme2n1p1" mkfsOptions=[] options=["shared"]

Move DiskXattrName and DiskXattrVirtioBlkName variables from
nodeserver.go to a new file xattr.go. This is a pure code move
with no functional changes.
We used to use /etc/kubernetes/volumes/disk/d-*.conf files to record the
relationships between disks without serial number and the device path. But it
has multiple drawbacks:
- leak: we may fail to remove the files we created
- inaccurate: if the disk is detached, switched driver, the device will gone
  without our knowledge. In this case, the conf file may points to a non-exist
  file, or even a wrong file.

Replace the conf files with xattr, which we are already using to calculate
number of volumes available on node. The xattrs are attached to the device
inode, and will go with the inode. So that we don't need to worry about any
cleanup or inaccuracy.

Old conf files are migrated to xattr in one go, in the init container.

As a bonus, the partition support is now more decoupled, and should work on
more scenarios. xattrs are always attached to root block device, not partition.
@huww98
Copy link
Contributor Author

huww98 commented Jan 11, 2026

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 11, 2026
@mowangdk
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: huww98, mowangdk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 12, 2026
@k8s-ci-robot k8s-ci-robot merged commit edf8fd3 into kubernetes-sigs:master Jan 12, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants