Skip to content

nvmeof: Implement safe controller disconnect logic#6163

Merged
mergify[bot] merged 4 commits intoceph:develfrom
gadididi:nvmeof/add_disconnect
Mar 18, 2026
Merged

nvmeof: Implement safe controller disconnect logic#6163
mergify[bot] merged 4 commits intoceph:develfrom
gadididi:nvmeof/add_disconnect

Conversation

@gadididi
Copy link
Copy Markdown
Contributor

@gadididi gadididi commented Mar 8, 2026

Describe what this PR does

Summary

This PR adds proper nvme-of controller disconnect logic to the CSI driver. When a pod is deleted and its volume is unstaged, we now safely disconnect from nvme-of controllers only when no other volumes are using them.

Why We Need This

The Original Problem

Before this change, when NodeUnstageVolume was called, we would unmount the volume but leave the NVMe-oF connection active. This meant NVMe controllers stayed connected even after all pods were deleted. In next Pod creation (in same nvmeof subsystem) in some environments (like in the e2e test here: #6058 ) the orphaned connection was stuck, with no reason. The cleanest way to handle it is to disconnect from the controller\s (host-subsystem pair\s) if this pod is the last mounted

Why We Can't Just Always Disconnect

In NVMe-oF multipath setups, multiple volumes share the same controllers. For example:

Subsystem: nqn.2016-06.io.spdk:cnode1
├── Namespace 1 (nvme0n1) - Used by PVC-1 and the device mounted for POD-1
└── Namespace 2 (nvme0n2) - Used by PVC-2 and the device mounted for POD-2

Both namespaces are accessed via:

├── Controller nvme0 (gateway 10.0.0.1:4420)
└── Controller nvme1 (gateway 10.0.0.2:4420)

If we disconnect controllers when POD-1 is deleted, we break POD-2 mounting which is still using those same controllers!!!

The Challenge

We need to track which volumes are still in use and only disconnect controllers when:

The volume being unstaged is no longer mounted
No other volumes are using the same controllers

Solution Overview

This PR implements a reference-counting approach
(on demand, without metadata file. just using the current state):

  • When NodeUnstageVolume is called, we first unmount the volume
  • We then check if this was the last mounted namespace (=nvme device) on each controller
  • Only if a controller has no other mounted namespaces, we disconnect it

FYI- the connection operation in NodeStageVolume() connects to entire listeners (it means every listener connection ==> creates a controller == that is host-listener of specific subsystem connection) .
So actually, it is not possible that some nvmeof namespace (in same subsystem)
will have different controllers. It means all nvme namespaces (in same subsystem) are sharing the same controllers!!

Implementation Details

New Functions:

getDeviceFromStagingPath()

  • Finds the NVMe device path for a given staging path
  • Uses findmnt with JSON output for reliable parsing
  • Handles both filesystem and block volumes

getNVMeMountedDevices()

  • Scans the entire mount table in a single findmnt call
  • Returns a map of all currently mounted NVMe devices
  • Filters for NVMe-oF CSI staging paths only

DisconnectIfLastMount()

  • Core disconnect logic with reference counting
  • Gets all controllers for the device being unstaged
  • For each controller, checks if other namespaces are still mounted (as I wrote above , all controllers have the same data!!)
  • Only disconnects if no other namespaces are in use

parseDeviceFromFindmntJSON()

  • Parses findmnt JSON output to extract device paths
  • Handles two different mount formats (see below)

Is there anything that requires special attention

** I did not add lock mechanism. I MUST add it. parallel calls of NodeStageVolume() + NodeUnStageVolume() can make issues.
I am going to add it in the next PR.
**

Related issues\PR

#6058

Future concerns

Adding lock mechanism in the next PR

Example

[root@cephnvme-devel-server-gadi nvmeof]# kubectl get pvc
NAME                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                     VOLUMEATTRIBUTESCLASS   AGE
cephcsi-nvmeof-pvc    Bound    pvc-4149616b-39b6-4d15-8e11-55643a2e6c70   64Mi       RWO            ocs-storagecluster-ceph-nvmeof   <unset>                 4h58m
cephcsi-nvmeof-pvc3   Bound    pvc-96834e31-fff5-4a99-8a75-8bb9ff378675   64Mi       RWO            ocs-storagecluster-ceph-nvmeof   <unset>                 3s
raw-block-pvc         Bound    pvc-81024dbc-dc61-438d-88d5-8852964c7fb5   64Mi       RWO            ocs-storagecluster-ceph-nvmeof   <unset>                 4m32s
[root@cephnvme-devel-server-gadi nvmeof]# kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
nvmeof-test-pod             1/1     Running   0          4h58m
nvmeof-test-pod3            1/1     Running   0          4s
pod-with-raw-block-volume   1/1     Running   0          4m36s

Delete pod3

I0308 14:22:06.339907  594156 utils.go:350] ID: 245 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0308 14:22:06.339966  594156 utils.go:351] ID: 245 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd GRPC request: {"target_path":"/var/lib/kubelet/pods/a70454ab-177d-46c7-ab6b-c2b1c85d7a5b/volumes/kubernetes.io~csi/pvc-96834e31-fff5-4a99-8a75-8bb9ff378675/mount","volume_id":"0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd"}
I0308 14:22:06.340261  594156 mount_linux.go:402] Unmounting /var/lib/kubelet/pods/a70454ab-177d-46c7-ab6b-c2b1c85d7a5b/volumes/kubernetes.io~csi/pvc-96834e31-fff5-4a99-8a75-8bb9ff378675/mount
I0308 14:22:06.342453  594156 nodeserver.go:297] ID: 245 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd nvmeof: successfully unbound volume 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd from /var/lib/kubelet/pods/a70454ab-177d-46c7-ab6b-c2b1c85d7a5b/volumes/kubernetes.io~csi/pvc-96834e31-fff5-4a99-8a75-8bb9ff378675/mount
I0308 14:22:06.342508  594156 utils.go:357] ID: 245 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd GRPC response: {}
I0308 14:22:06.447995  594156 utils.go:350] ID: 246 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0308 14:22:06.448022  594156 utils.go:351] ID: 246 GRPC request: {}
I0308 14:22:06.448104  594156 utils.go:357] ID: 246 GRPC response: {"capabilities":[{"rpc":{"type":"STAGE_UNSTAGE_VOLUME"}},{"rpc":{"type":"SINGLE_NODE_MULTI_WRITER"}},{"rpc":{"type":"EXPAND_VOLUME"}}]}
I0308 14:22:06.449180  594156 utils.go:350] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd GRPC call: /csi.v1.Node/NodeUnstageVolume
I0308 14:22:06.449224  594156 utils.go:351] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/f967c4f2ac08da70d983c7ff5c6f249783312f892ce4e7877f6cb15bb6031434/globalmount","volume_id":"0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd"}
I0308 14:22:06.452563  594156 cephcmds.go:165] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd command succeeded: findmnt [--mountpoint /var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/f967c4f2ac08da70d983c7ff5c6f249783312f892ce4e7877f6cb15bb6031434/globalmount/0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd --output SOURCE --noheadings --first-only -J]
I0308 14:22:06.452684  594156 nodeserver.go:834] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd found filesystem volume device: /dev/nvme0n3
I0308 14:22:06.452715  594156 nodeserver.go:801] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd found device /dev/nvme0n3 for staging path /var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/f967c4f2ac08da70d983c7ff5c6f249783312f892ce4e7877f6cb15bb6031434/globalmount/0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd
I0308 14:22:06.452796  594156 mount_linux.go:402] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/f967c4f2ac08da70d983c7ff5c6f249783312f892ce4e7877f6cb15bb6031434/globalmount/0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd
I0308 14:22:06.463632  594156 nodeserver.go:340] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd successfully unmounted volume (0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd) from staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/f967c4f2ac08da70d983c7ff5c6f249783312f892ce4e7877f6cb15bb6031434/globalmount/0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd)
I0308 14:22:06.463708  594156 nodeserver.go:354] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd successfully removed staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/f967c4f2ac08da70d983c7ff5c6f249783312f892ce4e7877f6cb15bb6031434/globalmount/0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd)
I0308 14:22:06.467335  594156 cephcmds.go:165] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd command succeeded: findmnt [-J --list --output SOURCE,TARGET]
I0308 14:22:06.467841  594156 nodeserver.go:834] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd found filesystem volume device: /dev/nvme0n1
I0308 14:22:06.467863  594156 nodeserver.go:883] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd Found mounted NVMe device: /dev/nvme0n1 at /var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/d64c8b9ba97ceb85d4c4f7fc125175428c3058f314a9a9424da3da4655460538/globalmount/0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e
I0308 14:22:06.467885  594156 nodeserver.go:825] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd parsed block volume device: /dev/nvme0n2 from devtmpfs[/nvme0n2]
I0308 14:22:06.467895  594156 nodeserver.go:883] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd Found mounted NVMe device: /dev/nvme0n2 at /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10
I0308 14:22:06.467903  594156 nodeserver.go:887] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd Found 2 mounted NVMe devices total
I0308 14:22:06.473797  594156 cephcmds.go:165] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd command succeeded: nvme [list-subsys /dev/nvme0n3 -o json]
I0308 14:22:06.476624  594156 cephcmds.go:165] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd command succeeded: nvme [list-ns /dev/nvme0 -o json]
I0308 14:22:06.476716  594156 nvmeof_initiator.go:458] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd Found other mounted namespace /dev/nvme0n1 on controller nvme0
I0308 14:22:06.476728  594156 nvmeof_initiator.go:252] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd Controller nvme0 has other mounted namespaces, skipping disconnect
I0308 14:22:06.476772  594156 utils.go:357] ID: 247 Req-ID: 0001-0011-openshift-storage-0000000000000002-db4f7f3b-cce8-4bb4-8bf8-6d7283df35bd GRPC response: {}

Delete pod2:

I0308 14:26:37.800919  594156 utils.go:350] ID: 252 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0308 14:26:37.801105  594156 utils.go:351] ID: 252 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e GRPC request: {"target_path":"/var/lib/kubelet/pods/98d77833-d0c7-4f68-b546-28e22a7e92e5/volumes/kubernetes.io~csi/pvc-4149616b-39b6-4d15-8e11-55643a2e6c70/mount","volume_id":"0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e"}
I0308 14:26:37.801249  594156 mount_linux.go:402] Unmounting /var/lib/kubelet/pods/98d77833-d0c7-4f68-b546-28e22a7e92e5/volumes/kubernetes.io~csi/pvc-4149616b-39b6-4d15-8e11-55643a2e6c70/mount
I0308 14:26:37.803299  594156 nodeserver.go:297] ID: 252 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e nvmeof: successfully unbound volume 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e from /var/lib/kubelet/pods/98d77833-d0c7-4f68-b546-28e22a7e92e5/volumes/kubernetes.io~csi/pvc-4149616b-39b6-4d15-8e11-55643a2e6c70/mount
I0308 14:26:37.803395  594156 utils.go:357] ID: 252 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e GRPC response: {}
I0308 14:26:37.907448  594156 utils.go:350] ID: 253 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0308 14:26:37.907495  594156 utils.go:351] ID: 253 GRPC request: {}
I0308 14:26:37.907562  594156 utils.go:357] ID: 253 GRPC response: {"capabilities":[{"rpc":{"type":"STAGE_UNSTAGE_VOLUME"}},{"rpc":{"type":"SINGLE_NODE_MULTI_WRITER"}},{"rpc":{"type":"EXPAND_VOLUME"}}]}
I0308 14:26:37.908607  594156 utils.go:350] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e GRPC call: /csi.v1.Node/NodeUnstageVolume
I0308 14:26:37.908659  594156 utils.go:351] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/d64c8b9ba97ceb85d4c4f7fc125175428c3058f314a9a9424da3da4655460538/globalmount","volume_id":"0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e"}
I0308 14:26:37.911972  594156 cephcmds.go:165] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e command succeeded: findmnt [--mountpoint /var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/d64c8b9ba97ceb85d4c4f7fc125175428c3058f314a9a9424da3da4655460538/globalmount/0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e --output SOURCE --noheadings --first-only -J]
I0308 14:26:37.912010  594156 nodeserver.go:834] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e found filesystem volume device: /dev/nvme0n1
I0308 14:26:37.912018  594156 nodeserver.go:801] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e found device /dev/nvme0n1 for staging path /var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/d64c8b9ba97ceb85d4c4f7fc125175428c3058f314a9a9424da3da4655460538/globalmount/0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e
I0308 14:26:37.912090  594156 mount_linux.go:402] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/d64c8b9ba97ceb85d4c4f7fc125175428c3058f314a9a9424da3da4655460538/globalmount/0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e
I0308 14:26:37.921283  594156 nodeserver.go:340] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e successfully unmounted volume (0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e) from staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/d64c8b9ba97ceb85d4c4f7fc125175428c3058f314a9a9424da3da4655460538/globalmount/0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e)
I0308 14:26:37.921374  594156 nodeserver.go:354] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e successfully removed staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/nvmeof.csi.ceph.com/d64c8b9ba97ceb85d4c4f7fc125175428c3058f314a9a9424da3da4655460538/globalmount/0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e)
I0308 14:26:37.925000  594156 cephcmds.go:165] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e command succeeded: findmnt [-J --list --output SOURCE,TARGET]
I0308 14:26:37.925391  594156 nodeserver.go:825] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e parsed block volume device: /dev/nvme0n2 from devtmpfs[/nvme0n2]
I0308 14:26:37.925408  594156 nodeserver.go:883] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e Found mounted NVMe device: /dev/nvme0n2 at /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10
I0308 14:26:37.925414  594156 nodeserver.go:887] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e Found 1 mounted NVMe devices total
I0308 14:26:37.930972  594156 cephcmds.go:165] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e command succeeded: nvme [list-subsys /dev/nvme0n1 -o json]
I0308 14:26:37.933539  594156 cephcmds.go:165] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e command succeeded: nvme [list-ns /dev/nvme0 -o json]
I0308 14:26:37.933577  594156 nvmeof_initiator.go:458] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e Found other mounted namespace /dev/nvme0n2 on controller nvme0
I0308 14:26:37.933586  594156 nvmeof_initiator.go:252] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e Controller nvme0 has other mounted namespaces, skipping disconnect
I0308 14:26:37.933617  594156 utils.go:357] ID: 254 Req-ID: 0001-0011-openshift-storage-0000000000000002-90d76f35-08a3-4076-8ca3-4ed39dd7b18e GRPC response: {}

Delete pod1

I0308 14:28:46.898913  594156 utils.go:350] ID: 257 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0308 14:28:46.898987  594156 utils.go:351] ID: 257 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 GRPC request: {"target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/e94ae8c2-d23c-4b69-bb1b-be2c3e6c3e49","volume_id":"0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10"}
I0308 14:28:46.899093  594156 mount_linux.go:402] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/e94ae8c2-d23c-4b69-bb1b-be2c3e6c3e49
I0308 14:28:46.902139  594156 nodeserver.go:297] ID: 257 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 nvmeof: successfully unbound volume 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 from /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/e94ae8c2-d23c-4b69-bb1b-be2c3e6c3e49
I0308 14:28:46.902176  594156 utils.go:357] ID: 257 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 GRPC response: {}
I0308 14:28:46.990648  594156 utils.go:350] ID: 258 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0308 14:28:46.990676  594156 utils.go:351] ID: 258 GRPC request: {}
I0308 14:28:46.990751  594156 utils.go:357] ID: 258 GRPC response: {"capabilities":[{"rpc":{"type":"STAGE_UNSTAGE_VOLUME"}},{"rpc":{"type":"SINGLE_NODE_MULTI_WRITER"}},{"rpc":{"type":"EXPAND_VOLUME"}}]}
I0308 14:28:46.991734  594156 utils.go:350] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 GRPC call: /csi.v1.Node/NodeUnstageVolume
I0308 14:28:46.991772  594156 utils.go:351] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5","volume_id":"0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10"}
I0308 14:28:46.994900  594156 cephcmds.go:165] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 command succeeded: findmnt [--mountpoint /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 --output SOURCE --noheadings --first-only -J]
I0308 14:28:46.994976  594156 nodeserver.go:825] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 parsed block volume device: /dev/nvme0n2 from devtmpfs[/nvme0n2]
I0308 14:28:46.994987  594156 nodeserver.go:801] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 found device /dev/nvme0n2 for staging path /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10
I0308 14:28:46.995043  594156 mount_linux.go:402] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10
I0308 14:28:46.997507  594156 nodeserver.go:340] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 successfully unmounted volume (0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10) from staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10)
I0308 14:28:46.997558  594156 nodeserver.go:354] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 successfully removed staging path (/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-81024dbc-dc61-438d-88d5-8852964c7fb5/0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10)
I0308 14:28:47.000378  594156 cephcmds.go:165] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 command succeeded: findmnt [-J --list --output SOURCE,TARGET]
I0308 14:28:47.000696  594156 nodeserver.go:887] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 Found 0 mounted NVMe devices total
I0308 14:28:47.006121  594156 cephcmds.go:165] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 command succeeded: nvme [list-subsys /dev/nvme0n2 -o json]
I0308 14:28:47.008386  594156 cephcmds.go:165] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 command succeeded: nvme [list-ns /dev/nvme0 -o json]
I0308 14:28:47.008427  594156 nvmeof_initiator.go:257] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 Controller nvme0 has no other mounted namespaces, disconnecting
I0308 14:28:47.137969  594156 cephcmds.go:165] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 command succeeded: nvme [disconnect -d /dev/nvme0]
I0308 14:28:47.137998  594156 nvmeof_initiator.go:265] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 Successfully disconnected controller nvme0
I0308 14:28:47.138029  594156 utils.go:357] ID: 259 Req-ID: 0001-0011-openshift-storage-0000000000000002-30985858-2b93-4f70-96e0-8a12052f0e10 GRPC response: {}

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@gadididi gadididi self-assigned this Mar 8, 2026
@gadididi gadididi added the component/nvme-of Issues and PRs related to NVMe-oF. label Mar 8, 2026
@gadididi gadididi requested review from Madhu-1, Copilot and nixpanic March 8, 2026 13:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds “disconnect controllers when safe” behavior to the NVMe-oF CSI node flow, so NVMe-oF controller connections are torn down during NodeUnstageVolume only when no other mounted namespaces are still using those controllers (multipath-safe cleanup).

Changes:

  • Add NVMe initiator disconnect logic (DisconnectIfLastMount) that enumerates controllers/namespaces and disconnects per-controller only when safe.
  • Add findmnt-based helpers in the node server to resolve a staging path to its NVMe device and to scan currently mounted NVMe devices, wiring this into NodeUnstageVolume and staging rollback.
  • Update NVMe-oF tests’ JSON structs to include controller Name fields from nvme list-subsys output.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
internal/nvmeof/nvmeof_initiator.go Adds controller/namespace enumeration and conditional disconnect logic for safe cleanup.
internal/nvmeof/nodeserver/nodeserver.go Adds findmnt-based device/mount discovery and triggers conditional disconnect on unstage/rollback.
internal/nvmeof/nvmeof_test.go Adjusts test structs to include the Name field in path JSON parsing.

@gadididi gadididi force-pushed the nvmeof/add_disconnect branch 3 times, most recently from 48d24a6 to 2c66f42 Compare March 11, 2026 11:35
@gadididi gadididi requested a review from nixpanic March 11, 2026 11:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

@gadididi gadididi force-pushed the nvmeof/add_disconnect branch 3 times, most recently from 02d7263 to 41c7e7a Compare March 12, 2026 14:42
@nixpanic nixpanic requested a review from a team March 12, 2026 17:02
Copy link
Copy Markdown
Contributor

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gadididi
This design seems to involved too many calls and steps. This may hamper performance.

Have you considered having a in memory RWMutex variable that is initialized once after pod restart to trace all mounted volumes on the node and then
is invoked in nodePublish to add a volume
and in nodeUnPublish to remove a volume ?

@gadididi
Copy link
Copy Markdown
Contributor Author

gadididi commented Mar 16, 2026

@Rakshith-R , you are right, I thought about it, I think in first phase I want let nodeserver works with disconnection operation. in the next phase, I will add this cache for reducing the time per pod creation\deletion.
I will also add lock mechanism for concurrency calls connect\disconnect (create\delete) pods..
the RWMutex will be good solution, but it needs more adjustments ( e.g. 2 deletions pods operation run parallelly, both read the mounts paths , see at least 2 actives paths , and they decided not run nvme disconnect , which is bad .. )
in the next following PR, all these new added functions will be run once (at nodeserver init).
Does that sound good to you?

@gadididi
Copy link
Copy Markdown
Contributor Author

@Mergifyio rebase

@mergify mergify bot added ci/in-progress/e2e This label acts like a guard and prevents Mergify from adding the `ok-to-test` label again. ok-to-test Label to trigger E2E tests labels Mar 18, 2026
@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.35

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.34

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.33

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.35

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.34

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.33

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e/k8s-1.35

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e/k8s-1.34

@ceph-csi-bot
Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e/k8s-1.33

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Mar 18, 2026
@mergify mergify bot added dequeued and removed queued labels Mar 18, 2026
@gadididi
Copy link
Copy Markdown
Contributor Author

@Mergifyio queue

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 18, 2026

Merge Queue Status

🛑 Queue command has been cancelled

@gadididi gadididi added queued and removed dequeued labels Mar 18, 2026
@mergify mergify bot added dequeued and removed queued labels Mar 18, 2026
@gadididi
Copy link
Copy Markdown
Contributor Author

@Mergifyio queue

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 18, 2026

queue

☑️ Command queue ignored because it is already running from a previous command.

@mergify mergify bot added the queued label Mar 18, 2026
@mergify mergify bot merged commit 9af58c4 into ceph:devel Mar 18, 2026
49 of 50 checks passed
@gadididi gadididi deleted the nvmeof/add_disconnect branch March 18, 2026 13:25
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 18, 2026

Merge Queue Status

  • Entered queue2026-03-18 13:25 UTC · Rule: default
  • Checks passed · in-place
  • Merged2026-03-18 13:25 UTC · at 87638ebf34c1834132bd509f8a96bf4905981d2d

This pull request spent 9 seconds in the queue, with no time running CI.

Required conditions to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/in-progress/e2e This label acts like a guard and prevents Mergify from adding the `ok-to-test` label again. component/nvme-of Issues and PRs related to NVMe-oF.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants