nvmeof: Add GroupLock to coordinate stage and unstage operations and e2e tests by gadididi · Pull Request #6210 · ceph/ceph-csi

gadididi · 2026-03-30T11:32:18Z

Describe what this PR does

This PR adds a GroupLock to prevent race conditions between stage and unstage operations that could lead to premature NVMe controller disconnects.
Added group mutual lock into NodeStageVolume() and NodeUnStageVolume() , because these 2 operations cannot run together. but few calls of same type can run together.

The Problem

Without coordination, a stage operation can connect to NVMe controllers while an unstage operation is simultaneously checking if it's safe to disconnect them. This creates a race:

Stage operation connects to controllers
Unstage operation checks cache, sees no devices
Unstage operation disconnects controllers
Stage operation tries to mount, but device is already disconnected

Result: Stage fails with "device not found" errors.

The Solution

Add a GroupLock that allows:

Multiple stage operations to run concurrently (Group A)
Multiple unstage operations to run concurrently (Group B)
But prevents stage and unstage from running at the same time

Three Levels of Locking

(after the PR: nvmeof: Add mount cache and locking for safe nvme disconnect will be merged)

The code will have three levels of locks working together:

Level 1: volumeLocks (per-volume mutex)
this already exists in the code.

Scope: Single volume
Purpose: Prevents concurrent operations on the same volume
Example: Two NodeStageVolume calls for vol-123 cannot run at the same time

Level 2: stageUnstageLock (GroupLock)
the current PR introduces it.

Scope: All volumes, operation type
Purpose: Prevents stage and unstage from interfering with each other
Allows: Multiple stages together, multiple unstages together
Prevents: Stage and unstage at the same time

Level 3: mountCache.mu (cache mutex)
this PR nvmeof: Add mount cache and locking for safe nvme disconnect

Scope: Cache data structure
Purpose: Protects cache reads and writes
Duration: Nanoseconds (just for map operations)

Lock Acquisition Order

Both NodeStageVolume and NodeUnstageVolume follow the same order:

volumeLocks.Lock(volumeID) // Lock this specific volume
stageUnstageLock.AcquireGroup() // Lock operation type (A or B)
mountCache operations // Cache mutex acquired internally as needed
stageUnstageLock.ReleaseGroup() // Release operation type
volumeLocks.Release(volumeID) // Release volume lock

There are unit tests for group locking here: https://github.com/ceph/ceph-csi/blob/devel/internal/util/lock/group_lock_test.go

Also, e2e tests were added.

Checklist:

Commit Message Formatting: Commit titles and messages follow
guidelines in the developer
guide.
Reviewed the developer guide on Submitting a Pull
Request
Pending release
notes
updated with breaking and/or notable changes for the next major release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

/retest ci/centos/<job-name>: retest the <job-name> after unrelated
failure (please report the failure too!)

Copilot

Pull request overview

Adds a cross-volume “group mutual exclusion” lock to the NVMe-oF node server to prevent NodeStageVolume and NodeUnstageVolume from running concurrently, avoiding races that can lead to premature NVMe controller disconnects.

Changes:

Introduces a stageUnstageLock (lock.GroupLock) field on the NVMe-oF NodeServer.
Wraps NodeStageVolume() with Group A acquire/release and NodeUnstageVolume() with Group B acquire/release.
Adds the internal/util/lock import to support the new locking behavior.

internal/nvmeof/nodeserver/nodeserver.go

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

added group mutual lock into `NodeStageVolume()` and `NodeUnStageVolume()` , because these 2 operations cannot run together. but few calls of same type can run together. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

gadididi · 2026-03-31T09:49:32Z

/test ci/centos/mini-e2e/k8s-1.35/nvmeof

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

e2e/nvmeof_helper.go

Copilot · 2026-03-31T09:52:04Z

internal/nvmeof/nodeserver/nodeserver.go

+	// Acquire GroupLock - wrap the mounting + connection logic in a GroupLock
+	// to prevent staging and unstaging from happening at the same time,
+	//  as they can interfere with each other.
+	// This allows multiple staging operations to run concurrently,
+	// and multiple unstaging operations to run concurrently,
+	// but prevents staging and unstaging from running at the same time.
+	ns.stageUnstageLock.AcquireGroupA()
+	defer ns.stageUnstageLock.ReleaseGroupA()


lock.GroupLock is explicitly documented as not guaranteeing fairness (potential starvation). Using it to guard long-running CSI RPC handlers means a steady stream of NodeStage calls could indefinitely delay NodeUnstage (or vice-versa), potentially stalling pod teardown. Consider switching to a fair group mutual-exclusion implementation (e.g., track waiting counts and alternate preference when the active group drains) and/or making acquisition context-aware so cancellations/timeouts don’t leave requests stuck waiting forever.

that's why I am testing parallel Go routines with delete\create , to verify there is starvation .

it is not expected that this results in a problem. starvation will only happen when there are a huge number of volumes staged at the same time, while other volumes are unstaged. The staging that cause unstaging to be blocked (or the other way around), is extremely unlikely to cause problematic delays.

I run the "mixed test" with batches of 5 pods (deletion\creation) .
I could make it with huge number, but of course it will increase the time.. do you want me to do that, or leave it as is?

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

gadididi · 2026-03-31T10:55:51Z

/test ci/centos/mini-e2e/k8s-1.35/nvmeof

nixpanic

lgtm, thanks!

nixpanic · 2026-03-31T14:03:21Z

internal/nvmeof/nodeserver/nodeserver.go

+	// Acquire GroupLock - wrap the mounting + connection logic in a GroupLock
+	// to prevent staging and unstaging from happening at the same time,
+	//  as they can interfere with each other.
+	// This allows multiple staging operations to run concurrently,
+	// and multiple unstaging operations to run concurrently,
+	// but prevents staging and unstaging from running at the same time.
+	ns.stageUnstageLock.AcquireGroupA()
+	defer ns.stageUnstageLock.ReleaseGroupA()


it is not expected that this results in a problem. starvation will only happen when there are a huge number of volumes staged at the same time, while other volumes are unstaged. The staging that cause unstaging to be blocked (or the other way around), is extremely unlikely to cause problematic delays.

nixpanic · 2026-03-31T14:04:04Z

e2e/nvmeof_helper.go

@@ -0,0 +1,328 @@
+/*
+Copyright 2025 The Ceph-CSI Authors.


we're in 2026 already!

every time.. 🤭

test on tentacle ceph v20 Signed-off-by: gadi-didi <gadi.didi@ibm.com>

gadididi · 2026-04-01T09:45:23Z

/test ci/centos/mini-e2e/k8s-1.35/nvmeof

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

e2e/nvmeof_helper.go

gadididi · 2026-04-01T09:59:12Z

/test ci/centos/mini-e2e/k8s-1.35/nvmeof

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

e2e/nvmeof_helper.go

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

e2e/nvmeof_helper.go

gadididi · 2026-04-05T05:48:04Z

/test ci/centos/mini-e2e/k8s-1.35/nvmeof

Add e2e tests to validate nvmeof NodeServer GroupLock implementation under concurrent NodeStage (Group A) and NodeUnstage (Group B) operations. The tests ensure no deadlock occurs when multiple PVCs and Pods are created and deleted simultaneously. New helper file (nvmeof_helper.go) provides reusable functions for concurrent PVC/Pod operations with proper error tracking. Two test cases cover: 1) sequential concurrent batches (create all, then delete all) 2) mixed operations with pre-created batch to guarantee continuous Group A/B switching.. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

gadididi · 2026-04-05T06:24:17Z

/test ci/centos/mini-e2e/k8s-1.35/nvmeof

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-05T06:28:16Z

build.env

 ROOK_VERSION=v1.18.4
 # Provide ceph image path
-ROOK_CEPH_CLUSTER_IMAGE=quay.io/ceph/ceph:v19.2.2
+ROOK_CEPH_CLUSTER_IMAGE=quay.io/ceph/ceph:v20


ROOK_CEPH_CLUSTER_IMAGE is now set to quay.io/ceph/ceph:v20 (major-only tag). This makes CI/e2e less reproducible because the image contents can change over time as new v20.x releases are pushed. Consider pinning to a specific v20.x.y tag (or documenting why floating within the major is required).

Suggested change

ROOK_CEPH_CLUSTER_IMAGE=quay.io/ceph/ceph:v20

ROOK_CEPH_CLUSTER_IMAGE=quay.io/ceph/ceph:v20.2.0

it is just for running the Jenkins e2e test with nvmeof .. after reviewing , this change will be reverted

gadididi requested review from Madhu-1, Copilot and nixpanic March 30, 2026 11:32

gadididi self-assigned this Mar 30, 2026

gadididi added the component/nvme-of Issues and PRs related to NVMe-oF. label Mar 30, 2026

Copilot started reviewing on behalf of gadididi March 30, 2026 11:32 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

internal/nvmeof/nodeserver/nodeserver.go Outdated Show resolved Hide resolved

gadididi force-pushed the nvmeof/add_locking_nodeserver branch from 3b71e31 to fe2a852 Compare March 30, 2026 11:38

gadididi requested a review from Copilot March 30, 2026 11:38

Copilot started reviewing on behalf of gadididi March 30, 2026 11:39 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

nvmeof: Adding group mutual lock into nodeserver

96907dc

added group mutual lock into `NodeStageVolume()` and `NodeUnStageVolume()` , because these 2 operations cannot run together. but few calls of same type can run together. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

gadididi force-pushed the nvmeof/add_locking_nodeserver branch from fe2a852 to 96907dc Compare March 30, 2026 11:48

gadididi requested a review from Copilot March 31, 2026 09:45

Copilot started reviewing on behalf of gadididi March 31, 2026 09:46 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

gadididi force-pushed the nvmeof/add_locking_nodeserver branch from 2928da4 to c2c15b1 Compare March 31, 2026 10:27

gadididi requested a review from Copilot March 31, 2026 10:28

Copilot started reviewing on behalf of gadididi March 31, 2026 10:29 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

nixpanic reviewed Mar 31, 2026

View reviewed changes

nvmeof: change ceph verion for test

8dd6a8e

test on tentacle ceph v20 Signed-off-by: gadi-didi <gadi.didi@ibm.com>

gadididi force-pushed the nvmeof/add_locking_nodeserver branch from a214c1b to 6519c04 Compare April 1, 2026 09:44

gadididi requested a review from Copilot April 1, 2026 09:44

Copilot started reviewing on behalf of gadididi April 1, 2026 09:45 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

gadididi force-pushed the nvmeof/add_locking_nodeserver branch from 6519c04 to 9094db7 Compare April 1, 2026 09:57

gadididi requested a review from Copilot April 1, 2026 09:58

Copilot started reviewing on behalf of gadididi April 1, 2026 09:59 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

e2e/nvmeof_helper.go Outdated Show resolved Hide resolved

gadididi force-pushed the nvmeof/add_locking_nodeserver branch from 9094db7 to f2300bb Compare April 1, 2026 11:05

gadididi requested a review from Copilot April 1, 2026 11:06

Copilot started reviewing on behalf of gadididi April 1, 2026 11:06 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

e2e/nvmeof_helper.go Show resolved Hide resolved

gadididi force-pushed the nvmeof/add_locking_nodeserver branch from f2300bb to df0f0ec Compare April 5, 2026 06:23

gadididi requested a review from Copilot April 5, 2026 06:24

Copilot started reviewing on behalf of gadididi April 5, 2026 06:25 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

gadididi requested a review from nixpanic April 5, 2026 06:29

gadididi marked this pull request as ready for review April 5, 2026 06:29

gadididi requested a review from a team April 5, 2026 06:33

gadididi changed the title ~~nvmeof: Add GroupLock to coordinate stage and unstage operations~~ nvmeof: Add GroupLock to coordinate stage and unstage operations and e2e tests Apr 5, 2026

mergify bot added the component/testing Additional test cases or CI work label Apr 5, 2026

	ROOK_CEPH_CLUSTER_IMAGE=quay.io/ceph/ceph:v20
	ROOK_CEPH_CLUSTER_IMAGE=quay.io/ceph/ceph:v20.2.0

Conversation

gadididi commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe what this PR does

The Problem

The Solution

Three Levels of Locking

Lock Acquisition Order

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

gadididi commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gadididi Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

nixpanic Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gadididi Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

gadididi commented Mar 31, 2026

Uh oh!

nixpanic left a comment

Choose a reason for hiding this comment

Uh oh!

nixpanic Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

nixpanic Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gadididi Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gadididi commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gadididi commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

gadididi commented Apr 5, 2026

gadididi commented Mar 30, 2026 •

edited

Loading