Skip to content

nvmeof: add nvmeof provisioner ability#5452

Merged
mergify[bot] merged 4 commits intoceph:develfrom
gadididi:nvmeof/provisioner_init
Aug 18, 2025
Merged

nvmeof: add nvmeof provisioner ability#5452
mergify[bot] merged 4 commits intoceph:develfrom
gadididi:nvmeof/provisioner_init

Conversation

@gadididi
Copy link
Copy Markdown
Contributor

@gadididi gadididi commented Jul 27, 2025

Describe what this PR does

This PR introduces NVMe-oF CSI driver support for Ceph-CSI, enabling RBD volumes to be exposed through NVMe-oF protocol instead of traditional kernel RBD mapping.

Components Added:

  • NVMe-oF Controller Server: Complete wrapper around RBD CSI that orchestrates NVMe-oF resource creation and cleanup
  • NVMe-oF Gateway Client: Full gRPC client implementation for managing NVMe-oF subsystems, namespaces, listeners, and hosts
  • Clean Domain Architecture: Separate structs for CSI operations vs protobuf wire protocol with proper abstraction layers
  • Comprehensive Metadata Management: Stores all NVMe-oF metadata (subsystem NQN, namespace ID, host NQN, listener details, gateway addresses) in RBD image metadata for reliable cleanup
  • Integration Tests: Docker-based tests for validating NVMe-oF gateway operations with configurable environment

Complete Workflow:

CreateVolume:

  1. Validate request with comprehensive parameter validation
  2. Create RBD volume through backend CSI driver
  3. Setup NVMe-oF resources:
    • Create/ensure subsystem exists
    • Create namespace in subsystem
    • Create listeners on specified addresses
    • Add host to subsystem
  4. Store metadata in both VolumeContext (for NodeServer) and RBD metadata (for cleanup)

DeleteVolume:

  1. Retrieve stored metadata from RBD image
  2. Cleanup NVMe-oF resources in proper order:
    • Remove host from subsystem
    • Delete namespace
    • Delete listeners
    • Delete subsystem (if empty)
  3. Delete RBD volume

Is there anything that requires special attention

⚠️ Prototype Status - Controller Operations Only ⚠️

This is a working prototype implementing the controller-side operations. The purpose of this PR is to:

  • Demonstrate complete NVMe-oF CSI controller integration
  • Validate the clean architecture approach with proper separation of concerns
  • Get feedback on metadata management and cleanup strategies

What's Working:

  • Complete CreateVolume/DeleteVolume operations with full NVMe-oF setup
  • Production-ready NVMe-oF Gateway integration (subsystem, namespace, listener, host management)
  • Robust metadata storage/retrieval with dual storage (VolumeContext + RBD metadata)
  • Clean separation of management vs data plane (gateway addresses vs listener addresses)
  • Integration tests with configurable environment variables
  • Proper resource lifecycle management with idempotent operations

What's Missing/TODO:

  • NodeServer implementation (NodeStageVolume, NodePublishVolume, NodeUnpublishVolume)
  • NVMe-oF initiator operations (nvme connect/disconnect on nodes)
  • Multi-gateway support (HA)
  • Multi-listeners support
  • MTLS Grpc
  • ❌ **inband-authentication **
  • Advanced features (volume expansion?, snapshots?, cloning?)

Key Design Decisions Needing Review:

  • Architecture Pattern: Controller-only setup vs full CSI implementation
  • Metadata Strategy: Dual storage (VolumeContext + RBD metadata) for reliability

Configuration Model:

StorageClass parameters support flexible deployment scenarios:

parameters:
  subsystemNQN: "nqn.2016-06.io.ceph:subsystem.production"
  hostNQN: "nqn.2016-06.io.ceph:host.k8s-node1"
  nvmeofGatewayAddress: "10.242.64.100"    # Management network
  nvmeofGatewayPort: "5500"                # gRPC port
  listenerIpAddress: "10.242.64.32"       # Data network  
  listenerPort: "4420"                    # NVMe-oF port
  listenerHostname: "ceph-nvmeof-gw1"

Related issues\PR\discussions

  1. Add support for NVMe-oF volumes backed by RBD #5370
  2. https://github.com/ceph/ceph-csi/blob/devel/docs/design/proposals/nvme-of.md

Future concerns

List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@gadididi gadididi force-pushed the nvmeof/provisioner_init branch 2 times, most recently from 2e379c6 to 5e9e77e Compare July 27, 2025 10:00
@gadididi gadididi changed the title add nvmeof ability nvmeof: add nvmeof provisioner ability Jul 27, 2025
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch 2 times, most recently from e62c552 to 9b99304 Compare July 27, 2025 13:03
@gadididi gadididi marked this pull request as ready for review July 27, 2025 13:03
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch 2 times, most recently from eb9249a to 9eb0ef3 Compare July 28, 2025 06:55
@gadididi gadididi marked this pull request as draft July 28, 2025 09:02
@nixpanic nixpanic added the component/nvme-of Issues and PRs related to NVMe-oF. label Jul 29, 2025
Comment thread internal/nvmeof/controller/controllerserver.go
Comment thread internal/nvmeof/nvmeof.go Outdated
Comment thread internal/nvmeof/nvmeof.go
Comment thread internal/nvmeof/tests/Dockerfile.nvmeof-test
@nixpanic nixpanic requested a review from yati1998 July 29, 2025 07:44
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch from d238ce2 to eb0f341 Compare July 29, 2025 11:23
@gadididi gadididi marked this pull request as ready for review July 29, 2025 11:55
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch 2 times, most recently from adb83c8 to 4388ee8 Compare July 29, 2025 14:41
Comment thread internal/nvmeof/controller/controllerserver.go
Comment thread internal/nvmeof/nvmeof.go
Comment thread internal/nvmeof/nvmeof.go Outdated
Comment thread internal/nvmeof/nvmeof.go
Comment thread internal/nvmeof/controller/controllerserver.go Outdated
Comment thread internal/nvmeof/controller/controllerserver.go Outdated
Comment thread internal/nvmeof/controller/controllerserver.go Outdated
Comment thread internal/nvmeof/controller/controllerserver.go Outdated
Comment thread internal/nvmeof/nvmeof.go Outdated
Comment thread internal/nvmeof/tests/Dockerfile.nvmeof-test Outdated
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch from fdd1ae6 to 5008752 Compare July 31, 2025 06:49
Comment thread internal/nvmeof/controller/controllerserver.go Outdated
Comment thread internal/nvmeof/controller/controllerserver.go Outdated
Comment thread internal/nvmeof/nvmeof.go
Comment thread internal/nvmeof/nvmeof.go Outdated
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch from 99f391e to a1e944f Compare August 3, 2025 09:35
@gadididi gadididi requested a review from nixpanic August 3, 2025 09:38
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch from 677e8b9 to c7a825a Compare August 7, 2025 05:02
Comment thread internal/nvmeof/nvmeof.go Outdated
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch 3 times, most recently from 0c7f622 to 88bc54c Compare August 7, 2025 11:32
Copy link
Copy Markdown
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major changes needed from my point of view. There are quite a few TODO's listed, those are all improvements we can work out later.

For an initial proof-of-concept this looks good to me. With your additional work-in-progress branch(es) I have been able to get this up and running 🥳

$ kubectl describe pvc
Name:          nvmeof-test-pvc
Namespace:     default
StorageClass:  ocs-storagecluster-ceph-nvmeof
Status:        Bound
Volume:        pvc-3a0994d7-6457-4966-977e-dc756b871af7
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: nvmeof.csi.ceph.com
               volume.kubernetes.io/storage-provisioner: nvmeof.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type    Reason                 Age   From                                                                                                    Message
  ----    ------                 ----  ----                                                                                                    -------
  Normal  Provisioning           9s    nvmeof.csi.ceph.com_csi-nvmeofplugin-provisioner-745f6d9947-f56rd_3479ebb6-df20-477d-a7bb-737129b80352  External provisioner is provisioning volume for claim "default/nvmeof-test-pvc"
  Normal  ExternalProvisioning   9s    persistentvolume-controller                                                                             Waiting for a volume to be created either by the external provisioner 'nvmeof.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal  ProvisioningSucceeded  8s    nvmeof.csi.ceph.com_csi-nvmeofplugin-provisioner-745f6d9947-f56rd_3479ebb6-df20-477d-a7bb-737129b80352  Successfully provisioned volume pvc-3a0994d7-6457-4966-977e-dc756b871af7

Comment thread internal/nvmeof/nvmeof.go Outdated
@yati1998
Copy link
Copy Markdown
Contributor

yati1998 commented Aug 7, 2025

I went through the code, looks good to me from the demo we had. The missing parts already have TODO tag in it, so we are good to have them in upcoming PR.

@nixpanic nixpanic force-pushed the nvmeof/provisioner_init branch from 88bc54c to 1779e08 Compare August 8, 2025 07:45
Copy link
Copy Markdown
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic the Plan is to get this PR merged and later remove the pb files and integration test file when ceph PR is merged?

Comment thread internal/nvmeof/controller/controllerserver.go Outdated
}

// connectGatewayForDelete creates gateway connection using stored management address.
func connectGatewayForDelete(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, its good to have single common function for connect

Comment thread internal/nvmeof/nvmeof.go Outdated
Comment thread internal/nvmeof/nvmeof.go
if err != nil {
return 0, fmt.Errorf("failed to create namespace for %s/%s: %w", poolName, imageName, err)
}
if resp.GetStatus() != 0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plan to open a tracker to improve the documentation for consumers? or else it will become hard to understand the input and outputs

Comment thread internal/nvmeof/nvmeof.go
if err != nil {
return fmt.Errorf("failed to delete listener %s from subsystem %s: %w", listenerInfo.Address, subsystemNQN, err)
}
if resp.GetStatus() == 0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to return error early and avoid multiple if if we check resp.GetStatus() != 0 && resp.GetStatus() != int32(syscall.ENOENT) same is required in other Delete calls as we..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 , I can do it , but i want a specific check to handle "no exist" case.

what I can do is something like this:

	resp, err := gw.client.DeleteListener(ctx, req)
	if err != nil {
		return fmt.Errorf("failed to delete listener %s from subsystem %s: %w", listenerInfo.Address, subsystemNQN, err)
	}

	// Early return for error cases (anything other than success or not-found)
	if resp.GetStatus() != 0 && resp.GetStatus() != int32(syscall.ENOENT) {
		return fmt.Errorf("gateway DeleteListener returned error (status=%d): %s", resp.GetStatus(), resp.GetErrorMessage())
	}

	// Not-found cases
    if resp.GetStatus() == int32(syscall.ENOENT) {
		log.DebugLog(ctx, "Listener %s already deleted from subsystem %s (not found)", listenerInfo.Address, subsystemNQN)
         return nil
	}
   // Handle success 
   log.DebugLog(ctx, "Listener deleted successfully: %s from subsystem %s", listenerInfo.Address, subsystemNQN)
	return nil

but maybe it makes it more complicated, what do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume this is for logging, IMO we dont need to log if it is success or NotFound because its same for CSI where the deletion is successful. if really required we can log the status code at the end which helps for debugging. its not worth to add a if statement to log different error messages.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the distinction because logging "deleted successfully" when it was already gone feels inaccurate. The extra if helps with debugging to know if we actually deleted something vs. it was already missing.

@Madhu-1
Copy link
Copy Markdown
Collaborator

Madhu-1 commented Aug 11, 2025

@nixpanic the Plan is to get this PR merged and later remove the pb files and integration test file when ceph PR is merged?

Based on the response, will LGTM, no need to address the new review comments, if we are planning to merge it. it can be a followup PR

@gadididi gadididi force-pushed the nvmeof/provisioner_init branch 2 times, most recently from 4cff334 to 5ab3ff9 Compare August 13, 2025 06:38
@gadididi
Copy link
Copy Markdown
Contributor Author

Hi, @Madhu-1 , @nixpanic ,

I pushed 2 new commits about generated files.

Update: Using upstream protobuf files from ceph-nvmeof

I've updated the PR to use the generated protobuf files from upstream ceph-nvmeof instead of maintaining local copies in our repository.

What changed:

  • Added dependency on github.com/ceph/ceph-nvmeof/lib/go/nvmeof which now contains the generated gateway.pb.go and gateway_grpc.pb.go files
  • Removed our locally generated protobuf files
  • Vendored the upstream files into vendor/github.com/ceph/ceph-nvmeof/lib/go/nvmeof/
  • Updated imports to use the upstream package: import pb "github.com/ceph/ceph-nvmeof/lib/go/nvmeof"

Commands I ran:

go get github.com/ceph/ceph-nvmeof@devel
go mod tidy
go mod vendor

Note about Go version change:
The Go version in go.mod was automatically updated from 1.24.0 to 1.24.4 because the upstream ceph-nvmeof module requires Go 1.24.4

@gadididi gadididi requested review from Madhu-1 and nixpanic August 13, 2025 06:42
Madhu-1
Madhu-1 previously approved these changes Aug 13, 2025
Copy link
Copy Markdown
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Lets get it merged to devel once we have 3.15.0 release

@Madhu-1 Madhu-1 added the DNM DO NOT MERGE label Aug 13, 2025
@Madhu-1
Copy link
Copy Markdown
Collaborator

Madhu-1 commented Aug 13, 2025

Added DNM to avoid accidental merge, please remove it once we have new release

@Madhu-1
Copy link
Copy Markdown
Collaborator

Madhu-1 commented Aug 13, 2025

Thank you @gadididi for all the work 🎉

@Madhu-1
Copy link
Copy Markdown
Collaborator

Madhu-1 commented Aug 13, 2025

i would suggest to drop 24951fb commit as its removing the proto file from internal folder and remove pb file from 1c430a5 this commit

@gadididi
Copy link
Copy Markdown
Contributor Author

gadididi commented Aug 13, 2025

thank you! 🙂
yes, Niels told me wait until will be new release
I will do rebase interactive and rearrange the commits (squash 24951fb and 1c430a5)

create prototype of nvmeof csi-driver.
create\delete volume were implemented.

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
nvmeof tests were added. create\delete subsystem, listener and host.

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
added "nvmeof" tag option for commits\PR due to new csi driver (nvmeof-csi)

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
to vendor folder by run "go mod vendor"

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
@gadididi gadididi force-pushed the nvmeof/provisioner_init branch from 5ab3ff9 to cb8a4d3 Compare August 13, 2025 07:02
@mergify mergify bot dismissed Madhu-1’s stale review August 13, 2025 07:02

Pull request has been modified.

@nixpanic nixpanic requested a review from Madhu-1 August 13, 2025 07:46
@nixpanic nixpanic removed the DNM DO NOT MERGE label Aug 18, 2025
mergify bot added a commit that referenced this pull request Aug 18, 2025
@nixpanic
Copy link
Copy Markdown
Member

@Mergifyio rebase

release-v3.15 has been branched, this can now get merged in the devel branch

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 18, 2025

rebase

☑️ Nothing to do, the required conditions are not met

Details
  • queue-position = -1 [📌 rebase requirement]
  • -closed [📌 rebase requirement]
  • -conflict [📌 rebase requirement]
  • any of:
    • #commits > 1 [📌 rebase requirement]
    • #commits-behind > 0 [📌 rebase requirement]
    • -linear-history [📌 rebase requirement]

@mergify mergify bot merged commit 147ba56 into ceph:devel Aug 18, 2025
15 checks passed
@gadididi gadididi deleted the nvmeof/provisioner_init branch September 28, 2025 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/skip/e2e skip running e2e CI jobs component/nvme-of Issues and PRs related to NVMe-oF.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants