nvmeof: add nvmeof provisioner ability#5452
Conversation
2e379c6 to
5e9e77e
Compare
e62c552 to
9b99304
Compare
eb9249a to
9eb0ef3
Compare
d238ce2 to
eb0f341
Compare
adb83c8 to
4388ee8
Compare
fdd1ae6 to
5008752
Compare
99f391e to
a1e944f
Compare
677e8b9 to
c7a825a
Compare
0c7f622 to
88bc54c
Compare
nixpanic
left a comment
There was a problem hiding this comment.
No major changes needed from my point of view. There are quite a few TODO's listed, those are all improvements we can work out later.
For an initial proof-of-concept this looks good to me. With your additional work-in-progress branch(es) I have been able to get this up and running 🥳
$ kubectl describe pvc
Name: nvmeof-test-pvc
Namespace: default
StorageClass: ocs-storagecluster-ceph-nvmeof
Status: Bound
Volume: pvc-3a0994d7-6457-4966-977e-dc756b871af7
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: nvmeof.csi.ceph.com
volume.kubernetes.io/storage-provisioner: nvmeof.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 9s nvmeof.csi.ceph.com_csi-nvmeofplugin-provisioner-745f6d9947-f56rd_3479ebb6-df20-477d-a7bb-737129b80352 External provisioner is provisioning volume for claim "default/nvmeof-test-pvc"
Normal ExternalProvisioning 9s persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'nvmeof.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
Normal ProvisioningSucceeded 8s nvmeof.csi.ceph.com_csi-nvmeofplugin-provisioner-745f6d9947-f56rd_3479ebb6-df20-477d-a7bb-737129b80352 Successfully provisioned volume pvc-3a0994d7-6457-4966-977e-dc756b871af7
|
I went through the code, looks good to me from the demo we had. The missing parts already have TODO tag in it, so we are good to have them in upcoming PR. |
88bc54c to
1779e08
Compare
| } | ||
|
|
||
| // connectGatewayForDelete creates gateway connection using stored management address. | ||
| func connectGatewayForDelete( |
There was a problem hiding this comment.
Sounds good, its good to have single common function for connect
| if err != nil { | ||
| return 0, fmt.Errorf("failed to create namespace for %s/%s: %w", poolName, imageName, err) | ||
| } | ||
| if resp.GetStatus() != 0 { |
There was a problem hiding this comment.
Any plan to open a tracker to improve the documentation for consumers? or else it will become hard to understand the input and outputs
| if err != nil { | ||
| return fmt.Errorf("failed to delete listener %s from subsystem %s: %w", listenerInfo.Address, subsystemNQN, err) | ||
| } | ||
| if resp.GetStatus() == 0 { |
There was a problem hiding this comment.
it would be nice to return error early and avoid multiple if if we check resp.GetStatus() != 0 && resp.GetStatus() != int32(syscall.ENOENT) same is required in other Delete calls as we..
There was a problem hiding this comment.
@Madhu-1 , I can do it , but i want a specific check to handle "no exist" case.
what I can do is something like this:
resp, err := gw.client.DeleteListener(ctx, req)
if err != nil {
return fmt.Errorf("failed to delete listener %s from subsystem %s: %w", listenerInfo.Address, subsystemNQN, err)
}
// Early return for error cases (anything other than success or not-found)
if resp.GetStatus() != 0 && resp.GetStatus() != int32(syscall.ENOENT) {
return fmt.Errorf("gateway DeleteListener returned error (status=%d): %s", resp.GetStatus(), resp.GetErrorMessage())
}
// Not-found cases
if resp.GetStatus() == int32(syscall.ENOENT) {
log.DebugLog(ctx, "Listener %s already deleted from subsystem %s (not found)", listenerInfo.Address, subsystemNQN)
return nil
}
// Handle success
log.DebugLog(ctx, "Listener deleted successfully: %s from subsystem %s", listenerInfo.Address, subsystemNQN)
return nilbut maybe it makes it more complicated, what do you think?
There was a problem hiding this comment.
i assume this is for logging, IMO we dont need to log if it is success or NotFound because its same for CSI where the deletion is successful. if really required we can log the status code at the end which helps for debugging. its not worth to add a if statement to log different error messages.
There was a problem hiding this comment.
I'd like to keep the distinction because logging "deleted successfully" when it was already gone feels inaccurate. The extra if helps with debugging to know if we actually deleted something vs. it was already missing.
Based on the response, will LGTM, no need to address the new review comments, if we are planning to merge it. it can be a followup PR |
4cff334 to
5ab3ff9
Compare
|
I pushed 2 new commits about generated files. Update: Using upstream protobuf files from ceph-nvmeofI've updated the PR to use the generated protobuf files from upstream What changed:
Commands I ran:go get github.com/ceph/ceph-nvmeof@devel
go mod tidy
go mod vendorNote about Go version change: |
Madhu-1
left a comment
There was a problem hiding this comment.
LGTM, Lets get it merged to devel once we have 3.15.0 release
|
Added DNM to avoid accidental merge, please remove it once we have new release |
|
Thank you @gadididi for all the work 🎉 |
create prototype of nvmeof csi-driver. create\delete volume were implemented. Signed-off-by: gadi-didi <gadi.didi@ibm.com>
nvmeof tests were added. create\delete subsystem, listener and host. Signed-off-by: gadi-didi <gadi.didi@ibm.com>
added "nvmeof" tag option for commits\PR due to new csi driver (nvmeof-csi) Signed-off-by: gadi-didi <gadi.didi@ibm.com>
to vendor folder by run "go mod vendor" Signed-off-by: gadi-didi <gadi.didi@ibm.com>
5ab3ff9 to
cb8a4d3
Compare
|
@Mergifyio rebase
|
☑️ Nothing to do, the required conditions are not metDetails
|
Describe what this PR does
This PR introduces NVMe-oF CSI driver support for Ceph-CSI, enabling RBD volumes to be exposed through NVMe-oF protocol instead of traditional kernel RBD mapping.
Components Added:
Complete Workflow:
CreateVolume:
DeleteVolume:
Is there anything that requires special attention
This is a working prototype implementing the controller-side operations. The purpose of this PR is to:
What's Working:
What's Missing/TODO:
Key Design Decisions Needing Review:
Configuration Model:
StorageClass parameters support flexible deployment scenarios:
Related issues\PR\discussions
Future concerns
List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.
Checklist:
guidelines in the developer
guide.
Request
notes
updated with breaking and/or notable changes for the next major release.
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>: retest the<job-name>after unrelatedfailure (please report the failure too!)