-
Notifications
You must be signed in to change notification settings - Fork 105
make delete volume idempotent when node is removed from cluster #413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
1671fcc to
9b11bec
Compare
Abhinandan-Purkait
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tiagolobocastro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about existing volumes
|
|
||
| _, err := volbuilder.NewKubeclient().WithNamespace(LvmNamespace).Update(vol) | ||
| if err != nil { | ||
| klog.Infof("Finalizer removed successfully for %s", vol.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this as info or is debug fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will not reach this code if node is present. Would be better to have it in log by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we already have the message from above where we call it if the node is not present?
Oh also this check seems reversed, if err != nil then we didn't remove the finalizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, err != nil means we fail to remove finalizer, we return err. If not, we move on to check if lvmvolume CR is deleted.
| "failed to handle delete volume request for {%s}", volumeID) | ||
| } | ||
| } | ||
| present := isNodePresent(vol.Spec.OwnerNodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the finalizer is already removed? Then we don't need to run this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalizer is removed from the node-agent when lvremove succeeds. Here we will validate if node is not present. If not, finalizer will not be removed. so will remove finalizer from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the lvm volume has additional finalisers (eg: added by user) but we've already removed ours, on a previous csi call, then we don't need to do this again
We only need to do this stage if our finaliser is present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now this removes all Finalizers. I am using the function which is called by node-agent once lvremove is done.
https://github.com/openebs/lvm-localpv/blob/develop/pkg/lvm/volume.go#L217C1-L223C2
lvmvolume CR is managed by localpv-lvm. Users may not add any finalizers onto it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this documented somewhere that users may not add finalizers?
It seems we just take the hammer there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We remove finaliser here only if node is not present. In the normal flow the finaliser will be removed using lvm.RemoveVolFinalizer(vol); by node-agent after doing lvremove
In the previous step we just mark the deletion timestamp by executing delete on the CR.
|
|
||
| _, err := volbuilder.NewKubeclient().WithNamespace(LvmNamespace).Update(vol) | ||
| if err != nil { | ||
| klog.Infof("Finalizer removed successfully for %s", vol.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we already have the message from above where we call it if the node is not present?
Oh also this check seems reversed, if err != nil then we didn't remove the finalizer?
| "failed to handle delete volume request for {%s}", volumeID) | ||
| } | ||
| } | ||
| present := isNodePresent(vol.Spec.OwnerNodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the lvm volume has additional finalisers (eg: added by user) but we've already removed ours, on a previous csi call, then we don't need to do this again
We only need to do this stage if our finaliser is present
pkg/driver/controller.go
Outdated
| if !present { | ||
| klog.Infof("Removing finalizer as node %s is not present in cluster", vol.Spec.OwnerNodeID) | ||
| if err = lvm.RemoveVolFinalizer(vol); err != nil { | ||
| return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we return err here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fail the DeleteVolume call? I guess that will be better.
da04e2d to
d766fd7
Compare
…e volume is not present Signed-off-by: Abhilash Shetty <[email protected]>
d766fd7 to
43a5f2c
Compare
| "failed to handle delete volume request for {%s}", volumeID) | ||
| } | ||
| } | ||
| present := isNodePresent(vol.Spec.OwnerNodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this documented somewhere that users may not add finalizers?
It seems we just take the hammer there
| } | ||
| time.Sleep(time.Duration(100 * time.Millisecond)) | ||
| } | ||
| return status.Errorf(codes.FailedPrecondition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this error code right? Shouldn't this be a timeout error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try the Timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think TimeOut is generated by gRPC framework. Its not available in the library.
var strToCode = map[string]Code{
"OK": OK,
"CANCELLED":/* [sic] */ Canceled,
"UNKNOWN": Unknown,
"INVALID_ARGUMENT": InvalidArgument,
"DEADLINE_EXCEEDED": DeadlineExceeded,
"NOT_FOUND": NotFound,
"ALREADY_EXISTS": AlreadyExists,
"PERMISSION_DENIED": PermissionDenied,
"RESOURCE_EXHAUSTED": ResourceExhausted,
"FAILED_PRECONDITION": FailedPrecondition,
"ABORTED": Aborted,
"OUT_OF_RANGE": OutOfRange,
"UNIMPLEMENTED": Unimplemented,
"INTERNAL": Internal,
"UNAVAILABLE": Unavailable,
"DATA_LOSS": DataLoss,
"UNAUTHENTICATED": Unauthenticated,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEADLINE_EXCEEDED
When a Kubernetes node is removed from the cluster while it still has existing PVCs/LVMVolumes, those volumes become orphaned. If a PVC is deleted in this state, the LocalPV-LVM CSI controller marks the corresponding LVMVolume CR for deletion by adding a deletion timestamp.
Normally, the node-agent running on the worker node is responsible for cleaning up the logical volume (LV) and removing the finalizer from the CR. However, in this scenario, the node no longer exists, so there is no agent running to perform the cleanup.
This fix leverages the condition of the Kubernetes Node being absent: in such cases, the CSI controller itself removes the finalizer and cleans up the LVMVolume CR.
Fixes: #407