-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI node plugin fix for unstaging volumes #3178
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Beorn Facchini <[email protected]>
Similarly, this early return in swarmkit/agent/csi/plugin/plugin.go Lines 366 to 373 in ea1a7ce
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dperny you're more familiar with this code; PTAL 🤗
if v, ok := np.volumeMap[req.ID]; ok && v.isPublished { | ||
return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super-familiar with this code, but should this actually proceed if the volume was not found in np.volumeMap
?
i.e., the volume should be present if it was successfully staged. If a failure happened during staging, it wouldn't be added. Looking at NodeStageVolume
further up;
swarmkit/agent/csi/plugin/plugin.go
Lines 231 to 239 in e8ecf83
if err != nil { | |
return err | |
} | |
v := &volumePublishStatus{ | |
stagingPath: stagingTarget, | |
} | |
np.volumeMap[req.ID] = v |
Perhaps this should be something like;
if v, ok := np.volumeMap[req.ID]; !ok || v.isPublished {
// volume not found or is still published
return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID)
}
Or if it would be useful to have distinct errors for each situation
v, ok := np.volumeMap[req.ID
if !ok {
return status.Errorf(codes.FailedPrecondition, "Volume %s not found", req.ID)
}
if v.isPublished {
return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the assumptions around state for unstaging would always hold true if np.volumeMap
was made persistent:
swarmkit/agent/csi/plugin/plugin.go
Lines 60 to 63 in ea1a7ce
// volumeMap is the map from volume ID to Volume. Will place a volume once it is staged, | |
// remove it from the map for unstage. | |
// TODO: Make this map persistent if the swarm node goes down | |
volumeMap map[string]*volumePublishStatus |
I recall during testing it is possible that a volume had been staged before the node daemon restarted and np.volumeMap
was empty. In the method NodeUnpublishVolume
it always unpublishes the volume irrespective of np.volumeMap
.
- What I did
While running a CSI volume plugin that supports staging, I created a new swarm service that initiated an attempt to unpublish a cluster volume. The node agent calls
NodeUnpublishVolume
which returns no errors thenNodeUnstageVolume
which also returns no errors. However, there was no call to the underlying plugin driver forNodeUnstageVolume
.- How I did it
Fixed the unpublish check to only return in the failing condition that the volume was not unpublished.
- How to test it
Create a cluster volume using CSI volume plugin then trigger a publish and unpublish by creating a swarm service and restarting the service.
- Description for the changelog