Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI node plugin fix for unstaging volumes #3178

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

beornf
Copy link

@beornf beornf commented Jun 12, 2024

- What I did
While running a CSI volume plugin that supports staging, I created a new swarm service that initiated an attempt to unpublish a cluster volume. The node agent calls NodeUnpublishVolume which returns no errors then NodeUnstageVolume which also returns no errors. However, there was no call to the underlying plugin driver for NodeUnstageVolume.

- How I did it
Fixed the unpublish check to only return in the failing condition that the volume was not unpublished.

- How to test it
Create a cluster volume using CSI volume plugin then trigger a publish and unpublish by creating a swarm service and restarting the service.

- Description for the changelog

@beornf
Copy link
Author

beornf commented Jun 12, 2024

Similarly, this early return in NodeUnpublishVolume skips the log which would be helpful in debugging:

if v, ok := np.volumeMap[req.ID]; ok {
v.publishedPath = ""
v.isPublished = false
return nil
}
log.G(ctx).Info("volume unpublished")
return nil

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dperny you're more familiar with this code; PTAL 🤗

Comment on lines +266 to 269
if v, ok := np.volumeMap[req.ID]; ok && v.isPublished {
return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super-familiar with this code, but should this actually proceed if the volume was not found in np.volumeMap ?

i.e., the volume should be present if it was successfully staged. If a failure happened during staging, it wouldn't be added. Looking at NodeStageVolume further up;

if err != nil {
return err
}
v := &volumePublishStatus{
stagingPath: stagingTarget,
}
np.volumeMap[req.ID] = v

Perhaps this should be something like;

if v, ok := np.volumeMap[req.ID]; !ok || v.isPublished {
	// volume not found or is still published
	return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID)
}

Or if it would be useful to have distinct errors for each situation

v, ok := np.volumeMap[req.ID
if !ok {
	return status.Errorf(codes.FailedPrecondition, "Volume %s not found", req.ID)
}
if v.isPublished {
	return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID)
}

Copy link
Author

@beornf beornf Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the assumptions around state for unstaging would always hold true if np.volumeMap was made persistent:

// volumeMap is the map from volume ID to Volume. Will place a volume once it is staged,
// remove it from the map for unstage.
// TODO: Make this map persistent if the swarm node goes down
volumeMap map[string]*volumePublishStatus

I recall during testing it is possible that a volume had been staged before the node daemon restarted and np.volumeMap was empty. In the method NodeUnpublishVolume it always unpublishes the volume irrespective of np.volumeMap.

@olljanat
Copy link
Contributor

@beornf New test case for this would nice as CSI logic is still quite new which bugs like this exist. You can find examples from my PRs #3116 and #3123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants