Skip to content

Fix silent volume resize failures and add unit tests for ControllerExpandVolume method #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 11, 2025

Conversation

Praveen005
Copy link
Contributor

@Praveen005 Praveen005 commented Feb 3, 2025

This PR aims to

  1. fix Fix silent volume resize failure #44

  2. Now error returned from ResizeVolume method is being logged and returned to the caller.

  3. I have also added unit tests for ControllerExpandVolume method. The test coverage is nearly 100% except for the obvious parts that can't be reached. Details below:

Screenshot 2025-02-06 at 2 34 15 AM

@Praveen005 Praveen005 self-assigned this Feb 4, 2025
@Praveen005 Praveen005 changed the title [WIP] Fix: Improve error handling and add unit tests for ResizeVolume method Fix: Improve error handling for ResizeVolume and add unit tests for ControllerExpandVolume method Feb 5, 2025
@Praveen005 Praveen005 marked this pull request as ready for review February 5, 2025 20:50
@Praveen005 Praveen005 changed the title Fix: Improve error handling for ResizeVolume and add unit tests for ControllerExpandVolume method Fix silent volume resize failures and add unit tests for ControllerExpandVolume method Feb 5, 2025
initialVolume *civogo.Volume
expectedError error
expectedSizeGB int64
}{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are no test cases for the error handling that was added this time (around _, err := d.CivoClient.ResizeVolume). Would it be difficult to add them? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hlts2 cc @rytswd, if we see the ResizeVolume implementation, it errors out only when the volume is missing:

// ResizeVolume implemented in a fake way for automated tests
func (c *FakeClient) ResizeVolume(id string, size int) (*SimpleResponse, error) {
	for i, volume := range c.Volumes {
		if volume.ID == id {
			c.Volumes[i].SizeGigabytes = size
			return &SimpleResponse{Result: "success"}, nil
		}
	}

	err := fmt.Errorf("unable to find volume %s, zero matches", id)
	return nil, ZeroMatchesError.wrap(err)
}

And for such cases it would be caught even before ResizeVolume method is called:

https://github.com/civo/civo-csi/blob/master/pkg/driver/controller_server.go#L414-L418

// Get the volume from the Civo API
	volume, err := d.CivoClient.GetVolume(volID)
	if err != nil {
		return nil, status.Errorf(codes.Internal, "ControllerExpandVolume could not retrieve existing volume: %v", err)
}
	

And this particular case has been catered in the following test case:

{
			name:     "Failed to find the volume",
			volumeID: "vol-123",
			capacityRange: &csi.CapacityRange{
				RequiredBytes: 20 * driver.BytesInGigabyte,
			},
			initialVolume: &civogo.Volume{
				ID:            "vol-1234",
				SizeGigabytes: 10,
				Status:        "available",
			},
			expectedError:  status.Errorf(codes.Internal, "ControllerExpandVolume could not retrieve existing volume: ZeroMatchesError: unable to get volume vol-123"),
			expectedSizeGB: 0,
},

Kindly let me know if I am missing or overlooking something.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my take is 3 fold:

  • We should consider race condition and concurrent requests, which could cause these rather error case to appear
  • If the code is absolutely unreachable, we may not need that code path at all
  • If the code path is only to prevent some breaking changes from some external dependency, and we know that the code will not be hit ever by the runtime dependencies, we should be able to make it clear with panic

I suppose this one is the first one -- but as I haven't dug into the code, I could be wrong. In either case, if there is any code paths that we don't have test cases, that's a question mark for me. We should at least put clear comment on "why" they are there (and not "what" or "how" they do).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rytswd, thank you!

The code path under question handles situation when ResizeVolume fails due to reasons not covered by our pre-checks (e.g. retry errors from the API, or other errors coming from further upstream) and hence necessary. However, our fake client does not simulate this behaviour.

I agree that the purpose should be clearly documented. I will add a why comment above it stating, it handles unexpected failures from ResizeVolume (e.g. retry errors or other upstream issues) that are not caught by the pre-checks.

As for race condition, the actual ResizeVolume implementation in API uses retry.RetryOnConflict, which handles race conditions and concurrent updates.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks, I understand the situation better now. The fake client should certainly match the actual API behaviours -- and for this particular situation, we can probably keep the comment on. But let's create a separate ticket to ensure that the fake client gets the update it needs for having more test coverage.

@Praveen005 Praveen005 requested a review from hlts2 February 6, 2025 10:04
Copy link

@rytswd rytswd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'm giving my approval, but please make sure that the follow-up ticket is created before closing this one 👍

@Praveen005
Copy link
Contributor Author

Thank you @rytswd! I will create a ticket for the same in civogo.

@Praveen005 Praveen005 merged commit e220d0f into master Feb 11, 2025
6 checks passed
@Praveen005 Praveen005 deleted the VolResizeErrorHandling branch February 11, 2025 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix silent volume resize failure
3 participants