Skip to content

Fix silent volume resize failures and add unit tests for ControllerExpandVolume method #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion pkg/driver/controller_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,15 @@ func (d *Driver) ControllerExpandVolume(ctx context.Context, req *csi.Controller
}

log.Info().Int64("size_gb", desiredSize).Str("volume_id", volID).Msg("Volume resize request sent")
d.CivoClient.ResizeVolume(volID, int(desiredSize))
_, err = d.CivoClient.ResizeVolume(volID, int(desiredSize))
// Handles unexpected errors (e.g., API retry error or other upstream errors).
if err != nil {
log.Error().
Err(err).
Str("VolumeID", volID).
Msg("Failed to resize volume in Civo API")
return nil, status.Errorf(codes.Internal, "cannot resize volume %s: %s", volID, err.Error())
}

// Resizes can take a while, double the number of normal retries
available, err := d.waitForVolumeStatus(volume, "available", CivoVolumeAvailableRetries*2)
Expand Down
149 changes: 149 additions & 0 deletions pkg/driver/controller_server_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ import (
"github.com/civo/civogo"
"github.com/container-storage-interface/spec/lib/go/csi"
"github.com/stretchr/testify/assert"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)

func TestCreateVolume(t *testing.T) {
Expand Down Expand Up @@ -290,3 +292,150 @@ func TestGetCapacity(t *testing.T) {
})

}

func TestControllerExpandVolume(t *testing.T) {

tests := []struct {
name string
volumeID string
capacityRange *csi.CapacityRange
initialVolume *civogo.Volume
expectedError error
expectedSizeGB int64
}{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are no test cases for the error handling that was added this time (around _, err := d.CivoClient.ResizeVolume). Would it be difficult to add them? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hlts2 cc @rytswd, if we see the ResizeVolume implementation, it errors out only when the volume is missing:

// ResizeVolume implemented in a fake way for automated tests
func (c *FakeClient) ResizeVolume(id string, size int) (*SimpleResponse, error) {
	for i, volume := range c.Volumes {
		if volume.ID == id {
			c.Volumes[i].SizeGigabytes = size
			return &SimpleResponse{Result: "success"}, nil
		}
	}

	err := fmt.Errorf("unable to find volume %s, zero matches", id)
	return nil, ZeroMatchesError.wrap(err)
}

And for such cases it would be caught even before ResizeVolume method is called:

https://github.com/civo/civo-csi/blob/master/pkg/driver/controller_server.go#L414-L418

// Get the volume from the Civo API
	volume, err := d.CivoClient.GetVolume(volID)
	if err != nil {
		return nil, status.Errorf(codes.Internal, "ControllerExpandVolume could not retrieve existing volume: %v", err)
}
	

And this particular case has been catered in the following test case:

{
			name:     "Failed to find the volume",
			volumeID: "vol-123",
			capacityRange: &csi.CapacityRange{
				RequiredBytes: 20 * driver.BytesInGigabyte,
			},
			initialVolume: &civogo.Volume{
				ID:            "vol-1234",
				SizeGigabytes: 10,
				Status:        "available",
			},
			expectedError:  status.Errorf(codes.Internal, "ControllerExpandVolume could not retrieve existing volume: ZeroMatchesError: unable to get volume vol-123"),
			expectedSizeGB: 0,
},

Kindly let me know if I am missing or overlooking something.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my take is 3 fold:

  • We should consider race condition and concurrent requests, which could cause these rather error case to appear
  • If the code is absolutely unreachable, we may not need that code path at all
  • If the code path is only to prevent some breaking changes from some external dependency, and we know that the code will not be hit ever by the runtime dependencies, we should be able to make it clear with panic

I suppose this one is the first one -- but as I haven't dug into the code, I could be wrong. In either case, if there is any code paths that we don't have test cases, that's a question mark for me. We should at least put clear comment on "why" they are there (and not "what" or "how" they do).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rytswd, thank you!

The code path under question handles situation when ResizeVolume fails due to reasons not covered by our pre-checks (e.g. retry errors from the API, or other errors coming from further upstream) and hence necessary. However, our fake client does not simulate this behaviour.

I agree that the purpose should be clearly documented. I will add a why comment above it stating, it handles unexpected failures from ResizeVolume (e.g. retry errors or other upstream issues) that are not caught by the pre-checks.

As for race condition, the actual ResizeVolume implementation in API uses retry.RetryOnConflict, which handles race conditions and concurrent updates.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks, I understand the situation better now. The fake client should certainly match the actual API behaviours -- and for this particular situation, we can probably keep the comment on. But let's create a separate ticket to ensure that the fake client gets the update it needs for having more test coverage.

{
name: "Successfully expand volume",
volumeID: "vol-123",
capacityRange: &csi.CapacityRange{
RequiredBytes: 20 * driver.BytesInGigabyte,
},
initialVolume: &civogo.Volume{
ID: "vol-123",
SizeGigabytes: 10,
Status: "available",
},
expectedError: nil,
expectedSizeGB: 20,
},
{
name: "Desired size not an exact multiple of BytesInGigabyte",
volumeID: "vol-123",
capacityRange: &csi.CapacityRange{
RequiredBytes: 20*driver.BytesInGigabyte + 1, // 20 GB + 1 byte
},
initialVolume: &civogo.Volume{
ID: "vol-123",
SizeGigabytes: 10,
Status: "available",
},
expectedError: nil,
expectedSizeGB: 21, // Desired size should be rounded up to 21 GB
},
{
name: "Volume ID is missing",
volumeID: "",
capacityRange: &csi.CapacityRange{
RequiredBytes: 20 * driver.BytesInGigabyte,
},
initialVolume: nil,
expectedError: status.Error(codes.InvalidArgument, "must provide a VolumeId to ControllerExpandVolume"),
expectedSizeGB: 0,
},
{
name: "Capacity range is missing",
volumeID: "vol-123",
capacityRange: nil,
initialVolume: &civogo.Volume{
ID: "vol-123",
SizeGigabytes: 10,
Status: "available",
},
expectedError: status.Error(codes.InvalidArgument, "must provide a capacity range to ControllerExpandVolume"),
expectedSizeGB: 0,
},
{
name: "Volume is already resizing",
volumeID: "vol-123",
capacityRange: &csi.CapacityRange{
RequiredBytes: 20 * driver.BytesInGigabyte,
},
initialVolume: &civogo.Volume{
ID: "vol-123",
SizeGigabytes: 10,
Status: "resizing",
},
expectedError: status.Error(codes.Aborted, "volume is already being resized"),
expectedSizeGB: 0,
},
{
name: "Volume is not available for expansion",
volumeID: "vol-123",
capacityRange: &csi.CapacityRange{
RequiredBytes: 20 * driver.BytesInGigabyte,
},
initialVolume: &civogo.Volume{
ID: "vol-123",
SizeGigabytes: 10,
Status: "attached",
},
expectedError: status.Error(codes.FailedPrecondition, "volume is not in an availble state for OFFLINE expansion"),
expectedSizeGB: 0,
},
{
name: "Desired size is smaller than current size",
volumeID: "vol-123",
capacityRange: &csi.CapacityRange{
RequiredBytes: 5 * driver.BytesInGigabyte,
},
initialVolume: &civogo.Volume{
ID: "vol-123",
SizeGigabytes: 10,
Status: "available",
},
expectedError: nil,
expectedSizeGB: 10,
},
{
name: "Failed to find the volume",
volumeID: "vol-123",
capacityRange: &csi.CapacityRange{
RequiredBytes: 20 * driver.BytesInGigabyte,
},
initialVolume: &civogo.Volume{
ID: "vol-1234",
SizeGigabytes: 10,
Status: "available",
},
expectedError: status.Errorf(codes.Internal, "ControllerExpandVolume could not retrieve existing volume: ZeroMatchesError: unable to get volume vol-123"),
expectedSizeGB: 0,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {

fc, _ := civogo.NewFakeClient()
d, _ := driver.NewTestDriver(fc)

// Populate the fake client with the initial volume
if tt.initialVolume != nil {
fc.Volumes = []civogo.Volume{*tt.initialVolume}
}

// Call the method under test
resp, err := d.ControllerExpandVolume(context.Background(), &csi.ControllerExpandVolumeRequest{
VolumeId: tt.volumeID,
CapacityRange: tt.capacityRange,
})

// Assert the expected error
if tt.expectedError != nil {
assert.Equal(t, tt.expectedError, err)
} else {
assert.Nil(t, err)
assert.Equal(t, tt.expectedSizeGB*driver.BytesInGigabyte, resp.CapacityBytes)
assert.True(t, resp.NodeExpansionRequired)
}
})
}
}
Loading