ComputeDelay called with maxDelay equal to remainingTime returns a '0' delay, leading to spinning

I'm trying to use the new "test/synctest" package to test my use of waiters in my own code using AWS SDKs.

My code.

Code under test:

<details>
  <summary>Click to expand</summary>
 
```go
package deleter

import (
	"context"
	"fmt"
	"time"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/service/ec2"
)

type ec2util struct {
	client ec2Client
}

func (e *ec2util) DeleteInstance(ctx context.Context, r string) error {
	_, err := e.client.TerminateInstances(ctx, &ec2.TerminateInstancesInput{
		InstanceIds: []string{r},
		DryRun:      aws.Bool(false),
	})
	if err != nil {
		return fmt.Errorf("terminating instance %s: %w", r, err)
	}

	// Wait for instance termination to complete
	waiter := ec2.NewInstanceTerminatedWaiter(e.client)
	err = waiter.Wait(ctx, &ec2.DescribeInstancesInput{
		InstanceIds: []string{r},
	}, time.Minute*10)
	if err != nil {
		return fmt.Errorf("waiting for instance termination: %w", err)
	}

	return nil
}

type ec2Client interface {
	DescribeInstances(ctx context.Context, params *ec2.DescribeInstancesInput, optFns ...func(*ec2.Options)) (*ec2.DescribeInstancesOutput, error)
	TerminateInstances(ctx context.Context, params *ec2.TerminateInstancesInput, optFns ...func(*ec2.Options)) (*ec2.TerminateInstancesOutput, error)
}

```
  
</details>

Test:

<details>
  <summary>Click to expand</summary>
 
```go
package deleter

import (
	"context"
	"testing"
	"testing/synctest"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/service/ec2"
	ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types"
)

// TestEC2Instance_Delete_WaiterTimeout verifies that DeleteInstance
// returns an error when the waiter times out using synctest.
func TestEC2Instance_Delete_WaiterTimeout(t *testing.T) {
	synctest.Test(t, func(t *testing.T) {
		fake := &fakeEC2Client{}

		provider := &ec2util{client: fake}
		ctx := t.Context()

		// DeleteResource should fail with timeout - the waiter will poll repeatedly
		// but the fake never returns terminated state, so it times out.
		err := provider.DeleteInstance(ctx, "i-test")
		if err == nil {
			t.Fatal("expected timeout error, got nil")
		}
	})
}

type fakeEC2Client struct {
}

func (f *fakeEC2Client) DescribeInstances(ctx context.Context, params *ec2.DescribeInstancesInput, optFns ...func(*ec2.Options)) (*ec2.DescribeInstancesOutput, error) {
	// pretend like any instance asked about is in shutting-down state
	var matchingInstances []ec2types.Instance

	for _, id := range params.InstanceIds {
		matchingInstances = append(matchingInstances, ec2types.Instance{
			InstanceId: aws.String(id),
			State: &ec2types.InstanceState{
				Name: ec2types.InstanceStateNameShuttingDown,
			},
		})
	}

	// Wrap instances in a reservation (EC2 API structure)
	var reservations []ec2types.Reservation
	reservations = append(reservations, ec2types.Reservation{
		Instances: matchingInstances,
	})

	return &ec2.DescribeInstancesOutput{
		Reservations: reservations,
	}, nil
}

func (f *fakeEC2Client) TerminateInstances(ctx context.Context, params *ec2.TerminateInstancesInput, optFns ...func(*ec2.Options)) (*ec2.TerminateInstancesOutput, error) {
	// pretend to shut down the instances
	var stateChanges []ec2types.InstanceStateChange

	for _, id := range params.InstanceIds {
		stateChanges = append(stateChanges, ec2types.InstanceStateChange{
			InstanceId: aws.String(id),
			PreviousState: &ec2types.InstanceState{
				Name: ec2types.InstanceStateNameRunning,
			},
			CurrentState: &ec2types.InstanceState{
				Name: ec2types.InstanceStateNameShuttingDown,
			},
		})
	}

	return &ec2.TerminateInstancesOutput{
		TerminatingInstances: stateChanges,
	}, nil
}

```
  
</details>

The test `TestEC2Instance_Delete_WaiterTimeout` should fail quickly, as each retry in the waiter advance our fake-clock a little bit at a time.

However this test actually ends up spinning forever.

Inside the waiter, on about the 12th attempt (and every subsequent attempt) in the waiter  ComputeDelay starts returning a delay of zero. Since the waiter stops sleeping, time stops advancing, the test sits an spins forever. (The delay within the waiter has a random element to it, so the exact attempt that starts getting stuck is not precise)

Specifically, a call to `waiter.ComputeDelay` with arguments:

- attempt: 12
- minDelay: 15s
- maxDelay: 120s
- remainingTime: 15s

Seems to always return a zero delay, when the delay should probably be "15s".

Think the problem is this bit of code:

```go
	// check if this is the last attempt possible and compute delay accordingly
	if remainingTime-delay <= minDelay {
		delay = remainingTime - minDelay
	}
```

I think this code is trying to make sure to always make room for one more attempt after the current attempt? But if the min-delay is significantly larger than the length of time it takes to perform a retry, once we get near the end of our allotted time-window we will dramatically increase the rate of attempts. I'm not 100% sure here.

In the synctest case, the length of time it takes to perform a retry is zero, so we never make progress once our remainingTime is less-than or equal-to the "minDelay".


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ComputeDelay called with maxDelay equal to remainingTime returns a '0' delay, leading to spinning #621

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ComputeDelay called with maxDelay equal to remainingTime returns a '0' delay, leading to spinning #621

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions