I'm trying to use the new "test/synctest" package to test my use of waiters in my own code using AWS SDKs.
My code.
Code under test:
Click to expand
package deleter
import (
"context"
"fmt"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/service/ec2"
)
type ec2util struct {
client ec2Client
}
func (e *ec2util) DeleteInstance(ctx context.Context, r string) error {
_, err := e.client.TerminateInstances(ctx, &ec2.TerminateInstancesInput{
InstanceIds: []string{r},
DryRun: aws.Bool(false),
})
if err != nil {
return fmt.Errorf("terminating instance %s: %w", r, err)
}
// Wait for instance termination to complete
waiter := ec2.NewInstanceTerminatedWaiter(e.client)
err = waiter.Wait(ctx, &ec2.DescribeInstancesInput{
InstanceIds: []string{r},
}, time.Minute*10)
if err != nil {
return fmt.Errorf("waiting for instance termination: %w", err)
}
return nil
}
type ec2Client interface {
DescribeInstances(ctx context.Context, params *ec2.DescribeInstancesInput, optFns ...func(*ec2.Options)) (*ec2.DescribeInstancesOutput, error)
TerminateInstances(ctx context.Context, params *ec2.TerminateInstancesInput, optFns ...func(*ec2.Options)) (*ec2.TerminateInstancesOutput, error)
}
Test:
Click to expand
package deleter
import (
"context"
"testing"
"testing/synctest"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/service/ec2"
ec2types "github.com/aws/aws-sdk-go-v2/service/ec2/types"
)
// TestEC2Instance_Delete_WaiterTimeout verifies that DeleteInstance
// returns an error when the waiter times out using synctest.
func TestEC2Instance_Delete_WaiterTimeout(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
fake := &fakeEC2Client{}
provider := &ec2util{client: fake}
ctx := t.Context()
// DeleteResource should fail with timeout - the waiter will poll repeatedly
// but the fake never returns terminated state, so it times out.
err := provider.DeleteInstance(ctx, "i-test")
if err == nil {
t.Fatal("expected timeout error, got nil")
}
})
}
type fakeEC2Client struct {
}
func (f *fakeEC2Client) DescribeInstances(ctx context.Context, params *ec2.DescribeInstancesInput, optFns ...func(*ec2.Options)) (*ec2.DescribeInstancesOutput, error) {
// pretend like any instance asked about is in shutting-down state
var matchingInstances []ec2types.Instance
for _, id := range params.InstanceIds {
matchingInstances = append(matchingInstances, ec2types.Instance{
InstanceId: aws.String(id),
State: &ec2types.InstanceState{
Name: ec2types.InstanceStateNameShuttingDown,
},
})
}
// Wrap instances in a reservation (EC2 API structure)
var reservations []ec2types.Reservation
reservations = append(reservations, ec2types.Reservation{
Instances: matchingInstances,
})
return &ec2.DescribeInstancesOutput{
Reservations: reservations,
}, nil
}
func (f *fakeEC2Client) TerminateInstances(ctx context.Context, params *ec2.TerminateInstancesInput, optFns ...func(*ec2.Options)) (*ec2.TerminateInstancesOutput, error) {
// pretend to shut down the instances
var stateChanges []ec2types.InstanceStateChange
for _, id := range params.InstanceIds {
stateChanges = append(stateChanges, ec2types.InstanceStateChange{
InstanceId: aws.String(id),
PreviousState: &ec2types.InstanceState{
Name: ec2types.InstanceStateNameRunning,
},
CurrentState: &ec2types.InstanceState{
Name: ec2types.InstanceStateNameShuttingDown,
},
})
}
return &ec2.TerminateInstancesOutput{
TerminatingInstances: stateChanges,
}, nil
}
The test TestEC2Instance_Delete_WaiterTimeout should fail quickly, as each retry in the waiter advance our fake-clock a little bit at a time.
However this test actually ends up spinning forever.
Inside the waiter, on about the 12th attempt (and every subsequent attempt) in the waiter ComputeDelay starts returning a delay of zero. Since the waiter stops sleeping, time stops advancing, the test sits an spins forever. (The delay within the waiter has a random element to it, so the exact attempt that starts getting stuck is not precise)
Specifically, a call to waiter.ComputeDelay with arguments:
- attempt: 12
- minDelay: 15s
- maxDelay: 120s
- remainingTime: 15s
Seems to always return a zero delay, when the delay should probably be "15s".
Think the problem is this bit of code:
// check if this is the last attempt possible and compute delay accordingly
if remainingTime-delay <= minDelay {
delay = remainingTime - minDelay
}
I think this code is trying to make sure to always make room for one more attempt after the current attempt? But if the min-delay is significantly larger than the length of time it takes to perform a retry, once we get near the end of our allotted time-window we will dramatically increase the rate of attempts. I'm not 100% sure here.
In the synctest case, the length of time it takes to perform a retry is zero, so we never make progress once our remainingTime is less-than or equal-to the "minDelay".
I'm trying to use the new "test/synctest" package to test my use of waiters in my own code using AWS SDKs.
My code.
Code under test:
Click to expand
Test:
Click to expand
The test
TestEC2Instance_Delete_WaiterTimeoutshould fail quickly, as each retry in the waiter advance our fake-clock a little bit at a time.However this test actually ends up spinning forever.
Inside the waiter, on about the 12th attempt (and every subsequent attempt) in the waiter ComputeDelay starts returning a delay of zero. Since the waiter stops sleeping, time stops advancing, the test sits an spins forever. (The delay within the waiter has a random element to it, so the exact attempt that starts getting stuck is not precise)
Specifically, a call to
waiter.ComputeDelaywith arguments:Seems to always return a zero delay, when the delay should probably be "15s".
Think the problem is this bit of code:
I think this code is trying to make sure to always make room for one more attempt after the current attempt? But if the min-delay is significantly larger than the length of time it takes to perform a retry, once we get near the end of our allotted time-window we will dramatically increase the rate of attempts. I'm not 100% sure here.
In the synctest case, the length of time it takes to perform a retry is zero, so we never make progress once our remainingTime is less-than or equal-to the "minDelay".