Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding attempt count to error message #5834

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

RanVaknin
Copy link

Motivation and Context

Improve debugging visibility by making attempt count immediately visible in exception messages. Currently, attempt information is only available through suppressed exceptions, requiring additional code to access. This change makes attempts visible directly in the exception message, matching behavior in other AWS SDKs.

Modifications

  • Added attempt count tracking and message formatting in SdkException
  • Added SdkDiagnostics class to handle the formatting of the Diagnostics clause in error messages.
  • Modified AwsServiceException to include attempts in AWS error details
  • Modified RetryableStageHelper to track and propagate attempt counts during retries

Testing

  • Added new functional test ExceptionAttemptMessageBehaviorTest testing with different retry scenarios
  • Modified existing tests in AwsServiceExceptionTest and SdkExceptionSerializationTest to verify attempt count behavior
  • Added test cases in SdkExceptionMessageTest for new attempt count functionality

Visualization of changes:

Before change:

// non-retryable scenario
try {
    // request to a non-existent bucket
    s3Client.listObjects(g -> g.bucket("some-non-existent-bucket"));
} catch (AwsServiceException e) {
    System.out.println(e);
}
// outputs:
// NoSuchBucketException: The specified bucket does not exist (Service: S3, Status Code: 404, Request ID: abc123, Extended Request ID: xyzdef123456)


// retryable scenario
try {
    // request that might cause a retryable error (e.g., throttling)
    dynamoDbClient.getItem(...);
} catch (AwsServiceException e) {
    System.out.println(e);
}
// outputs:
// ThrottlingException: Rate of requests exceeds the allowed throughput (Service: DynamoDB, Status Code: 429, Request ID: abc123, Extended Request ID: xyzdef123456)
// Note: attempts only visible through e.getSuppressed()

After change:

// non-retryable scenario
try {
    // request to a non-existent bucket
    s3Client.listObjects(g -> g.bucket("some-non-existent-bucket"));
} catch (AwsServiceException e) {
    System.out.println(e);
}
// outputs:
// NoSuchBucketException: The specified bucket does not exist (Service: S3, Status Code: 404, Request ID: abc123, Extended Request ID: xyzdef123456) (SDK Diagnostics: numAttempts = 1)


// retryable scenario
try {
    // request that might cause a retryable error (e.g., throttling)
    dynamoDbClient.getItem(...);
} catch (AwsServiceException e) {
    System.out.println(e);
}
// outputs:
// ThrottlingException: Rate of requests exceeds the allowed throughput (Service: DynamoDB, Status Code: 429, Request ID: abc123, Extended Request ID: xyzdef123456) (SDK Diagnostics: numAttempts = 4)
// Note: attempts now visible directly in the exception message

Potential Impact:

Test or error handling code that relies on exact exception message matching might break as messages now include attempt count. We recommend using partial message matching for more resilient tests and error handling.

// may break
assertEquals("Resource not found", e.getMessage());
if (e.getMessage().equals("Resource not found")) { ... }

// more resilient approach
assertTrue(e.getMessage().contains("Resource not found"));
try {
    client.someOperation();
} catch (AwsServiceException e) {
	// may break
    if (e.getMessage().equals("Resource not found")) {
        handleNotFound();
    }
	// more resilient
    if (e.getMessage().contains("Resource not found")) {
        handleNotFound();
    }
}

@RanVaknin RanVaknin requested a review from a team as a code owner January 28, 2025 21:07
}

@Test
public void create_WithoutAttemptCount_UsesDefaultValue() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's me that is confused, but test name says WithoutAttemptCount but we set numAttempt to 6?

@@ -115,7 +115,7 @@ public void successfulExecutionCallsResponseHandler() throws Exception {

@Test
public void failedExecutionCallsErrorResponseHandler() throws Exception {
SdkServiceException exception = SdkServiceException.builder().message("Uh oh!").statusCode(500).build();
SdkServiceException exception = (SdkServiceException) SdkServiceException.builder().message("Uh oh!").statusCode(500).numAttempts(1).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to cast here? Does the type return when calling .build() change? It should not change as it is a public API

import software.amazon.awssdk.annotations.SdkProtectedApi;

@SdkProtectedApi
public class SdkDiagnostics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coul you implement the ToCopyableBuilder interface here? with the build extending the Builder interface? You can look at EndpointDiscoveryEndpoint for an example

}

public SdkDiagnostics.Builder toBuilder() {
return new SdkDiagnostics.BuilderImpl();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the builder instance returned should have the numAttempt of this instance

* @param numAttempts The attempt count
* @return This method for object chaining
*/
Builder numAttempts(Integer numAttempts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add default and throw new UnsupportedOperationException for this method and numAttempts()? As this is a public API, there is a small chance that users might extends this interface, and it would break them at compile time since they are not implementing these methods

@@ -122,7 +122,7 @@ public void marshallerThrowsReportedThroughFuture() throws Exception {
public void responseHandlerThrowsReportedThroughFuture() throws Exception {
final SdkClientException e = SdkClientException.create("Could not handle response");
when(responseHandler.handle(any(SdkHttpFullResponse.class), any(ExecutionAttributes.class))).thenThrow(e);
doVerify(() -> clientHandler.execute(executionParams), e);
doVerify(() -> clientHandler.execute(executionParams), thrown -> thrown.getCause() instanceof SdkClientException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here did the thrown exception type changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants