Skip to content

Commit f14f8aa

Browse files
Update retry guidance and example
This updates the suggested retry strategy interface to accept info about the operation being invoked in the initial token acquisition method. This also updates the example strategy to match the new behaviors that are being rolled out for the AWS standard retry strategy, which includes checking for a long-polling trait on the operation.
1 parent 4491592 commit f14f8aa

2 files changed

Lines changed: 86 additions & 32 deletions

File tree

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"type": "documentation",
3+
"description": "Updated the example retry strategy in client guidance and updated the initial token method to take information about the operation.",
4+
"pull_requests": [
5+
"[#3000](https://github.com/smithy-lang/smithy/pull/3000)"
6+
]
7+
}

docs/source-2.0/guides/client-guidance/retries.md

Lines changed: 79 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ public interface RetryStrategy {
6868
*
6969
* @throws TokenAcquisitionFailedException if a token cannot be acquired.
7070
*/
71-
RetryToken acquireInitialToken();
71+
RetryToken acquireInitialToken(ApiOperation<?, ?> operation);
7272

7373
/**
7474
* Invoked before each subsequent (non-first) request attempt.
@@ -100,6 +100,13 @@ client. Be careful to ensure that access to that state is synchronized in order
100100
to prevent race conditions.
101101
:::
102102

103+
:::{admonition} TODO - Define ApiOperation
104+
:class: note
105+
106+
`ApiOperation` will be defined later in a separate document. At a minimum, it
107+
should contain the operation's ID.
108+
:::
109+
103110
#### Using retry strategies
104111

105112
An initial retry token should be acquired at the beginning of a request, before
@@ -129,12 +136,12 @@ The following is a simplified example of what it looks like to use the
129136
*
130137
* @return a successful result.
131138
*/
132-
public Result request(SerializedRequest serializedRequest) {
139+
public Result request(ApiOperation<?, ?> operation, SerializedRequest serializedRequest) {
133140
// First acquire the initial retry token. If a token cannot be acquired,
134141
// make only one attempt without retries.
135142
RetryToken retryToken;
136143
try {
137-
retryToken = this.retryStrategy.acquireInitialToken();
144+
retryToken = this.retryStrategy.acquireInitialToken(operation);
138145
} catch (TokenAcquisitionFailedException e) {
139146
return send(serializedRequest);
140147
}
@@ -413,25 +420,34 @@ demonstrate some of the potential needs of a retry system.
413420
### Example retry strategy
414421

415422
The following is an example retry strategy that implements exponential backoff
416-
with jitter alongside a token bucket. This strategy adds extra cost for timeout
417-
errors since they may indicate a more degraded service.
423+
with jitter alongside a token bucket. This strategy has a reduced cost for
424+
throttling errors as they indicate that the service is actively managing
425+
retries.
418426

419427
Aside from delay, the retry token also tracks the number of attempts that have
420-
been made. This is necessary because this strategy imposes a maximum attempt
421-
count, and also because the delay is calculated in part based on how many
422-
attempts have been made.
428+
been made as well as if the operation is a long-polling operation. The attempt
429+
count is necessary because this strategy imposes a maximum attempt count, and
430+
also because the delay is calculated in part based on how many attempts have
431+
been made.
432+
433+
For long-polling operations, the strategy will sleep if the bucket is found
434+
to be empty.
423435

424436
```java
425-
public record AwsStandardRetryToken(int attempts, Duration delay) implements RetryToken {
437+
public record AwsStandardRetryToken(
438+
int attempts,
439+
Duration delay,
440+
boolean isLongPoll
441+
) implements RetryToken {
426442
}
427443
```
428444

429445
```java
430446
public final class AwsStandardRetryStrategy implements RetryStrategy {
431447
// These values are not prescriptive. They are static in this example for the
432448
// sake of simplicity, but making them configurable is ideal.
433-
private static final int RETRY_COST = 5;
434-
private static final int TIMEOUT_COST = 10;
449+
private static final int RETRY_COST = 14;
450+
private static final int THROTTLING_RETRY_COST = 5;
435451
private static final int SUCCESS_REFUND = 1;
436452

437453
private static final int MAX_ATTEMPTS = 5;
@@ -449,13 +465,14 @@ public final class AwsStandardRetryStrategy implements RetryStrategy {
449465
private final Object tokensLock = new Object();
450466

451467
@Override
452-
public RetryToken acquireInitialToken() {
468+
public RetryToken acquireInitialToken(ApiOperation<?, ?> operation) {
453469
// This returns successfully even if the token bucket is empty. This is
454470
// because an initial attempt will always be performed anyway, and
455471
// returning successfully here will ensure that the retry strategy is
456472
// checked if that initial attempt fails. By that point, the token bucket
457473
// may no longer be empty.
458-
return new AwsStandardRetryToken(0, null);
474+
boolean isLongPoll = operation.schema().hasTrait(TraitKey.get(LongPollTrait.class));
475+
return new AwsStandardRetryToken(0, null, isLongPoll);
459476
}
460477

461478
@Override
@@ -477,19 +494,19 @@ public final class AwsStandardRetryStrategy implements RetryStrategy {
477494
// If the exception thrown by the operation includes retryability
478495
// information, use that to inform retry behavior.
479496
case RetryInfo retryInfo when retryInfo.isRetrySafe() != RetrySafety.NO -> {
480-
// Attempt to consume tokens from the token bucket to "pay"
481-
// for the retry.
482-
consumeTokens(retryInfo.isTimeout());
483-
yield backoff(standardToken, retryInfo.retryAfter());
497+
var resultToken = backoff(standardToken, retryInfo.retryAfter(), retryInfo.isThrottle());
498+
payForRetry(retryInfo.isThrottle(), resultToken);
499+
yield resultToken;
484500
}
485501

486502
// If the exception does not have retry info, but does have more
487503
// general error info, that can also be used. This assumes that
488504
// a server error is likely retryable and that a client error
489505
// likely is not.
490506
case ErrorInfo errorInfo when errorInfo.fault() == ErrorFault.SERVER -> {
491-
consumeTokens(false);
492-
yield backoff(standardToken);
507+
var resultToken = backoff(standardToken);
508+
payForRetry(false, resultToken);
509+
yield resultToken;
493510
}
494511
default -> throw new TokenAcquisitionFailedException("Exception not retryable.");
495512
};
@@ -498,21 +515,41 @@ public final class AwsStandardRetryStrategy implements RetryStrategy {
498515
/**
499516
* Consumes tokens to "pay" for a retry.
500517
*
501-
* @param isTimeout whether the retry is in response to a timeout error,
502-
* which will require more tokens.
518+
* @param isThrottle whether the retry is in response to a throttling error,
519+
* which will require fewer tokens.
520+
* @param resultCandidate The candidate RetryToken that needs to be paid for.
503521
*
504522
* @throws TokenAcquisitionFailedException if there are not enough tokens
505523
* in the bucket to pay for the retry.
506524
*/
507-
private void consumeTokens(boolean isTimeout) {
508-
synchronized (tokensLock) {
509-
int cost = isTimeout ? TIMEOUT_COST : RETRY_COST;
525+
private void payForRetry(boolean isThrottle, AwsStandardRetryToken resultCandidate) {
526+
int cost = isThrottle ? THROTTLING_RETRY_COST : RETRY_COST;
527+
if (!consumeTokens(cost)) {
528+
if (resultCandidate.isLongPoll) {
529+
try {
530+
Thread.sleep(resultCandidate.delay());
531+
} catch (InterruptedException e) {
532+
Thread.currentThread().interrupt();
533+
}
534+
}
535+
throw new TokenAcquisitionFailedException("Token bucket exhausted.");
536+
}
537+
}
510538

539+
/**
540+
* Attempts to consume a specified amount of tokens.
541+
*
542+
* @param cost The amount of tokens to attempt to consume.
543+
* @return Returns whether the tokens were able to be consumed.
544+
*/
545+
private boolean consumeTokens(int cost) {
546+
synchronized (tokensLock) {
511547
if (this.tokens < cost) {
512-
throw new TokenAcquisitionFailedException("Token bucket exhausted.");
548+
return false;
513549
}
514550

515551
this.tokens -= cost;
552+
return true;
516553
}
517554
}
518555

@@ -522,41 +559,51 @@ public final class AwsStandardRetryStrategy implements RetryStrategy {
522559
* @param token the previous token.
523560
*/
524561
private AwsStandardRetryToken backoff(AwsStandardRetryToken token) {
525-
return new AwsStandardRetryToken(token.attempts + 1, computeDelay(token.attempts));
562+
return new AwsStandardRetryToken(
563+
token.attempts + 1, computeDelay(token.attempts, false), token.isLongPoll);
526564
}
527565

528566
/**
529567
* Computes a backoff with exponential backoff and jitter, capped at 20 seconds.
530568
*
531569
* @param token the previous token.
570+
* @param isThrottle whether the triggering error was a throttle.
532571
* @param suggested the delay suggested by the service, which will serve as
533572
* the minimum delay.
534573
*/
535-
private AwsStandardRetryToken backoff(AwsStandardRetryToken token, Duration suggested) {
574+
private AwsStandardRetryToken backoff(AwsStandardRetryToken token, Duration suggested, boolean isThrottle) {
536575
// Compute the backoff as normal. If it is longer than the suggested
537576
// backoff from the service, use it. Otherwise, use the suggested
538577
// backoff.
539-
Duration computedDelay = computeDelay(token.attempts);
540-
Duration finalDelay = computedDelay.toMillis() < suggested.toMillis() ? suggested : computedDelay;
541-
return new AwsStandardRetryToken(token.attempts + 1, finalDelay);
578+
Duration finalDelay = computeDelay(token.attempts, isThrottle);
579+
if (suggested != null && finalDelay.toMillis() < suggested.toMillis()) {
580+
finalDelay = suggested;
581+
}
582+
return new AwsStandardRetryToken(token.attempts + 1, finalDelay, token.isLongPoll);
542583
}
543584

544585
/**
545586
* Computes the delay with exponential backoff and jitter, capped at 20 seconds.
546587
*
547588
* @param attempts the number of attempts made so far.
589+
* @param isThrottle whether the triggering error was a throttle.
548590
* @return the computed delay duration.
549591
*/
550-
private Duration computeDelay(int attempts) {
592+
private Duration computeDelay(int attempts, boolean isThrottle) {
551593
// First compute the exponential backoff.
552594
double backoff = Math.pow(2, attempts);
553595

596+
// Try to recover faster from non-throttling errors.
597+
if (!isThrottle) {
598+
backoff = backoff * 0.05;
599+
}
600+
554601
// Next, cap it at 20 seconds.
555602
backoff = Math.min(backoff, MAX_BACKOFF);
556603

557604
// Finally, add jitter and expand to milliseconds.
558605
double backoffMillis = Math.random() * backoff * 1000;
559-
return Duration.ofMilliseconds((long) backoffMillis);
606+
return Duration.ofMillis((long) backoffMillis);
560607
}
561608

562609
@Override

0 commit comments

Comments
 (0)