Skip to content

Add Retry Strategy Implementations and Tests#1

Open
nishkarsh-db wants to merge 3 commits intoUnified_Retry_1from
Unified_Retry_2
Open

Add Retry Strategy Implementations and Tests#1
nishkarsh-db wants to merge 3 commits intoUnified_Retry_1from
Unified_Retry_2

Conversation

@nishkarsh-db
Copy link
Copy Markdown
Owner

Description

Introduces the retry strategy implementations that consume the foundation components from PR1:

  • IRetryStrategy interface: Defines contract for retry decision logic with two methods:

    • shouldRetryAfter() for HTTP responses - returns retry delay and checks timeout budgets
    • shouldRetryAfter() for exceptions - returns retry delay for network errors
  • IdempotentRetryStrategy: Aggressive retry strategy for safe operations

    • Retries all HTTP error codes except specific client errors (400, 401, 403, 404, etc.)
    • Uses Retry-After header when present, otherwise exponential backoff
    • Retries all exceptions except non-retriable runtime exceptions (IllegalArgumentException, etc.)
  • NonIdempotentRetryStrategy: Conservative retry strategy for unsafe operations

    • Only retries 503 Service Unavailable and 429 Too Many Requests
    • Requires Retry-After header to retry (will not retry without it)
    • Only retries specific network exceptions (ConnectException, UnknownHostException, etc.)
  • API Retriable Codes Integration:

    • Both strategies check connectionContext.getApiRetriableHttpCodes()
    • Custom status codes (e.g., 404, 400) can be retried with independent timeout budget
    • Passes isApiRetriableCode flag to RetryTimeoutManager for correct budget deduction
  • RetryUtils enhancements:

    • calculateExponentialBackoff() - exponential backoff with jitter (1s to 10s)
    • extractRetryAfterHeader() - parses Retry-After header from responses
    • getRetryStrategy() - returns appropriate strategy based on RequestType
    • throwDatabricksHttpException() - converts exceptions to standardized format

Testing

All tests pass (16 tests): [INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0

IdempotentRetryStrategyTest (8 tests):

  • Retriable response with/without Retry-After header
  • Non-retriable response (2xx success, 400 client error)
  • Retriable/non-retriable exceptions
  • Timeout exhaustion for responses and exceptions

NonIdempotentRetryStrategyTest (8 tests):

  • Retriable response with Retry-After header
  • No retry without Retry-After header (conservative approach)
  • Only 503/429 are retriable (not 500)
  • Network exceptions retriable, other exceptions not retriable
  • Timeout exhaustion scenarios

Notes for Reviewer

  • No breaking changes - builds on PR1 foundation
  • Strategy selection happens in PR3 when integrated into HTTP client
  • Default timeouts are 120s (from PR1) for other errors and exceptions
  • 503/429 timeouts remain configurable via connection context (TemporarilyUnavailableRetryTimeout, RateLimitRetryTimeout)
  • API retriable codes use separate timeout budget (ApiRetryTimeout) and operate independently from standard codes

nishkarsh-db and others added 3 commits February 19, 2026 15:42
- Add IRetryStrategy interface defining retry decision contract
- Add IdempotentRetryStrategy for safe-to-retry operations (GET, metadata)
- Add NonIdempotentRetryStrategy for conservative retry (POST, mutations)
- Add comprehensive test coverage (16 tests total: 8 per strategy)
- Expand RetryUtils with helper methods for strategies

IdempotentRetryStrategy retries on 5xx, 429, 408, and network errors.
NonIdempotentRetryStrategy only retries 503/429 with Retry-After headers.
Both integrate with RetryTimeoutManager for bounded retry attempts.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Check if status code is in ApiRetriableHttpCodes from connection context
- If API retriable code OR retriable by standard logic, proceed with retry
- Pass isApiRetriableCode boolean to RetryTimeoutManager.evaluateRetryTimeoutForResponse()
- Timeout manager uses flag to deduct from correct budget (API codes vs standard)
- Allows custom status codes (e.g., 404, 400) to have independent retry budget

Test updates:
- Add lenient mock for getApiRetriableHttpCodes() in setUp
- Update evaluateRetryTimeoutForResponse mocks to include boolean parameter
- All 16 tests pass (8 IdempotentRetryStrategy + 8 NonIdempotentRetryStrategy)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Move common retry logic documentation to IRetryStrategy interface
- Simplify IdempotentRetryStrategy docs to focus on implementation specifics
- Simplify NonIdempotentRetryStrategy docs to focus on implementation specifics
- Each concrete class now documents only what makes it unique:
  * Which status codes/exceptions are retriable
  * Retry delay calculation approach (Retry-After vs exponential backoff)
  * Special requirements (e.g., Retry-After header requirement for non-idempotent)
- All 16 tests still pass

Co-authored-by: Cursor <cursoragent@cursor.com>
* </ul>
*
* <p><b>Critical Requirement:</b> Retry-After header MUST be present. Will NOT retry without it
* (except for API retriable codes which may use exponential backoff).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we're actually following this?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests that cover API retriable codes for either strategy

private static final int MIN_BACKOFF_INTERVAL_MILLISECONDS = 1000; // 1s
private static final int MAX_BACKOFF_INTERVAL_MILLISECONDS = 10000; // 10s
private static final String RETRY_AFTER_HEADER = "Retry-After";
private static final Random RANDOM = new Random();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple JDBC connections retrying concurrently will contend on the shared Random seed, producing poor randomness and potential performance degradation.

Use ThreadLocalRandom.current().nextDouble() instead.

if (response.containsHeader(RETRY_AFTER_HEADER)) {
try {
int retryAfterSeconds =
Integer.parseInt(response.getFirstHeader(RETRY_AFTER_HEADER).getValue().trim());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add some safe guard to ensure that whatever we parse is > 0 i.e. we do not want to subtract time, or retry instantly. so maybe do a max(some_default, retryAfter)

* @return Optional containing the retry interval in milliseconds, or empty if header is missing
* or invalid
*/
public static Optional<Integer> extractRetryAfterHeader(Map<String, String> headers) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we dedup the logic for these two methods and make one use the other?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is some duplication between the two strategies, is there scope to perhaps define an abstract base class?


int retryAfter = RetryUtils.calculateExponentialBackoff(executionAttempt);
if (!retryTimeoutManager.evaluateRetryTimeoutForException(retryAfter)) {
LOGGER.error(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think this should be error. same for the other strategy

return Optional.empty();
}

String retryAfterValue = headers.get(RETRY_AFTER_HEADER);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this is case sensitive, which should not be the case

return isRetriable;
}

private boolean isExceptionRetrieable(Exception e) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type

return isRetriable;
}

private boolean isExceptionRetrieable(Exception e) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@nishkarsh-db nishkarsh-db force-pushed the Unified_Retry_1 branch 2 times, most recently from 32e2032 to 98a6f6d Compare February 23, 2026 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants