feat(csharp): convert cloudfetch retry from limit by # of retries to limit by time by eric-wang-1990 · Pull Request #375 · adbc-drivers/databricks

eric-wang-1990 · 2026-03-28T06:14:17Z

Summary

Replace fixed 3-attempt retry with 5-minute time-budget approach using exponential backoff with jitter (matching RetryHttpHandler's pattern)
Retry all exceptions (502 Bad Gateway from proxy, connection drops, TCP resets, HttpClient.Timeout) — catch-all, not just transport errors
Add new adbc.databricks.cloudfetch.retry_timeout_seconds parameter (default 300s / 5 min)
Keep adbc.databricks.cloudfetch.max_retries parameter for PowerBI backward compatibility (default -1 = not set). When set, retry loop exits if either max retries or timeout is reached first
Add detailed retry tracing (attempt count, backoff delay, timeout budget remaining)

Problem

CloudFetch downloads used a fixed 3-retry loop with linear backoff (~1.5s total). When a proxy returns 502 or a firewall blocks cloud storage, all 3 retries exhaust within seconds — not enough time for transient issues to resolve.

Solution

Switch to a time-budget approach (same as RetryHttpHandler from #373): keep retrying with exponential backoff + jitter until the 5-minute budget is exhausted. This gives transient issues (proxy errors, firewall rules, connection drops) enough time to resolve.

	Before	After
Retry limit	3 attempts	5-minute time budget (+ optional max_retries)
Backoff	Linear (500ms, 1s, 1.5s)	Exponential with jitter (500ms → 1s → 2s → ... → 32s cap)
Total retry window	~3 seconds	Up to 5 minutes
Error classification	Retry all exceptions	Retry all exceptions (same)
max_retries param	Required (default 3)	Optional (default -1 = not set)

Test plan

All 9 CloudFetch downloader unit tests pass
Full build succeeds (netstandard2.0 + net8.0)
Manual test: block cloud storage via MITM firewall, verify retries continue for 5 min
Manual test: unblock within 5 min, verify download resumes successfully

This pull request was AI-assisted by Isaac.

- Extract shared TransportErrorHelper from RetryHttpHandler for reuse - Replace fixed 3-attempt retry with 5-minute time-budget approach using exponential backoff with jitter (matching RetryHttpHandler) - Retry all transient errors (502, connection drops, timeouts) not just transport errors - Replace adbc.databricks.cloudfetch.max_retries parameter with adbc.databricks.cloudfetch.retry_timeout_seconds (default 300s) - Add detailed retry tracing (attempt count, backoff, timeout budget) Co-authored-by: Isaac

Keep the PR scope minimal: only CloudFetch retry-count to time-budget change. RetryHttpHandler remains unchanged from main. Co-authored-by: Isaac

…loudFetch Restore adbc.databricks.cloudfetch.max_retries parameter (needed by PowerBI) as an optional guard alongside the new time-budget retry approach. Default is 0 (no limit), meaning only the 5-minute timeout applies. When set, the retry loop exits if either the max retry count or the timeout is reached first. Co-authored-by: Isaac

Change DefaultMaxRetries from 0 to -1 so that -1 clearly means "not set" and 0 can mean "no retries at all" if needed. Guard checks updated from > 0 to >= 0 accordingly. Co-authored-by: Isaac

…loader Remove CalculateBackoffWithJitter method and MaxBackoffMs constant, inline the single-use logic at the call site. Co-authored-by: Isaac

Copilot

Pull request overview

This PR updates the C# CloudFetch downloader retry behavior from a small fixed retry count to a time-budgeted retry loop with exponential backoff + jitter, and adds a new connection parameter to configure the retry time budget.

Changes:

Introduce adbc.databricks.cloudfetch.retry_timeout_seconds (default 300s) and wire it into CloudFetchDownloader.
Replace the fixed retry loop with a budget-based loop that logs richer retry telemetry and supports an optional max_retries cap for backward compatibility.
Update CloudFetch downloader tests to pass the new constructor parameters and adjust expectations/timings.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File	Description
csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs	Implements the new time-budget retry loop, adds retry telemetry, and plumbs through `RetryTimeoutSeconds`.
csharp/src/Reader/CloudFetch/CloudFetchConfiguration.cs	Adds defaults + parsing for the new retry timeout parameter and changes `MaxRetries` default/handling.
csharp/src/DatabricksParameters.cs	Adds the new parameter constant and updates parameter documentation for CloudFetch retry settings.
csharp/test/E2E/CloudFetch/CloudFetchDownloaderTest.cs	Updates tests to use the new retry configuration and adjusts assertions/waits for the new behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs

csharp/src/Reader/CloudFetch/CloudFetchConfiguration.cs

csharp/test/E2E/CloudFetch/CloudFetchDownloaderTest.cs

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs

- Fix maxRetries semantics: -1=unlimited, 0=no retries, >0=max retries. Move check to catch block so first attempt always runs. - Use real elapsed time (stopwatch) for retry budget instead of summing backoff delays, so slow/hanging attempts count against the budget. - Use long for retryTimeoutMs to prevent int overflow on large values. - Remove raw exception messages from trace events to avoid leaking pre-signed URL credentials in telemetry. - Throw on invalid max_retries property values instead of silent fallback. - Replace flaky Task.Delay(3000) in test with deterministic polling. Co-authored-by: Isaac

Keep consistent with RetryHttpHandler which also tracks cumulative backoff sleep time rather than real elapsed time. Both rely on per-request timeouts (HttpClient.Timeout / per-request CTS) to bound individual attempt duration. Co-authored-by: Isaac

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs

msrathore-db

LGTM

- Use Random.Shared instead of new Random() per retry to avoid identical jitter values across parallel downloads. - Use long math for backoff doubling to prevent int overflow. - Fix maxRetries check from > to >= for backward compatibility (max_retries=3 means 3 total attempts, not 4). Co-authored-by: Isaac

0 = no limit (use timeout only), >0 = total attempts including first try. Matches old for-loop semantics where maxRetries=3 meant 3 total attempts. Removes -1 sentinel value, check is now before try block like the old code. Co-authored-by: Isaac

Random.Shared requires .NET 6+, not available in netstandard2.0. Co-authored-by: Isaac

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs

csharp/src/Reader/CloudFetch/CloudFetchConfiguration.cs

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs

…time, backoff clamp - Replace SanitizeUrl(url) calls with existing sanitizedUrl variable - Use stopwatch.Elapsed in exception message for accurate elapsed time - Clamp initial currentBackoffMs to 32s cap to prevent overflow Co-authored-by: Isaac

…limit by time (#375) ## Summary - Replace fixed 3-attempt retry with **5-minute time-budget** approach using exponential backoff with jitter (matching RetryHttpHandler's pattern) - Retry all exceptions (502 Bad Gateway from proxy, connection drops, TCP resets, HttpClient.Timeout) — catch-all, not just transport errors - Add new `adbc.databricks.cloudfetch.retry_timeout_seconds` parameter (default 300s / 5 min) - Keep `adbc.databricks.cloudfetch.max_retries` parameter for PowerBI backward compatibility (default -1 = not set). When set, retry loop exits if either max retries or timeout is reached first - Add detailed retry tracing (attempt count, backoff delay, timeout budget remaining) ### Problem CloudFetch downloads used a fixed 3-retry loop with linear backoff (~1.5s total). When a proxy returns 502 or a firewall blocks cloud storage, all 3 retries exhaust within seconds — not enough time for transient issues to resolve. ### Solution Switch to a time-budget approach (same as RetryHttpHandler from #373): keep retrying with exponential backoff + jitter until the 5-minute budget is exhausted. This gives transient issues (proxy errors, firewall rules, connection drops) enough time to resolve. | | Before | After | |---|---|---| | Retry limit | 3 attempts | 5-minute time budget (+ optional max_retries) | | Backoff | Linear (500ms, 1s, 1.5s) | Exponential with jitter (500ms → 1s → 2s → ... → 32s cap) | | Total retry window | ~3 seconds | Up to 5 minutes | | Error classification | Retry all exceptions | Retry all exceptions (same) | | max_retries param | Required (default 3) | Optional (default -1 = not set) | ## Test plan - [x] All 9 CloudFetch downloader unit tests pass - [x] Full build succeeds (netstandard2.0 + net8.0) - [ ] Manual test: block cloud storage via MITM firewall, verify retries continue for 5 min - [ ] Manual test: unblock within 5 min, verify download resumes successfully This pull request was AI-assisted by Isaac.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-29T03:16:16Z

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs

+                        $"Failed to download file from {sanitizedUrl} after {attemptCount} attempts over {stopwatch.Elapsed.TotalSeconds:F1}s ({retryLimits}).",
+                        lastException);


The thrown InvalidOperationException now includes lastException as an inner exception. Some inner exceptions (e.g., HttpRequestException) can include the original request URI in their message, which may contain pre-signed URL query parameters. Since this exception is also passed to tracing via activity?.AddException(ex, ...), the inner exception message can end up in logs/telemetry. Consider not attaching the raw exception as the inner exception, or wrapping it in a redacted exception type/message before attaching so pre-signed URL credentials can’t leak.

Suggested change

$"Failed to download file from {sanitizedUrl} after {attemptCount} attempts over {stopwatch.Elapsed.TotalSeconds:F1}s ({retryLimits}).",

lastException);

$"Failed to download file from {sanitizedUrl} after {attemptCount} attempts over {stopwatch.Elapsed.TotalSeconds:F1}s ({retryLimits}).");

✅ Done - Removed lastException as inner exception to prevent pre-signed URL leakage. Instead, appended the exception type name to the message: Last error: {lastException?.GetType().Name ?? "unknown"}. This gives operators enough context for diagnosis without exposing credentials.

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs

Drop lastException as inner exception from the retry-exhausted InvalidOperationException. HttpRequestException.Message can contain the full pre-signed URL with credentials/SAS tokens. Instead, append the exception type name to the message for operator diagnosis. Co-authored-by: Isaac

eric-wang-1990 added 5 commits March 27, 2026 23:13

fix: revert RetryHttpHandler changes, remove unused TransportErrorHelper

6c55679

Keep the PR scope minimal: only CloudFetch retry-count to time-budget change. RetryHttpHandler remains unchanged from main. Co-authored-by: Isaac

fix(csharp): use -1 as default for max_retries to denote not set

2922c71

Change DefaultMaxRetries from 0 to -1 so that -1 clearly means "not set" and 0 can mean "no retries at all" if needed. Guard checks updated from > 0 to >= 0 accordingly. Co-authored-by: Isaac

refactor(csharp): inline backoff jitter calculation in CloudFetchDown…

b4163cd

…loader Remove CalculateBackoffWithJitter method and MaxBackoffMs constant, inline the single-use logic at the call site. Co-authored-by: Isaac

eric-wang-1990 requested review from CurtHagenlocher, Copilot, jadewang-db and msrathore-db March 28, 2026 06:58

Copilot started reviewing on behalf of eric-wang-1990 March 28, 2026 06:59 View session

eric-wang-1990 changed the title ~~feat(csharp): add CloudFetch download retry resilience~~ feat(csharp): convert cloudfetch retry from limit by # of retries to limit by time Mar 28, 2026

Copilot AI reviewed Mar 28, 2026

View reviewed changes

eric-wang-1990 added 2 commits March 28, 2026 00:06

eric-wang-1990 requested a review from Copilot March 28, 2026 07:26

Copilot started reviewing on behalf of eric-wang-1990 March 28, 2026 07:26 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs Outdated Show resolved Hide resolved

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs Show resolved Hide resolved

csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs Outdated Show resolved Hide resolved

msrathore-db approved these changes Mar 28, 2026

View reviewed changes

eric-wang-1990 added 3 commits March 28, 2026 08:34

fix(csharp): revert Random.Shared to new Random() for netstandard2.0

13c7210

Random.Shared requires .NET 6+, not available in netstandard2.0. Co-authored-by: Isaac

eric-wang-1990 requested a review from Copilot March 29, 2026 02:21

Copilot started reviewing on behalf of eric-wang-1990 March 29, 2026 02:21 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

eric-wang-1990 requested a review from Copilot March 29, 2026 03:11

Copilot started reviewing on behalf of eric-wang-1990 March 29, 2026 03:11 View session

eric-wang-1990 added this pull request to the merge queue Mar 29, 2026

Copilot AI reviewed Mar 29, 2026

View reviewed changes

eric-wang-1990 removed this pull request from the merge queue due to a manual request Mar 29, 2026

eric-wang-1990 enabled auto-merge March 29, 2026 03:24

eric-wang-1990 added this pull request to the merge queue Mar 29, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 29, 2026

eric-wang-1990 added this pull request to the merge queue Mar 29, 2026

Merged via the queue into main with commit adf4853 Mar 29, 2026
16 checks passed

eric-wang-1990 deleted the fix/cloudfetch-transport-error-retry branch March 29, 2026 05:56

		$"Failed to download file from {sanitizedUrl} after {attemptCount} attempts over {stopwatch.Elapsed.TotalSeconds:F1}s ({retryLimits}).",
		lastException);

Conversation

eric-wang-1990 commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msrathore-db left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

eric-wang-1990 Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eric-wang-1990 commented Mar 28, 2026 •

edited

Loading