Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,25 @@
namespace Microsoft.Azure.Cosmos.ChangeFeed.Bootstrapping
{
using System;
using System.Net.Http;
using System.Threading.Tasks;
using Microsoft.Azure.Cosmos.ChangeFeed.FeedManagement;
using Microsoft.Azure.Cosmos.ChangeFeed.LeaseManagement;
using Microsoft.Azure.Cosmos.Core.Trace;

internal sealed class BootstrapperCore : Bootstrapper
{
/// <summary>
/// Maximum number of times <see cref="InitializeAsync"/> will retry when
/// <see cref="PartitionSynchronizer.CreateMissingLeasesAsync"/> fails with a
/// regional error (e.g., <see cref="CosmosException"/> with 503 or
/// <see cref="HttpRequestException"/>). The retry is useful because
/// <see cref="MetadataRequestThrottleRetryPolicy"/> marks the failing
/// endpoint unavailable before propagating the error, so the next attempt
/// will be routed to a different region.
/// </summary>
internal const int MaxInitializationRetries = 3;

internal static readonly TimeSpan DefaultSleepTime = TimeSpan.FromSeconds(15);
internal static readonly TimeSpan DefaultLockTime = TimeSpan.FromSeconds(30);

Expand Down Expand Up @@ -50,6 +62,8 @@ public BootstrapperCore(PartitionSynchronizer synchronizer, DocumentServiceLease

public override async Task InitializeAsync()
{
int retryCount = 0;

while (true)
{
bool initialized = await this.leaseStore.IsInitializedAsync().ConfigureAwait(false);
Expand All @@ -73,6 +87,39 @@ public override async Task InitializeAsync()
await this.synchronizer.CreateMissingLeasesAsync().ConfigureAwait(false);
await this.leaseStore.MarkInitializedAsync().ConfigureAwait(false);
}
catch (CosmosException ex) when (retryCount < MaxInitializationRetries)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Overbroad Exception Filter: Catches all CosmosException types including non-transient errors

The when guard only checks retryCount, but the XML doc and comment explicitly say this is for "regional error (e.g., CosmosException with 503 or HttpRequestException)". The code doesn't enforce this — a 400 BadRequest, 404 NotFound, or 409 Conflict will also be caught and retried 3 times with 15-second delays (up to 45 seconds wasted on a deterministic failure).

This is inconsistent with MetadataRequestThrottleRetryPolicy.ShouldRetryInternalAsync() in this same PR, which carefully discriminates by status code (503, 500, 410/LeaseNotFound, 403/DatabaseAccountNotFound).

Concrete failure scenario: If the lease container is deleted mid-initialization, CreateMissingLeasesAsync() throws a 404 CosmosException. This catch retries 3× against the same endpoint with the same result, adding ~45 seconds of latency before finally propagating.

Suggestion: Add a status code filter to align with the policy:

catch (CosmosException ex) when (retryCount < MaxInitializationRetries
    && (ex.StatusCode == HttpStatusCode.ServiceUnavailable
        || ex.StatusCode == HttpStatusCode.InternalServerError
        || (ex.StatusCode == HttpStatusCode.Gone
            && ex.SubStatusCode == (int)SubStatusCodes.LeaseNotFound)
        || (ex.StatusCode == HttpStatusCode.Forbidden
            && ex.SubStatusCode == (int)SubStatusCodes.DatabaseAccountNotFound)))

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

{
// MetadataRequestThrottleRetryPolicy has already marked the
// failing endpoint unavailable, so the next iteration will
// route to a different region.
retryCount++;
DefaultTrace.TraceWarning(
"BootstrapperCore: Regional failure during initialization "
+ "(StatusCode: {0}, SubStatusCode: {1}). "
+ "Attempt {2} of {3}. Retrying after {4}.",
ex.StatusCode,
ex.SubStatusCode,
retryCount,
MaxInitializationRetries,
this.sleepTime);

await Task.Delay(this.sleepTime).ConfigureAwait(false);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — Task.Delay without CancellationToken

The review rules require verifying cancellation token propagation on all async methods. Both Task.Delay(this.sleepTime) calls in the new catch blocks don't accept a cancellation token — if the caller wants to cancel during the 15-second retry delay, they can't.

The InitializeAsync() method itself doesn't take a CancellationToken parameter (pre-existing design), so there's no token to pass here. This is a pre-existing limitation of the Bootstrapper interface, not introduced by this PR.

Consider adding CancellationToken support to InitializeAsync() in a follow-up if the team wants cancellable bootstrap initialization.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

continue;
}
catch (HttpRequestException ex) when (retryCount < MaxInitializationRetries)
{
retryCount++;
DefaultTrace.TraceWarning(
"BootstrapperCore: HttpRequestException during initialization: {0}. "
+ "Attempt {1} of {2}. Retrying after {3}.",
ex.Message,
retryCount,
MaxInitializationRetries,
this.sleepTime);

await Task.Delay(this.sleepTime).ConfigureAwait(false);
continue;
}
finally
{
if (isLockAcquired)
Expand Down
15 changes: 15 additions & 0 deletions Microsoft.Azure.Cosmos/src/Handler/AbstractRetryHandler.cs
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,21 @@ private static async Task<ResponseMessage> ExecuteHttpRequestAsync(
throw;
}
}
catch (CosmosException cosmosException)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Missing test coverage for the most critical change in this PR

This new catch (CosmosException) block is the key change enabling the entire operation-level cross-region retry for metadata failures. Without it, CosmosException escapes to SendAsync's outer catch at line 38 and is silently converted to a ResponseMessage without consulting ClientRetryPolicy.

However, there is no test in RetryHandlerTests.cs (or elsewhere) that verifies this path works. The existing RetryHandlerDoesNotRetryOnException test throws a generic Exception, and RetryHandlerHttpClientExceptionRefreshesLocations covers HttpRequestException — but neither exercises the CosmosException catch.

Suggestion: Add a test in RetryHandlerTests.cs following the RetryHandlerHttpClientExceptionRefreshesLocations pattern:

  1. Have the inner handler throw a CosmosException (503)
  2. Configure the mock retry policy to return RetryAfter on the first ShouldRetryAsync(Exception) call
  3. Verify the retry policy is consulted and the handler retries the operation
  4. Add a complementary test verifying that when the policy returns NoRetry, the CosmosException re-throws (not silently converted)

This is especially important because NamedCacheRetryHandler also extends AbstractRetryHandler — its InvalidPartitionExceptionRetryPolicy will now see CosmosException too, and should be verified to handle it safely (it returns NoRetry, causing re-throw, which is correct).

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

{
// Metadata requests (e.g., pkranges) that fail with a regional
// error throw CosmosException from within the pipeline. Without
// this catch, the exception escapes the retry loop and is caught
// by the outer catch in SendAsync, which converts it to a
// ResponseMessage without consulting the retry policy.
// By catching it here, ClientRetryPolicy can evaluate the
// failure and retry the entire operation in another region.
result = await callShouldRetryException(cosmosException, cancellationToken);
if (!result.ShouldRetry)
{
throw;
}
}

TimeSpan backoffTime = result.BackoffTime;
if (backoffTime != TimeSpan.Zero)
Expand Down
130 changes: 106 additions & 24 deletions Microsoft.Azure.Cosmos/src/MetadataRequestThrottleRetryPolicy.cs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ namespace Microsoft.Azure.Cosmos
{
using System;
using System.Net;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.Cosmos.Core.Trace;
Expand All @@ -14,6 +15,9 @@ namespace Microsoft.Azure.Cosmos

/// <summary>
/// Metadata Request Throttle Retry Policy is combination of endpoint change retry + throttling retry.
/// On regional failures the policy marks the endpoint unavailable and retries on the next
/// preferred region. Once all regions have been attempted, the exception propagates to the
/// operation-level retry policy (e.g. <see cref="ClientRetryPolicy"/>) for cross-region failover.
/// </summary>
internal sealed class MetadataRequestThrottleRetryPolicy : IDocumentClientRetryPolicy
{
Expand Down Expand Up @@ -43,8 +47,8 @@ internal sealed class MetadataRequestThrottleRetryPolicy : IDocumentClientRetryP
private readonly int maxUnavailableEndpointRetryCount;

/// <summary>
/// An instance of <see cref="Uri"/> containing the location endpoint where the partition key
/// range http request will be sent over.
/// An instance of <see cref="MetadataRetryContext"/> containing the location index
/// and preferred-location flag used to route the next retry attempt.
/// </summary>
private MetadataRetryContext retryContext;

Expand All @@ -53,6 +57,13 @@ internal sealed class MetadataRequestThrottleRetryPolicy : IDocumentClientRetryP
/// </summary>
private int unavailableEndpointRetryCount;

/// <summary>
/// The resolved location endpoint for the current attempt. Used to mark
/// the endpoint as unavailable in the <see cref="IGlobalEndpointManager"/> when
/// a regional failure is detected.
/// </summary>
private Uri locationEndpoint;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Observation — Design: Marking endpoint unavailable affects ALL reads, not just metadata

MarkEndpointUnavailableForRead() signals the LocationCache to deprioritize this endpoint for all subsequent read operations (queries, point reads, change feed), not just metadata requests. The endpoint moves to the end of the preferred list and self-heals after the 5-minute TTL (DefaultUnavailableLocationsExpirationTimeInSeconds = 300).

This is the correct trade-off: if a region is failing for metadata reads (partition key range lookups), it's very likely failing for data reads too. The impact is soft (deprioritized, not blocked) and self-healing. This aligns with how ClientRetryPolicy.ShouldRetryOnEndpointFailureAsync already works.

Also confirmed: double-marking (from both this policy and ClientRetryPolicy when the exception propagates) is benign — LocationCache.MarkEndpointUnavailable uses ConcurrentDictionary.AddOrUpdate, which is idempotent (just refreshes the TTL timestamp).

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.


/// <summary>
/// The request being sent to the service.
/// </summary>
Expand Down Expand Up @@ -124,17 +135,36 @@ public Task<ShouldRetryResult> ShouldRetryAsync(
clientException.GetSubStatus(),
exception, cancellationToken);
}
else

if (exception is HttpRequestException)
{
DefaultTrace.TraceInformation("MetadataRequestThrottleRetryPolicy: Evaluating retry for Exception of type: {0}, Message: {1}, ResourceType {2}, CollectionName {3}, ResourceID {4}",
exception.GetType().Name,
exception.Message,
this.request.ResourceType,
this.request.CollectionName,
this.request.ResourceId);
DefaultTrace.TraceWarning("MetadataRequestThrottleRetryPolicy: HttpRequestException received. Marking endpoint {0} unavailable. ResourceType {1}, CollectionName {2}, ResourceID {3}.",
this.locationEndpoint,
this.request?.ResourceType,
this.request?.CollectionName,
this.request?.ResourceId);

return Task.FromResult(this.HandleRegionalFailure());
}

if (exception is OperationCanceledException && !cancellationToken.IsCancellationRequested)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — Novel pattern: treating non-user OperationCanceledException as a regional failure

This is a new behavioral pattern not present in any sibling retry policy. ClientRetryPolicy.ShouldRetryAsync(Exception) does not treat OperationCanceledException as a regional failure requiring endpoint marking and cross-region retry — it only marks per-partition endpoints and falls through to the throttling policy (which returns NoRetry).

The rationale here (transport timeout = region issue) is reasonable for metadata requests, since a timeout on a metadata read is a strong signal of regional unavailability. But this creates a behavioral divergence between the metadata-level and operation-level policies interpreting the same exception type.

Questions to confirm:

  1. Is this intentional? If so, consider adding a comment explaining why metadata-level OCE handling differs from ClientRetryPolicy.
  2. Should ClientRetryPolicy be aligned in a follow-up PR to also treat non-user OCE as a regional failure for metadata requests?

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

{
DefaultTrace.TraceWarning("MetadataRequestThrottleRetryPolicy: Non-user OperationCanceledException received. Marking endpoint {0} unavailable. ResourceType {1}, CollectionName {2}, ResourceID {3}.",
this.locationEndpoint,
this.request?.ResourceType,
this.request?.CollectionName,
this.request?.ResourceId);

return Task.FromResult(this.HandleRegionalFailure());
}

DefaultTrace.TraceInformation("MetadataRequestThrottleRetryPolicy: Evaluating retry for Exception of type: {0}, Message: {1}, ResourceType {2}, CollectionName {3}, ResourceID {4}",
exception.GetType().Name,
exception.Message,
this.request.ResourceType,
this.request.CollectionName,
this.request.ResourceId);

return this.throttlingRetryPolicy.ShouldRetryAsync(exception, cancellationToken);
}

Expand All @@ -154,10 +184,17 @@ private Task<ShouldRetryResult> ShouldRetryInternalAsync(
|| (statusCode == HttpStatusCode.Gone && subStatus == SubStatusCodes.LeaseNotFound)
|| (statusCode == HttpStatusCode.Forbidden && subStatus == SubStatusCodes.DatabaseAccountNotFound))
{
if (this.IncrementRetryIndexOnUnavailableEndpointForMetadataRead())
{
return Task.FromResult(ShouldRetryResult.RetryAfter(TimeSpan.Zero));
}
DefaultTrace.TraceWarning(
"MetadataRequestThrottleRetryPolicy: Regional failure detected (StatusCode: {0}, SubStatusCode: {1}). "
+ "Marking endpoint {2} unavailable. ResourceType {3}, CollectionName {4}, ResourceID {5}.",
statusCode,
subStatus,
this.locationEndpoint,
this.request?.ResourceType,
this.request?.CollectionName,
this.request?.ResourceId);

return Task.FromResult(this.HandleRegionalFailure());
}

return this.throttlingRetryPolicy.ShouldRetryAsync(exception, cancellationToken);
Expand Down Expand Up @@ -196,10 +233,17 @@ private Task<ShouldRetryResult> ShouldRetryInternalAsync(
|| (statusCode == HttpStatusCode.Gone && subStatus == SubStatusCodes.LeaseNotFound)
|| (statusCode == HttpStatusCode.Forbidden && subStatus == SubStatusCodes.DatabaseAccountNotFound))
{
if (this.IncrementRetryIndexOnUnavailableEndpointForMetadataRead())
{
return Task.FromResult(ShouldRetryResult.RetryAfter(TimeSpan.Zero));
}
DefaultTrace.TraceWarning(
"MetadataRequestThrottleRetryPolicy: Regional failure detected in response (StatusCode: {0}, SubStatusCode: {1}). "
+ "Marking endpoint {2} unavailable. ResourceType {3}, CollectionName {4}, ResourceID {5}.",
statusCode,
subStatus,
this.locationEndpoint,
this.request?.ResourceType,
this.request?.CollectionName,
this.request?.ResourceId);

return Task.FromResult(this.HandleRegionalFailure());
}

return this.throttlingRetryPolicy.ShouldRetryAsync(responseMessage, cancellationToken);
Expand All @@ -219,21 +263,61 @@ public void OnBeforeSendRequest(DocumentServiceRequest request)
this.retryContext.RetryLocationIndex,
this.retryContext.RetryRequestOnPreferredLocations);

Uri metadataLocationEndpoint = this.globalEndpointManager.ResolveServiceEndpoint(request);
this.locationEndpoint = this.globalEndpointManager.ResolveServiceEndpoint(request);

DefaultTrace.TraceInformation("MetadataRequestThrottleRetryPolicy: Routing the metadata request to: {0} for operation type: {1} and resource type: {2} for collection: {3} with collection rid {4}.",
metadataLocationEndpoint,
this.locationEndpoint,
request.OperationType,
request.ResourceType,
request.CollectionName,
request.ResourceId);
request.RequestContext.RouteToLocation(metadataLocationEndpoint);
request.RequestContext.RouteToLocation(this.locationEndpoint);
}

/// <summary>
/// Marks the current endpoint as unavailable and attempts to increment the
/// retry location index so the next attempt targets a different region.
/// </summary>
/// <returns>
/// <see cref="ShouldRetryResult"/> with <c>ShouldRetry = true</c> if there are still
/// regions left to try; <see cref="ShouldRetryResult"/> with <c>ShouldRetry = false</c> otherwise,
/// allowing the exception to propagate to the operation-level retry policy.
/// </returns>
private ShouldRetryResult HandleRegionalFailure()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Observation — Clean consolidation with a subtle behavioral change

HandleRegionalFailure() cleanly consolidates the "mark + increment" pattern and replaces 4 separate code blocks. The structural improvement is welcome.

One subtle behavioral change worth noting: previously, when IncrementRetryIndexOnUnavailableEndpointForMetadataRead() returned false (retries exhausted), the code fell through to this.throttlingRetryPolicy.ShouldRetryAsync(exception, cancellationToken). Now it returns ShouldRetryResult.NoRetry() directly.

This is functionally equivalent today because ResourceThrottleRetryPolicy only retries 429s and returns NoRetry for the status codes handled here (503, 500, 410/LeaseNotFound, 403/DatabaseAccountNotFound). However, it changes the implicit contract: the throttling policy is no longer consulted after region exhaustion. If ResourceThrottleRetryPolicy ever gains broader retry logic in the future, this path would be missed.

Not blocking — just calling it out for awareness.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

{
this.MarkEndpointUnavailable();

if (this.IncrementRetryIndexOnUnavailableEndpointForMetadataRead())
{
return ShouldRetryResult.RetryAfter(TimeSpan.Zero);
}

return ShouldRetryResult.NoRetry();
}

/// <summary>
/// Marks the current <see cref="locationEndpoint"/> as unavailable for reads
/// in the <see cref="IGlobalEndpointManager"/>. This acts as a hint to the
/// <see cref="LocationCache"/> so that all subsequent calls to
/// <see cref="IGlobalEndpointManager.ResolveServiceEndpoint"/> will prefer
/// other regions.
/// </summary>
private void MarkEndpointUnavailable()
{
if (this.locationEndpoint != null)
{
DefaultTrace.TraceWarning(
"MetadataRequestThrottleRetryPolicy: Marking endpoint {0} unavailable for reads.",
this.locationEndpoint);

this.globalEndpointManager.MarkEndpointUnavailableForRead(this.locationEndpoint);
}
}

/// <summary>
/// Increments the location index when a unavailable endpoint exception ocurrs, for any future read requests.
/// Increments the location index when an unavailable endpoint exception occurs, for any future read requests.
/// </summary>
/// <returns>A boolean flag indicating if the operation was successful.</returns>
/// <returns>A boolean flag indicating if there are still regions left to try.</returns>
private bool IncrementRetryIndexOnUnavailableEndpointForMetadataRead()
{
if (this.unavailableEndpointRetryCount++ >= this.maxUnavailableEndpointRetryCount)
Expand All @@ -242,8 +326,6 @@ private bool IncrementRetryIndexOnUnavailableEndpointForMetadataRead()
return false;
}

// Retrying on second PreferredLocations.
// RetryCount is used as zero-based index.
DefaultTrace.TraceWarning("MetadataRequestThrottleRetryPolicy: Incrementing the metadata retry location index to: {0}.", this.unavailableEndpointRetryCount);
this.retryContext = new MetadataRetryContext()
{
Expand Down
Loading
Loading