Skip to content

[Internal] Release: Adds Cherry-Pick Metadata PR Fix to release branch#5125

Merged
microsoft-github-policy-service[bot] merged 2 commits intoreleases/3.48.1from
users/nalutripician/3481metadataCherryPick
Apr 12, 2025
Merged

[Internal] Release: Adds Cherry-Pick Metadata PR Fix to release branch#5125
microsoft-github-policy-service[bot] merged 2 commits intoreleases/3.48.1from
users/nalutripician/3481metadataCherryPick

Conversation

@NaluTripician
Copy link
Copy Markdown
Contributor

Pull Request Template

Description

Context

Currently, there is a bug in the SDK where upon a cold start of the SDK and rare edge cases involving online/offline-ing regions, where only query requests are made, the SDK will not retry certain status code responses from metadata requests causing the entire request to fail. The correct behavior would be for the SDK to do cross region retries on these metadata requests.

This pulls request includes several updates to enhance error handling and retry logic in the Cosmos DB SDK. The changes mainly focus on extending support for additional server error types and improving retry policies for various scenarios.

Improvements to retry logic:

ClientRetryPolicy.cs: Enhanced retry logic to handle InternalServerError , DatabaseAccountNotFound, and LeaseNotFound status codes. *
MetadataRequestThrottleRetryPolicy.cs: Refactored retry policy to handle additional status codes and renamed methods and constants to reflect the broader scope of endpoint unavailability.

FaultInjection enhancements to error handling testing:

FaultInjectionRuleBuilder.cs: Added support for additional server error types such as DatabaseAccountNotFound, ServiceUnavailable, InternalServerError, and LeaseNotFound in the for metadata requests.
*
FaultInjectionServerErrorType.cs: Updated the FaultInjectionServerErrorType enum to include LeaseNotFound and corrected the status code for
DatabaseAccountNotFound.
*
FaultInjectionServerErrorResultInternal.cs: Added handling for LeaseNotFound and updated the status code for DatabaseAccountNotFound in the GetInjectedServerError method. *

Testing updates:

CosmosItemIntegrationTests.cs: Added a new test method
MetadataEndpointUnavailableCrossRegionalRetryTest to validate the retry logic for various server error types.
*
ClientRetryPolicyTests.cs: Extended existing tests to cover additional status codes and substatus codes.

Type of change

Please delete options that are not relevant.

  • [] Bug fix (non-breaking change which fixes an issue)

Closing issues

To automatically close an issue: closes #4710


Pull Request Template

Description

Please include a summary of the change and which issue is fixed. Include samples if adding new API, and include relevant motivation and context. List any dependencies that are required for this change.

Type of change

Please delete options that are not relevant.

  • [] Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [] This change requires a documentation update

Closing issues

To automatically close an issue: closes #IssueNumber

…retried with a client cold start with only query requests (#5108)

# Pull Request Template

## Description

### Context 

Currently, there is a bug in the SDK where upon a cold start of the SDK
and rare edge cases involving online/offline-ing regions, where only
query requests are made, the SDK will not retry certain status code
responses from metadata requests causing the entire request to fail. The
correct behavior would be for the SDK to do cross region retries on
these metadata requests.

This pulls request includes several updates to enhance error handling
and retry logic in the Cosmos DB SDK. The changes mainly focus on
extending support for additional server error types and improving retry
policies for various scenarios.

### Improvements to retry logic:

*
[`ClientRetryPolicy.cs`](diffhunk://#diff-2b056512ca285b1d95e025e31f60345059fa92d958becc38f90a6fb54ce1bbb4R331-R341):
Enhanced retry logic to handle `InternalServerError` ,
`DatabaseAccountNotFound`, and `LeaseNotFound` status codes.
*
[`MetadataRequestThrottleRetryPolicy.cs`](diffhunk://#diff-a5ed5985909c3dcb6e4ce186cdd662d590dac5297ea14e68560c7d1eca307be4L26-R28):
Refactored retry policy to handle additional status codes and renamed
methods and constants to reflect the broader scope of endpoint
unavailability.

### FaultInjection enhancements to error handling testing:

*
[`FaultInjectionRuleBuilder.cs`](diffhunk://#diff-d827164a4a6a0d8737e6598f8132c915ef48a1fc01daaa6422706f770dada5d5L152-R156):
Added support for additional server error types such as
`DatabaseAccountNotFound`, `ServiceUnavailable`, `InternalServerError`,
and `LeaseNotFound` in the for metadata requests.
*
[`FaultInjectionServerErrorType.cs`](diffhunk://#diff-0c89faa9a48c428a7a98662d995474e34295618ac60e677ad9762fd048f33601L75-R82):
Updated the `FaultInjectionServerErrorType` enum to include
`LeaseNotFound` and corrected the status code for
`DatabaseAccountNotFound`.
*
[`FaultInjectionServerErrorResultInternal.cs`](diffhunk://#diff-1ae8256c6d505a8f3b0a350978a0cc9a08f6234f7328f28f1db14302ee691d72L473-R473):
Added handling for `LeaseNotFound` and updated the status code for
`DatabaseAccountNotFound` in the `GetInjectedServerError` method.
* 
### Testing updates:

*
[`CosmosItemIntegrationTests.cs`](diffhunk://#diff-16d429adf686a32936696d2014afab3fc8faf91f10c880850fb8b30f8b96bb33R153-R214):
Added a new test method
`MetadataEndpointUnavailableCrossRegionalRetryTest` to validate the
retry logic for various server error types.
*
[`ClientRetryPolicyTests.cs`](diffhunk://#diff-d3fdfdc5d4f4d8af2c2cc463d928285680e7695861422ef2e3330d1a956807e1L167-R176):
Extended existing tests to cover additional status codes and substatus
codes.

## Type of change

Please delete options that are not relevant.

- [] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #4710

---------

Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

@microsoft-github-policy-service microsoft-github-policy-service Bot enabled auto-merge (squash) April 12, 2025 04:09
@NaluTripician NaluTripician changed the title Cherry-Pick Metadata PR Fix to release branch [Internal] Release: Adds Cherry-Pick Metadata PR Fix to release branch Apr 12, 2025
…5124)

Hotfix will include crucial bugfix:
#5108

Please delete options that are not relevant.

To automatically close an issue: closes #IssueNumber
Copy link
Copy Markdown
Member

@kundadebdatta kundadebdatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@microsoft-github-policy-service microsoft-github-policy-service Bot merged commit ad0583e into releases/3.48.1 Apr 12, 2025
26 checks passed
@microsoft-github-policy-service microsoft-github-policy-service Bot deleted the users/nalutripician/3481metadataCherryPick branch April 12, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Enables automation to merge PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants