[Internal] PPAF: Adds Dynamic Enablement of PPAF#5310
Merged
FabianMeiswinkel merged 29 commits intomasterfrom Aug 13, 2025
Merged
[Internal] PPAF: Adds Dynamic Enablement of PPAF#5310FabianMeiswinkel merged 29 commits intomasterfrom
FabianMeiswinkel merged 29 commits intomasterfrom
Conversation
ensures methods return false when PPAF is disabled
Contributor
There was a problem hiding this comment.
Pull Request Overview
This pull request introduces dynamic enablement of Partition-level Failover (PPAF) in the Azure Cosmos SDK, allowing the SDK to enable/disable PPAF at runtime based on the database account configuration without requiring a client restart. The changes include a default cross-region hedging strategy for PPAF, thread-safe dynamic configuration updates, and comprehensive test coverage.
- Adds dynamic PPAF enablement/disablement based on database account properties retrieved during background refresh
- Introduces SDK default cross-region hedging strategy specifically for PPAF scenarios
- Updates constructor signatures across multiple components to support the new GlobalPartitionEndpointManager architecture
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| CosmosItemIntegrationTests.cs | Adds comprehensive integration test for dynamic PPAF override behavior with fault injection |
| UserAgentContainer.cs | Updates feature appending logic to handle dynamic feature changes |
| DocumentClient.cs | Implements dynamic PPAF configuration updates and event handling for account property changes |
| GlobalEndpointManager.cs | Adds event for PPAF configuration changes and detection logic during account refresh |
| AvailabilityStrategy.cs | Introduces SDK default cross-region hedging strategy method for PPAF |
| CrossRegionHedgingAvailabilityStrategy.cs | Adds internal flag to identify SDK default strategies |
| GlobalPartitionEndpointManagerCore.cs | Implements thread-safe PPAF/PPCB enablement with atomic operations |
| GlobalPartitionEndpointManager.cs | Defines abstract methods for dynamic PPAF/PPCB configuration |
| GlobalPartitionEndpointManagerNoOp.cs | Implements no-op versions of new PPAF/PPCB methods |
| GatewayStoreModel.cs, GatewayStoreClient.cs, ThinClientStoreClient.cs | Updates constructors to use GlobalPartitionEndpointManager instead of boolean flags |
| Multiple test files | Updates test constructors to accommodate new GlobalPartitionEndpointManager parameter requirements |
Comments suppressed due to low confidence (1)
Microsoft.Azure.Cosmos/src/UserAgentContainer.cs:56
- IndexOf can return -1 if the character is not found, which would cause Substring to throw an exception. The Contains check above should protect against this, but the logic could be more robust.
? this.Suffix.Substring(this.Suffix.IndexOf('|') + 1)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
ananth7592
previously approved these changes
Aug 8, 2025
FabianMeiswinkel
previously approved these changes
Aug 11, 2025
Member
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
LGTM except for one small question
da3ff36
FabianMeiswinkel
requested changes
Aug 12, 2025
Member
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
LGTM except for the regex compilation comment
ananth7592
approved these changes
Aug 12, 2025
ananth7592
added a commit
that referenced
this pull request
May 1, 2026
…ind PPAF When ExcludeRegions filters out all preferred read regions and PPAF (Partition Level Failover) is enabled, GetApplicableEndpoints now falls back to WriteEndpoints[0] (dynamic, tracks current write region) instead of this.defaultEndpoint (static, region-agnostic URI set once at init). The fix is gated behind isPartitionLevelFailoverEnabled (Func<bool>) wired from ConnectionPolicy.EnablePartitionLevelFailover through GlobalEndpointManager, supporting dynamic enablement per PR #5310. When PPAF is disabled, original behavior (defaultEndpoint fallback) is preserved. Fixes #5821 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Template
Description
This pull request introduces enhancements to the partition-level failover (PPAF) functionality in the Azure Cosmos SDK. The changes include the addition of a default cross-region hedging strategy, dynamic enablement of PPAF based on the database account configuration without a restart of the SDK client, and new tests to validate these behaviors. Below are the most important changes grouped by theme:
Enhancements to Availability Strategy:
SDKDefaultCrossRegionHedgingStrategyinAvailabilityStrategy.csto provide a default hedging strategy for cross-region failover, including support for write requests on multi-region accounts.IsSDKDefaultStrategyinCrossRegionHedgingAvailabilityStrategyto differentiate SDK default strategies from custom ones. Updated the constructor to accept this flag. [1] [2] [3]Dynamic PPAF Enablement:
GlobalEndpointManager.csto dynamically enable or disable PPAF based on theenablePartitionLevelFailoverflag retrieved from the database account properties. Added logic to reset the availability strategy to null if PPAF is disabled and no custom strategy is set. [1] [2]Default Hedging Thresholds:
DefaultHedgingThresholdInMillisecondsandDefaultHedgingThresholdStepInMillisecondsinDocumentClient.csfrom private to internal for broader accessibility within the SDK.InitializePartitionLevelFailoverWithDefaultHedgingto use the newSDKDefaultCrossRegionHedgingStrategy.Tests for PPAF Functionality:
ReadItemAsync_WithPPAFDynamicOverrideinCosmosItemIntegrationTests.csto validate dynamic PPAF enablement, hedging behavior, and fallback when PPAF is disabled. This includes fault injection and diagnostics validation.End to End Validation:
CreateItemAsync: Strong Consistency Account with Direct Mode:
CreateItemAsync: Strong Consistency Account with Gateway Mode:
CreateItemAsync: Session Consistency Account with Direct Mode:
CreateItemAsync: Session Consistency Account with Gateway Mode:
[Note: The Orange graph indicates the number of requests processed in the North CentralUS region, where as the Blue graph indicates the number of requests processed in the Central US region.]
Type of change
Please delete options that are not relevant.
Closing issues
To automatically close an issue: closes #5304