CrossRegionalHedging: Adds Initial Design for hedging Metadata Requests#5920
Draft
kundadebdatta wants to merge 3 commits into
Draft
CrossRegionalHedging: Adds Initial Design for hedging Metadata Requests#5920kundadebdatta wants to merge 3 commits into
kundadebdatta wants to merge 3 commits into
Conversation
Addresses the findings produced by the PR Deep Reviewer on PR #5920 (excluding Finding #1, which assumed PR #5844 `MetadataDetachedExecutor` would merge into main; per author guidance, this design proceeds without it). Structural changes - §5.3 `ExecuteAsync` rewritten: per-branch CancellationTokenSources (no shared linkedCts) so the loser`s OperationCanceledException is contained inside `BackgroundCleanupAsync` and cannot reach `MetadataRequestThrottleRetryPolicy` (Finding #2 — protects healthy secondary from spurious `MarkEndpointUnavailableForRead` post-PR #5780). - §5.3: primary fault before threshold no longer bypasses hedge — adds `primaryTask.Status == RanToCompletion` guard so fast-fail-on-degraded primary triggers the hedge (Finding #3). - §5.3: wait-for-winner is now a loop that filters transient/faulted completions; a fast 503 from the hedge can no longer beat a healthy 200 from the primary (Finding #4). - §5.3 + §5.4 + §5.5: `as` cast (not hard cast) for `MetadataRequestThrottleRetryPolicy`; wrapped/test-double policies no longer throw `InvalidCastException` (Finding #7). - §5.3: added `BackgroundCleanupAsync` that awaits the loser, disposes its `DocumentServiceResponse` body (handle-leak fix), records outcome via volatile field, and disposes the loser CTS (Finding #11). Correctness/factual fixes - §5.10 + §5.6: corrected — `ClientCollectionCache` uses `AsyncCache` (not `AsyncCacheNonBlocking`); base-class abstract signature change must be defaulted for subclass compat; forbid inferring cold-start from `previousValue == null` inside the factory (Finding #5). - §5.2 + §6.1: added `HasHedgedThisOperation` flag (set via `Interlocked.Exchange`); fixes the broken §6.1 claim that retries wouldn`t re-hedge because the cache had a `previousValue` (false — cache is only populated when the loop exits) (Finding #8). - §5.2: `ConcurrentDictionary<Uri, byte>` replaces `HashSet<Uri>`; volatile `LoserOutcome` field for cross-thread updates (Finding #6). - §5.9: added `HttpTimeoutPolicy.FirstAttemptTimeout` accessor design — `TimeoutsAndDelays` is private today (Finding #9). - §5.7.4: sketched the per-index resolve loop for `IncrementRetryIndexOnUnavailableEndpointForMetadataRead` — today it`s a 1-line counter that never resolves an endpoint (Finding #10). New sections - §5.7 (4 subsections): coordination with PR #5780, structural invariant that hedge-loser OCE never reaches retry policy, shared `RetryUtility.IsRegionalFailure` helper, attempted-endpoints skip loop. - §5.12: net472 stack-unwind discipline (`SendOneAsync` middle-layer seam + `ExceptionDispatchInfo`) — adopts the PR #5870 lesson (Finding #12). - §5.13: per-auth-mode handling in `CloneForHedge`; hedge-401/403 guard for RBAC-role-assignment-missing-in-secondary case (Finding #13). - §7.1: wiring step for `isHedgingDisabledByGateway` from `DocumentClient` into the cache constructors via `Func<bool>` (Finding #15 bundle). - §9.1: `EventSource`/`Meter` counters for fire-rate, win-rate, budget-exhaustion, late-loser, hedge-fired-elapsed-ms (Finding #15 bundle). API/rollout - §5.1: `EnableMetadataHedgingForColdStart` becomes tri-state `bool?`; `MetadataHedgingOptions` promoted to public so customers can tune `PerClientConcurrencyBudget` for high-container-cardinality startups (Finding #14). - §12: Phase 3 no longer removes the opt-in (binary break avoided); only the phase default changes (Finding #14). Smaller items (Finding #15 bundle) - §5.3: drop `closest secondary` framing (SDK has no proximity measure); use `Wait(TimeSpan.Zero)` instead of `WaitAsync(TimeSpan.Zero)` (no Task allocation); add `EvaluateEligibility`-vs-budget-check note. - §5.4: defaulted `isColdStart = false` on the abstract method to avoid breaking subclass overrides (e.g., encryption-mirrored caches). - §6: added eligibility rules 8 (`ExcludeRegions` hard filter), 9 (`HasHedgedThisOperation`), 10 (single-master account guard). - §10: reconciled `Both branches fault` with §5.3 (consistent `ExceptionDispatchInfo` semantics). - §11: tests added for loser-cancellation-doesn`t-poison-secondary, loser-disposal, no-re-hedge-across-retries, cross-policy-type, net472 SO regression; mirrors PR #5787 `senderCallCount` assertions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uggestions) to metadata-hedging design Critical: - B1 §5.13: CloneForHedge reuses primary Authorization + x-ms-date verbatim; master-key auth-table row updated. - B2 §5.13 + §5.3 + §5.2: per-branch IsAcceptableWinner(resp, branch) helper + HedgeBranch enum; 401/plain-403 from the hedge branch are rejected so a fast hedge-401 cannot beat a slow primary-200. HedgeOutcome diagnostic field added with Volatile read/write. - B3 §5.4: introduce protected virtual GetByNameAsync(..., isColdStart, ct) overload on CollectionCache; existing protected abstract is unchanged so encryption-mirrored subclasses stay source-compatible (hedge-disabled). - B4 §1/§5.7/§6.1/§10: enumerate InternalServerError (500) everywhere alongside 503; new §10 edge case for primary 500/503 before threshold documenting the §5.7.1 asymmetry. Recommended: - R1 §5.7.1.1: paragraph on the lost primary regional-failure signal when the hedge wins (the next request closes the LocationCache gap; telemetry exposes the window). - R2 §5.7.1.2: race-window walkthrough (t=1.4s primary fails / t=1.5s timer fires); HasHedgedThisOperation Interlocked.CompareExchange is the serialization point. - R3 §5.3 + §5.8: CTS allocation moved inside outer try; loserCts ownership transferred to BackgroundCleanupAsync via local-ref null-out; outer finally disposes all three CTSs with ?.Dispose() null-guards. - R4 §5.5: thread the caller CancellationToken through ExecutePartitionKeyRangeReadChangeFeedAsync into MetadataHedgingStrategy.ExecuteAsync (was CancellationToken.None). - R5 §11: add 4 regression tests (hedge-401/403 per-branch overlay, mid-flight kill-switch flip, primary-500-hedge-wins-then-next-request-marks- unavailable, session-token RID-change cleanup). - R6 §9.1: full telemetry rewrite. Meter renamed to Azure.Cosmos.Client.MetadataHedging; instruments use the azure.cosmosdb.client.metadata_hedging.* prefix; hedge_wins counter replaces the (0..1) histogram; Metrics (§9.1.1) and EventSource (§9.1.2) cleanly separated. Suggestions: - S1 §5.7.4: pseudocode uses this.request; writes this.retryContext.RetryLocationIndex per return-true path; side-effect invariant documented. - S2 §5.2/§5.3/§5.7.4: AttemptedEndpoints switched to ConcurrentDictionary<string, byte> keyed on Uri.AbsoluteUri. - S3 §5.2: WinningEndpoint/WinningRegion converted to backing fields + RecordWinner using Interlocked.CompareExchange (first-publication wins). - S4 §5.6: full walk-through table for isColdStart = !forceRefresh (cache-hit / cache-miss / force-refresh). Stats: 1133 -> 1379 lines (+335 / -89). All 28 section headers intact. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Template
Description
Type of change
Please delete options that are not relevant.
Closing issues
To automatically close an issue: closes #5917
Changelog
### Unreleasedinchangelog.mdfor the user-facing impact of this change.
documentation-only, test-only, CI / build-only, or a pure internal refactor
with no observable customer impact.
If the second box is checked, briefly justify here: