Skip to content

CrossRegionalHedging: Adds Metadata Hedging Support#5923

Draft
kundadebdatta wants to merge 32 commits into
mainfrom
users/kundadebdatta/5917_implement_metadata_hedging
Draft

CrossRegionalHedging: Adds Metadata Hedging Support#5923
kundadebdatta wants to merge 32 commits into
mainfrom
users/kundadebdatta/5917_implement_metadata_hedging

Conversation

@kundadebdatta
Copy link
Copy Markdown
Member

Pull Request Template

Description

Please include a summary of the change and which issue is fixed. Include samples if adding new API, and include relevant motivation and context. List any dependencies that are required for this change.

Type of change

Please delete options that are not relevant.

  • [] Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [] This change requires a documentation update

Closing issues

To automatically close an issue: closes #IssueNumber

Changelog

  • I have added a changelog entry under ### Unreleased in changelog.md
    for the user-facing impact of this change.
  • No changelog entry is required because this PR is one of:
    documentation-only, test-only, CI / build-only, or a pure internal refactor
    with no observable customer impact.

If the second box is checked, briefly justify here:

NaluTripician and others added 30 commits May 6, 2026 12:54
Introduces an internal MetadataDetachedExecutor that runs metadata-cache reads on a
detached, internally-bounded CancellationToken and observes the caller's CancellationToken
only on the response path. The retry-policy decision is therefore never preempted by
caller-cancel, fixing the cross-region-failover preemption bug from issue #5805.

ConfigurationManager exposes a configurable hard deadline
(AZURE_COSMOS_METADATA_DETACHED_HARD_DEADLINE_SECONDS, default 5 min) so the detached
attempt cannot leak background work indefinitely. A defensive 50-attempt cap guards
against a misbehaving retry policy returning ShouldRetry=true with zero backoff.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tByRid/GetByName

Repoints both metadata-cache-feeder factories from TaskHelper.InlineIfPossible (which
delegates to BackoffRetryUtility, the source of the caller-cancel preemption) to
MetadataDetachedExecutor. TaskHelper.RunInlineIfNeededAsync still wraps for NETFX
SynchronizationContext safety. Caller CancellationToken is preserved at the entry-side
ThrowIfCancellationRequested() gate and observed by the executor only on the response path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pins behavior of detached-cancellation execution model: success path, transient retry, primary-fix scenario where cross-region retry executes on detached token after caller-cancel mid-flight, caller OCE surfacing while detached task continues, already-cancelled caller token, CancellationToken.None fast path, policy NoRetry/ExceptionToThrow/throws, internal-deadline bound, hard attempt cap, first-attempt OCE consults policy, null-arg validation, non-positive deadline, backoff honored, SyncContext smoke test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…p per fresh-eyes review

Fresh-eyes review (.coding-harness/review-feedback-1.json) flagged:

R1.1 (major): GetMetadataDetachedHardDeadline returned an unbounded TimeSpan.

An envvar value larger than ~uint.MaxValue-1 ms (~49.7 days) would make new

CancellationTokenSource(TimeSpan) throw ArgumentOutOfRangeException, breaking

every metadata read. Added MaxMetadataDetachedHardDeadlineInSeconds=86400

(24h) clamp; new test verifies a 60-day envvar value is clamped and the

resulting TimeSpan constructs a CancellationTokenSource without throwing.

R1.3 (minor): the attempt-cap throw discarded last-failure context. Hoisted

lastCapturedException above the loop; cap path now passes its SourceException

as InnerException and traces type+message so the cap and the underlying

failure are both diagnosable.

R1.5 (nit): doc comment referenced a literal '5 minutes' default; now points

at ConfigurationManager.DefaultMetadataDetachedHardDeadlineInSeconds so the

doc cannot drift from the constant.

R1.2 (nit): renamed ExecuteAsync_CancellationTokenNone_FastPath_NoCallerOcePropagation

to ExecuteAsync_CancellationTokenNone_SucceedsAndOperationReceivesNonCanceledToken

so the test name matches what it actually proves; the fast-path micro-

optimization is not directly observable from outside the executor.

Tests: 19/19 MetadataDetachedExecutor pass (was 17; +2 clamp tests);

60/60 in CollectionCache/ClientRetry/ConfigurationManager regression slice.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses iteration-2 deep review findings on PR #5844:

- R2.4 (Correctness): Surface underlying exception when internal deadline trips during the operation lambda, not just during Task.Delay backoff. Adds a top-of-loop OCE-due-to-detached-token guard that surfaces the prior captured exception, preserving the design contract that callers see the failure mode that drove the retry (not a hard-deadline artifact).

- R2.5 (Documentation): Reword the AsyncCache caveat doc-comment to accurately describe in-flight reuse semantics. Concurrent callers do not share the eventual successful result after the first caller cancels; AsyncCache discards the OCE-faulted lazy and the second caller starts a fresh detached attempt. The real benefit is side-effect accrual (LocationCache region marking, session clearing), not result reuse.

- R2.9 (Style): Change ConfigurationManager.GetMetadataDetachedHardDeadline accessibility from public to internal for consistency with the internal-static class containing it.

- R2.10 (Concurrency): Add Task.Yield() when the retry policy returns BackoffTime <= TimeSpan.Zero, bounding CPU and giving the threadpool a chance to schedule other work. Limits amplification of a misbehaving policy that returns ShouldRetry=true with zero backoff.

- R2.11 (Documentation): Add comment explaining ContinueWith inline-completion ordering for disposeWhenDone, warning future maintainers not to read detachedCts after the registration.

- R2.13 (Testing): Add [DoNotParallelize] to MetadataDetachedExecutorTests so the env-var clamp tests are isolated from MSTest class-level parallelism.

- R2.1 (Diagnostics): Document the post-cancel trace/stats mutation as a known limitation in the executor's doc-comment. The full fix (isolate detached task into a child trace tree, merge only on success) is a follow-up tracked separately to keep this fix scoped.

Adds regression test ExecuteAsync_DeadlineTripsDuringOperation_SurfacesUnderlyingException pinning the R2.4 contract: when the deadline trips during operation execution with prior failures, the underlying DocumentClientException surfaces, not the deadline OCE.

Test results: 20/20 MetadataDetachedExecutor tests pass; 60/60 regression slice (CollectionCache | ClientRetry | ConfigurationManager) pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…er third deep-review pass

Addresses iteration-3 deep review findings on PR #5844 (merge_recommendation: ready, 0 blocking):

- R3.1 (Recommendation): Add comment in ExecuteRetryLoopAsync explaining the asymmetry of the OCE-during-operation guard's third filter clause. When previousException is itself OCE, the filter intentionally falls through to the general catch path because swapping one OCE for another offers no diagnostic gain and the general path correctly funnels through policy/hard-cap/backoff-catch termination.

- R3.2 (Recommendation): Add 'Retry-policy invariant' paragraph to executor's <summary> documenting that the supplied IDocumentClientRetryPolicy MUST be a per-call instance because ShouldRetryAsync is intentionally NOT invoked on either OCE termination path. A future refactor that caches policies must preserve this invariant or move OCE termination paths through ShouldRetryAsync.

- R3.3 (Suggestion): Bump ExecuteAsync_DeadlineTripsDuringOperation_SurfacesUnderlyingException internal-deadline from 200ms to 2s to remove CI flakiness risk on saturated runners. The test still verifies the same R2.4 contract; only the wall-clock generosity changes.

Skipped (non-blocking, deferred or rejected):

- R3.4 (Suggestion): Split tests into two classes for parallelism — over-engineering for zero observed flakes; class-level [DoNotParallelize] is acceptable for a 20-test class that completes in <2s.

- R3.5 (Observation): AsyncCache fall-through coalescer — tracked as follow-up work item.

- R3.6 (Observation): Split <summary> doc-comment into <remarks> — cosmetic; current structure renders correctly in IDE tooltips.

Test results: 20/20 MetadataDetachedExecutor tests pass (~2s); build clean (0 errors).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ation

- Replace object.ReferenceEquals(Exception, Exception) with reference
  equality '==' on typed locals to avoid the CDX1000 analyzer error
  (boxing Exception to object on the metadata hot path).
- Re-derive MaxAttemptsHardCap against the actual SDK retry policies:
  the dominant per-call retry ceiling is ClientRetryPolicy.MaxRetryCount
  = 120 (cross-region failover counter), not the previously-claimed
  '5 preferred regions x 10 in-region retries = 50'. Bump the cap to 200
  (120 + ~80 headroom for stacked throttling/session/serviceUnavailable
  retries) and rewrite the doc comment to cite the real source constants.
- Update the matching DefaultMetadataDetachedHardDeadlineInSeconds doc
  comment to derive 300 s from the per-region 1+5+65 s timeout ladder
  and a typical ~3-5 region failover sweep, rather than the wrong
  '5 x 36 s' rationale.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nd policy distinction

Tighten DefaultMetadataDetachedHardDeadlineInSeconds doc comment to:
- explicitly cite the wrapped call site (ClientCollectionCache.ReadCollectionAsync)
  and the GetTimeoutPolicy branch that routes it to the HotPath policy.
- name the slower HttpTimeoutPolicyControlPlaneRead ladder (5+10+20 = 35 s/region)
  used by GatewayAccountReader, with a note that the executor does not wrap that
  path today. Keeps the comment self-correcting if the executor's surface ever
  expands to account reads.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hed CT

Adds two mock-based regression tests asserting that ClientCollectionCache.GetByRidAsync and GetByNameAsync route through MetadataDetachedExecutor and pass the executor-owned detached CancellationToken (NOT the caller's token) into the inner ReadCollectionAsync lambda.

Addresses SDK review agent feedback on PR #5844: a regression that reverts either lambda to the caller's CancellationToken would silently reintroduce the cross-region failover preemption bug (issue #5805); the existing MetadataDetachedExecutorTests would still pass because they exercise the executor directly with synthetic operations.

Mechanism: hold the first storeModel.ProcessMessageAsync call on a gate, cancel the caller mid-flight, release the gate so the in-flight attempt fails transiently, and assert the retry policy drives a second ProcessMessageAsync invocation. Verified by temporarily reverting the lambda to caller-token passthrough -- both tests fail with timeout-on-second-invocation. Restored to detached wiring -- both pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous comment incorrectly attributed CDX1000 to boxing. Exception
is a reference type so no boxing occurs in either form. The
DontConvertExceptionToObject analyzer flags type-information loss when
typed Exception references are converted to object — that is the actual
concern the comment now describes.

Addresses @xinlian12 review comment 3204019954 on PR #5844.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a new Unreleased Preview section (per the pattern established in PR #5815) with a Fixed entry for PR #5844. Customer-facing description focuses on the symptom (premature OperationCanceledException preempting cross-region failover during metadata-cache reads) rather than the implementation detail (the new MetadataDetachedExecutor).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Executor for Java parity

Generalizes MetadataDetachedExecutor with a no-retry-loop ExecuteDetachedAsync
overload and wires QueryPlanRetriever.GetQueryPlanThroughGatewayAsync through
it. The internal RequestInvokerHandler pipeline keeps its own retry semantics;
the wrap only ensures those decisions cannot be preempted by caller
CancellationToken, mirroring Java's error-signal-only retryWhen contract.

- MetadataDetachedExecutor.cs: add ExecuteDetachedAsync; refactor ExecuteAsync
  to compose on top of it; expanded XML doc covering both overloads and the
  Java alignment matrix (PKRange + GatewayAccountReader already aligned).
- QueryPlanRetriever.GetQueryPlanThroughGatewayAsync: route the gateway call
  through ExecuteDetachedAsync so caller CT is observed only on the response
  path. GetQueryPlanWithServiceInteropAsync is unchanged (local CPU work, not
  a metadata retry path).
- MetadataDetachedExecutorTests: 9 new tests covering ExecuteDetachedAsync
  invariants (success, no outer retry, mid-flight cancel detaches, internal
  deadline, sync factory throw, null operation/task validation, fast path).
- changelog.md: expand Unreleased Preview entry to mention query-plan path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ian/metadata-detached

# Conflicts:
#	.gitignore
#	changelog.md
Resolves changelog.md conflict: keeps PR 5844 entry in Unreleased; drops 5870 entry now released in 3.60.0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses the findings produced by the PR Deep Reviewer on PR #5920
(excluding Finding #1, which assumed PR #5844 `MetadataDetachedExecutor`
would merge into main; per author guidance, this design proceeds without it).

Structural changes
- §5.3 `ExecuteAsync` rewritten: per-branch CancellationTokenSources
  (no shared linkedCts) so the loser`s OperationCanceledException is
  contained inside `BackgroundCleanupAsync` and cannot reach
  `MetadataRequestThrottleRetryPolicy` (Finding #2 — protects healthy
  secondary from spurious `MarkEndpointUnavailableForRead` post-PR #5780).
- §5.3: primary fault before threshold no longer bypasses hedge — adds
  `primaryTask.Status == RanToCompletion` guard so fast-fail-on-degraded
  primary triggers the hedge (Finding #3).
- §5.3: wait-for-winner is now a loop that filters transient/faulted
  completions; a fast 503 from the hedge can no longer beat a healthy
  200 from the primary (Finding #4).
- §5.3 + §5.4 + §5.5: `as` cast (not hard cast) for
  `MetadataRequestThrottleRetryPolicy`; wrapped/test-double policies
  no longer throw `InvalidCastException` (Finding #7).
- §5.3: added `BackgroundCleanupAsync` that awaits the loser, disposes
  its `DocumentServiceResponse` body (handle-leak fix), records outcome
  via volatile field, and disposes the loser CTS (Finding #11).

Correctness/factual fixes
- §5.10 + §5.6: corrected — `ClientCollectionCache` uses `AsyncCache`
  (not `AsyncCacheNonBlocking`); base-class abstract signature change
  must be defaulted for subclass compat; forbid inferring cold-start from
  `previousValue == null` inside the factory (Finding #5).
- §5.2 + §6.1: added `HasHedgedThisOperation` flag (set via
  `Interlocked.Exchange`); fixes the broken §6.1 claim that retries
  wouldn`t re-hedge because the cache had a `previousValue` (false —
  cache is only populated when the loop exits) (Finding #8).
- §5.2: `ConcurrentDictionary<Uri, byte>` replaces `HashSet<Uri>`;
  volatile `LoserOutcome` field for cross-thread updates (Finding #6).
- §5.9: added `HttpTimeoutPolicy.FirstAttemptTimeout` accessor design
  — `TimeoutsAndDelays` is private today (Finding #9).
- §5.7.4: sketched the per-index resolve loop for
  `IncrementRetryIndexOnUnavailableEndpointForMetadataRead` — today
  it`s a 1-line counter that never resolves an endpoint (Finding #10).

New sections
- §5.7 (4 subsections): coordination with PR #5780, structural invariant
  that hedge-loser OCE never reaches retry policy, shared
  `RetryUtility.IsRegionalFailure` helper, attempted-endpoints skip loop.
- §5.12: net472 stack-unwind discipline (`SendOneAsync` middle-layer
  seam + `ExceptionDispatchInfo`) — adopts the PR #5870 lesson
  (Finding #12).
- §5.13: per-auth-mode handling in `CloneForHedge`; hedge-401/403
  guard for RBAC-role-assignment-missing-in-secondary case (Finding #13).
- §7.1: wiring step for `isHedgingDisabledByGateway` from
  `DocumentClient` into the cache constructors via `Func<bool>`
  (Finding #15 bundle).
- §9.1: `EventSource`/`Meter` counters for fire-rate, win-rate,
  budget-exhaustion, late-loser, hedge-fired-elapsed-ms (Finding #15 bundle).

API/rollout
- §5.1: `EnableMetadataHedgingForColdStart` becomes tri-state `bool?`;
  `MetadataHedgingOptions` promoted to public so customers can tune
  `PerClientConcurrencyBudget` for high-container-cardinality startups
  (Finding #14).
- §12: Phase 3 no longer removes the opt-in (binary break avoided);
  only the phase default changes (Finding #14).

Smaller items (Finding #15 bundle)
- §5.3: drop `closest secondary` framing (SDK has no proximity measure);
  use `Wait(TimeSpan.Zero)` instead of `WaitAsync(TimeSpan.Zero)`
  (no Task allocation); add `EvaluateEligibility`-vs-budget-check note.
- §5.4: defaulted `isColdStart = false` on the abstract method to avoid
  breaking subclass overrides (e.g., encryption-mirrored caches).
- §6: added eligibility rules 8 (`ExcludeRegions` hard filter), 9
  (`HasHedgedThisOperation`), 10 (single-master account guard).
- §10: reconciled `Both branches fault` with §5.3 (consistent
  `ExceptionDispatchInfo` semantics).
- §11: tests added for loser-cancellation-doesn`t-poison-secondary,
  loser-disposal, no-re-hedge-across-retries, cross-policy-type,
  net472 SO regression; mirrors PR #5787 `senderCallCount` assertions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uggestions) to metadata-hedging design

Critical:
- B1 §5.13: CloneForHedge reuses primary Authorization + x-ms-date verbatim;
  master-key auth-table row updated.
- B2 §5.13 + §5.3 + §5.2: per-branch IsAcceptableWinner(resp, branch) helper +
  HedgeBranch enum; 401/plain-403 from the hedge branch are rejected so a fast
  hedge-401 cannot beat a slow primary-200. HedgeOutcome diagnostic field added
  with Volatile read/write.
- B3 §5.4: introduce protected virtual GetByNameAsync(..., isColdStart, ct)
  overload on CollectionCache; existing protected abstract is unchanged so
  encryption-mirrored subclasses stay source-compatible (hedge-disabled).
- B4 §1/§5.7/§6.1/§10: enumerate InternalServerError (500) everywhere
  alongside 503; new §10 edge case for primary 500/503 before threshold
  documenting the §5.7.1 asymmetry.

Recommended:
- R1 §5.7.1.1: paragraph on the lost primary regional-failure signal when the
  hedge wins (the next request closes the LocationCache gap; telemetry exposes
  the window).
- R2 §5.7.1.2: race-window walkthrough (t=1.4s primary fails / t=1.5s timer
  fires); HasHedgedThisOperation Interlocked.CompareExchange is the
  serialization point.
- R3 §5.3 + §5.8: CTS allocation moved inside outer try; loserCts ownership
  transferred to BackgroundCleanupAsync via local-ref null-out; outer finally
  disposes all three CTSs with ?.Dispose() null-guards.
- R4 §5.5: thread the caller CancellationToken through
  ExecutePartitionKeyRangeReadChangeFeedAsync into
  MetadataHedgingStrategy.ExecuteAsync (was CancellationToken.None).
- R5 §11: add 4 regression tests (hedge-401/403 per-branch overlay,
  mid-flight kill-switch flip, primary-500-hedge-wins-then-next-request-marks-
  unavailable, session-token RID-change cleanup).
- R6 §9.1: full telemetry rewrite. Meter renamed to
  Azure.Cosmos.Client.MetadataHedging; instruments use the
  azure.cosmosdb.client.metadata_hedging.* prefix; hedge_wins counter replaces
  the (0..1) histogram; Metrics (§9.1.1) and EventSource (§9.1.2) cleanly
  separated.

Suggestions:
- S1 §5.7.4: pseudocode uses this.request; writes
  this.retryContext.RetryLocationIndex per return-true path; side-effect
  invariant documented.
- S2 §5.2/§5.3/§5.7.4: AttemptedEndpoints switched to
  ConcurrentDictionary<string, byte> keyed on Uri.AbsoluteUri.
- S3 §5.2: WinningEndpoint/WinningRegion converted to backing fields +
  RecordWinner using Interlocked.CompareExchange (first-publication wins).
- S4 §5.6: full walk-through table for isColdStart = !forceRefresh
  (cache-hit / cache-miss / force-refresh).

Stats: 1133 -> 1379 lines (+335 / -89). All 28 section headers intact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Policy.FirstAttemptTimeout helpers

Stage 0 of staged implementation plan (see copilot_plan.md). Introduces shared seams used by subsequent stages with no production behavior change:

- RetryUtility.IsRegionalFailure: single source of truth for the 503/500/410+LeaseNotFound/403+DatabaseAccountNotFound/HttpRequestException/non-user-OCE failure-class set shared by MetadataRequestThrottleRetryPolicy and the upcoming cold-start metadata hedging strategy (design doc PPAF_Metadata_Hedging_ColdStart_Design.md section 5.7.2).
- HttpTimeoutPolicy.FirstAttemptTimeout: virtual accessor (default-implemented via GetTimeoutEnumerator) used by the hedge strategy to derive its default threshold and to enforce the invariant hedgeThreshold > firstAttemptTimeout in unit tests (section 5.9 / section 8).

Not yet consumed by any caller; subsequent stages wire these in.
…ll unit coverage

Stage 1 of the staged metadata-hedging implementation plan (copilot_plan.md). Introduces the cross-region hedging strategy used during cold-start population of the Collection and PartitionKeyRange caches. No caller invokes the strategy yet; Stage 3 wires ClientCollectionCache and Stage 4 wires PartitionKeyRangeCache.

Source: MetadataHedgingStrategy (eligibility + ExecuteAsync with loser-cancellation invariant, per-branch 401/403 overlay, background-cleanup ownership transfer, net472 Task.Yield stack discipline), MetadataHedgingContext, MetadataHedgeDiagnostics, MetadataHedgingResult, MetadataHedgeEligibility, MetadataHedgeSkipReason, HedgeBranch, MetadataHedgingOptions. IGlobalEndpointManager exposes GetApplicableEndpoints. CosmosClientOptions adds two internal opt-in properties (EnableMetadataHedgingForColdStart, MetadataHedgingOptions) — kept internal in Stage 1; promoted to public in Stage 6 with API contracts update.

Tests: 22 new MSTest cases covering the full eligibility matrix, loser-cancellation invariant, hedge 401/403 per-branch rejection, budget-exhausted fallback, late-loser disposal via TrackingStream, and a 50-concurrent-hedge regression for the net472 stack-unwind discipline. All 22 pass; Stage 0 RetryUtilityTests (12) still pass.

No customer-visible API change yet (opt-in surface internal). No changelog entry required for this stage.
… hedge context

Stage 2 of the staged metadata-hedging implementation plan (copilot_plan.md). Coordinates the metadata retry policy with a hedge's AttemptedEndpoints set so retries after a hedged operation do not re-target a region the hedge just used.

Changes: (a) replaces the policy's inline 4-case regional-failure switch with RetryUtility.IsRegionalFailure (Stage 0 helper, design 5.7.2) to keep the single-source-of-truth classification; (b) adds AttachHedgeContext(MetadataHedgingContext) — safe no-op when null; (c) rewrites IncrementRetryIndexOnUnavailableEndpointForMetadataRead as a bounded probe loop (design 5.7.4) that mutates retryContext.RetryLocationIndex in place on every return-true path, advances past any preferred-location index whose resolved endpoint is already in hedgeContext.AttemptedEndpoints, and terminates when all preferred regions are exhausted. With no hedge attached, the loop collapses to the legacy monotonic-counter behavior and skips the ResolveServiceEndpoint probe entirely.

Tests: 5 new MSTest cases (hedge attempts A+B then retry lands on C; all-preferred-regions-attempted terminates; no-hedge legacy behavior; full status-code matrix vs RetryUtility.IsRegionalFailure; AttachHedgeContext(null) is safe). Existing policy tests (2) still pass. Stage 0 + Stage 1 tests (34) still pass.

No customer-visible API change yet (no caller of AttachHedgeContext lands until Stage 3 wires ClientCollectionCache). No changelog entry required for this stage.
…kundadebdatta/5917_implement_metadata_hedging"

This reverts commit 771c783, reversing
changes made to 9839cd8.
Removes the internal MetadataHedgingOptions from the client options surface, leaving only the EnableMetadataHedgingForColdStart enable/disable hook. Makes enablement follow PPAF: when EnableMetadataHedgingForColdStart is null it follows ConnectionPolicy.EnablePartitionLevelFailover; true forces on even when PPAF is off; false is a kill-switch. Adds a two-arg ResolveOptIn(customerOptIn, isPpafEnabled) resolver and updates the eligibility gate. Updates and extends unit tests, including a warm-cache-hit test confirming a warmed-up client never hedges or re-sends. All members are internal (no customer-observable change).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants