diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index f15c482051..3116d67225 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -62,4 +62,13 @@ Purpose: quick, actionable context so an AI coding assistant can be immediately - In VS Code Copilot Chat: `@MsdataDirectSyncAgent sync msdata/direct`. - In the Copilot CLI: describe the task naturally (e.g., "sync the msdata/direct branch with master"). +- **OpenSpec — Spec-Driven Development**: + - The SDK uses [OpenSpec](https://github.com/openspec-dev/openspec) for spec-driven development. Specs live in `openspec/specs/` and capture behavioral contracts for major feature areas. + - **Read `openspec/README.md`** for the full developer guide, workflow instructions, and best practices. + - Active changes (in-progress work) live in `openspec/changes/`. Archived changes in `openspec/changes/archive/`. + - Configuration and project context: `openspec/config.yaml`. + - Slash commands: `/opsx:propose` (create change), `/opsx:apply` (implement), `/opsx:explore` (investigate), `/opsx:archive` (complete). + - When making changes, check if an existing spec in `openspec/specs/` covers the affected behavior — if so, update the spec alongside the code change. + - When proposing a new feature or significant behavioral change, use `/opsx:propose` to create structured artifacts (proposal, design, tasks) before implementing. + If anything here is unclear or you want the file to include additional examples (specific files, common refactor targets, or typical PR reviewers), tell me what to add and I will iterate. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e3cb1edc66..2a9b006d83 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -93,6 +93,7 @@ When evaluating adding new tests, please search in the existing test files if th 1. Create a branch for your contribution (if you are an external contributor, on your own fork). 1. Make sure your work is adding [tests](#tests) as required (either unit and/or emulator tests depending on the scope of the work). +1. If your change affects behavior covered by an [OpenSpec spec](#spec-driven-development), update the relevant spec alongside your code changes. 1. Send a Pull Request to the master branch once your work is ready to be reviewed. 1. The CI pipeline will start any required tests. If you are an external contributor, a team member will start the verification once we confirm the nature of the contribution through a `/azp run` comment in your Pull Request. 1. Look for review comments and attempt to answer/address them to the best of your ability. @@ -136,3 +137,11 @@ Or all through `Re-run failed checks` on the top right corner: - [General .NET SDK Troubleshooting](https://docs.microsoft.com/azure/cosmos-db/sql/troubleshoot-dot-net-sdk) - [Timeout troubleshooting](https://docs.microsoft.com/azure/cosmos-db/sql/troubleshoot-dot-net-sdk-request-timeout?tabs=cpu-new) - [Service unavailable troubleshooting](https://docs.microsoft.com/azure/cosmos-db/sql/troubleshoot-service-unavailable) + +## Spec-Driven Development + +This repository uses [OpenSpec](https://github.com/openspec-dev/openspec) for spec-driven development. Behavioral specifications for major SDK feature areas live in `openspec/specs/`. + +When contributing changes that affect documented behavior, check if an existing spec covers the area and update it as part of your PR. For new features or significant changes, consider using the OpenSpec workflow to propose, design, and implement changes with AI assistance. + +See [`openspec/README.md`](openspec/README.md) for the full developer guide, workflow instructions, and best practices. diff --git a/openspec/README.md b/openspec/README.md new file mode 100644 index 0000000000..ec528c1c3f --- /dev/null +++ b/openspec/README.md @@ -0,0 +1,99 @@ +# OpenSpec — Spec-Driven Development for the Azure Cosmos DB .NET SDK + +This directory contains [OpenSpec](https://github.com/openspec-dev/openspec) artifacts for the Azure Cosmos DB .NET v3 SDK. OpenSpec provides a structured, AI-assisted workflow for proposing, specifying, designing, and implementing changes. + +## Why OpenSpec? + +The Cosmos DB .NET SDK is a large, complex codebase (~1,400+ source files) with many interdependent subsystems — retry policies, handler pipelines, cross-region routing, change feed processing, query execution, and more. OpenSpec helps by: + +1. **Capturing behavioral contracts** — Specs define *what* a feature should do (invariants, edge cases, error handling), not *how* it's implemented. This makes them durable even as implementation evolves. +2. **Guiding AI-assisted development** — AI agents use specs as context when proposing and implementing changes, leading to more accurate code generation. +3. **Reducing tribal knowledge** — Complex features like PPAF, cross-region hedging, and the handler pipeline have subtle invariants that are easy to break. Specs make these invariants explicit and reviewable. + +## Directory Structure + +``` +openspec/ +├── config.yaml # Project context and artifact rules +├── README.md # This file +├── specs/ # Main spec catalog (living documentation) +│ ├── README.md # Spec index organized by area +│ ├── retry-and-failover/ +│ │ └── spec.md +│ └── ... +├── changes/ # Active changes (in-progress work) +│ └── archive/ # Completed changes +``` + +| Concept | Location | Purpose | +|---------|----------|---------| +| **Specs** | `openspec/specs//spec.md` | Living behavioral contracts for major feature areas. | +| **Changes** | `openspec/changes//` | In-progress work with proposal, design, and task artifacts. | +| **Archive** | `openspec/changes/archive/` | Completed changes with full context preserved. | +| **Config** | `openspec/config.yaml` | Project context and per-artifact rules that guide AI. | + +## Workflow + +``` +Propose ──▶ Specs ──▶ Design ──▶ Tasks ──▶ Apply ──▶ Archive +``` + +| Command | Purpose | +|---------|---------| +| `/opsx:propose ` | Create a new change with proposal, design, and task artifacts | +| `/opsx:apply [name]` | Implement tasks from a change | +| `/opsx:explore [topic]` | Investigate ideas or problems without making code changes | +| `/opsx:archive [name]` | Archive a completed change | + +## Writing Good Specs + +Specs capture **behavioral contracts** using [EARS notation](https://en.wikipedia.org/wiki/Easy_Approach_to_Requirements_Syntax) (WHEN/THEN/SHALL). They should answer: "What do I need to know to safely modify this feature?" + +### What a spec should include + +1. **Purpose** — One-paragraph summary of what the feature does +2. **Public API surface** — Key types, methods, and their contracts (C# code blocks) +3. **Requirements** — Behavioral requirements using EARS notation (`WHEN , THEN the SDK SHALL `) +4. **Reference tables** — Status code tables, configuration defaults, parameter matrices for dense reference data +5. **Edge cases** — Non-obvious behaviors, race conditions, failure modes +6. **Interactions** — How this feature relates to other SDK components (cross-spec links) +7. **References** — Links to source files and existing design docs + +### What a spec should NOT include + +- Implementation details (specific variable names, internal algorithms) +- Performance benchmarks (these change; use test projects instead) +- Step-by-step code walkthroughs (that's what `docs/SdkDesign.md` is for) + +## When to Create or Update Specs + +**Create a new spec when:** +- Adding a new major feature to the SDK +- An area has complex invariants that are easy to break +- The same behavioral rules are explained in multiple PR reviews + +**Update an existing spec when:** +- Your PR changes behavior covered by a spec +- A bug fix reveals an invariant that wasn't captured +- A design doc in `docs/` gets updated + +**Don't need a spec for:** +- Pure refactoring with no behavioral change +- Test-only changes, documentation updates, dependency bumps + +## Best Practices + +| ✅ Do | ❌ Don't | +|-------|---------| +| Be specific about invariants (status codes, timeouts) | Copy implementation details into specs | +| Use EARS notation for requirements | Create a spec per class (group by feature area) | +| Include cross-spec "Interactions" sections | Let specs go stale | +| Reference source files by path | Duplicate content from `docs/` | +| Update specs as part of behavioral change PRs | Skip `/opsx:explore` for complex changes | +| Review spec diffs in PRs like code | Archive before the PR is merged | + +## Related Documentation + +- [Spec Index](specs/README.md) — All specs organized by area +- [SdkDesignGuidelines.md](../SdkDesignGuidelines.md) — Public API contract rules +- [docs/SdkDesign.md](../docs/SdkDesign.md) — SDK architecture overview \ No newline at end of file diff --git a/openspec/changes/diagnostics-compaction/design.md b/openspec/changes/diagnostics-compaction/design.md new file mode 100644 index 0000000000..d738f4045c --- /dev/null +++ b/openspec/changes/diagnostics-compaction/design.md @@ -0,0 +1,105 @@ +# Diagnostics Compaction — Design + +## Summary Compaction Algorithm + +### Data Collection + +Walk the `ITrace` tree (same traversal as `SummaryDiagnostics.CollectSummaryFromTraceTree()`) to collect all `StoreResponseStatistics` and `HttpResponseStatistics` entries from every `ClientSideRequestStatisticsTraceDatum` in the trace hierarchy. + +### Region Grouping + +Group collected entries by `Region` (string). Entries with a null/empty region are grouped under `"Unknown"`. + +### Per-Region Summary + +For each region group (ordered chronologically by request start time): + +1. **First**: Full details of the chronologically first request +2. **Last**: Full details of the chronologically last request (omitted if only 1 request) +3. **Middle entries** (all except first and last): Group by `(StatusCode, SubStatusCode)`: + - **Count**: Number of requests in this group + - **TotalRequestCharge**: Sum of RU charges + - **MinDurationMs / MaxDurationMs / P50DurationMs / AvgDurationMs**: Latency statistics + +### Size Enforcement + +1. Serialize the summary JSON +2. If `serializedBytes <= MaxDiagnosticsSummarySizeBytes` → return as-is +3. If `serializedBytes > MaxDiagnosticsSummarySizeBytes` → return truncated output + +### Handling Both Direct and Gateway Requests + +Both `StoreResponseStatistics` (direct mode) and `HttpResponseStatistics` (gateway mode) are collected and treated uniformly in the summary. The aggregated groups include entries from both transport paths. An optional `"TransportType"` field (`"Direct"` / `"Gateway"`) can be included in aggregated groups if needed to distinguish. + +## Request Flow + +```mermaid +flowchart TD + A["ToString(DiagnosticsVerbosity)"] --> B{Verbosity?} + B -->|Detailed| C["Existing TraceJsonWriter path"] + B -->|Summary| D["DiagnosticsSummaryWriter"] + D --> E["Walk ITrace tree"] + E --> F["Collect StoreResponseStatistics\n+ HttpResponseStatistics"] + F --> G["Group by Region"] + G --> H["Per region:\nFirst + Last + Aggregated Middle"] + H --> I["Serialize to JSON"] + I --> J{Size <= Max?} + J -->|Yes| K["Return summary JSON"] + J -->|No| L["Return truncated JSON"] + C --> M["Return full trace JSON"] +``` + +## Files to Create + +| File | Description | +|------|-------------| +| `Microsoft.Azure.Cosmos/src/Diagnostics/DiagnosticsVerbosity.cs` | `DiagnosticsVerbosity` enum | +| `Microsoft.Azure.Cosmos/src/Diagnostics/DiagnosticsSummaryWriter.cs` | Summary computation and JSON serialization logic | + +## Files to Modify + +| File | Change | +|------|--------| +| `CosmosClientOptions.cs` | Add `DiagnosticsVerbosity` and `MaxDiagnosticsSummarySizeBytes` properties with validation | +| `CosmosDiagnostics.cs` | Add `ToString(DiagnosticsVerbosity)` abstract overload | +| `CosmosTraceDiagnostics.cs` | Implement `ToString(DiagnosticsVerbosity)` overload; delegate to `DiagnosticsSummaryWriter` when verbosity is `Summary` | +| `TraceWriter.TraceJsonWriter.cs` | Add summary serialization path that delegates to `DiagnosticsSummaryWriter` when verbosity is `Summary` | +| `SummaryDiagnostics.cs` | Extend `CollectSummaryFromTraceTree()` to support region-grouped collection with ordering | +| `ClientSideRequestStatisticsTraceDatum.cs` | Ensure `StoreResponseStatistics` and `HttpResponseStatistics` lists are accessible for summary computation | + +## Contract/Baseline Updates + +| File | Change | +|------|--------| +| `ContractEnforcementTests.cs` baseline | Update public API contract for new enum and properties | + +## Alternatives Considered + +### Alternative 1: Emit summary alongside truncated trace tree +Instead of replacing the full trace, emit the summary _alongside_ the first + last children of the trace tree. + +**Pros:** Preserves some trace structure for tooling that parses it. +**Cons:** Larger output size; complex to implement; defeats the purpose of compaction. +**Decision:** Rejected — summary replaces the full trace. The `First` and `Last` entries in each region summary provide the detailed bookends. + +### Alternative 2: Per-request verbosity via RequestOptions +Add a `DiagnosticsVerbosity` property to `RequestOptions` for per-request control. + +**Pros:** More granular control. +**Cons:** Verbosity is a serialization concern, not a request concern. The `ToString(DiagnosticsVerbosity)` overload provides the same flexibility without complicating `RequestOptions`. +**Decision:** Deferred. Can be added later if needed. + +### Alternative 3: Transport type distinction in aggregated groups +Include a `TransportType` field (`"Direct"` / `"Gateway"`) in each aggregated group. + +**Pros:** Helps distinguish transport-specific issues. +**Cons:** Increases output size; `StatusCode/SubStatusCode` is usually sufficient. +**Decision:** Deferred. Can add later if customer feedback warrants it. + +## Key References + +- `Microsoft.Azure.Cosmos/src/Diagnostics/CosmosTraceDiagnostics.cs` — concrete diagnostics implementation +- `Microsoft.Azure.Cosmos/src/Tracing/TraceWriter.TraceJsonWriter.cs` — current trace serialization +- `Microsoft.Azure.Cosmos/src/Diagnostics/SummaryDiagnostics.cs` — existing summary aggregation (foundation) +- `Microsoft.Azure.Cosmos/src/Tracing/TraceData/ClientSideRequestStatisticsTraceDatum.cs` — stats data +- `docs/SdkDesign.md` — SDK architecture overview diff --git a/openspec/changes/diagnostics-compaction/proposal.md b/openspec/changes/diagnostics-compaction/proposal.md new file mode 100644 index 0000000000..b0db031a50 --- /dev/null +++ b/openspec/changes/diagnostics-compaction/proposal.md @@ -0,0 +1,71 @@ +# Diagnostics Compaction — Proposal + +## Problem + +`CosmosDiagnostics.ToString()` produces a JSON trace that grows **unboundedly** with retries. Each retry attempt creates a new child `ITrace` node containing a full `ClientSideRequestStatisticsTraceDatum` with complete `StoreResponseStatistics` and `HttpResponseStatistics` entries. In pathological scenarios (sustained 429 throttling, transient failures, cross-region failovers), a single operation's diagnostics can grow to hundreds of KB. + +**Impact:** +- **Log truncation** — monitoring systems (Application Insights, Azure Monitor, etc.) silently drop oversized log entries +- **Memory pressure** — large diagnostic strings increase GC overhead, especially at high throughput +- **Readability** — operators cannot quickly extract signal from noise when hundreds of identical retry entries are listed + +**Example scenario:** A point read that encounters 50 retries due to 429 throttling in West US 2, then fails over to East US 2 with 10 more retries, produces ~60 full `StoreResponseStatistics` entries in the trace tree. With summary mode, this compacts to: first request + last request + 1 aggregated group per region. + +## Proposed Approach + +Introduce a **`DiagnosticsVerbosity`** concept (modeled after [Azure/azure-sdk-for-rust#3592](https://github.com/Azure/azure-sdk-for-rust/pull/3592)) that controls how `CosmosDiagnostics.ToString()` serializes trace data: + +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Detailed** (default) | Current behavior — full trace tree output | Debugging, development | +| **Summary** | Region-grouped compaction with first/last + aggregated middle | Production logging, size-constrained environments | + +**Key design principle:** The in-memory representation (`ITrace` tree, `ClientSideRequestStatisticsTraceDatum`) stays **unchanged**. Compaction only happens at **serialization time** in the `TraceJsonWriter` path. This preserves full programmatic access to diagnostics data while reducing serialized output size. + +## SDK Area + +- **Primary:** Diagnostics +- **Secondary:** Client-config (new options properties) + +## Preview vs GA + +The `DiagnosticsVerbosity` enum and related options should ship as **GA** (non-preview) since it's an additive, backward-compatible feature with no impact when not opted into. + +## Backward Compatibility + +- **Default is `Detailed`** — no behavioral change for existing users +- **No breaking changes** — `ToString()` output format only changes when `Summary` is explicitly opted into +- **Programmatic API unchanged** — `GetContactedRegions()`, `GetFailedRequestCount()`, etc. continue to work from the full in-memory trace regardless of verbosity + +## Rollout Strategy + +1. Ship with `Detailed` as default in initial release +2. Document `Summary` mode in SDK documentation and changelog +3. Consider making `Summary` the default in a future major version after customer feedback + +## Non-Goals + +- Changing the in-memory `ITrace` tree structure +- Modifying the `Detailed` mode output format +- Adding new programmatic APIs beyond `ToString(DiagnosticsVerbosity)` overload +- Per-request verbosity override via `RequestOptions` (can be added later) + +## Resolved Questions + +1. **Should `AggregatedGroups` include an `AvgDurationMs` field?** The Rust SDK only includes min/max/P50. Adding avg is cheap to compute but adds to the output size. _Decision: Include avg. It's a single field and provides useful signal._ + +2. **Should the summary include the `children` trace tree at all?** Currently proposed as replacing the entire trace output. An alternative is to emit the summary _alongside_ a truncated trace tree (e.g., first + last children only). _Decision: Summary replaces the full trace. The `First` and `Last` entries in each region summary provide the detailed bookends._ + +3. **Gateway vs Direct distinction in aggregated groups.** Should each `AggregatedGroup` indicate whether it's from Direct or Gateway transport? _Decision: Defer. The `StatusCode/SubStatusCode` combination is usually sufficient. Can add a `TransportType` field later if needed._ + +4. **Caching.** The Rust SDK caches serialized JSON per verbosity level via `OnceLock`. Should the .NET SDK cache the summary JSON? _Decision: Yes, use `Lazy` or similar. `ToString()` may be called multiple times (logging, telemetry, etc.)._ + +5. **Thread safety.** `CosmosDiagnostics.Verbosity` as a settable property on a potentially shared object needs consideration. _Decision: Use the `ToString(DiagnosticsVerbosity)` overload which avoids mutating state entirely. The property is set once from `CosmosClientOptions` during response creation and read during serialization._ + +## References + +- **Rust SDK PR:** [Azure/azure-sdk-for-rust#3592](https://github.com/Azure/azure-sdk-for-rust/pull/3592) — `DiagnosticsContext` with `Summary` and `Detailed` modes +- **Current .NET diagnostics:** `Microsoft.Azure.Cosmos/src/Diagnostics/` and `Microsoft.Azure.Cosmos/src/Tracing/` +- **Existing summary:** `SummaryDiagnostics.cs` — aggregates `(StatusCode, SubStatusCode)` counts (foundation to build on) +- **Trace tree:** `ITrace` → `Trace` with recursive children and `ClientSideRequestStatisticsTraceDatum` data +- **Related spec:** `openspec/specs/diagnostics-and-observability/spec.md` diff --git a/openspec/changes/diagnostics-compaction/tasks.md b/openspec/changes/diagnostics-compaction/tasks.md new file mode 100644 index 0000000000..6048a7aac1 --- /dev/null +++ b/openspec/changes/diagnostics-compaction/tasks.md @@ -0,0 +1,88 @@ +# Diagnostics Compaction — Tasks + +## Task 1: DiagnosticsVerbosity Enum & Options Plumbing + +**Scope:** Create the enum, add `DiagnosticsVerbosity` and `MaxDiagnosticsSummarySizeBytes` properties to `CosmosClientOptions`, add `ToString(DiagnosticsVerbosity)` abstract overload to `CosmosDiagnostics`, add environment variable support. + +**Acceptance:** `ToString(verbosity)` overloads compile and delegate correctly. Parameterless `ToString()` is unchanged (always `Detailed`). No behavioral change yet. + +**Spec requirements:** Diagnostics Verbosity (default verbosity, parameterless ToString, environment variable configuration, code-level override, verbosity precedence) + +## Task 2: Summary Computation Engine + +**Scope:** Implement `DiagnosticsSummaryWriter` — the core logic that walks the trace tree, collects stats, groups by region, computes first/last/aggregated groups, and produces the summary JSON structure. + +**Acceptance:** Given an `ITrace` tree, produces the correct summary JSON. Unit-testable in isolation. + +**Spec requirements:** Summary mode region grouping, first/last preservation, single request region, aggregated groups, mixed Direct and Gateway, region ordering + +## Task 3: Summary Serialization Integration + +**Scope:** Implement `CosmosTraceDiagnostics.ToString(DiagnosticsVerbosity)`. When `Summary`, delegate to `DiagnosticsSummaryWriter`. Implement size enforcement and truncated output fallback. Implement caching. Parameterless `ToString()` remains unchanged. + +**Acceptance:** `ToString(DiagnosticsVerbosity.Summary)` returns compact summary JSON. `ToString()` (parameterless) continues to return full `Detailed` trace. + +**Spec requirements:** In-memory trace tree unchanged, size enforcement, size under limit, summary mode caching, Summary JSON Format, truncated output format + +## Task 4: Contract Updates & Public API Validation + +**Scope:** Update `ContractEnforcementTests` baselines for new public API surface. Ensure the new enum and properties appear in contracts. + +**Acceptance:** All contract tests pass. Public API is correctly documented. + +## Task 5: Unit Tests + +**Scope:** Comprehensive unit tests for the summary engine. + +| Test | Description | Spec Requirement | +|------|-------------|------------------| +| `DiagnosticsVerbosity_DefaultIsDetailed` | Verify enum default | Default verbosity is Detailed | +| `CosmosClientOptions_DiagnosticsVerbosity_DefaultValue` | Verify options default | Default verbosity is Detailed | +| `CosmosClientOptions_MaxSummarySizeBytes_Validation` | Min 4096 enforced | MaxDiagnosticsSummarySizeBytes minimum validation | +| `CosmosClientOptions_DiagnosticsVerbosity_EnvVarFallback` | Env var populates options | Environment variable configuration | +| `CosmosClientOptions_DiagnosticsVerbosity_CodeOverridesEnvVar` | Code takes precedence | Code-level value overrides env var | +| `ToString_Overload_UsesSummary_WhenExplicit` | `ToString(Summary)` produces summary | Verbosity precedence | +| `Summary_SingleRegion_SingleRequest` | No deduplication, first only | Single request region | +| `Summary_SingleRegion_TwoRequests` | First + last, no middle | First/last preservation | +| `Summary_SingleRegion_ManyRetries_429` | First + last + 1 aggregated group | Aggregated groups | +| `Summary_MultiRegion_Failover` | Separate region summaries | Region grouping | +| `Summary_MixedStatusCodes` | Multiple aggregated groups per region | Aggregated groups | +| `Summary_DirectAndGateway_Combined` | Both transport types in summary | Mixed Direct and Gateway | +| `Summary_P50_OddCount` | Percentile on odd-sized collection | Aggregated groups | +| `Summary_P50_EvenCount` | Percentile on even-sized collection | Aggregated groups | +| `Summary_P50_SingleItem` | Percentile with 1 item | Aggregated groups | +| `Summary_SizeEnforcement_UnderLimit` | Summary fits within max size | Size under limit | +| `Summary_SizeEnforcement_OverLimit_Truncated` | Falls back to truncated output | Size enforcement | +| `Summary_EmptyTrace` | No requests produces minimal output | Region grouping | +| `Summary_RegionOrdering_Deterministic` | Regions sorted alphabetically | Region ordering | +| `Detailed_Mode_Unchanged` | Existing detailed output is byte-for-byte identical | Parameterless ToString | +| `ToString_Parameterless_AlwaysDetailed` | Parameterless always returns Detailed | Parameterless ToString | + +## Task 6: Integration Tests (Emulator) + +| Test | Description | Spec Requirement | +|------|-------------|------------------| +| `ReadItem_SummaryMode_ProducesValidJson` | Real read → summary JSON parses correctly | Summary JSON Format | +| `ReadItem_SummaryMode_SizeWithinLimit` | Summary output ≤ configured max bytes | Size under limit | +| `QueryItems_SummaryMode_MultipleRequests` | Query with continuations → summary compacts | Aggregated groups | +| `BulkOperations_SummaryMode_HighRetryCount` | Simulate throttling → verify compaction | Aggregated groups | +| `CrossRegion_SummaryMode_RegionGroups` | Multi-region → separate region summaries | Region grouping | + +## Task 7: Baseline / Golden-File Tests + +**Scope:** Create baseline JSON files for summary mode output (similar to existing `EndToEndTraceWriterBaselineTests`). Verify serialization stability across code changes. + +**Spec requirements:** Summary JSON Format, truncated output format + +## Task 8: Changelog & Documentation + +**Scope:** Update `changelog.md` with the new feature. Update `.github/copilot-instructions.md` if diagnostics verbosity affects AI assistant behavior. + +## Expected Size Reductions + +| Scenario | Detailed Size | Expected Summary Size | Reduction | +|----------|--------------|----------------------|-----------| +| 1 request, no retries | ~2 KB | ~1 KB | ~50% | +| 10 retries, same region | ~20 KB | ~2 KB | ~90% | +| 50 retries, 2 regions | ~100 KB | ~3 KB | ~97% | +| 100 retries, 3 regions | ~200 KB | ~4 KB | ~98% | diff --git a/openspec/config.yaml b/openspec/config.yaml index 392946c67c..f01a19bf9b 100644 --- a/openspec/config.yaml +++ b/openspec/config.yaml @@ -1,20 +1,96 @@ schema: spec-driven -# Project context (optional) -# This is shown to AI when creating artifacts. -# Add your tech stack, conventions, style guides, domain knowledge, etc. -# Example: -# context: | -# Tech stack: TypeScript, React, Node.js -# We use conventional commits -# Domain: e-commerce platform - -# Per-artifact rules (optional) -# Add custom rules for specific artifacts. -# Example: -# rules: -# proposal: -# - Keep proposals under 500 words -# - Always include a "Non-goals" section -# tasks: -# - Break tasks into chunks of max 2 hours +context: | + This is the Azure Cosmos DB .NET v3 SDK — a thick client SDK for Azure Cosmos DB. + Tech stack: C# / .NET (LangVersion 10.0), Azure Pipelines CI. + + Key architecture layers: + - Public APIs (CosmosClient, Container, Database) + - Handler pipeline (chain-of-responsibility pattern for request processing) + - Retry policies (cross-region, throttle, PPAF, timeout) + - Transport (Gateway HTTP mode, Direct TCP mode) + - Serialization (Newtonsoft JSON.NET default, System.Text.Json optional) + + Major components: + - Microsoft.Azure.Cosmos/ — core SDK client (production code) + - Microsoft.Azure.Cosmos.Encryption/ — client-side encryption extension + - Microsoft.Azure.Cosmos.Encryption.Custom/ — custom encryption extension + - Microsoft.Azure.Cosmos.Samples/ — runnable examples and usage patterns + - docs/ — general reference documentation (SdkDesign.md, etc.) + + Build & test: + - Build: dotnet build Microsoft.Azure.Cosmos.sln + - Unit tests: dotnet test in test project folders under **/tests/ + - Integration tests: require Windows Cosmos DB Emulator (see templates/emulator-setup.yml) + - CI pipelines: azure-pipelines-*.yml files at repo root + + Versioning & feature flags: + - Versions managed in Directory.Build.props (do NOT change without explicit instruction) + - Preview features gated by PREVIEW define constant (IsPreview=true) + - Strong-name signed assemblies (35MSSharedLib1024.snk) + + Conventions: + - Follow Azure SDK .NET guidelines where compatible with v3 API + - Consistency with existing V3 public API takes priority over central SDK guidelines + - Public contract changes require API review (see SdkDesignGuidelines.md) + - All public APIs must support unit testing and mocking + - Request options should not be sealed + - Never push directly to master — always use feature branches and PRs + - PR title format: [Internal] Category: (Adds|Fixes|Refactors|Removes) Description + - Use [Internal] prefix only for changes with NO customer-facing impact + - Existing design docs live in docs/ — specs should reference them, not duplicate + + Architecture reference: docs/SdkDesign.md + Design guidelines: SdkDesignGuidelines.md + + SDK areas: routing, retry, query, changefeed, bulk, serialization, + encryption, telemetry, transport, diagnostics, client-config + + Testing: + - Unit tests: Microsoft.Azure.Cosmos.Tests (no external dependencies) + - Integration tests: Microsoft.Azure.Cosmos.EmulatorTests (requires Cosmos DB Emulator) + - Performance tests: Microsoft.Azure.Cosmos.Performance.Tests (micro benchmarks) + +rules: + proposal: + - Include which SDK area this change affects (routing, retry, query, changefeed, bulk, serialization, encryption, telemetry, transport, diagnostics, client-config) + - State whether this is a preview or GA feature + - Reference existing design docs in docs/ when applicable + - Include backward compatibility impact analysis + - Consider both Gateway and Direct mode implications + - Link related GitHub issues with issue numbers + - Include a Non-goals section to clarify scope boundaries + - Note any service-side dependencies or minimum service version requirements + + specs: + - Use EARS notation for behavioral requirements (WHEN , THEN the SDK SHALL ) + - Capture behavioral contracts and invariants, not implementation details + - Include full C# public API surface with XML doc comments + - Show a complete usage example a customer would write + - Include error handling and edge cases + - Make every acceptance criterion testable — each must map to at least one test case + - Specify error handling behavior using CosmosException (status codes, sub-status codes) + - Note backward compatibility impact and any breaking changes with migration guidance + - Include configuration precedence (per-request RequestOptions > client-wide CosmosClientOptions > environment variable > default) + - Note differences between GA and Preview behavior where applicable + + design: + - Reference docs/SdkDesign.md for component architecture context + - Link to specific source files using relative paths (e.g., ../../Microsoft.Azure.Cosmos/src/Path/File.cs) + - Reference key source files and classes by path + - Include Mermaid diagrams for request flow through SDK layers + - List all new and modified files with their responsibilities in a table + - Document alternatives considered with pros, cons, and rejection rationale + - Consider thread safety, async patterns, and performance considerations + + tasks: + - Keep tasks focused and independently testable + - Separate tasks into implementation, contract updates, and documentation + - Include test requirements (unit and/or emulator tests) + - Include unit test tasks with specific test scenarios listed + - Include integration test tasks that specify emulator requirements + - Include baseline/golden-file test tasks for serialization stability (follow EndToEndTraceWriterBaselineTests pattern) + - Include a task to update ContractEnforcementTests baseline for any public API changes + - Include a task to update changelog.md + - Include a task to update .github/copilot-instructions.md if the feature affects AI assistant behavior + - Note if changes affect public API contract diff --git a/openspec/specs/README.md b/openspec/specs/README.md new file mode 100644 index 0000000000..377dbfd565 --- /dev/null +++ b/openspec/specs/README.md @@ -0,0 +1,61 @@ +# Azure Cosmos DB .NET SDK — Feature Specifications + +Behavioral specifications for all major features of the Azure Cosmos DB .NET SDK v3. Each spec uses [EARS notation](https://en.wikipedia.org/wiki/Easy_Approach_to_Requirements_Syntax) (WHEN/THEN/SHALL) and includes public API surface, reference tables, and cross-spec interaction links. + +## How to Use These Specs + +- **Before implementing a feature**: Read the relevant spec to understand requirements and edge cases +- **During code review**: Validate PRs against the spec's requirements +- **For AI agents**: Reference specs as authoritative context for implementation, testing, and review tasks +- **For onboarding**: Use specs to understand how each SDK feature is supposed to behave + +## Index by Area + +### Data Operations + +| Spec | Description | +|------|-------------| +| [CRUD Operations](crud-operations/spec.md) | Point reads, creates, upserts, replaces, deletes, ReadMany | +| [Patch Operations](patch-operations/spec.md) | Partial document modifications (add, remove, replace, set, increment, move) | +| [Query and LINQ](query-and-linq/spec.md) | SQL queries, LINQ, cross-partition, pagination, FeedIterator | +| [Change Feed](change-feed/spec.md) | Change feed iterator, processor, estimator, modes, start positions | +| [Batch and Transactional](batch-and-transactional/spec.md) | TransactionalBatch (atomic) and bulk execution (throughput-optimized) | +| [Distributed Transactions](distributed-transactions/spec.md) | Cross-partition transactional operations (evolving) | + +### Routing & Availability + +| Spec | Description | +|------|-------------| +| [Retry and Failover](retry-and-failover/spec.md) | Throttling retry (429), region failover, PPAF, Gone (410) handling | +| [Cross-Region Hedging](cross-region-hedging/spec.md) | AvailabilityStrategy, threshold-based hedging, response selection | +| [Partition Keys](partition-keys/spec.md) | Partition keys, hierarchical keys, FeedRange, partition routing | + +### Transport & Configuration + +| Spec | Description | +|------|-------------| +| [Transport and Connectivity](transport-and-connectivity/spec.md) | Gateway vs Direct mode, TCP tuning, endpoint discovery | +| [Client and Configuration](client-and-configuration/spec.md) | CosmosClient lifecycle, authentication, custom handlers, builder | +| [Consistency and Session](consistency-and-session/spec.md) | Five consistency levels, session token management | +| [Handler Pipeline](handler-pipeline/spec.md) | Chain-of-responsibility request pipeline, handler ordering | + +### Serialization & Diagnostics + +| Spec | Description | +|------|-------------| +| [Serialization](serialization/spec.md) | CosmosSerializer, JSON.NET, System.Text.Json, LINQ serialization | +| [Diagnostics and Observability](diagnostics-and-observability/spec.md) | CosmosDiagnostics, trace tree, OpenTelemetry, metrics, DiagnosticsVerbosity (Summary/Detailed) | + +### Security & Management + +| Spec | Description | +|------|-------------| +| [Client-Side Encryption](client-side-encryption/spec.md) | Encryption keys, policies, transparent encrypt/decrypt | +| [Container and Database Management](container-and-database-management/spec.md) | Database/container CRUD, throughput, indexing | + +## Related Documentation + +- [OpenSpec README](../README.md) — Developer guide and workflow +- [SdkDesignGuidelines.md](../../SdkDesignGuidelines.md) — Public API contract rules +- [docs/SdkDesign.md](../../docs/SdkDesign.md) — SDK architecture overview +- [openspec/config.yaml](../config.yaml) — OpenSpec project configuration and rules \ No newline at end of file diff --git a/openspec/specs/batch-and-transactional/spec.md b/openspec/specs/batch-and-transactional/spec.md new file mode 100644 index 0000000000..92bb1809ac --- /dev/null +++ b/openspec/specs/batch-and-transactional/spec.md @@ -0,0 +1,151 @@ +# Batch and Transactional Operations + +## Purpose + +The Azure Cosmos DB .NET SDK provides two batching mechanisms: `TransactionalBatch` for atomic multi-operation transactions within a single partition key, and `AllowBulkExecution` for automatic throughput-optimized batching of individual operations. TransactionalBatch guarantees all-or-nothing semantics; bulk execution optimizes throughput at the cost of latency. + +## Public API Surface + +### TransactionalBatch (Atomic) + +```csharp +TransactionalBatch batch = container.CreateTransactionalBatch(new PartitionKey("pk-value")) + .CreateItem(item1) + .CreateItem(item2) + .ReplaceItem("id3", updatedItem) + .DeleteItem("id4") + .ReadItem("id5"); + +using TransactionalBatchResponse response = await batch.ExecuteAsync(); +``` + +### Batch Operations + +| Method | Parameters | Purpose | +|--------|-----------|---------| +| `CreateItem` | `T item, TransactionalBatchItemRequestOptions` | Add create operation | +| `CreateItemStream` | `Stream payload, TransactionalBatchItemRequestOptions` | Create (stream) | +| `ReadItem` | `string id, TransactionalBatchItemRequestOptions` | Add read operation | +| `ReplaceItem` | `string id, T item, TransactionalBatchItemRequestOptions` | Add replace | +| `ReplaceItemStream` | `string id, Stream payload, TransactionalBatchItemRequestOptions` | Replace (stream) | +| `UpsertItem` | `T item, TransactionalBatchItemRequestOptions` | Add upsert | +| `UpsertItemStream` | `Stream payload, TransactionalBatchItemRequestOptions` | Upsert (stream) | +| `DeleteItem` | `string id, TransactionalBatchItemRequestOptions` | Add delete | +| `PatchItem` | `string id, IReadOnlyList, TransactionalBatchPatchItemRequestOptions` | Add patch | + +### Bulk Execution + +```csharp +CosmosClientOptions options = new CosmosClientOptions { AllowBulkExecution = true }; +CosmosClient client = new CosmosClient("connection-string", options); + +// Individual operations are automatically batched by the SDK +List tasks = items.Select(item => + container.CreateItemAsync(item, new PartitionKey(item.Pk))).ToList(); +await Task.WhenAll(tasks); +``` + +## Requirements + +### Requirement: TransactionalBatch Atomicity + +The SDK SHALL execute all operations in a TransactionalBatch atomically. + +#### All-or-nothing execution + +**When** `TransactionalBatch.ExecuteAsync()` is called, **if** ANY operation fails, the SDK SHALL roll back the ENTIRE batch. Zero operations SHALL be committed. + +#### Same partition key constraint + +**When** creating a TransactionalBatch, all items in the batch SHALL share the same partition key (specified at `CreateTransactionalBatch`). + +#### Ordered execution + +**When** a TransactionalBatch is executed, the SDK SHALL execute operations in submission order (`x-ms-cosmos-batch-ordered: true`). + +#### No exceptions on failure + +**When** a TransactionalBatch fails, `ExecuteAsync` SHALL NOT throw exceptions. The caller SHALL check `response.IsSuccessStatusCode` to determine success or failure. + +#### Failed dependency status + +**When** one operation causes the batch to fail, subsequent operations SHALL return status code 424 (Failed Dependency). The failing operation SHALL return the actual error code. + +### Requirement: TransactionalBatch Limits + +The SDK SHALL enforce server-side limits on batch operations. + +| Limit | Value | +|-------|-------| +| Max operations per batch | 100 (server-enforced) | +| Max payload size | 2 MB (server-enforced) | + +### Requirement: TransactionalBatch Response + +The SDK SHALL provide per-operation results in the batch response. + +#### Response structure + +**When** a TransactionalBatch completes, the response SHALL implement `IReadOnlyList` and `IDisposable`, providing per-operation `StatusCode`, `ETag`, and typed results via `GetOperationResultAtIndex`. + +#### Multi-Status promotion + +**If** the response is 207 (Multi-Status), the SDK SHALL promote the status to the first failing operation's error code. + +### Requirement: Per-Operation Request Options + +The SDK SHALL support per-operation configuration within a batch. + +| Option | Type | Effect | +|--------|------|--------| +| `IfMatchEtag` | `string` | Conditional operation (412 on mismatch) | +| `IfNoneMatchEtag` | `string` | Conditional read | +| `EnableContentResponseOnWrite` | `bool?` | Skip response payload | +| `IndexingDirective` | `IndexingDirective?` | Include/Exclude indexing | + +### Requirement: Bulk Execution + +The SDK SHALL support automatic throughput-optimized batching via `AllowBulkExecution`. + +#### Automatic batching + +**Where** `CosmosClientOptions.AllowBulkExecution = true`, **when** individual item operations are called, the SDK SHALL automatically group operations by partition key range and send them as server batches. + +#### Non-atomic execution + +**While** bulk execution is enabled, individual operations SHALL be independent. Some MAY succeed while others fail (unlike TransactionalBatch). + +#### Per-partition-key-range streaming + +**While** bulk execution is enabled, each partition key range SHALL have its own `BatchAsyncStreamer` that accumulates operations and flushes when full or when a timer expires. + +#### Individual options respected + +**While** bulk execution is enabled, each operation's `ItemRequestOptions` (ETags, consistency, etc.) SHALL be honored within the batch. + +### TransactionalBatch vs Bulk Execution + +| Aspect | TransactionalBatch | Bulk Execution | +|--------|-------------------|----------------| +| Atomicity | All-or-nothing | Independent operations | +| Max operations | 100 | Unlimited (auto-batched) | +| Partition key | All same PK | Any/multiple PKs | +| Latency | Low (single round-trip) | Higher (batching delay) | +| Throughput | Single request | Optimized (parallel batches) | +| API | Explicit builder pattern | Implicit (normal CRUD calls) | +| Ordering | Ordered execution | Unordered | +| Configuration | Per-operation | `CosmosClientOptions.AllowBulkExecution = true` | + +## Interactions + +- **Partition Keys**: All TransactionalBatch items must share the same partition key. See `partition-keys` spec. +- **CRUD Operations**: Batch uses the same operation semantics as individual CRUD. See `crud-operations` spec. +- **Retry Policies**: Batch requests are retried at the batch level (not per-operation). See `retry-and-failover` spec. +- **Serialization**: Typed batch operations use the container's serializer. Internal wire format uses HybridRow binary serialization. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Batch/TransactionalBatch.cs` +- Source: `Microsoft.Azure.Cosmos/src/Batch/BatchCore.cs` +- Source: `Microsoft.Azure.Cosmos/src/Batch/BatchAsyncContainerExecutor.cs` +- Source: `Microsoft.Azure.Cosmos/src/Batch/TransactionalBatchResponse.cs` \ No newline at end of file diff --git a/openspec/specs/change-feed/spec.md b/openspec/specs/change-feed/spec.md new file mode 100644 index 0000000000..73de797232 --- /dev/null +++ b/openspec/specs/change-feed/spec.md @@ -0,0 +1,195 @@ +# Change Feed + +## Purpose + +The Azure Cosmos DB change feed provides an ordered log of changes (creates, updates, and optionally deletes) to items in a container. The .NET SDK supports two consumption patterns: a low-level `FeedIterator`-based pull model for fine-grained control, and a high-level `ChangeFeedProcessor` framework for distributed, resilient consumption with automatic lease management and partition balancing. + +## Public API Surface + +### Change Feed Modes + +| Mode | Header | Returns | Requirements | +|------|--------|---------|-------------| +| **Incremental** (LatestVersion) | `A-IM: Incremental` | Latest state of items (creates + updates) | Default; no special container config | +| **AllVersionsAndDeletes** (FullFidelity) | `A-IM: FullFidelityFeed` | All intermediate versions + deletes within retention window | Container must have `ChangeFeedPolicy` with retention; forces Gateway mode | + +### ChangeFeedStartFrom Options + +| Factory Method | Behavior | +|---------------|----------| +| `ChangeFeedStartFrom.Beginning()` | Start from container creation; catch all historical changes | +| `ChangeFeedStartFrom.Now()` | Start from current instant; only future changes | +| `ChangeFeedStartFrom.Time(DateTime utcTime)` | Start from specific UTC timestamp (exclusive); `DateTime.Kind` must be `Utc` | +| `ChangeFeedStartFrom.ContinuationToken(string)` | Resume from a saved checkpoint token | + +All options except `ContinuationToken` accept an optional `FeedRange` parameter for partition-specific reading. + +### FeedIterator-Based Consumption + +```csharp +FeedIterator iterator = container.GetChangeFeedIterator( + ChangeFeedStartFrom.Now(), + ChangeFeedMode.LatestVersion); + +while (iterator.HasMoreResults) +{ + FeedResponse response = await iterator.ReadNextAsync(); + if (response.StatusCode == HttpStatusCode.NotModified) + { + string token = response.Headers.ContinuationToken; + await Task.Delay(pollInterval); + continue; + } + foreach (T item in response) { /* process change */ } +} +``` + +### ChangeFeedProcessor + +```csharp +ChangeFeedProcessor processor = container + .GetChangeFeedProcessorBuilder("processorName", HandleChangesAsync) + .WithInstanceName("host-1") + .WithLeaseContainer(leaseContainer) + .WithPollInterval(TimeSpan.FromSeconds(5)) + .WithStartFromBeginning() + .WithErrorNotification(HandleErrorAsync) + .Build(); + +await processor.StartAsync(); +await processor.StopAsync(); +``` + +## Requirements + +### Requirement: Change Feed Modes + +The SDK SHALL support two change feed modes with distinct behavior. + +#### Incremental mode + +**When** `ChangeFeedMode.LatestVersion` is used, the SDK SHALL return only the latest version of each item. Intermediate updates between polls SHALL be collapsed into the final state. + +#### AllVersionsAndDeletes mode + +**When** `ChangeFeedMode.AllVersionsAndDeletes` is used, the SDK SHALL return all intermediate versions and deletes within the configured retention window. + +#### AllVersionsAndDeletes retention boundary + +**If** a read in AllVersionsAndDeletes mode attempts to read beyond the retention window, the SDK SHALL return 400 Bad Request. + +#### AllVersionsAndDeletes forces Gateway mode + +**When** AllVersionsAndDeletes mode is used, the SDK SHALL force Gateway mode for split-handling logic, regardless of `CosmosClientOptions.ConnectionMode`. + +### Requirement: FeedIterator Semantics + +The SDK SHALL provide change feed results through the FeedIterator pattern with specific guarantees. + +#### 304 Not Modified + +**When** no changes exist since the last checkpoint, the SDK SHALL return a response with status code 304 (Not Modified) and an empty result set. Continuation tokens SHALL still be available in the response headers. + +#### Transactional grouping + +**When** items are committed in the same transaction, the SDK SHALL return them together in the same page, even if this exceeds `PageSizeHint`. + +#### Ordering guarantee + +**When** reading changes within a single partition, the SDK SHALL return changes ordered by logical sequence number (LSN). No ordering guarantee SHALL be provided across partitions. + +#### PageSizeHint semantics + +**Where** `ChangeFeedRequestOptions.PageSizeHint` is set, the SDK SHALL treat it as a hint, not a guarantee. Pages MAY contain fewer or more items than requested. + +### Requirement: ChangeFeedProcessor + +The SDK SHALL provide a high-level processor framework for distributed change feed consumption. + +#### Lease container requirement + +**When** building a ChangeFeedProcessor, a lease container SHALL be required. The lease container partition key SHOULD be `/id`. + +#### Instance name requirement + +**When** building a ChangeFeedProcessor, an instance name SHALL be required. It SHALL be unique per instance in a distributed processor cluster. + +#### Automatic partition balancing + +**When** instances are added or removed from a processor cluster, the SDK SHALL automatically rebalance partition ownership evenly across instances. + +#### Auto-checkpointing + +**When** using the default `ChangeFeedHandler` delegate, the SDK SHALL auto-checkpoint after successful completion of each batch. For explicit checkpointing, `ChangeFeedHandlerWithManualCheckpoint` SHALL be used. + +#### Error handling + +**When** an unhandled exception occurs in the user delegate, the SDK SHALL pause processing for that partition, invoke the `WithErrorNotification` callback, and retry after `PollInterval`. + +#### Lease expiration + +**If** a lease is not renewed within `LeaseExpirationInterval` (default 60s), the SDK SHALL treat it as expired and redistribute it to other instances. + +#### StartFromBeginning with AllVersionsAndDeletes + +**When** `WithStartFromBeginning()` is used with `AllVersionsAndDeletes` mode, the SDK SHALL prohibit this combination. + +### Requirement: Lease Management + +The SDK SHALL manage leases with optimistic concurrency. + +#### Lease storage + +**When** leases are created, the SDK SHALL store them as documents in the lease container with optimistic concurrency via ETags. Each lease SHALL track: partition range, owner instance, continuation token, and last timestamp. + +#### Lease renewal + +**While** a processor is running, the SDK SHALL renew held leases every `LeaseRenewInterval` (default 17s). + +#### Lease acquisition + +**While** a processor is running, the SDK SHALL check for unowned leases every `LeaseAcquireInterval` (default 13s). + +#### Partition split handling + +**When** a physical partition splits, the SDK SHALL automatically create new leases for the split ranges. + +## Configuration + +### ChangeFeedRequestOptions + +| Property | Type | Default | Notes | +|----------|------|---------|-------| +| `PageSizeHint` | `int?` | `null` (server default) | Batch size hint; transaction-aware | + +### ChangeFeedProcessor Timing Options + +| Parameter | Default | Purpose | +|-----------|---------|---------| +| `PollInterval` | 5 seconds | Delay between empty polls | +| `LeaseAcquireInterval` | 13 seconds | How often to check for unowned leases | +| `LeaseExpirationInterval` | 60 seconds | Lease validity window | +| `LeaseRenewInterval` | 17 seconds | How often to refresh held leases | + +## Error Handling + +| Exception | Trigger | Recovery | +|-----------|---------|----------| +| `MalformedChangeFeedContinuationTokenException` | Invalid/corrupted continuation token | Restart from `Beginning` or `Now` | +| `LeaseLostException` | Lease expired or stolen during processing | Automatic — partition reassigned | +| `FeedRangeGoneException` | Partition split/merge during iteration | Automatic — ranges refreshed | +| Delegate exceptions | Unhandled exception in user code | Partition paused; retried after poll interval | + +## Interactions + +- **Handler Pipeline**: Change feed requests flow through the full pipeline with `IsPartitionKeyRangeHandlerRequired = true`. See `handler-pipeline` spec. +- **Retry Policies**: Change feed page fetches are retried per `retry-and-failover` spec. +- **Partition Keys**: Change feed can be scoped to a `FeedRange` (physical partition). See `partition-keys` spec. +- **Serialization**: `FeedIterator` uses the container's serializer. See `serialization` spec. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/ChangeFeed/` +- Source: `Microsoft.Azure.Cosmos/src/ChangeFeedProcessor/` +- Source: `Microsoft.Azure.Cosmos/src/ChangeFeed/ChangeFeedMode.cs` +- Source: `Microsoft.Azure.Cosmos/src/ChangeFeed/ChangeFeedStartFrom.cs` \ No newline at end of file diff --git a/openspec/specs/client-and-configuration/spec.md b/openspec/specs/client-and-configuration/spec.md new file mode 100644 index 0000000000..7ca3c7b3b9 --- /dev/null +++ b/openspec/specs/client-and-configuration/spec.md @@ -0,0 +1,159 @@ +# Client and Configuration + +## Purpose + +`CosmosClient` is the entry point for all interactions with Azure Cosmos DB. It manages connections, caches, and configuration. The SDK is designed for a single long-lived `CosmosClient` instance per application (singleton pattern) to maximize connection pooling and cache reuse. Configuration is immutable after construction. + +## Public API Surface + +### CosmosClient Constructors + +```csharp +// Connection string +public CosmosClient(string connectionString, CosmosClientOptions clientOptions = null) + +// Endpoint + key/resource token +public CosmosClient(string accountEndpoint, string authKeyOrResourceToken, CosmosClientOptions clientOptions = null) + +// Endpoint + rotatable credential +public CosmosClient(string accountEndpoint, AzureKeyCredential authKeyOrResourceTokenCredential, CosmosClientOptions clientOptions = null) + +// Endpoint + AAD token +public CosmosClient(string accountEndpoint, TokenCredential tokenCredential, CosmosClientOptions clientOptions = null) +``` + +### CosmosClientBuilder (Fluent API) + +```csharp +CosmosClient client = new CosmosClientBuilder("connection-string") + .WithApplicationPreferredRegions(new List { "East US", "West US" }) + .WithConnectionModeDirect() + .WithThrottlingRetryOptions(maxWaitTime: TimeSpan.FromSeconds(30), maxAttempts: 9) + .WithBulkExecution(true) + .Build(); + +// Or with pre-warming: +CosmosClient client = await new CosmosClientBuilder("connection-string") + .BuildAndInitializeAsync(new[] { ("myDb", "myContainer") }); +``` + +### Resource References + +```csharp +Database db = cosmosClient.GetDatabase("myDb"); // No network call +Container container = cosmosClient.GetContainer("myDb", "myContainer"); // No network call +``` + +These return proxy references — they do NOT validate existence. Use `CreateDatabaseIfNotExistsAsync` / `CreateContainerIfNotExistsAsync` to ensure resources exist. + +## Requirements + +### Requirement: Client Lifecycle + +The SDK SHALL manage `CosmosClient` as a thread-safe, long-lived singleton. + +#### Thread safety + +**When** multiple threads access a `CosmosClient` instance concurrently, the SDK SHALL handle all operations safely without requiring external synchronization. + +#### Singleton pattern + +**When** creating a `CosmosClient`, the SDK SHALL optimize for a single instance per application lifetime to maximize connection pooling and cache reuse. + +#### No network validation at construction + +**When** a `CosmosClient` is constructed, the SDK SHALL NOT perform any network calls. Connectivity issues SHALL surface on the first operation. + +#### Immutable after construction + +**When** a `CosmosClient` is constructed, the SDK SHALL treat `ClientOptions` as read-only. Modifications after construction SHALL NOT be possible. + +#### Disposal behavior + +**When** a `CosmosClient` is disposed, all subsequent operations SHALL throw errors. The SDK SHALL track disposal via `DisposedDateTimeUtc`. + +### Requirement: Connection Modes + +The SDK SHALL support two connection modes with distinct characteristics. + +| Aspect | Gateway (`ConnectionMode.Gateway`) | Direct (`ConnectionMode.Direct`) - Default | +|--------|-----------------------------------|-------------------------------------------| +| Protocol | HTTPS (port 443) | TCP/SSL (multiple ports) | +| Routing | Via gateway proxy | Direct to data nodes | +| Throughput | Lower | Higher | +| Latency | Higher | Lower | +| Firewall | Simple (one endpoint) | Complex (multiple ports) | +| Key options | `GatewayModeMaxConnectionLimit`, `WebProxy` | `MaxRequestsPerTcpConnection`, `MaxTcpConnectionsPerEndpoint`, `IdleTcpConnectionTimeout` | + +### Requirement: Region Configuration + +The SDK SHALL support configuring preferred regions for request routing. + +#### ApplicationRegion + +**Where** `CosmosClientOptions.ApplicationRegion` is set (single string), **when** the client initializes, the SDK SHALL generate a proximity-ordered fallback list. This setting SHALL be mutually exclusive with `ApplicationPreferredRegions`. + +#### ApplicationPreferredRegions + +**Where** `CosmosClientOptions.ApplicationPreferredRegions` is set (ordered list), **when** requests are routed, the SDK SHALL follow the explicit failover order. Invalid regions SHALL be silently ignored but used if later added to the account. + +#### LimitToEndpoint + +**Where** `CosmosClientOptions.LimitToEndpoint = true`, **when** the client initializes, the SDK SHALL disable region auto-discovery. This setting SHALL be incompatible with `ApplicationRegion`/`ApplicationPreferredRegions`. + +### Requirement: Proxy Reference Semantics + +The SDK SHALL return lightweight proxy references for database and container access. + +**When** `GetDatabase()` or `GetContainer()` is called, the SDK SHALL return a reference without making network calls. Operations on non-existent resources SHALL return 404. + +## Configuration + +### CosmosClientOptions Key Properties + +| Property | Type | Default | Notes | +|----------|------|---------|-------| +| `ConnectionMode` | `ConnectionMode` | `Direct` | Gateway or Direct | +| `ApplicationRegion` | `string` | `null` | Single preferred region | +| `ApplicationPreferredRegions` | `IReadOnlyList` | `null` | Ordered region list | +| `LimitToEndpoint` | `bool` | `false` | Disable region discovery | +| `ConsistencyLevel` | `ConsistencyLevel?` | `null` | Can only weaken account default | +| `MaxRetryAttemptsOnRateLimitedRequests` | `int?` | 9 | HTTP 429 retry attempts | +| `MaxRetryWaitTimeOnRateLimitedRequests` | `TimeSpan?` | 30 seconds | Max cumulative retry wait | +| `AllowBulkExecution` | `bool` | `false` | Automatic request batching | +| `EnableContentResponseOnWrite` | `bool?` | `null` | Skip response payload on writes | +| `RequestTimeout` | `TimeSpan` | 6 seconds | Per-request timeout | +| `GatewayModeMaxConnectionLimit` | `int` | 50 | Gateway HTTP connection pool | +| `MaxRequestsPerTcpConnection` | `int?` | 30 | Direct: concurrent requests per TCP connection | +| `MaxTcpConnectionsPerEndpoint` | `int?` | 65,535 | Direct: max TCP connections per backend | +| `IdleTcpConnectionTimeout` | `TimeSpan?` | indefinite | Direct: close idle connections (min 10 min) | +| `OpenTcpConnectionTimeout` | `TimeSpan?` | 5 seconds | Direct: TCP establishment timeout | +| `EnableTcpConnectionEndpointRediscovery` | `bool` | `true` | Direct: refresh addresses on TCP reset | +| `AvailabilityStrategy` | `AvailabilityStrategy` | `null` | Cross-region hedging | +| `CustomHandlers` | `Collection` | empty | Pipeline interceptors | +| `ApplicationName` | `string` | `null` | User-agent suffix | + +### Serializer Configuration (Mutually Exclusive) + +Only ONE of these can be set: +- `SerializerOptions` — `CosmosSerializationOptions` (Newtonsoft.Json config) +- `Serializer` — `CosmosSerializer` (custom implementation) +- `UseSystemTextJsonSerializerWithOptions` — `JsonSerializerOptions` (System.Text.Json) + +See `serialization` spec for details. + +## Interactions + +- **Handler Pipeline**: Client constructs the handler pipeline at initialization. See `handler-pipeline` spec. +- **Retry Policies**: `MaxRetryAttemptsOnRateLimitedRequests` and `MaxRetryWaitTimeOnRateLimitedRequests` configure `ResourceThrottleRetryPolicy`. See `retry-and-failover` spec. +- **Hedging**: `AvailabilityStrategy` configures cross-region hedging. See `cross-region-hedging` spec. +- **Serialization**: Serializer configuration affects all typed APIs. See `serialization` spec. +- **Transport**: Connection mode and TCP settings affect transport behavior. See `transport-and-connectivity` spec. +- **Consistency**: `ConsistencyLevel` affects read guarantees. See `consistency-and-session` spec. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/CosmosClient.cs` +- Source: `Microsoft.Azure.Cosmos/src/CosmosClientOptions.cs` +- Source: `Microsoft.Azure.Cosmos/src/Fluent/CosmosClientBuilder.cs` +- Source: `Microsoft.Azure.Cosmos/src/ConnectionMode.cs` \ No newline at end of file diff --git a/openspec/specs/client-side-encryption/spec.md b/openspec/specs/client-side-encryption/spec.md new file mode 100644 index 0000000000..5ebb759706 --- /dev/null +++ b/openspec/specs/client-side-encryption/spec.md @@ -0,0 +1,145 @@ +# Client-Side Encryption + +## Purpose + +Client-side encryption enables encrypting sensitive item properties before they are sent to the Cosmos DB service, ensuring data is encrypted at rest and in transit with customer-managed keys. The encryption extensions are delivered as separate NuGet packages (`Microsoft.Azure.Cosmos.Encryption` and `Microsoft.Azure.Cosmos.Encryption.Custom`) that wrap the core SDK client. + +## Public API Surface + +### Client Encryption Key Management + +```csharp +// Create a client encryption key +ClientEncryptionKeyResponse response = await database.CreateClientEncryptionKeyAsync( + new ClientEncryptionKeyProperties( + id: "myKey", + encryptionAlgorithm: DataEncryptionAlgorithm.AeadAes256CbcHmacSha256, + wrappedDataEncryptionKey: wrappedKeyBytes, + encryptionKeyWrapMetadata: new EncryptionKeyWrapMetadata( + type: "akv", + name: "myKeyVaultKey", + value: "https://myvault.vault.azure.net/keys/myKey/version"))); + +// Read key properties +ClientEncryptionKeyProperties keyProps = await database.GetClientEncryptionKey("myKey").ReadAsync(); + +// Rewrap (rotate) key +await database.GetClientEncryptionKey("myKey").ReplaceAsync(updatedProperties); +``` + +### Encryption Policy Configuration + +```csharp +ContainerProperties containerProps = new ContainerProperties("myContainer", "/pk") +{ + ClientEncryptionPolicy = new ClientEncryptionPolicy( + new List + { + new ClientEncryptionIncludedPath + { + Path = "/sensitiveProperty", + ClientEncryptionKeyId = "myKey", + EncryptionType = EncryptionType.Deterministic, + EncryptionAlgorithm = DataEncryptionAlgorithm.AeadAes256CbcHmacSha256 + } + }) +}; +``` + +### Extension Package Registration + +```csharp +// Microsoft.Azure.Cosmos.Encryption (Azure Key Vault) +CosmosClient encryptionClient = cosmosClient.WithEncryption( + keyEncryptionKeyResolver, + KeyEncryptionKeyResolverName.AzureKeyVault); + +// Microsoft.Azure.Cosmos.Encryption.Custom (custom provider) +CosmosClient customEncClient = cosmosClient.WithEncryption( + new MyCustomEncryptionKeyWrapProvider()); +``` + +## Requirements + +### Requirement: Client Encryption Key Management + +The SDK SHALL support creating and managing client encryption keys (CEKs) in the Cosmos DB account. + +#### Create encryption key + +**When** `database.CreateClientEncryptionKeyAsync(properties)` is called with valid key properties, the SDK SHALL create a client encryption key in the database with the key material wrapped using the specified key wrap provider. + +#### Read encryption key + +**When** `database.GetClientEncryptionKey(keyId).ReadAsync()` is called for an existing client encryption key, the SDK SHALL return the key properties (metadata only, not raw key material). + +#### Replace (rewrap) encryption key + +**When** `database.GetClientEncryptionKey(keyId).ReplaceAsync(updatedProperties)` is called, the SDK SHALL rewrap the data encryption key with the new key wrap metadata, enabling key rotation without re-encrypting data. + +### Requirement: Encryption Policy + +The SDK SHALL support defining encryption policies on containers to specify which properties are encrypted. + +#### Define encryption policy + +**Where** `ContainerProperties.ClientEncryptionPolicy` is configured with `ClientEncryptionIncludedPath` entries, **when** the container is created, the SDK SHALL encrypt the specified property paths on write and decrypt them on read. + +#### Deterministic encryption + +**Where** `EncryptionType = "Deterministic"` is set for a path, **when** the same value is encrypted multiple times, the SDK SHALL produce the same ciphertext, enabling equality queries on the encrypted property. + +#### Randomized encryption + +**Where** `EncryptionType = "Randomized"` is set for a path, **when** the same value is encrypted multiple times, the SDK SHALL produce different ciphertext each time. Equality queries on the encrypted property SHALL NOT be supported. + +### Requirement: Key Wrap Providers + +The SDK SHALL support pluggable key wrap providers for wrapping/unwrapping data encryption keys. + +#### Azure Key Vault provider + +**Where** `EncryptionKeyWrapMetadata` is configured with type `"akv"` and an Azure Key Vault key URL, **when** the encryption key is used, the SDK SHALL use Azure Key Vault to wrap/unwrap the data encryption key. + +#### Custom key wrap provider + +**Where** a custom `EncryptionKeyWrapProvider` implementation is registered, the SDK SHALL use the custom provider for all key wrap/unwrap operations. + +### Requirement: Transparent Encryption/Decryption + +The SDK SHALL transparently encrypt and decrypt properties without requiring application code changes for standard CRUD operations. + +#### Transparent encryption on write + +**While** a container has an encryption policy configured, **when** `Container.CreateItemAsync(item)` is called, the SDK SHALL automatically encrypt the specified properties before sending to the service. + +#### Transparent decryption on read + +**While** a container has encrypted properties, **when** `Container.ReadItemAsync(id, pk)` is called, the SDK SHALL automatically decrypt the encrypted properties in the returned item. + +### Requirement: Encryption with Cosmos Client Extensions + +The SDK SHALL provide encryption through separate extension packages. + +#### Microsoft.Azure.Cosmos.Encryption package + +**Where** the `Microsoft.Azure.Cosmos.Encryption` NuGet package is referenced, **when** `cosmosClient.WithEncryption(keyEncryptionKeyResolver, KeyEncryptionKeyResolverName.AzureKeyVault)` is called, the SDK SHALL configure the client for client-side encryption with Azure Key Vault. + +#### Microsoft.Azure.Cosmos.Encryption.Custom package + +**Where** the `Microsoft.Azure.Cosmos.Encryption.Custom` NuGet package is referenced, **when** a custom `EncryptionKeyWrapProvider` is configured, the SDK SHALL support encryption with custom key management. + +## Interactions + +- **CRUD Operations**: Encryption is transparent for typed CRUD APIs. See `crud-operations` spec. +- **Query**: Deterministic encryption supports equality filters in queries. Randomized encryption does not. See `query-and-linq` spec. +- **Serialization**: Encryption wraps the configured serializer — items are serialized first, then encrypted properties are replaced with ciphertext. See `serialization` spec. +- **Change Feed**: Change feed results are automatically decrypted if the encryption client is used. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Resource/ClientEncryptionKey/ClientEncryptionKey.cs` +- Source: `Microsoft.Azure.Cosmos/src/Resource/Settings/ClientEncryptionPolicy.cs` +- Source: `Microsoft.Azure.Cosmos/src/Resource/Settings/ClientEncryptionIncludedPath.cs` +- Source: `Microsoft.Azure.Cosmos.Encryption/src/` — Azure Key Vault encryption extension +- Source: `Microsoft.Azure.Cosmos.Encryption.Custom/src/` — Custom encryption extension \ No newline at end of file diff --git a/openspec/specs/consistency-and-session/spec.md b/openspec/specs/consistency-and-session/spec.md new file mode 100644 index 0000000000..60b4bffa90 --- /dev/null +++ b/openspec/specs/consistency-and-session/spec.md @@ -0,0 +1,132 @@ +# Consistency and Session Management + +## Purpose + +The Azure Cosmos DB .NET SDK supports five consistency levels and automatically manages session tokens to ensure read-your-writes guarantees when using session consistency. Consistency can be configured at the account, client, or per-request level, with each level only able to weaken (never strengthen) the parent level's guarantee. + +## Public API Surface + +### Consistency Level Configuration + +```csharp +// Client-level override (weakens account-level) +CosmosClientOptions options = new CosmosClientOptions +{ + ConsistencyLevel = ConsistencyLevel.Session +}; + +// Per-request override +ItemRequestOptions requestOptions = new ItemRequestOptions +{ + ConsistencyLevel = ConsistencyLevel.Eventual +}; +``` + +### Session Token Management + +```csharp +// Automatic: SDK manages session tokens internally + +// Manual: Pass session token from one client to another +ItemResponse writeResponse = await client1Container.CreateItemAsync(item, pk); +string sessionToken = writeResponse.Headers.Session; + +ItemRequestOptions readOptions = new ItemRequestOptions +{ + SessionToken = sessionToken +}; +ItemResponse readResponse = await client2Container.ReadItemAsync(id, pk, readOptions); +``` + +## Requirements + +### Requirement: Consistency Level Configuration + +The SDK SHALL support overriding the account-level consistency at the client and request levels. + +#### Client-level consistency override + +**Where** `CosmosClientOptions.ConsistencyLevel` is set, **when** requests are made through this client, the SDK SHALL use the specified consistency level by default. This MAY only weaken the account-level consistency, never strengthen it. + +#### Per-request consistency override + +**Where** `RequestOptions.ConsistencyLevel` is set on a specific request, **when** that request is made, the SDK SHALL use the per-request consistency level, overriding the client-level setting. + +#### No override (account default) + +**Where** neither client-level nor request-level consistency is set, **when** requests are made, the SDK SHALL use the account's default consistency level. + +#### Configuration precedence + +Per-request `RequestOptions.ConsistencyLevel` > client-wide `CosmosClientOptions.ConsistencyLevel` > account default. + +### Requirement: Consistency Levels + +The SDK SHALL support all five Azure Cosmos DB consistency levels. + +| Level | Guarantee | +|-------|-----------| +| `Strong` | Linearizable reads — always returns the most recent committed write | +| `BoundedStaleness` | Reads lag by at most K versions or T seconds | +| `Session` | Read-your-writes and monotonic reads within a session (default) | +| `ConsistentPrefix` | Reads never see out-of-order writes | +| `Eventual` | Reads eventually converge to latest write | + +### Requirement: Session Token Management + +The SDK SHALL automatically manage session tokens to maintain session consistency guarantees. + +#### Automatic session token capture + +**When** a write operation completes, the SDK SHALL automatically capture the session token from the response and associate it with the container and partition key range. + +#### Automatic session token propagation + +**While** a session token has been captured from a previous write, **when** a subsequent read request is made to the same container, the SDK SHALL automatically include the stored session token in the request header. + +#### Session token per partition + +**When** writes are performed to multiple partitions, the SDK SHALL maintain a separate session token for each partition key range and ensure session consistency is maintained independently per partition. + +#### Manual session token override + +**Where** `RequestOptions.SessionToken` is explicitly set, **when** the request is made, the SDK SHALL use the provided session token instead of the SDK-managed token. + +### Requirement: Session Container + +The SDK SHALL maintain an internal session container for token storage. + +#### Token storage lifecycle + +**While** a `CosmosClient` instance is active, the SDK SHALL store session tokens in memory for the lifetime of the client. + +#### Cross-client session continuity + +**When** a session token obtained from one client's response is passed to another client via `RequestOptions.SessionToken`, the SDK SHALL enable the second client to achieve session-consistent reads relative to the first client's writes. + +### Requirement: Consistency Weakening Validation + +The SDK SHALL only allow weakening the account-level consistency, not strengthening it. + +#### Weaken from Strong to Session + +**Where** `CosmosClientOptions.ConsistencyLevel = ConsistencyLevel.Session` is set on an account with Strong consistency, the SDK SHALL use Session consistency as a valid weakening. + +#### Attempt to strengthen + +**If** `CosmosClientOptions.ConsistencyLevel = ConsistencyLevel.Strong` is set on an account with Eventual consistency, **then** the service SHALL reject the request with a 400 (Bad Request) error. + +## Interactions + +- **Retry Policies**: Session token mismatches (404/1002) trigger `ResetSessionTokenRetryPolicy` retries. See `retry-and-failover` spec. +- **Client Configuration**: Consistency is configured via `CosmosClientOptions`. See `client-and-configuration` spec. +- **Cross-Region Hedging**: Session tokens are propagated to hedged requests. See `cross-region-hedging` spec. +- **Transport**: Session token headers are attached in both Gateway and Direct mode. See `transport-and-connectivity` spec. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/CosmosClientOptions.cs` +- Source: `Microsoft.Azure.Cosmos/src/RequestOptions/RequestOptions.cs` +- Source: `Microsoft.Azure.Cosmos/src/SessionContainer.cs` +- Source: `Microsoft.Azure.Cosmos/src/SessionRetryOptions.cs` +- Source: `Microsoft.Azure.Cosmos/src/GatewayStoreModel.cs` \ No newline at end of file diff --git a/openspec/specs/container-and-database-management/spec.md b/openspec/specs/container-and-database-management/spec.md new file mode 100644 index 0000000000..2d77eb2de5 --- /dev/null +++ b/openspec/specs/container-and-database-management/spec.md @@ -0,0 +1,119 @@ +# Container and Database Management + +## Purpose + +The Azure Cosmos DB .NET SDK provides APIs for managing databases and containers (the two top-level resource types in the Cosmos DB resource hierarchy). Databases are units of management for containers; containers are the units of scalability for throughput and storage. This spec covers the CRUD operations for these resources and their configuration properties. + +## Public API Surface + +### Database Operations + +| Method | Returns | Purpose | +|--------|---------|---------| +| `CosmosClient.CreateDatabaseAsync` | `DatabaseResponse` | Create a database | +| `CosmosClient.CreateDatabaseIfNotExistsAsync` | `DatabaseResponse` | Create if not exists (200 if exists, 201 if created) | +| `Database.ReadAsync` | `DatabaseResponse` | Read database properties | +| `Database.DeleteAsync` | `DatabaseResponse` | Delete database and all contents | +| `Database.ReadThroughputAsync` | `ThroughputResponse` | Read provisioned throughput | +| `Database.ReplaceThroughputAsync` | `ThroughputResponse` | Update provisioned throughput | +| `CosmosClient.GetDatabase` | `Database` | Get proxy reference (no network call) | + +### Container Operations + +| Method | Returns | Purpose | +|--------|---------|---------| +| `Database.CreateContainerAsync` | `ContainerResponse` | Create a container | +| `Database.CreateContainerIfNotExistsAsync` | `ContainerResponse` | Create if not exists | +| `Container.ReadContainerAsync` | `ContainerResponse` | Read container properties | +| `Container.ReplaceContainerAsync` | `ContainerResponse` | Update container settings | +| `Container.DeleteContainerAsync` | `ContainerResponse` | Delete container and all items | +| `Container.ReadThroughputAsync` | `ThroughputResponse` | Read provisioned throughput | +| `Container.ReplaceThroughputAsync` | `ThroughputResponse` | Update provisioned throughput | +| `Database.GetContainer` | `Container` | Get proxy reference (no network call) | + +### ContainerProperties Key Settings + +| Property | Type | Notes | +|----------|------|-------| +| `Id` | `string` | Container name | +| `PartitionKeyPath` | `string` | Single partition key path (e.g., `"/userId"`) | +| `PartitionKeyPaths` | `IReadOnlyList` | Hierarchical partition key paths | +| `IndexingPolicy` | `IndexingPolicy` | Indexing configuration | +| `DefaultTimeToLive` | `int?` | TTL in seconds (-1 = no expiry, null = disabled) | +| `UniqueKeyPolicy` | `UniqueKeyPolicy` | Unique constraints | +| `ConflictResolutionPolicy` | `ConflictResolutionPolicy` | Multi-region conflict handling | +| `ChangeFeedPolicy` | `ChangeFeedPolicy` | Change feed retention (required for AllVersionsAndDeletes) | +| `ComputedProperties` | `Collection` | Server-computed properties | + +### IndexingPolicy + +| Property | Type | Notes | +|----------|------|-------| +| `Automatic` | `bool` | Auto-index all properties (default: `true`) | +| `IndexingMode` | `IndexingMode` | `Consistent` (default), `Lazy`, `None` | +| `IncludedPaths` | `Collection` | Paths to index | +| `ExcludedPaths` | `Collection` | Paths to exclude | +| `CompositeIndexes` | `Collection>` | Multi-property indexes for ORDER BY | +| `SpatialIndexes` | `Collection` | Geospatial indexes | +| `VectorIndexes` | `Collection` | Vector similarity search indexes (Preview) | + +## Requirements + +### Requirement: Proxy Reference Semantics + +The SDK SHALL return lightweight proxy references that do not validate resource existence. + +**When** `GetDatabase()` or `GetContainer()` is called, the SDK SHALL return a reference without making network calls. Operations on non-existent resources SHALL return 404. + +### Requirement: CreateIfNotExists Idempotency + +The SDK SHALL support idempotent create operations. + +**When** `CreateDatabaseIfNotExistsAsync` or `CreateContainerIfNotExistsAsync` is called, the SDK SHALL return 200 with the existing resource if it already exists, or 201 if newly created. + +### Requirement: Partition Key Immutability + +The SDK SHALL enforce that partition keys cannot be changed after container creation. + +**When** a container is created with a partition key definition, the SDK SHALL NOT allow the partition key to be changed via `ReplaceContainerAsync` or any other operation. + +### Requirement: Indexing Policy Management + +The SDK SHALL support configuring indexing policies on containers. + +#### Background re-indexing + +**When** indexing policy changes are applied via `ReplaceContainerAsync`, the SDK SHALL allow the service to trigger background re-indexing as needed. + +#### Vector indexes + +**Where** `IndexingPolicy.VectorIndexes` are configured (Preview), the SDK SHALL support creating vector similarity search indexes with types: Flat, QuantizedFlat, DiskANN. + +### Requirement: Time-to-Live (TTL) + +The SDK SHALL support configuring item expiration via TTL. + +#### Container-level TTL + +**Where** `ContainerProperties.DefaultTimeToLive` is set to a positive value, **when** items are created without an explicit TTL, the SDK SHALL expire items after the specified duration. + +#### Enable without default expiry + +**Where** `ContainerProperties.DefaultTimeToLive = -1`, the SDK SHALL enable TTL without a default expiry. Individual items SHALL set their own TTL via the `ttl` property. + +#### Item-level TTL override + +**While** a container has default TTL enabled, **when** an item is created with `ttl = -1`, the SDK SHALL ensure that item never expires. + +### Requirement: Throughput Management + +The SDK SHALL support provisioning throughput at database or container level. + +**When** `ReadThroughputAsync` or `ReplaceThroughputAsync` is called, the SDK SHALL manage throughput using `ThroughputProperties`, supporting both manual (`CreateManualThroughput()`) and autoscale (`CreateAutoscaleThroughput()`) modes. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Resource/Database/` +- Source: `Microsoft.Azure.Cosmos/src/Resource/Container/` +- Source: `Microsoft.Azure.Cosmos/src/Resource/Settings/ContainerProperties.cs` +- Source: `Microsoft.Azure.Cosmos/src/Resource/Settings/IndexingPolicy.cs` \ No newline at end of file diff --git a/openspec/specs/cross-region-hedging/spec.md b/openspec/specs/cross-region-hedging/spec.md new file mode 100644 index 0000000000..895a33a650 --- /dev/null +++ b/openspec/specs/cross-region-hedging/spec.md @@ -0,0 +1,223 @@ +# Cross-Region Hedging + +## Purpose + +The cross-region hedging availability strategy allows the SDK to send redundant parallel requests to multiple regions when the primary region is slow. This reduces tail latency and improves availability during regional degradation. When configured, the SDK fires hedge requests to secondary regions after a configurable threshold, returning the first final response. + +## Public API Surface + +### AvailabilityStrategy Factory + +```csharp +public abstract class AvailabilityStrategy +{ + // Create a cross-region hedging strategy + public static AvailabilityStrategy CrossRegionHedgingStrategy( + TimeSpan threshold, + TimeSpan? thresholdStep = null, + bool enableMultiWriteRegionHedge = false); + + // Disable hedging for a specific request (overrides client-level) + public static AvailabilityStrategy DisabledStrategy(); +} +``` + +### Configuration + +**Client-level:** +```csharp +CosmosClientOptions options = new CosmosClientOptions +{ + ApplicationPreferredRegions = new List { "East US", "West US", "Central US" }, + AvailabilityStrategy = AvailabilityStrategy.CrossRegionHedgingStrategy( + threshold: TimeSpan.FromMilliseconds(1500), + thresholdStep: TimeSpan.FromMilliseconds(1000)) +}; +``` + +**Request-level override:** +```csharp +ItemRequestOptions requestOptions = new ItemRequestOptions +{ + AvailabilityStrategy = AvailabilityStrategy.DisabledStrategy() // Disable for this request +}; +``` + +## Requirements + +### Requirement: Eligibility + +The SDK **SHALL** disable hedging and send the request directly when any disqualifying condition is met. + +#### Non-Document Resource Type + +**When** the operation targets a resource type other than `ResourceType.Document` (e.g., Database, Container, StoredProcedure, or other metadata resources), the SDK **SHALL NOT** hedge the request. + +#### Write Operations Without Multi-Master + +**When** the operation is a write, the SDK **SHALL** hedge the request only **if** `enableMultiWriteRegionHedge = true` **AND** the account supports multiple write locations. **If** either condition is not met, the SDK **SHALL** send the write to the primary region without hedging. Read operations **SHALL** always be eligible for hedging. + +#### Single Region + +**When** `GlobalEndpointManager.ReadEndpoints.Count == 1`, the SDK **SHALL NOT** hedge the request, as there is no secondary region available. + +#### No Preferred Regions + +**When** neither `ApplicationRegion` nor `ApplicationPreferredRegions` is configured, the SDK **SHALL NOT** hedge the request, as it cannot determine target regions. + +### Requirement: Resolution Priority + +The SDK **SHALL** resolve the availability strategy by applying request-level configuration over client-level configuration. + +#### Request-Level Override + +**When** `RequestOptions.AvailabilityStrategy` is non-null, the SDK **SHALL** use it as the effective strategy, ignoring `CosmosClientOptions.AvailabilityStrategy`. + +#### Client-Level Fallback + +**When** `RequestOptions.AvailabilityStrategy` is null, the SDK **SHALL** fall back to `CosmosClientOptions.AvailabilityStrategy`. + +#### Disabled Strategy Override + +**When** `DisabledStrategy()` is set on a request, the SDK **SHALL** disable hedging for that request even if client-level hedging is configured. + +### Requirement: Request Timing + +The SDK **SHALL** fire hedge requests according to the configured threshold schedule. + +#### Hedge Scheduling + +**Given** `N` regions in `ApplicationPreferredRegions`, `threshold = T`, and `thresholdStep = S`, the SDK **SHALL** fire requests as follows: + +| Request | Fired At | Target Region | Retry Scope | +|---------|----------|--------------|-------------| +| Primary (0) | T = 0 | 1st preferred region | Full cross-region retries | +| Hedge 1 | T = threshold | 2nd preferred region | Local retries only | +| Hedge 2 | T = threshold + thresholdStep | 3rd preferred region | Local retries only | +| Hedge N-1 | T = threshold + (N-2) x thresholdStep | Nth preferred region | Local retries only | + +**Example** (3 regions, threshold=100ms, step=50ms): +- T=0ms: Send to East US (primary) +- T=100ms: No response - send hedge to West US +- T=150ms: No response - send hedge to Central US +- First final response wins + +#### Primary vs Hedge Differences + +The SDK **SHALL** differentiate primary and hedge requests as follows: + +| Aspect | Primary (request 0) | Hedge (request 1+) | +|--------|---------------------|---------------------| +| `ExcludeRegions` | None | All regions except target | +| Cross-region retry | Full `ClientRetryPolicy` retry | Local retries only | +| Handler pipeline | Independent instance | Independent instance | +| Cancellation | Cancelled when any hedge returns final | Cancelled when any other returns final | + +### Requirement: Cancellation + +The SDK **SHALL** cancel all remaining in-flight requests as soon as a final response is received. + +#### Linked Cancellation Token + +**While** hedged requests are in flight, the SDK **SHALL** link all requests (primary + hedges) through a shared `CancellationTokenSource`. + +#### Final Response Triggers Cancellation + +**When** any request returns a **final** response, the SDK **SHALL** call `hedgeRequestsCancellationTokenSource.Cancel()` so that all other in-flight requests observe cancellation immediately through the linked token. + +#### Request Body Cloning + +**Before** dispatching hedge requests, the SDK **SHALL** clone the request `Content` stream to a `CloneableStream` once. All hedges **SHALL** share this clone. + +#### Cloned Request Disposal + +**After** each hedged request completes execution, the SDK **SHALL** dispose its cloned request in a `finally` block. + +### Requirement: Final vs Non-Final Response Classification + +The SDK **SHALL** classify every response as either final or non-final to determine whether to return it or continue waiting. + +#### Final Responses + +**When** a response has any of the following status codes, the SDK **SHALL** treat it as final, return it immediately, and cancel all other in-flight requests: + +- All 1xx, 2xx, 3xx status codes +- 400 (Bad Request) +- 401 (Unauthorized) +- 404/0 (Not Found - document truly absent) +- 405 (Method Not Allowed) +- 409 (Conflict) +- 412 (Precondition Failed) +- 413 (Request Entity Too Large) + +#### Non-Final Responses + +**When** a response has any of the following status codes, the SDK **SHALL** treat it as non-final (transient) and continue waiting for other responses: + +- 408 (Request Timeout) +- 404/1002 (Session Not Available) +- 429 (Too Many Requests) +- 500 (Internal Server Error) +- 503 (Service Unavailable) + +#### Accelerated Hedge on Non-Final + +**When** a non-final result arrives **and** there are still pending requests, the SDK **SHALL** skip any remaining threshold wait and immediately fire the next hedge. + +### Requirement: Thread Safety + +The SDK **SHALL** ensure safe concurrent execution of hedged requests. + +#### Independent Handler Pipelines + +**While** processing hedged requests, the SDK **SHALL** route each hedged request through its own independent handler pipeline instance. + +#### Cancellation as Synchronization + +The SDK **SHALL** use the linked `CancellationTokenSource` as the sole synchronization mechanism between parallel requests. + +#### First-Final-Wins Selection + +**When** multiple requests complete, the SDK **SHALL** use `Task.WhenAny` semantics - the first task to complete with a final result wins. + +#### Linked CTS Token Distribution (Race Condition Fix #5613) + +The SDK **SHALL** pass the linked CTS token (not the application's original token) to all requests, preventing abandoned tasks from accessing disposed request objects. + +## Configuration + +### Parameters + +| Parameter | Type | Validation | Default | +|-----------|------|-----------|---------| +| `threshold` | `TimeSpan` | Must be > `TimeSpan.Zero` | Required | +| `thresholdStep` | `TimeSpan?` | Must be > `TimeSpan.Zero` if provided | `null` (no additional hedges) | +| `enableMultiWriteRegionHedge` | `bool` | - | `false` | + +### SDK Default Strategy (Internal) + +For PPAF-enabled clients, the SDK applies a default hedging strategy: +- Threshold: `min(1000ms, RequestTimeout / 2)` +- ThresholdStep: 500ms +- This is internal and not configurable by customers. + +## Edge Cases + +1. **ThresholdStep = null**: Only the primary and one hedge request are sent (at `threshold`). No additional hedges. +2. **All responses non-final**: The SDK returns the last response received after all regions are exhausted. +3. **Request body cloning**: The request `Content` stream is cloned to a `CloneableStream` once. All hedges share this clone. +4. **Disposed client**: `RequestInvokerHandler` validates client state before hedging. Disposed clients return error immediately. + +## Interactions + +- **Handler Pipeline**: Hedging executes in `RequestInvokerHandler` (position #1). Each hedge traverses its own pipeline. See `handler-pipeline` spec. +- **Retry Policies**: Primary requests get full `ClientRetryPolicy` behavior (cross-region retries). Hedged requests use `ExcludeRegions` to restrict to local retries only. See `retry-and-failover` spec. +- **PPAF**: SDK-default hedging strategy is applied automatically for PPAF-enabled clients with reduced thresholds. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Routing/AvailabilityStrategy/AvailabilityStrategy.cs` +- Source: `Microsoft.Azure.Cosmos/src/Routing/AvailabilityStrategy/CrossRegionHedgingAvailabilityStrategy.cs` +- Source: `Microsoft.Azure.Cosmos/src/Handler/RequestInvokerHandler.cs` +- Design: `docs/Cross Region Request Hedging.md` +- Fix: PR #5613 (race condition in hedge cancellation) \ No newline at end of file diff --git a/openspec/specs/crud-operations/spec.md b/openspec/specs/crud-operations/spec.md new file mode 100644 index 0000000000..378d6f6077 --- /dev/null +++ b/openspec/specs/crud-operations/spec.md @@ -0,0 +1,114 @@ +# CRUD Operations + +## Purpose + +The Azure Cosmos DB .NET SDK provides item-level CRUD (Create, Read, Replace, Upsert, Delete) operations on containers. Every data-plane interaction with items flows through these APIs, which exist in two variants: a typed API that serializes/deserializes to `T` and a stream API that works with raw `Stream` payloads. + +## Public API Surface + +### Container Item Methods + +| Method | Returns | Success Code | Purpose | +|--------|---------|-------------|---------| +| `CreateItemAsync` | `ItemResponse` | 201 | Create a new item | +| `CreateItemStreamAsync` | `ResponseMessage` | 201 | Create (stream) | +| `ReadItemAsync` | `ItemResponse` | 200 | Read by id + partition key | +| `ReadItemStreamAsync` | `ResponseMessage` | 200 | Read (stream) | +| `ReplaceItemAsync` | `ItemResponse` | 200 | Full item replacement | +| `ReplaceItemStreamAsync` | `ResponseMessage` | 200 | Replace (stream) | +| `UpsertItemAsync` | `ItemResponse` | 200 or 201 | Create-or-replace | +| `UpsertItemStreamAsync` | `ResponseMessage` | 200 or 201 | Upsert (stream) | +| `DeleteItemAsync` | `ItemResponse` | 204 | Delete by id + partition key | +| `DeleteItemStreamAsync` | `ResponseMessage` | 204 | Delete (stream) | +| `ReadManyItemsAsync` | `FeedResponse` | 200 | Batch read multiple items | +| `ReadManyItemsStreamAsync` | `ResponseMessage` | 200 | Batch read (stream) | + +### Typed vs Stream API + +| Aspect | Typed (`ItemResponse`) | Stream (`ResponseMessage`) | +|--------|--------------------------|---------------------------| +| Error handling | Throws `CosmosException` on failure | Returns status code; caller checks `IsSuccessStatusCode` | +| Partition key | Optional for writes (auto-extracted from item) | Mandatory parameter | +| Deserialization | Automatic via container serializer | None; raw stream | +| Use case | Application code | Performance-critical paths, proxying | + +## Requirements + +### Requirement: Partition Key Handling + +1. **When** the caller invokes a Read or Delete operation, the SDK **shall** require a partition key parameter. +2. **When** the caller invokes a typed Create, Replace, or Upsert operation with a `null` partition key, the SDK **shall** extract the partition key from the serialized item using the container's partition key path definition. +3. **When** the caller invokes a stream Create, Replace, or Upsert operation, the SDK **shall** require a partition key parameter (the SDK cannot extract from an opaque stream without the caller's serializer context). +4. **Where** auto-extraction fails due to a stale partition key definition cache, the SDK **shall** refresh the cache and retry extraction via `PartitionKeyMismatchRetryPolicy`. + +### Requirement: Conditional Operations (ETags) + +| Operation | `IfMatchEtag` | `IfNoneMatchEtag` | +|-----------|--------------|-------------------| +| Create | Ignored | N/A | +| Read | N/A | Supported (returns 304 if match) | +| Replace | Supported (412 on mismatch) | N/A | +| Upsert | Respected only during replace phase | Respected only during replace phase | +| Delete | Supported (412 on mismatch) | N/A | + +1. **When** a Replace or Delete request includes `IfMatchEtag` and the server-side ETag does not match, the SDK **shall** return status code 412 (Precondition Failed). +2. **When** a Read request includes `IfNoneMatchEtag` and the server-side ETag matches, the SDK **shall** return status code 304 (Not Modified). +3. **When** an Upsert request includes `IfMatchEtag` or `IfNoneMatchEtag`, the SDK **shall** apply the condition only during the replace phase of the upsert. +4. **When** a Create request includes `IfMatchEtag`, the SDK **shall** ignore it. + +### Requirement: Response Behavior + +1. **When** `EnableContentResponseOnWrite` is set to `false` on `ItemRequestOptions`, write operations (Create, Replace, Upsert) **shall** return `null` for `Resource`, reducing network payload. Delete **shall** always return `null` for `Resource`. +2. **When** an Upsert operation succeeds, the SDK **shall** return status code 201 if the item was created or 200 if the item was replaced. Callers MUST check `StatusCode` to distinguish the outcome. +3. **When** a Delete operation succeeds, `Resource` **shall** always be `null` and the content stream **shall** be empty, regardless of options. +4. **When** a ReadMany operation is invoked, the SDK **shall** return partial results. Items not found **shall** be silently omitted — the operation **shall not** fail if some items are missing. + +### Requirement: Error Handling + +| Code | Substatus | Meaning | Operations | +|------|-----------|---------|-----------| +| 400 | — | Bad request (invalid PK, malformed item) | All | +| 404 | 0 | Item not found | Read, Replace, Delete | +| 409 | 0 | Conflict (item already exists) | Create | +| 412 | 0 | Precondition failed (ETag mismatch) | Replace, Delete, Upsert (replace phase) | +| 413 | — | Item exceeds size limit (2 MB) | Create, Replace, Upsert | +| 429 | — | Rate limited | All (handled by retry policy) | + +1. **When** the request payload is invalid (bad partition key, malformed item), the SDK **shall** return status code 400. +2. **When** a Read, Replace, or Delete targets an item that does not exist, the SDK **shall** return status code 404. +3. **When** a Create targets an item whose id already exists in the partition, the SDK **shall** return status code 409. +4. **When** a write operation exceeds the 2 MB item size limit, the SDK **shall** return status code 413. +5. **When** the service returns 429 (rate limited), the SDK **shall** retry the request according to the configured retry policy. + +### Requirement: Server-Side Triggers + +1. **When** `ItemRequestOptions.PreTriggers` or `PostTriggers` are specified on a write operation, the SDK **shall** include the trigger names in the request so they execute server-side. +2. **Where** triggers are configured, they **shall** execute within the same transaction as the operation. + +## Configuration + +### ItemRequestOptions + +| Property | Type | Effect | +|----------|------|--------| +| `IfMatchEtag` | `string` | Conditional write; 412 if ETag doesn't match | +| `IfNoneMatchEtag` | `string` | Conditional read; 304 if ETag matches | +| `EnableContentResponseOnWrite` | `bool?` | `false` = null Resource on writes | +| `IndexingDirective` | `IndexingDirective?` | Include/Exclude from indexing | +| `ConsistencyLevel` | `ConsistencyLevel?` | Override account default | +| `SessionToken` | `string` | For session consistency | +| `PreTriggers` / `PostTriggers` | `IEnumerable` | Server-side trigger names | + +## Interactions + +- **Handler Pipeline**: All CRUD operations flow through the full handler pipeline (`RequestInvokerHandler` → ... → `TransportHandler`). See `handler-pipeline` spec. +- **Retry Policies**: Transient failures (429, 503, etc.) are retried per the `retry-and-failover` spec. +- **Serialization**: Typed APIs use the container's `CosmosSerializer`. See `serialization` spec. +- **Partition Keys**: Routing depends on partition key. See `partition-keys` spec. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Resource/Container/ContainerCore.Items.cs` +- Source: `Microsoft.Azure.Cosmos/src/RequestOptions/ItemRequestOptions.cs` +- Source: `Microsoft.Azure.Cosmos/src/Resource/Response.cs` +- Design: `docs/SdkDesign.md` \ No newline at end of file diff --git a/openspec/specs/diagnostics-and-observability/spec.md b/openspec/specs/diagnostics-and-observability/spec.md new file mode 100644 index 0000000000..9b95018965 --- /dev/null +++ b/openspec/specs/diagnostics-and-observability/spec.md @@ -0,0 +1,399 @@ +# Diagnostics and Observability + +## Purpose + +The Azure Cosmos DB .NET SDK provides comprehensive diagnostics and observability through three layers: `CosmosDiagnostics` for per-request diagnostic information, a hierarchical `Trace` system for structured request telemetry, and OpenTelemetry integration for distributed tracing and metrics. Together these enable debugging latency issues, understanding retry behavior, and monitoring SDK health in production. + +## Public API Surface + +### CosmosDiagnostics + +Every response exposes diagnostics: + +```csharp +ItemResponse response = await container.ReadItemAsync(id, pk); +CosmosDiagnostics diagnostics = response.Diagnostics; + +TimeSpan elapsed = diagnostics.GetClientElapsedTime(); +IReadOnlyList<(string regionName, Uri uri)> regions = diagnostics.GetContactedRegions(); +string fullDiagnostics = diagnostics.ToString(); // JSON — lazy materialized +string summaryDiagnostics = diagnostics.ToString(DiagnosticsVerbosity.Summary); // Compacted summary +``` + +| Method | Returns | Notes | +|--------|---------|-------| +| `GetClientElapsedTime()` | `TimeSpan` | End-to-end client-side elapsed time | +| `GetStartTimeUtc()` | `DateTime?` | Request start time in UTC | +| `GetContactedRegions()` | `IReadOnlyList<(string, Uri)>` | Unique regions contacted (no ordering guarantee) | +| `GetFailedRequestCount()` | `int` | Count of failed sub-requests (retries) | +| `GetQueryMetrics()` | `ServerSideCumulativeMetrics` | Server-side query metrics (query operations only; null for others) | +| `ToString()` | `string` | Full JSON diagnostic tree — lazy materialized on first call | +| `ToString(DiagnosticsVerbosity)` | `string` | Diagnostics at the requested verbosity level | + +### DiagnosticsVerbosity Enum + +```csharp +namespace Microsoft.Azure.Cosmos +{ + /// + /// Controls the level of detail in CosmosDiagnostics serialized output. + /// + public enum DiagnosticsVerbosity + { + /// + /// Full diagnostic output with all individual request traces. + /// This is the default and preserves backward compatibility. + /// + Detailed = 0, + + /// + /// Compacted diagnostic output optimized for log size constraints. + /// Groups requests by region. Keeps first and last request in full detail. + /// Deduplicates middle requests by (StatusCode, SubStatusCode) with + /// aggregate statistics (count, total RU, min/max/P50 latency). + /// Respects MaxDiagnosticsSummarySizeBytes limit. + /// + Summary = 1, + } +} +``` + +### CosmosClientOptions — Diagnostics Verbosity + +```csharp +public class CosmosClientOptions +{ + /// + /// Gets or sets the default verbosity for CosmosDiagnostics serialization. + /// Default: . + /// Can also be set via the AZURE_COSMOS_DIAGNOSTICS_VERBOSITY environment variable. + /// + public DiagnosticsVerbosity DiagnosticsVerbosity { get; set; } = DiagnosticsVerbosity.Detailed; + + /// + /// Gets or sets the maximum size in bytes for Summary mode diagnostic output. + /// If the summary output exceeds this limit, a truncated indicator is returned. + /// Default: 8192 (8 KB). Minimum: 4096 (4 KB). + /// Can also be set via the AZURE_COSMOS_DIAGNOSTICS_MAX_SUMMARY_SIZE environment variable. + /// + public int MaxDiagnosticsSummarySizeBytes { get; set; } = 8192; +} +``` + +### OpenTelemetry Activity Sources + +| Source Name | Purpose | +|-------------|---------| +| `Azure.Cosmos.Operation` | Operation-level activities (spans) with semantic conventions | +| `Azure-Cosmos-Operation-Request-Diagnostics` | EventSource for events with request diagnostics JSON | + +### OpenTelemetry Meters + +| Meter Name | Purpose | +|------------|---------| +| `Azure.Cosmos.Client.Operation` | Operation-level metrics (duration, payload sizes) | +| `Azure.Cosmos.Client.Request` | Network-level request metrics | + +## Requirements + +### Requirement: Diagnostics Lifecycle + +**Scenario: Lazy materialization of diagnostics** +When `ToString()` is called on a `CosmosDiagnostics` instance, +the system shall serialize the trace tree to JSON only at that point, avoiding allocation overhead for operations where diagnostics are not inspected. + +**Scenario: Diagnostics always attached to responses** +While the SDK is processing any operation, +`ResponseMessage.Diagnostics` shall always be available, defaulting to `NoOpTrace` if no trace was captured. + +**Scenario: Immutability after materialization** +When the trace tree is serialized, +the system shall freeze the tree (walked state set recursively) before serialization to ensure consistency. + +**Scenario: Thread-safe region tracking** +While multiple threads are recording diagnostics concurrently, +`TraceSummary` shall use `HashSet` under lock for region deduplication and `Interlocked.Increment` for failure counting to guarantee thread safety. + +### Requirement: OpenTelemetry Distributed Tracing + +**Scenario: Distributed tracing disabled by default in GA** +When the SDK is built as a GA release, +`CosmosClientTelemetryOptions.DisableDistributedTracing` shall default to `true`. + +**Scenario: Distributed tracing enabled by default in Preview** +When the SDK is built as a Preview release, +`CosmosClientTelemetryOptions.DisableDistributedTracing` shall default to `false`. + +**Scenario: No overhead when tracing is disabled** +While distributed tracing is disabled, +the system shall not create any `Activity` objects. + +**Scenario: Semantic conventions compliance** +When distributed tracing is enabled, +activities shall follow [OpenTelemetry Cosmos DB semantic conventions](https://opentelemetry.io/docs/specs/semconv/database/cosmosdb/). + +**Scenario: Activity kind selection** +When an operation uses Gateway mode, +the activity kind shall be `ActivityKind.Internal`. +When an operation uses Direct mode, +the activity kind shall be `ActivityKind.Client`. + +### Requirement: Telemetry Events + +When distributed tracing is enabled, three event types are emitted: + +| Event | Level | Trigger | Payload | +|-------|-------|---------|---------| +| **FailedRequest** | Error | Non-success status code (≥300, excluding 404/0, 304/0, 409/0, 412/0) | Full diagnostics JSON | +| **LatencyOverThreshold** | Warning | Latency exceeds threshold OR RU/payload exceeds configured limits | Full diagnostics JSON | +| **Exception** | Error | Any exception during operation | Exception diagnostics | + +**Scenario: Failed request event emission** +When an operation returns a non-success status code (≥300, excluding 404/0, 304/0, 409/0, 412/0), +the system shall emit a **FailedRequest** event at Error level with the full diagnostics JSON as payload. + +**Scenario: Latency over threshold event emission** +When an operation's latency exceeds the configured threshold, or the RU charge or payload size exceeds configured limits, +the system shall emit a **LatencyOverThreshold** event at Warning level with the full diagnostics JSON as payload. + +**Scenario: Exception event emission** +When any exception occurs during an operation, +the system shall emit an **Exception** event at Error level with the exception diagnostics as payload. + +### Requirement: Diagnostics Verbosity + +**Scenario: Default verbosity is Detailed** +If no explicit verbosity is configured, +the system shall default to `DiagnosticsVerbosity.Detailed`, preserving full backward-compatible trace output. + +**Scenario: Parameterless ToString always returns Detailed** +When `ToString()` (parameterless) is called on a `CosmosDiagnostics` instance, +the system shall always return the full `Detailed` trace output, regardless of `CosmosClientOptions.DiagnosticsVerbosity` setting. + +**Scenario: In-memory trace tree unchanged by verbosity** +When any verbosity mode is selected, +the system shall leave the in-memory `ITrace` tree and `ClientSideRequestStatisticsTraceDatum` data unchanged. Compaction shall only occur at serialization time. + +**Scenario: Summary mode region grouping** +When `ToString(DiagnosticsVerbosity.Summary)` is called, +the system shall group all `StoreResponseStatistics` and `HttpResponseStatistics` entries by region. Entries with a null or empty region shall be grouped under `"Unknown"`. + +**Scenario: Summary mode first/last preservation** +When a region group contains two or more requests, +the system shall preserve full details of the chronologically first and last request in that region. + +**Scenario: Summary mode single request region** +When a region group contains exactly one request, +the system shall include only the first request with full details and omit the last request. + +**Scenario: Summary mode aggregated groups** +When a region group contains more than two requests, +the system shall aggregate the middle entries (all except first and last) by `(StatusCode, SubStatusCode)`, providing `Count`, `TotalRequestCharge`, `MinDurationMs`, `MaxDurationMs`, `P50DurationMs`, and `AvgDurationMs` for each group. + +**Scenario: Summary mode with mixed Direct and Gateway requests** +When the trace tree contains both `StoreResponseStatistics` (Direct mode) and `HttpResponseStatistics` (Gateway mode), +the system shall collect and treat both uniformly in the summary. Both transport types shall appear in the same region groups and aggregated groups. + +**Scenario: Summary mode size enforcement** +When the serialized summary JSON exceeds `MaxDiagnosticsSummarySizeBytes`, +the system shall fall back to a minimal truncated output containing `TotalDurationMs`, `TotalRequestCount`, `Truncated: true`, and a message directing users to use Detailed mode. + +**Scenario: Summary mode size under limit** +When the serialized summary JSON fits within `MaxDiagnosticsSummarySizeBytes`, +the system shall return the full summary JSON as-is. + +**Scenario: MaxDiagnosticsSummarySizeBytes minimum validation** +When `CosmosClientOptions.MaxDiagnosticsSummarySizeBytes` is set below 4096, +the system shall reject the value (minimum: 4096 bytes). + +**Scenario: Verbosity precedence** +When determining verbosity for serialization, +the system shall apply this precedence (highest to lowest): +1. Explicit `ToString(DiagnosticsVerbosity)` parameter +2. `CosmosClientOptions.DiagnosticsVerbosity` (set in code or populated from env var) +3. Default: `DiagnosticsVerbosity.Detailed` + +**Scenario: Environment variable configuration** +When the `AZURE_COSMOS_DIAGNOSTICS_VERBOSITY` environment variable is set and no code-level value overrides it, +the system shall use the environment variable value to populate `CosmosClientOptions.DiagnosticsVerbosity`. Valid values: `"Detailed"`, `"Summary"`. + +**Scenario: Code-level value overrides environment variable** +When `CosmosClientOptions.DiagnosticsVerbosity` is explicitly set in code, +the system shall use the code-set value regardless of the `AZURE_COSMOS_DIAGNOSTICS_VERBOSITY` environment variable. + +**Scenario: Summary mode region ordering** +When multiple regions are present in the summary output, +the system shall order region groups deterministically (alphabetically by region name). + +**Scenario: Summary mode caching** +When `ToString(DiagnosticsVerbosity)` is called multiple times with the same verbosity on the same `CosmosDiagnostics` instance, +the system shall cache and return the same serialized string to avoid redundant computation. + +### Requirement: Summary JSON Format + +The summary mode output shall conform to this structure: + +```json +{ + "Summary": { + "DiagnosticsVerbosity": "Summary", + "TotalDurationMs": 1234.5, + "TotalRequestCharge": 245.5, + "TotalRequestCount": 60, + "RegionsSummary": [ + { + "Region": "West US 2", + "RequestCount": 50, + "TotalRequestCharge": 200.0, + "First": { + "StatusCode": 429, + "SubStatusCode": 3200, + "RequestCharge": 0.0, + "DurationMs": 5, + "Region": "West US 2", + "Endpoint": "https://account-westus2.documents.azure.com", + "RequestStartTimeUtc": "2026-02-26T21:00:00.000Z", + "OperationType": "Read", + "ResourceType": "Document" + }, + "Last": { }, + "AggregatedGroups": [ + { + "StatusCode": 429, + "SubStatusCode": 3200, + "Count": 48, + "TotalRequestCharge": 0.0, + "MinDurationMs": 3, + "MaxDurationMs": 45, + "P50DurationMs": 12, + "AvgDurationMs": 15.3 + } + ] + } + ] + } +} +``` + +**Scenario: Truncated output format** +When summary output is truncated due to size limits, +the system shall emit: + +```json +{ + "Summary": { + "DiagnosticsVerbosity": "Summary", + "TotalDurationMs": 1234.5, + "TotalRequestCount": 60, + "Truncated": true, + "Message": "Summary output truncated to fit size limit. Set DiagnosticsVerbosity to Detailed for full diagnostics." + } +} +``` + +### Requirement: Threshold-Based Diagnostics + +```csharp +CosmosClientTelemetryOptions telemetryOptions = new() +{ + DisableDistributedTracing = false, + CosmosThresholdOptions = new CosmosThresholdOptions + { + PointOperationLatencyThreshold = TimeSpan.FromSeconds(1), // Default: 1s + NonPointOperationLatencyThreshold = TimeSpan.FromSeconds(3), // Default: 3s + RequestChargeThreshold = 100.0, // Optional: RU threshold + PayloadSizeThresholdInBytes = 1024 * 1024 // Optional: 1MB threshold + } +}; +``` + +**Scenario: Default point operation latency threshold** +If no custom threshold is configured for point operations, +the system shall use a default latency threshold of 1 second. + +**Scenario: Default non-point operation latency threshold** +If no custom threshold is configured for non-point operations, +the system shall use a default latency threshold of 3 seconds. + +**Scenario: Per-request threshold override** +When `RequestOptions.CosmosThresholdOptions` is set on an individual request, +the system shall use the per-request thresholds instead of the client-level thresholds for that operation. + +## Configuration + +### CosmosClientOptions — Diagnostics Verbosity + +| Property | Type | Default | Notes | +|----------|------|---------|-------| +| `DiagnosticsVerbosity` | `DiagnosticsVerbosity` | `Detailed` | Controls serialization detail level | +| `MaxDiagnosticsSummarySizeBytes` | `int` | `8192` (8 KB) | Max bytes for summary output. Minimum: 4096 | + +### Diagnostics Verbosity Environment Variables + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `AZURE_COSMOS_DIAGNOSTICS_VERBOSITY` | `string` | `"Detailed"` | Default verbosity when not set in code. Values: `"Detailed"`, `"Summary"` | +| `AZURE_COSMOS_DIAGNOSTICS_MAX_SUMMARY_SIZE` | `int` | `8192` | Max bytes for summary output. Minimum: 4096 | + +### CosmosClientTelemetryOptions + +| Property | Type | Default | Notes | +|----------|------|---------|-------| +| `DisableDistributedTracing` | `bool` | `true` (GA) / `false` (Preview) | Master switch for OpenTelemetry | +| `DisableSendingMetricsToService` | `bool` | `true` | Opt-in for Microsoft telemetry collection | +| `CosmosThresholdOptions` | `CosmosThresholdOptions` | defaults | Latency/RU/size thresholds for events | +| `QueryTextMode` | `QueryTextMode` | `None` | `None`, `Parameterized`, `All` — include queries in traces | +| `IsClientMetricsEnabled` | `bool` | `false` | Client-side metrics (Preview) | + +### Activity Attributes + +Key attributes on OpenTelemetry activities: + +| Attribute | Value | +|-----------|-------| +| `db.system.name` | `"cosmosdb"` | +| `db.operation.name` | Operation type (e.g., `read_item`, `query_items`) | +| `db.namespace` | Database name | +| `db.collection.name` | Container name | +| `db.response.status_code` | HTTP status code | +| `db.cosmosdb.sub_status_code` | Cosmos sub-status code | +| `db.cosmosdb.consistency_level` | Effective consistency level | +| `network.protocol.name` | `"https"` or `"rntbd"` | +| `cloud.region` | Contacted regions | + +## Trace Hierarchy + +The SDK captures diagnostics in a tree of `ITrace` nodes: + +``` +Root Trace (operation name) +├── Child: Authorization +├── Child: Transport +│ ├── Data: ClientSideRequestStatistics +│ └── Data: RegionsContacted +├── Child: Retry (if retried) +│ └── Child: Transport (retry attempt) +└── Data: CpuHistory (system usage) +``` + +- **Root trace**: Created per operation via `Trace.GetRootTrace()` +- **Children**: Added via `trace.StartChild()` — share the parent's `TraceSummary` +- **Data**: Key-value pairs added via `trace.AddDatum()` (latencies, statistics, metadata) +- **Summary**: Thread-safe aggregate of regions contacted and failure count across all nodes + +## Interactions + +- **Handler Pipeline**: `DiagnosticsHandler` (#3) captures CPU/system usage. `TelemetryHandler` (#4) collects operation telemetry. See `handler-pipeline` spec. +- **Retry Policies**: Each retry attempt adds a child trace node, making retries visible in diagnostics. See `retry-and-failover` spec. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Diagnostics/CosmosDiagnostics.cs` +- Source: `Microsoft.Azure.Cosmos/src/Diagnostics/DiagnosticsVerbosity.cs` +- Source: `Microsoft.Azure.Cosmos/src/Diagnostics/DiagnosticsSummaryWriter.cs` +- Source: `Microsoft.Azure.Cosmos/src/Tracing/Trace.cs` +- Source: `Microsoft.Azure.Cosmos/src/Telemetry/OpenTelemetry/` +- Source: `Microsoft.Azure.Cosmos/src/CosmosClientTelemetryOptions.cs` +- Source: `Microsoft.Azure.Cosmos/src/CosmosThresholdOptions.cs` +- Design: `docs/observability.md` +- Cross-SDK reference: [Azure/azure-sdk-for-rust#3592](https://github.com/Azure/azure-sdk-for-rust/pull/3592) — Rust SDK `DiagnosticsContext` with `Summary` and `Detailed` modes \ No newline at end of file diff --git a/openspec/specs/distributed-transactions/spec.md b/openspec/specs/distributed-transactions/spec.md new file mode 100644 index 0000000000..6c2f39bae9 --- /dev/null +++ b/openspec/specs/distributed-transactions/spec.md @@ -0,0 +1,56 @@ +# Distributed Transactions + +> **Status**: This spec is evolving alongside active development. The distributed transactions feature is under active implementation. Update this spec as the design solidifies. + +## Purpose + +Distributed transactions extend the Cosmos DB .NET SDK to support cross-partition transactional operations. Unlike `TransactionalBatch` which is scoped to a single partition key, distributed transactions coordinate atomic operations across multiple partitions. + +## Current State + +The distributed transactions feature is being actively developed with the following components: + +### Key Source Files + +- `Microsoft.Azure.Cosmos/src/DistributedTransaction/` — Core DTS implementation +- Includes: `DistributedTransaction.cs`, `DistributedWriteTransaction.cs`, `DistributedTransactionCommitter.cs` + +### Known Implementation Details + +1. **DTS routing**: Centralized request routing with constants for operation types and resource types. +2. **Operation type serialization**: Custom serialization for DTS-specific operation types. +3. **Partition key serialization**: Support for partition key serialization across transaction boundaries. +4. **Direct package integration**: Requires specific `Microsoft.Azure.Cosmos.Direct` package versions for DTS support. + +## Requirements (Preliminary) + +### Requirement: Cross-Partition Atomicity + +The SDK SHALL support atomic operations across multiple partition key ranges. + +**When** a distributed transaction is committed, all operations SHALL commit or roll back together, even across partition key ranges. + +### Requirement: Coordination Protocol + +The SDK SHALL coordinate distributed transactions via a two-phase commit-like protocol. + +**When** a distributed transaction is executed, the SDK SHALL coordinate with the service using a two-phase commit-like protocol. + +### Requirement: Resource Type Scope + +The SDK SHALL scope distributed transactions to document operations. + +**When** operations are added to a distributed transaction, only `ResourceType.Document` items SHALL be supported. + +## Open Questions + +- What are the size and operation count limits? +- What is the latency overhead compared to single-partition batch? +- How does DTS interact with availability strategies (hedging)? +- What retry semantics apply to distributed transactions? +- What consistency levels are supported? + +## References + +- Source: `Microsoft.Azure.Cosmos/src/DistributedTransaction/` +- Related PRs: #5624, #5607, #5619, #5615, #5576 \ No newline at end of file diff --git a/openspec/specs/handler-pipeline/spec.md b/openspec/specs/handler-pipeline/spec.md new file mode 100644 index 0000000000..80a80ca285 --- /dev/null +++ b/openspec/specs/handler-pipeline/spec.md @@ -0,0 +1,171 @@ +# Handler Pipeline + +## Purpose + +The Azure Cosmos DB .NET SDK processes all requests through a chain-of-responsibility handler pipeline. Each handler in the chain has a single responsibility (diagnostics, telemetry, retries, routing, transport) and processes a `RequestMessage`, passing it to the next handler via `InnerHandler`. Understanding the pipeline ordering and handler responsibilities is critical because the order determines which cross-cutting concerns apply to a request and in what sequence. + +## Public API Surface + +### RequestHandler Base Class + +```csharp +public abstract class RequestHandler +{ + public RequestHandler InnerHandler { get; set; } + public virtual Task SendAsync(RequestMessage request, CancellationToken cancellationToken); +} +``` + +### Custom Handler Injection + +```csharp +CosmosClientOptions options = new CosmosClientOptions +{ + CustomHandlers = { new MyLoggingHandler(), new MyMetricsHandler() } +}; +``` + +Custom handlers are inserted after `RequestInvokerHandler` and before `DiagnosticsHandler`. They must inherit from `RequestHandler` and must have `InnerHandler == null` at registration time (the SDK links them). + +## Pipeline Ordering + +``` +RequestInvokerHandler (#1 - entry point, validates, applies options) + | +Custom Handlers (#2 - user-provided, in registration order) + | +DiagnosticsHandler (#3 - captures CPU/system usage) + | +TelemetryHandler (#4 - collects operation telemetry) + | +RetryHandler (#5 - cross-region + throttle retries) + | +RouterHandler (#6 - routes to point or feed pipeline) + |-- Point operations ----> TransportHandler + +-- Feed operations -----> NamedCacheRetryHandler + | + PartitionKeyRangeHandler + | + TransportHandler +``` + +## Handler Specifications + +### 1. RequestInvokerHandler (Entry Point) + +- **File**: `Microsoft.Azure.Cosmos/src/Handler/RequestInvokerHandler.cs` +- **Responsibilities**: + - Validates client state (returns error if client is disposed) + - Applies request-level and client-level options (consistency level, priority, throughput bucket) + - Handles binary encoding negotiation (serialization format headers) + - Executes availability strategy (hedging) if configured — hedged requests each traverse their own independent pipeline +- **RequestMessage**: Reads `RequestOptions`, writes consistency/priority/serialization headers +- **ResponseMessage**: Converts to cloneable stream if binary encoding enabled; adds excluded regions to diagnostics + +### 2. Custom Handlers (User-Provided) + +- **Injection**: `CosmosClientOptions.CustomHandlers` +- **Constraints**: Must be stateless. Must inherit `RequestHandler`. `InnerHandler` must be `null` at registration. +- **Position**: After `RequestInvokerHandler`, before `DiagnosticsHandler` +- **Use cases**: Logging, request modification, metrics collection, request signing + +### 3. DiagnosticsHandler + +- **File**: `Microsoft.Azure.Cosmos/src/Handler/DiagnosticsHandler.cs` +- **Responsibilities**: Captures CPU and system usage statistics during request execution (best-effort — failures don't break the request) +- **Behavior**: Adds `CpuHistoryTraceDatum` to the request's `Trace` on completion + +### 4. TelemetryHandler + +- **File**: `Microsoft.Azure.Cosmos/src/Handler/TelemetryHandler.cs` +- **Responsibilities**: Collects operation telemetry (latency, payload size, regions contacted, request charge, status codes) for service monitoring +- **Behavior**: Non-blocking, exception-safe. Filters by resource type via `ClientTelemetryOptions.AllowedResourceTypes`. Runs collection asynchronously on background thread. + +### 5. RetryHandler + +- **File**: `Microsoft.Azure.Cosmos/src/Handler/RetryHandler.cs` +- **Responsibilities**: Cross-region retries and throttle retries. Uses a stack of pluggable retry policies. +- **Retry policy stack** (in order of application): + 1. `ResetSessionTokenRetryPolicy` — session token mismatch + 2. `ClientRetryPolicy` — cross-region failover (DNS failures, 410, 503, 403/3, 403/1008, 404/1002) + 3. `ResourceThrottleRetryPolicy` — rate limiting (429) +- **Behavior**: Loops on `ShouldRetryAsync` while response is non-success. Respects backoff delay and `CancellationToken`. See `retry-and-failover` spec for detailed retry logic. + +### 6. RouterHandler + +- **File**: `Microsoft.Azure.Cosmos/src/Handler/RouterHandler.cs` +- **Responsibilities**: Routes requests to one of two sub-pipelines based on operation type. +- **Decision**: Reads `request.IsPartitionKeyRangeHandlerRequired`: + - **`false`** (point operations) -> `TransportHandler` directly + - **`true`** (feed operations: queries, change feed, read feed) -> `NamedCacheRetryHandler` -> `PartitionKeyRangeHandler` -> `TransportHandler` + +### 7. NamedCacheRetryHandler (Feed Pipeline Only) + +- **File**: `Microsoft.Azure.Cosmos/src/Handler/NamedCacheRetryHandler.cs` +- **Responsibilities**: Retries feed operations when partition key range cache becomes stale due to partition splits or container recreation. + +### 8. PartitionKeyRangeHandler (Feed Pipeline Only) + +- **File**: `Microsoft.Azure.Cosmos/src/Handler/PartitionKeyRangeHandler.cs` +- **Responsibilities**: Resolves the target physical partition(s) for feed operations. Distributes cross-partition queries across partition key ranges. + +### 9. TransportHandler (Leaf) + +- **File**: `Microsoft.Azure.Cosmos/src/Handler/TransportHandler.cs` +- **Responsibilities**: + - Converts `RequestMessage` to `DocumentServiceRequest` + - Obtains authorization token + - Invokes transport layer: `GatewayStoreModel` (HTTP/Gateway mode) or `ServerStoreModel` (TCP/Direct mode) + - Converts exceptions to `ResponseMessage` + - Captures client-side request statistics + +## Requirements + +### Requirement: Handler Statelessness + +The SDK SHALL ensure all handlers are stateless. + +**When** a handler processes a request, the SDK SHALL NOT store request-specific state between calls. Each invocation of `SendAsync` SHALL be independent. + +### Requirement: Pipeline Immutability + +The SDK SHALL ensure the pipeline is immutable after client creation. + +**When** `CosmosClient` is constructed, the SDK SHALL link all handlers during construction. **After** construction, the pipeline SHALL NOT be modifiable. + +### Requirement: Custom Handler Ordering + +The SDK SHALL execute custom handlers before SDK handlers (except `RequestInvokerHandler`). + +**When** custom handlers are registered via `CosmosClientOptions.CustomHandlers`, the SDK SHALL insert them after `RequestInvokerHandler` (position #1) and before `DiagnosticsHandler` (position #3). Custom handlers SHALL see the raw request before retries, diagnostics, or telemetry processing. + +### Requirement: Independent Pipeline Per Hedge + +The SDK SHALL provide each hedged request its own independent pipeline. + +**When** an availability strategy (hedging) is active, the SDK SHALL execute each parallel request through its own handler pipeline instance, including independent retry behavior. + +### Requirement: Feed Operation Routing + +The SDK SHALL route feed operations through additional handlers. + +**When** `request.IsPartitionKeyRangeHandlerRequired` is `true` (queries, change feed, read feed), the SDK SHALL route the request through `NamedCacheRetryHandler` and `PartitionKeyRangeHandler` before reaching `TransportHandler`. Point operations SHALL go directly to `TransportHandler`. + +### Requirement: Exception Conversion + +The SDK SHALL convert handler exceptions to `ResponseMessage`. + +**When** a handler throws an unhandled exception, the SDK SHALL catch the exception and wrap it in a `ResponseMessage` with an appropriate status code. Handlers SHALL NOT propagate unhandled exceptions up the pipeline. + +## Interactions + +- **Retry Policies**: `RetryHandler` delegates to policies defined in `retry-and-failover` spec. +- **Availability Strategy**: `RequestInvokerHandler` executes hedging from `cross-region-hedging` spec. +- **Transport**: `TransportHandler` connects to Gateway or Direct mode based on `CosmosClientOptions.ConnectionMode`. +- **Diagnostics**: Each handler can add data to `RequestMessage.Trace` for diagnostic output. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Handler/` (all handler files) +- Source: `Microsoft.Azure.Cosmos/src/ClientPipelineBuilder.cs` +- Design: `docs/SdkDesign.md` (Handler Pipeline section) \ No newline at end of file diff --git a/openspec/specs/partition-keys/spec.md b/openspec/specs/partition-keys/spec.md new file mode 100644 index 0000000000..076a55f476 --- /dev/null +++ b/openspec/specs/partition-keys/spec.md @@ -0,0 +1,179 @@ +# Partition Keys + +## Purpose + +Every Azure Cosmos DB container is partitioned by a partition key, which determines how data is distributed across physical partitions. The .NET SDK provides the `PartitionKey` struct for single-key values and `PartitionKeyBuilder` for hierarchical (multi-hash) partition keys. Understanding partition key semantics is critical because every point operation requires a partition key for routing, and the SDK's auto-extraction behavior differs between typed and stream APIs. + +## Public API Surface + +### PartitionKey Struct + +```csharp +public readonly struct PartitionKey +{ + public PartitionKey(string partitionKeyValue); + public PartitionKey(bool partitionKeyValue); + public PartitionKey(double partitionKeyValue); + + public static readonly PartitionKey None; // No partition key value + public static readonly PartitionKey Null; // Explicit null value + + public bool IsNone { get; } + public override string ToString(); + public static bool TryParseJsonString(string json, out PartitionKey partitionKey); +} +``` + +### PartitionKeyBuilder (Hierarchical Partition Keys) + +```csharp +public sealed class PartitionKeyBuilder +{ + public PartitionKeyBuilder Add(string val); + public PartitionKeyBuilder Add(double val); + public PartitionKeyBuilder Add(bool val); + public PartitionKeyBuilder AddNullValue(); + public PartitionKeyBuilder AddNoneType(); + public PartitionKey Build(); // Throws ArgumentException if no values added +} +``` + +### ContainerProperties Partition Key Access + +```csharp +string path = containerProperties.PartitionKeyPath; // e.g., "/userId" +IReadOnlyList paths = containerProperties.PartitionKeyPaths; // e.g., ["/tenantId", "/userId"] +``` + +## Requirements + +### Requirement: PartitionKey.None vs PartitionKey.Null + +The SDK SHALL distinguish between `PartitionKey.None` and `PartitionKey.Null`. + +| Aspect | `PartitionKey.None` | `PartitionKey.Null` | +|--------|---------------------|---------------------| +| `IsNone` | `true` | `false` | +| Meaning | No partition key value provided or applicable | Explicit `null` value | +| Usage | Legacy or schema-flexible containers | Any container — `null` is a valid PK value | +| Multi-hash support | Not allowed | Allowed per component | +| Construction | `PartitionKey.None` | `new PartitionKey((string)null)` | +| ToString | `"None"` | `"null"` (JSON) | + +### Requirement: Partition Key Per Operation + +The SDK SHALL enforce partition key requirements per operation type. + +| Operation | Partition Key | Behavior | +|-----------|--------------|----------| +| `CreateItemAsync` | Optional | Auto-extracted from item if null | +| `CreateItemStreamAsync` | Required | Cannot auto-extract from opaque stream | +| `ReadItemAsync` | Required | Routing parameter | +| `ReplaceItemAsync` | Optional | Auto-extracted from item if null | +| `DeleteItemAsync` | Required | Routing parameter | +| `UpsertItemAsync` | Optional | Auto-extracted from item if null | +| `CreateTransactionalBatch` | Required | All items in batch must share same PK | +| Query (via `QueryRequestOptions`) | Optional | Null = cross-partition query | + +### Requirement: Auto-Extraction from Documents + +The SDK SHALL automatically extract partition key values from typed items. + +#### Typed write auto-extraction + +**When** a typed write operation receives `partitionKey=null`, the SDK SHALL serialize the item, navigate the JSON tree using the container's partition key path(s), and extract the value(s). + +#### Hierarchical key extraction + +**When** a container uses hierarchical partition keys, the SDK SHALL extract each path level's corresponding value from the document. + +#### Missing path handling + +**If** a path is missing from the document, the SDK SHALL extract the value as `Undefined`, which maps to `PartitionKey.None` semantics. + +#### Stale cache retry + +**If** extraction fails due to a stale partition key definition cache, the SDK SHALL retry with a refreshed cache via `PartitionKeyMismatchRetryPolicy`. + +### Requirement: Hierarchical (Multi-Hash) Partition Keys + +The SDK SHALL support hierarchical partition keys with `PartitionKeyBuilder`. + +#### Version and kind requirements + +**When** hierarchical partition keys are used, the container SHALL require `PartitionKeyDefinitionVersion.V2` and `PartitionKind.MultiHash`. + +#### Ordered path components + +**When** building a hierarchical partition key via `PartitionKeyBuilder`, `Add()` calls SHALL correspond to paths in definition order. + +#### Complete key for point operations + +**When** a point operation is performed with a hierarchical partition key, all path components SHALL be provided. Incomplete keys SHALL return 400 Bad Request. + +#### Prefix routing for queries + +**When** a query operation provides only the first N components of an M-level hierarchical key, the SDK SHALL route to partitions matching that prefix. + +#### Empty builder validation + +**When** `PartitionKeyBuilder.Build()` is called with no values added, the SDK SHALL throw `ArgumentException`. + +### Requirement: Partition Key Immutability + +The SDK SHALL enforce that an item's partition key value is immutable. + +**When** a Replace or Upsert operation is performed, the SDK SHALL NOT allow changing the item's partition key value. To change an item's partition key, the item SHALL be deleted and recreated with the desired key. + +### Requirement: Supported Value Types + +The SDK SHALL support the following partition key value types. + +| Type | Constructor | Notes | +|------|------------|-------| +| `string` | `new PartitionKey("value")` | Most common | +| `bool` | `new PartitionKey(true)` | Boolean partition keys | +| `double` | `new PartitionKey(42.0)` | All numeric types as double | +| `null` | `new PartitionKey((string)null)` | Creates `PartitionKey.Null` | + +### Requirement: Equality and Hashing + +The SDK SHALL support value-based equality for `PartitionKey`. + +**When** two `PartitionKey` instances have the same value, the SDK SHALL consider them equal via `==`, `!=`, `Equals()`, and `GetHashCode()`, regardless of how they were constructed. + +## Configuration + +### Container Creation + +```csharp +// Single partition key +new ContainerProperties(id: "myContainer", partitionKeyPath: "/userId") + +// Hierarchical partition keys +new ContainerProperties( + id: "myContainer", + partitionKeyPaths: new[] { "/tenantId", "/userId" }) +``` + +### PartitionKeyDefinition Properties + +| Property | Type | Values | +|----------|------|--------| +| `Paths` | `Collection` | 1-3 paths (e.g., `["/tenantId", "/userId"]`) | +| `Kind` | `PartitionKind` | `Hash` (single), `MultiHash` (hierarchical) | +| `Version` | `PartitionKeyDefinitionVersion` | `V1` (default single), `V2` (required for MultiHash) | + +## Interactions + +- **CRUD Operations**: Every point operation routes by partition key. See `crud-operations` spec. +- **Query**: `QueryRequestOptions.PartitionKey` controls single-partition vs cross-partition routing. See `query-and-linq` spec. +- **Handler Pipeline**: `PartitionKeyRangeHandler` resolves partition key ranges for feed operations. See `handler-pipeline` spec. +- **Batch**: All items in a `TransactionalBatch` must share the same partition key. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/PartitionKey.cs` +- Source: `Microsoft.Azure.Cosmos/src/PartitionKeyBuilder.cs` +- Source: `Microsoft.Azure.Cosmos/src/Resource/Settings/ContainerProperties.cs` +- Source: `Microsoft.Azure.Cosmos/src/Routing/DocumentAnalyzer.cs` \ No newline at end of file diff --git a/openspec/specs/patch-operations/spec.md b/openspec/specs/patch-operations/spec.md new file mode 100644 index 0000000000..7124c8909e --- /dev/null +++ b/openspec/specs/patch-operations/spec.md @@ -0,0 +1,105 @@ +# Patch Operations + +## Purpose + +The Azure Cosmos DB .NET SDK supports partial document updates via JSON patch operations. Instead of replacing an entire item, patch allows modifying specific properties atomically with lower RU cost. The SDK provides a `PatchOperation` builder and supports conditional patches via filter predicates. + +## Public API Surface + +### Container Patch Methods + +```csharp +// Typed +Task> PatchItemAsync( + string id, PartitionKey partitionKey, + IReadOnlyList patchOperations, + PatchItemRequestOptions requestOptions = null, + CancellationToken cancellationToken = default); + +// Stream +Task PatchItemStreamAsync( + string id, PartitionKey partitionKey, + IReadOnlyList patchOperations, + PatchItemRequestOptions requestOptions = null, + CancellationToken cancellationToken = default); +``` + +### PatchOperation Types + +| Factory Method | Parameters | Purpose | +|---------------|-----------|---------| +| `PatchOperation.Add(path, value)` | JSON path, value | Add property (or array element) | +| `PatchOperation.Remove(path)` | JSON path | Remove property | +| `PatchOperation.Replace(path, value)` | JSON path, value | Replace existing property value | +| `PatchOperation.Set(path, value)` | JSON path, value | Set property (create if not exists) | +| `PatchOperation.Increment(path, value)` | JSON path, long/double | Atomic increment/decrement | +| `PatchOperation.Move(from, path)` | Source path, target path | Move property to new location | + +### PatchItemRequestOptions + +| Property | Type | Effect | +|----------|------|--------| +| `FilterPredicate` | `string` | SQL WHERE clause for conditional patch (e.g., `"from c where c.status = 'active'"`) | +| `IfMatchEtag` | `string` | Conditional patch; 412 if ETag mismatch | +| `EnableContentResponseOnWrite` | `bool?` | Skip response payload | + +## Requirements + +### Requirement: Atomic Execution + +The SDK SHALL execute all patch operations in a single call atomically. + +**When** `PatchItemAsync` is called with multiple `PatchOperation` entries, all operations SHALL succeed or all SHALL fail. No partial application SHALL occur. + +### Requirement: Set vs Add vs Replace Semantics + +The SDK SHALL differentiate Set, Add, and Replace operations. + +#### Set (upsert semantics) + +**When** `PatchOperation.Set(path, value)` is used, the SDK SHALL create the property if it does not exist, or replace it if it does. + +#### Replace (strict) + +**When** `PatchOperation.Replace(path, value)` is used, the SDK SHALL fail if the property does not exist. + +#### Add + +**When** `PatchOperation.Add(path, value)` is used, the SDK SHALL add the property or array element at the specified path. + +### Requirement: Atomic Increment + +The SDK SHALL support server-side atomic increment/decrement. + +**When** `PatchOperation.Increment(path, value)` is used, the SDK SHALL perform a server-side atomic increment without read-modify-write race conditions. + +### Requirement: Conditional Patches + +The SDK SHALL support conditional patches via filter predicates. + +**Where** `PatchItemRequestOptions.FilterPredicate` is set to a SQL condition, **if** the condition evaluates to false against the current item, the SDK SHALL return 412 (Precondition Failed) without modifying the item. + +### Requirement: Path Syntax + +The SDK SHALL use JSON pointer-like path syntax for patch operations (e.g., `/address/city`, `/tags/0`). + +### Requirement: Immutable Fields + +The SDK SHALL prevent modification of immutable fields via patch. + +**When** a patch operation targets the `id` or partition key properties, the SDK SHALL reject the operation. + +### Requirement: Operation Limit + +The SDK SHALL enforce a maximum of 10 patch operations per call. + +## Interactions + +- **CRUD Operations**: Patch is an alternative to Replace for partial updates. See `crud-operations` spec. +- **Batch**: Patch can be included in `TransactionalBatch` via `batch.PatchItem()`. See `batch-and-transactional` spec. +- **Serialization**: Patch values are serialized using the configured serializer. See `serialization` spec. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Patch/PatchOperation.cs` +- Source: `Microsoft.Azure.Cosmos/src/RequestOptions/PatchItemRequestOptions.cs` \ No newline at end of file diff --git a/openspec/specs/query-and-linq/spec.md b/openspec/specs/query-and-linq/spec.md new file mode 100644 index 0000000000..35b08e4e24 --- /dev/null +++ b/openspec/specs/query-and-linq/spec.md @@ -0,0 +1,212 @@ +# Query and LINQ + +## Purpose + +The Azure Cosmos DB .NET SDK provides SQL query execution and LINQ-to-SQL translation for reading items from containers. Queries return results through the `FeedIterator` pattern, which supports asynchronous pagination with continuation tokens. The SDK handles cross-partition fan-out, query plan generation, and distributed execution transparently. + +## Public API Surface + +### Container Query Methods + +| Method | Parameters | Returns | Purpose | +|--------|-----------|---------|---------| +| `GetItemQueryIterator` | `QueryDefinition`, continuation, options | `FeedIterator` | Typed query with deserialization | +| `GetItemQueryIterator` | `string queryText`, continuation, options | `FeedIterator` | Typed query with inline SQL | +| `GetItemQueryStreamIterator` | `QueryDefinition`, continuation, options | `FeedIterator` | Stream query; raw JSON response | +| `GetItemQueryStreamIterator` | `string queryText`, continuation, options | `FeedIterator` | Stream query with inline SQL | +| `GetItemQueryIterator` | `FeedRange`, `QueryDefinition`, continuation, options | `FeedIterator` | Partition-scoped typed query | +| `GetItemQueryStreamIterator` | `FeedRange`, `QueryDefinition`, continuation, options | `FeedIterator` | Partition-scoped stream query | +| `GetItemLinqQueryable` | `allowSynchronousQueryExecution`, continuation, options, linqSerializerOptions | `IOrderedQueryable` | LINQ provider entry point | +| `GetFeedRangesAsync` | `CancellationToken` | `Task>` | Get partition ranges for parallel queries | + +### FeedIterator Pattern + +```csharp +// Typed +public abstract class FeedIterator : IDisposable +{ + public abstract bool HasMoreResults { get; } + public abstract Task> ReadNextAsync(CancellationToken cancellationToken = default); +} + +// Stream +public abstract class FeedIterator : IDisposable +{ + public abstract bool HasMoreResults { get; } + public abstract Task ReadNextAsync(CancellationToken cancellationToken = default); +} +``` + +### QueryDefinition + +```csharp +public class QueryDefinition +{ + public string QueryText { get; } + public QueryDefinition WithParameter(string name, object value); + public QueryDefinition WithParameterStream(string name, Stream value); + public IReadOnlyList<(string Name, object Value)> GetQueryParameters(); +} +``` + +## Requirements + +### Requirement: FeedIterator Lifecycle + +The SDK SHALL manage query results through the FeedIterator pattern with specific lifecycle guarantees. + +#### HasMoreResults initial state + +**When** a FeedIterator is created, `HasMoreResults` SHALL be `true` and SHALL remain `true` until the server returns a `null` continuation token or 304 Not Modified. + +#### Exhaustion semantics + +**When** `HasMoreResults` becomes `false`, the query SHALL be exhausted with no more pages available. + +#### Exception resilience + +**When** `ReadNextAsync()` throws an exception, `HasMoreResults` SHALL remain `true`. The caller SHALL decide whether to retry. + +#### Disposal requirement + +**When** a FeedIterator is no longer needed, it SHALL be disposed to avoid resource leaks. The SDK SHALL implement `IDisposable`. + +#### Empty pages + +**When** a page is returned with 0 items but a non-null continuation token, the SDK SHALL treat this as valid behavior. The caller SHALL continue iterating. + +### Requirement: Parameterized Queries + +The SDK SHALL support parameterized queries for safe SQL execution. + +#### Parameter replacement + +**When** `WithParameter` is called with an existing parameter name, the SDK SHALL replace the previous value. + +#### SQL injection prevention + +**When** parameter values are provided, the SDK SHALL NOT parse them as SQL. Parameters SHALL prevent SQL injection by design. + +#### Supported parameter types + +**When** parameters are added, the SDK SHALL support primitives, objects, arrays, and `Stream` (via `WithParameterStream`). + +### Requirement: Cross-Partition vs Single-Partition Queries + +The SDK SHALL support both single-partition and cross-partition query execution. + +#### Single-partition routing + +**Where** `QueryRequestOptions.PartitionKey` is set, **when** a query is executed, the SDK SHALL target a single partition for faster execution and lower RU cost. + +#### Cross-partition fan-out + +**Where** `QueryRequestOptions.PartitionKey` is null, **when** a query is executed, the SDK SHALL execute a cross-partition query, fanning out to all physical partitions and merging results. + +#### Null query as read feed + +**When** `QueryDefinition` or `queryText` is `null`, the SDK SHALL treat this as a read feed, returning all items with no WHERE clause. + +#### Cross-partition ordering + +**When** a cross-partition query uses `ORDER BY`, the SDK SHALL perform server-side sorting, which MAY increase RU cost. No implicit ordering SHALL be guaranteed across partitions. + +### Requirement: FeedRange-Based Parallelism + +The SDK SHALL support partition-scoped parallel queries via FeedRange. + +#### FeedRange per physical partition + +**When** `GetFeedRangesAsync()` is called, the SDK SHALL return one `FeedRange` per physical partition. Ranges SHALL be mutually exclusive. + +#### Independent parallel queries + +**When** a query is scoped to a `FeedRange`, the SDK SHALL execute it independently against that partition. + +#### Range-specific continuation tokens + +**When** continuation tokens are generated for FeedRange queries, they SHALL be range-specific and not interchangeable between ranges. + +#### Transparent split handling + +**When** a physical partition splits during iteration, the SDK SHALL handle it transparently. + +### Requirement: Continuation Tokens + +The SDK SHALL manage opaque, version-bound continuation tokens for query resumption. + +#### Opaque tokens + +**When** a continuation token is returned, callers SHALL NOT parse or construct tokens manually. The SDK SHALL treat them as opaque. + +#### Version and container binding + +**When** a continuation token is used, the SDK SHALL validate it is compatible with the current container and SDK version. Tokens from different containers or SDK versions SHALL be invalid. + +#### Token size control + +**Where** `QueryRequestOptions.ResponseContinuationTokenLimitInKb` is set, the SDK SHALL limit the maximum token size accordingly. + +#### Options snapshot at creation + +**When** a FeedIterator is created, `QueryRequestOptions` SHALL be copied at creation time. Modifying options after creation SHALL have no effect. + +### Requirement: LINQ Provider + +The SDK SHALL support LINQ-to-SQL translation for type-safe queries. + +#### Queryable interface + +**When** `GetItemLinqQueryable()` is called, the SDK SHALL return an `IOrderedQueryable` backed by `CosmosLinqQuery`. + +#### Supported operators + +**When** LINQ expressions are built, the SDK SHALL support: `Where`, `Select`, `OrderBy`, `OrderByDescending`, `ThenBy`, `Take`, `Skip`, `Distinct`, `Count`, `Sum`, `Average`, `Min`, `Max`, `Join`, `GroupJoin`, `OfType`. + +#### Lazy expression building + +**When** LINQ expressions are composed, the SDK SHALL NOT execute them until materialization (e.g., `ToFeedIterator()` or enumeration). + +#### Async execution (recommended) + +**When** executing LINQ queries, callers SHOULD call `.ToFeedIterator()` to get a `FeedIterator` and iterate with `ReadNextAsync()`. + +#### Synchronous execution + +**Where** `allowSynchronousQueryExecution=true`, **when** `.ToList()` or direct enumeration is used, the SDK SHALL execute the query synchronously, blocking the calling thread. + +#### Non-translatable expressions + +**When** a LINQ expression contains non-translatable methods (custom methods, `ToString()`, etc.), the SDK SHALL fail at query execution time, not at expression-build time. + +## Configuration + +### QueryRequestOptions + +| Property | Type | Effect | +|----------|------|--------| +| `MaxItemCount` | `int?` | Page size hint; -1 = dynamic; 0 is invalid | +| `MaxConcurrency` | `int?` | Parallelism for cross-partition; -1 = auto | +| `MaxBufferedItemCount` | `int?` | Client-side buffer during parallel execution | +| `PartitionKey` | `PartitionKey?` | Single-partition routing (null = cross-partition) | +| `EnableScanInQuery` | `bool?` | Allow scans when indexes do not cover query | +| `EnableOptimisticDirectExecution` | `bool` | Try direct execution before query plan | +| `PopulateIndexMetrics` | `bool?` | Return index usage stats | +| `ConsistencyLevel` | `ConsistencyLevel?` | Override account default | +| `SessionToken` | `string` | Session consistency token | +| `ResponseContinuationTokenLimitInKb` | `int?` | Max continuation token size | + +## Interactions + +- **Handler Pipeline**: Query requests flow through the full handler pipeline. For cross-partition queries, `PartitionKeyRangeHandler` distributes across partitions. See `handler-pipeline` spec. +- **Partition Keys**: Single-partition queries require a partition key in `QueryRequestOptions`. See `partition-keys` spec. +- **Serialization**: `FeedIterator` uses the container's serializer for deserialization. See `serialization` spec. +- **Retry**: Query page fetches are retried per `retry-and-failover` spec policies. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Resource/Container/Container.cs` +- Source: `Microsoft.Azure.Cosmos/src/Query/v3Query/QueryDefinition.cs` +- Source: `Microsoft.Azure.Cosmos/src/Resource/FeedIterators/FeedIterator.cs` +- Source: `Microsoft.Azure.Cosmos/src/Linq/CosmosLinqQuery.cs` +- Source: `Microsoft.Azure.Cosmos/src/RequestOptions/QueryRequestOptions.cs` \ No newline at end of file diff --git a/openspec/specs/retry-and-failover/spec.md b/openspec/specs/retry-and-failover/spec.md new file mode 100644 index 0000000000..e7480a3045 --- /dev/null +++ b/openspec/specs/retry-and-failover/spec.md @@ -0,0 +1,324 @@ +# Retry and Failover + +## Purpose + +The Azure Cosmos DB .NET SDK implements multiple retry policies to handle transient failures, throttling, region failovers, and partition-level unavailability. These policies are layered — each handles a specific class of errors — and are orchestrated by the `RetryHandler` in the handler pipeline. Correct retry behavior is critical for SDK reliability; bugs in retry logic can cause outages or data loss. + +## Retry Policy Stack + +The `RetryHandler` applies policies in this order: + +1. **`ResetSessionTokenRetryPolicy`** — Session token mismatch +2. **`ClientRetryPolicy`** — Cross-region failover and partition failover (PPAF/PPCB) +3. **`ResourceThrottleRetryPolicy`** — Rate limiting (HTTP 429) + +Additionally: +- **`MetadataRequestThrottleRetryPolicy`** — For metadata/control-plane requests +- **`HttpTimeoutPolicy` variants** — Transport-level timeout and retry at the HTTP layer +- **`PartitionKeyMismatchRetryPolicy`** — Stale partition key definition cache + +## Requirements + +### Requirement: ClientRetryPolicy Status Code Handling + +The `ClientRetryPolicy` SHALL evaluate the response status code and substatus to determine whether a request is eligible for retry and which failover action to take. + +**Status code reference table:** + +| Status Code | Substatus | Condition | Retry? | Behavior | +|-------------|-----------|-----------|--------|----------| +| 403 | 3 (WriteForbidden) | Write request | ✅ | Endpoint not writable; mark partition for failover, refresh cache | +| 403 | 51 (DatabaseAccountNotFound) | Read or multi-master write | ✅ | Regional endpoint unavailable; fail over to preferred locations | +| 403 | 1008 | Region being added/removed | ✅ | Region marked unavailable for 5 min; retry on next region | +| 404 | 1002 (ReadSessionNotAvailable) | Session consistency | ✅ | Session token mismatch; retry on write region (single-master) or next preferred region (multi-master) | +| 408 | Any | PPAF/PPCB eligible | ✅ | Request timeout; mark partition unavailable, retry next region | +| 410 | 1022 (LeaseNotFound) | Always | ✅ | Partition recreated/moved; retry on next region | +| 429 | 3092 (SystemResourceUnavailable) | Multi-master write only | ⚠️ | Treated as 503; partition marked unavailable | +| 429 | Other | Always | ✅ | Delegates to `ResourceThrottleRetryPolicy` | +| 500 | Any | Read requests only | ✅ | Internal server error on reads; retry next region | +| 503 | Any | Always | ✅ | Service unavailable; PPAF/PPCB marks partition, retry if regions available | +| HttpRequestException | — | Gateway/DNS failure | ✅ | Mark partition unavailable; retry next region | +| OperationCanceledException | — | Cancellation | ⚠️ | PPAF/PPCB marks partition for future requests; current request may not retry | + +#### WriteForbidden (403/3) + +**When** the SDK receives HTTP 403 with substatus 3 (WriteForbidden) on a write request, **then** the SDK **SHALL** mark the partition for failover, refresh the location cache, and retry the request on the next available region. + +#### DatabaseAccountNotFound (403/51) + +**When** the SDK receives HTTP 403 with substatus 51 (DatabaseAccountNotFound) on a read request or a multi-master write request, **then** the SDK **SHALL** fail over to the next preferred location and retry the request. + +#### Region Add/Remove (403/1008) + +**When** the SDK receives HTTP 403 with substatus 1008, **then** the SDK **SHALL** mark the region as unavailable for 5 minutes and retry the request on the next available region. + +#### ReadSessionNotAvailable (404/1002) + +**When** the SDK receives HTTP 404 with substatus 1002 (ReadSessionNotAvailable) under session consistency, **then** the SDK **SHALL** retry on the write region (single-master) or the next preferred region (multi-master). + +#### RequestTimeout (408) + +**When** the SDK receives HTTP 408 (RequestTimeout) **and** the request is PPAF/PPCB eligible, **then** the SDK **SHALL** mark the partition as unavailable and retry the request on the next available region. + +#### LeaseNotFound (410/1022) + +**When** the SDK receives HTTP 410 with substatus 1022 (LeaseNotFound), **then** the SDK **SHALL** retry the request on the next available region regardless of request type. + +#### SystemResourceUnavailable (429/3092) + +**When** the SDK receives HTTP 429 with substatus 3092 (SystemResourceUnavailable) on a multi-master write request, **then** the SDK **SHALL** treat the response as HTTP 503 and mark the partition as unavailable. + +#### Throttling (429 — Other Substatus) + +**When** the SDK receives HTTP 429 with any substatus other than 3092, **then** the SDK **SHALL** delegate retry handling to the `ResourceThrottleRetryPolicy`. + +#### InternalServerError (500) + +**When** the SDK receives HTTP 500 on a read request, **then** the SDK **SHALL** retry the request on the next available region. + +**If** the SDK receives HTTP 500 on a write request, **then** the SDK **SHALL NOT** retry the request. + +#### ServiceUnavailable (503) + +**When** the SDK receives HTTP 503, **then** the SDK **SHALL** mark the partition via PPAF/PPCB and retry the request if additional regions are available. + +#### HttpRequestException (Gateway/DNS Failure) + +**When** the SDK catches an `HttpRequestException` (e.g., gateway or DNS failure), **then** the SDK **SHALL** mark the partition as unavailable and retry the request on the next available region. + +#### OperationCanceledException + +**When** the SDK catches an `OperationCanceledException`, **then** the SDK **SHALL** mark the partition via PPAF/PPCB for future requests. The current request **may** not be retried. + +### Requirement: ClientRetryPolicy Limits + +The `ClientRetryPolicy` SHALL enforce upper bounds on retry attempts to prevent unbounded retry loops. + +**Retry limit reference table:** + +| Parameter | Value | +|-----------|-------| +| Max failover retries | 120 | +| Retry interval | 1 second between failover retries | +| Max session token retries (single-master) | 1 | +| Max session token retries (multi-master) | Number of configured endpoints | +| Max service unavailable retries | 1 | + +#### Max Failover Retries + +**While** performing cross-region failover retries, the SDK **SHALL** retry at most 120 times with a 1-second interval between retries. + +#### Session Token Retry Limit — Single-Master + +**Where** the account is configured as single-master, **when** a session token mismatch occurs, **then** the SDK **SHALL** retry at most 1 time on the write (primary) region. + +#### Session Token Retry Limit — Multi-Master + +**Where** the account is configured as multi-master, **when** a session token mismatch occurs, **then** the SDK **SHALL** retry on each configured endpoint in order, up to `endpoints.Count` times. + +#### Service Unavailable Retry Limit + +**When** the SDK encounters a service unavailable (503) response, **then** the SDK **SHALL** retry at most 1 time before propagating the failure. + +### Requirement: ResourceThrottleRetryPolicy + +The `ResourceThrottleRetryPolicy` SHALL handle HTTP 429 (rate-limited) responses by retrying with server-specified delays up to configurable limits. + +**Throttle retry parameter reference table:** + +| Parameter | Default | +|-----------|---------| +| Max retry attempts | 9 | +| Max cumulative wait time | 60 seconds | +| Retry delay source | `x-ms-retry-after` response header | +| Fallback delay (if header = 0) | 5 seconds | +| Backoff strategy | Configurable factor (default = 1, no escalation) | + +#### Retry Decision + +**When** the SDK receives HTTP 429, **then** the SDK **SHALL** retry **if** `currentAttempt < maxAttempts AND cumulativeDelay + nextDelay ≤ maxWaitTime`. + +**If** the cumulative delay would exceed `maxWaitTime` or the current attempt has reached `maxAttempts`, **then** the SDK **SHALL NOT** retry and **SHALL** propagate the 429 response to the caller. + +#### Retry Delay Selection + +**When** the SDK retries a throttled request, **then** the SDK **SHALL** use the delay specified in the `x-ms-retry-after` response header. + +**If** the `x-ms-retry-after` header value is 0, **then** the SDK **SHALL** use a fallback delay of 5 seconds. + +#### Backoff Escalation + +**Where** a backoff factor is configured, **then** the SDK **SHALL** multiply the retry delay by the configured factor on each successive attempt. The default factor is 1 (no escalation). + +### Requirement: Per-Partition Automatic Failover (PPAF) + +The SDK **SHALL** support partition-level failover for single-master write accounts so that when a specific partition is unavailable, write requests are routed to read regions instead of failing entirely. + +**PPAF trigger reference table:** + +| Trigger | Status Code | Behavior | +|---------|------------|----------| +| Write 503 | ServiceUnavailable | Mark partition unavailable; retry on read region | +| Write 408 | RequestTimeout | Mark partition unavailable; retry on read region | +| Eligibility | `!canUseMultipleWriteLocations && !request.IsReadOnly && PPAFEnabled` | | + +#### PPAF Eligibility + +**Where** the account does not use multiple write locations (`!canUseMultipleWriteLocations`) **and** the request is not read-only (`!request.IsReadOnly`) **and** PPAF is enabled, **then** the SDK **SHALL** evaluate the request for partition-level failover. + +#### Write ServiceUnavailable under PPAF + +**When** a PPAF-eligible write request receives HTTP 503 (ServiceUnavailable), **then** the SDK **SHALL** mark the partition as unavailable and retry the request on a read region. + +#### Write RequestTimeout under PPAF + +**When** a PPAF-eligible write request receives HTTP 408 (RequestTimeout), **then** the SDK **SHALL** mark the partition as unavailable and retry the request on a read region. + +#### Partition Unavailability Tracking + +**When** a partition is marked as unavailable under PPAF, **then** the SDK **SHALL** record the unavailability for a duration of 5 seconds (configurable via `AZURE_COSMOS_PPAF_ALLOWED_PARTITION_UNAVAILABILITY_DURATION_IN_SECONDS`). + +The unavailability state **SHALL** be stored in a `ConcurrentDictionary`. + +A background task **SHALL** periodically retry failed partitions to detect recovery. + +### Requirement: Per-Partition Circuit Breaker (PPCB) + +The SDK **SHALL** extend PPAF logic to multi-master accounts and read operations using circuit breaker thresholds. + +**PPCB trigger reference table:** + +| Trigger | Applies To | Behavior | +|---------|-----------|----------| +| Read 503/500/408 | Read requests | Increment failure counter; trip circuit after threshold | +| Write 503/408 (multi-master) | Multi-master writes | Same circuit breaker logic | +| Failback interval | — | 5 minutes (configurable via `AZURE_COSMOS_PPCB_STALE_PARTITION_UNAVAILABILITY_REFRESH_INTERVAL_IN_SECONDS`) | + +#### Read Failure Circuit Breaker + +**When** a read request receives HTTP 503, 500, or 408, **then** the SDK **SHALL** increment the failure counter for the affected partition. **If** the failure count exceeds the configured threshold, **then** the SDK **SHALL** trip the circuit breaker and route subsequent requests to an alternate region. + +#### Multi-Master Write Circuit Breaker + +**When** a multi-master write request receives HTTP 503 or 408, **then** the SDK **SHALL** apply the same circuit breaker logic as read failure handling. + +#### PPCB Failback + +**While** a partition circuit breaker is tripped, the SDK **SHALL** attempt failback after a 5-minute interval (configurable via `AZURE_COSMOS_PPCB_STALE_PARTITION_UNAVAILABILITY_REFRESH_INTERVAL_IN_SECONDS`). + +### Requirement: Session Token Retry (404/1002) + +The SDK **SHALL** retry requests that fail with HTTP 404 substatus 1002 (ReadSessionNotAvailable) using a strategy determined by the account's write configuration. + +**Session token retry strategy reference table:** + +| Config | Strategy | Max Retries | +|--------|----------|-------------| +| Multi-master | Retry on each configured endpoint in order | `endpoints.Count` | +| Single-master | Retry on write (primary) region only | 1 | +| Single-master (after 1st retry) | Add `x-ms-should-process-only-in-hub-region=true` header | Subsequent attempts | + +#### Multi-Master Session Token Retry + +**Where** the account is configured as multi-master, **when** the SDK receives HTTP 404 with substatus 1002, **then** the SDK **SHALL** retry the request on each configured endpoint in order, up to `endpoints.Count` retries. + +#### Single-Master Session Token Retry + +**Where** the account is configured as single-master, **when** the SDK receives HTTP 404 with substatus 1002, **then** the SDK **SHALL** retry the request on the write (primary) region only, with a maximum of 1 retry. + +#### Single-Master Hub Region Header + +**Where** the account is configured as single-master, **when** the first session token retry has been attempted, **then** the SDK **SHALL** add the header `x-ms-should-process-only-in-hub-region=true` to subsequent retry attempts. + +### Requirement: HTTP Timeout Policies + +The SDK **SHALL** wrap HTTP requests in timeout policies that control per-attempt timeouts and retry behavior. + +**HTTP timeout policy reference table:** + +| Policy | Attempt Timeouts | Use Case | +|--------|-----------------|----------| +| `HttpTimeoutPolicyDefault` | 65s / 65s / 65s (1s delay between) | Gateway mode data-plane | +| `HttpTimeoutPolicyForPartitionFailover` | 6s / 6s / 10s | PPAF-enabled requests | +| `HttpTimeoutPolicyControlPlaneRetriableHotPath` | 0.5s / 5s / 65s (1s delay) | Metadata reads on hot path (query plan, addresses) | +| `HttpTimeoutPolicyControlPlaneRead` | 5s / 10s / 65s | Metadata reads (initialization, account info) | +| `HttpTimeoutPolicyNoRetry` | 65s (no retry) | Client telemetry | + +#### Default Gateway Timeout + +**Where** a request uses gateway mode for data-plane operations, **then** the SDK **SHALL** apply `HttpTimeoutPolicyDefault` with attempt timeouts of 65s / 65s / 65s and a 1-second delay between attempts. + +#### Partition Failover Timeout + +**Where** a request is PPAF-enabled, **then** the SDK **SHALL** apply `HttpTimeoutPolicyForPartitionFailover` with attempt timeouts of 6s / 6s / 10s. + +#### Control Plane Hot Path Timeout + +**Where** a request is a metadata read on the hot path (e.g., query plan, addresses), **then** the SDK **SHALL** apply `HttpTimeoutPolicyControlPlaneRetriableHotPath` with attempt timeouts of 0.5s / 5s / 65s and a 1-second delay between attempts. + +#### Control Plane Read Timeout + +**Where** a request is a metadata read for initialization or account info, **then** the SDK **SHALL** apply `HttpTimeoutPolicyControlPlaneRead` with attempt timeouts of 5s / 10s / 65s. + +#### No Retry Timeout + +**Where** a request is a client telemetry call, **then** the SDK **SHALL** apply `HttpTimeoutPolicyNoRetry` with a single 65-second timeout and no retry. + +### Requirement: CancellationToken Handling + +The SDK **SHALL** respect `CancellationToken` propagation through retry loops while maintaining partition health tracking. + +#### Retry Loop Cancellation + +**When** a `CancellationToken` passed to a public API is cancelled, **then** the SDK **SHALL** stop any active retry loop at the earliest opportunity. + +#### ResourceThrottleRetryPolicy Cancellation + +**While** `ResourceThrottleRetryPolicy` accepts a `CancellationToken`, the SDK **SHALL NOT** actively check it during retry wait intervals — cancellation **SHALL** only be observed at exception time. + +#### OperationCanceledException Partition Marking + +**When** `ClientRetryPolicy` catches an `OperationCanceledException`, **then** the SDK **SHALL** convert it to a PPAF/PPCB partition marking for future requests. + +### Requirement: Region Failover Logic + +The SDK **SHALL** maintain a `LocationCache` that tracks available read and write endpoints and governs region failover behavior. + +#### Unavailable Region Handling + +**When** a region is marked as unavailable, **then** the SDK **SHALL** move it to the end of the preference list. The unavailability marking **SHALL** expire after 5 minutes. + +#### Endpoint Resolution Order + +**When** resolving an endpoint for a request, the SDK **SHALL** use the following order: Preferred Locations → Available Write/Read Endpoints → Account-level fallback. + +#### Account Information Refresh on Failover + +**When** the SDK encounters a failover-triggering error (403/3, 403/51, or 503), **then** `GlobalEndpointManager` **SHALL** refresh account information to update the available regions. + +## Configuration + +| Parameter | Configurable Via | Default | +|-----------|-----------------|---------| +| Max throttle retries | `CosmosClientOptions.MaxRetryAttemptsOnRateLimitedRequests` | 9 | +| Max throttle wait | `CosmosClientOptions.MaxRetryWaitTimeOnRateLimitedRequests` | 60 seconds | +| Unavailable region expiry | Internal | 5 minutes (300s) | +| PPAF partition unavailability | Environment variable | 5 seconds | +| PPCB failback interval | Environment variable | 5 minutes | + +## Interactions + +- **Handler Pipeline**: `RetryHandler` sits in position #5 in the pipeline. See `handler-pipeline` spec. +- **Hedging**: Hedged requests have independent retry behavior. Primary requests can cross-region retry; hedged requests are local-retry only. See `cross-region-hedging` spec. +- **PPAF/PPCB**: Partition-level failover runs within `ClientRetryPolicy` and coordinates with `GlobalPartitionEndpointManager`. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/ClientRetryPolicy.cs` +- Source: `Microsoft.Azure.Cosmos/src/ResourceThrottleRetryPolicy.cs` +- Source: `Microsoft.Azure.Cosmos/src/Handler/RetryHandler.cs` +- Source: `Microsoft.Azure.Cosmos/src/Routing/GlobalEndpointManager.cs` +- Source: `Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs` +- Source: `Microsoft.Azure.Cosmos/src/HttpClient/HttpTimeoutPolicy*.cs` +- Design: `docs/SdkDesign.md` (Retry sections) +- Design: `docs/PerPartitionAutomaticFailoverDesign.md` \ No newline at end of file diff --git a/openspec/specs/serialization/spec.md b/openspec/specs/serialization/spec.md new file mode 100644 index 0000000000..7815b32545 --- /dev/null +++ b/openspec/specs/serialization/spec.md @@ -0,0 +1,161 @@ +# Serialization + +## Purpose + +The Azure Cosmos DB .NET SDK serializes and deserializes documents for all typed API operations. It supports three serialization backends: Newtonsoft JSON.NET (default), System.Text.Json, and custom implementations via the `CosmosSerializer` abstract class. The serializer choice affects typed CRUD operations, query result deserialization, and LINQ-to-SQL property name translation. Stream APIs bypass serialization entirely. + +## Public API Surface + +### CosmosSerializer (Abstract Base) + +```csharp +public abstract class CosmosSerializer +{ + public abstract T FromStream(Stream stream); // Deserialize; MUST dispose stream + public abstract Stream ToStream(T input); // Serialize; returns readable stream +} +``` + +### CosmosLinqSerializer (LINQ-Aware Extension) + +```csharp +public abstract class CosmosLinqSerializer : CosmosSerializer +{ + public abstract string SerializeMemberName(MemberInfo memberInfo); // Property name for LINQ queries +} +``` + +### Registration (Mutually Exclusive) + +```csharp +// Option 1: Newtonsoft.Json configuration +CosmosClientOptions options = new() +{ + SerializerOptions = new CosmosSerializationOptions + { + IgnoreNullValues = true, + PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase + } +}; + +// Option 2: Custom serializer +CosmosClientOptions options = new() { Serializer = new MyCustomSerializer() }; + +// Option 3: System.Text.Json +CosmosClientOptions options = new() +{ + UseSystemTextJsonSerializerWithOptions = new JsonSerializerOptions + { + DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull, + PropertyNamingPolicy = JsonNamingPolicy.CamelCase + } +}; +``` + +## Requirements + +### Requirement: Serializer Contract + +**EARS-SC-1 (FromStream dispose):** When `FromStream` is called on a serializer implementation, the serializer shall dispose the input stream before returning, including on exceptions. If the stream is still readable after `FromStream` returns, the SDK shall throw `InvalidOperationException`. + +**EARS-SC-2 (ToStream readable):** When `ToStream` is called on a serializer implementation, the serializer shall return a stream with `CanRead = true` and position at 0. If the returned stream is null or not readable, the SDK shall throw `InvalidOperationException`. + +**EARS-SC-3 (Stream pass-through):** When `T` is `Stream`, the SDK shall return the input stream directly without deserialization. + +**EARS-SC-4 (Empty stream):** When the input stream is empty and seekable, the SDK shall return `default(T)`. + +### Requirement: Serializer Routing + +The SDK maintains separate serializers for different type categories: + +| Type Category | Serializer Used | Examples | +|--------------|-----------------|---------| +| User types | Custom/configured serializer | Application POCOs, `dynamic`, `Document` | +| SDK internal types | Always default (JSON.NET) | `DatabaseProperties`, `ContainerProperties`, `ThroughputProperties` | +| `PatchSpec` | `PatchOperationsSerializer` | Patch operation payloads | +| `SqlQuerySpec` | `SqlQuerySpecSerializer` | Query definitions with parameters | + +**EARS-SR-1 (User type routing):** When a typed API operation targets a user-defined type (application POCOs, `dynamic`, `Document`), the SDK shall use the custom or configured serializer. + +**EARS-SR-2 (Internal type routing):** When a typed API operation targets an SDK internal type (`DatabaseProperties`, `ContainerProperties`, `ThroughputProperties`), the SDK shall always use the default JSON.NET serializer, regardless of any custom serializer configured by the user. This ensures SDK resource management works correctly regardless of custom serializer behavior. + +**EARS-SR-3 (PatchSpec routing):** When serializing patch operation payloads, the SDK shall use `PatchOperationsSerializer`. + +**EARS-SR-4 (SqlQuerySpec routing):** When serializing query definitions with parameters, the SDK shall use `SqlQuerySpecSerializer`. + +### Requirement: Serialization Configuration + +#### CosmosSerializationOptions (Newtonsoft.Json) + +| Property | Type | Default | Maps To | +|----------|------|---------|---------| +| `IgnoreNullValues` | `bool` | `false` | `NullValueHandling.Ignore` / `.Include` | +| `Indented` | `bool` | `false` | `Formatting.Indented` / `.None` | +| `PropertyNamingPolicy` | `CosmosPropertyNamingPolicy` | `Default` | `CamelCasePropertyNamesContractResolver` | + +#### CosmosPropertyNamingPolicy + +| Value | Effect | +|-------|--------| +| `Default` | No transformation — property names used as-is | +| `CamelCase` | First letter lowercased: `PropertyName` → `propertyName` | + +**EARS-CF-1 (Null value handling):** When `IgnoreNullValues` is set to `true`, the Newtonsoft.Json serializer shall use `NullValueHandling.Ignore`; when set to `false`, it shall use `NullValueHandling.Include`. + +**EARS-CF-2 (Indentation):** When `Indented` is set to `true`, the Newtonsoft.Json serializer shall use `Formatting.Indented`; when set to `false`, it shall use `Formatting.None`. + +**EARS-CF-3 (Camel case naming):** When `PropertyNamingPolicy` is set to `CamelCase`, the Newtonsoft.Json serializer shall use `CamelCasePropertyNamesContractResolver`, lowercasing the first letter of property names (`PropertyName` → `propertyName`). When set to `Default`, property names shall be used as-is. + +### Requirement: LINQ Property Name Translation + +**EARS-LQ-1 (Member name resolution):** When the SDK translates a LINQ expression to SQL, it shall call `CosmosLinqSerializer.SerializeMemberName(MemberInfo)` to determine the JSON property name for `SELECT`, `WHERE`, and `ORDER BY` clauses. + +**EARS-LQ-2 (System.Text.Json attributes):** When the System.Text.Json serializer is configured, the LINQ translator shall respect `[JsonPropertyName]` attributes and `JsonSerializerOptions.PropertyNamingPolicy`. + +**EARS-LQ-3 (Newtonsoft.Json attributes):** When the Newtonsoft.Json serializer is configured, the LINQ translator shall respect `[JsonProperty]` attributes and contract resolver naming. + +**EARS-LQ-4 (Custom LINQ serializer naming policy):** When a custom `CosmosLinqSerializer` implementation is registered, the SDK shall validate that `PropertyNamingPolicy` is set to `Default` and reject other values. + +### Requirement: Serialization Scope + +**Typed APIs (use serializer)**: +- `CreateItemAsync`, `ReadItemAsync`, `ReplaceItemAsync`, `UpsertItemAsync` +- `GetItemQueryIterator`, `GetItemLinqQueryable` +- `TransactionalBatch` typed operations +- Stored procedure parameter serialization + +**Stream APIs (bypass serializer)**: +- `CreateItemStreamAsync`, `ReadItemStreamAsync`, `ReplaceItemStreamAsync` +- `GetItemQueryStreamIterator`, `GetChangeFeedStreamIterator` + +**EARS-SS-1 (Typed API serialization):** When a typed API operation (`CreateItemAsync`, `ReadItemAsync`, `ReplaceItemAsync`, `UpsertItemAsync`, `GetItemQueryIterator`, `GetItemLinqQueryable`, `TransactionalBatch` typed operations, stored procedure parameter serialization) is invoked, the SDK shall use the configured serializer to serialize and/or deserialize the payload. + +**EARS-SS-2 (Stream API bypass):** When a stream API operation (`CreateItemStreamAsync`, `ReadItemStreamAsync`, `ReplaceItemStreamAsync`, `GetItemQueryStreamIterator`, `GetChangeFeedStreamIterator`) is invoked, the SDK shall bypass the serializer entirely and pass the raw stream. + +### Requirement: Partition Key Extraction + +**EARS-PK-1 (Auto-extraction serialization):** When auto-extracting a partition key from a typed item (e.g., `CreateItemAsync(item, partitionKey: null)`), the SDK shall first serialize the item using the configured serializer. + +**EARS-PK-2 (JToken extraction):** When extracting the partition key value after serialization, the SDK shall always use `JToken.FromObject()` (Newtonsoft) on the deserialized object, regardless of the configured serializer. + +**EARS-PK-3 (Cross-serializer compatibility):** Because partition key extraction always uses `JToken.FromObject()`, partition key paths shall be deserializable via `JToken.FromObject()` even when using a custom serializer. + +### Requirement: Max Depth Protection + +**EARS-MD-1 (Depth limit):** The default JSON.NET serializer shall set `MaxDepth = 64` to prevent denial-of-service attacks via deeply nested JSON (GHSA-5crp-9r3c-p9vr). + +## Interactions + +- **CRUD Operations**: All typed CRUD operations use the configured serializer. See `crud-operations` spec. +- **Query**: `FeedIterator` deserializes results; LINQ uses `SerializeMemberName` for property translation. See `query-and-linq` spec. +- **Client Configuration**: Serializer is configured via `CosmosClientOptions`. See `client-and-configuration` spec. +- **Partition Keys**: Auto-extraction depends on serialization. See `partition-keys` spec. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/Serializer/CosmosSerializer.cs` +- Source: `Microsoft.Azure.Cosmos/src/Serializer/CosmosLinqSerializer.cs` +- Source: `Microsoft.Azure.Cosmos/src/Serializer/CosmosJsonDotNetSerializer.cs` +- Source: `Microsoft.Azure.Cosmos/src/Serializer/CosmosSystemTextJsonSerializer.cs` +- Source: `Microsoft.Azure.Cosmos/src/Serializer/CosmosSerializerCore.cs` +- Source: `Microsoft.Azure.Cosmos/src/CosmosClientOptions.cs` \ No newline at end of file diff --git a/openspec/specs/transport-and-connectivity/spec.md b/openspec/specs/transport-and-connectivity/spec.md new file mode 100644 index 0000000000..f59cc7f51e --- /dev/null +++ b/openspec/specs/transport-and-connectivity/spec.md @@ -0,0 +1,155 @@ +# Transport and Connectivity + +## Purpose + +The Azure Cosmos DB .NET SDK supports two transport modes — Gateway (HTTPS) and Direct (TCP) — with configurable connection management, pooling, and endpoint discovery. The transport layer is the bottom of the handler pipeline stack and is responsible for converting `RequestMessage` objects into network calls to the Cosmos DB service. + +## Public API Surface + +### Connection Mode Configuration + +```csharp +// Gateway mode (HTTPS through Cosmos DB gateway proxy) +CosmosClientOptions gatewayOptions = new CosmosClientOptions +{ + ConnectionMode = ConnectionMode.Gateway, + GatewayModeMaxConnectionLimit = 100 +}; + +// Direct mode (TCP to backend replicas — default) +CosmosClientOptions directOptions = new CosmosClientOptions +{ + ConnectionMode = ConnectionMode.Direct, + IdleTcpConnectionTimeout = TimeSpan.FromMinutes(20), + OpenTcpConnectionTimeout = TimeSpan.FromSeconds(10), + MaxRequestsPerTcpConnection = 50, + MaxTcpConnectionsPerEndpoint = 32 +}; +``` + +### Endpoint and Region Configuration + +```csharp +CosmosClientOptions options = new CosmosClientOptions +{ + ApplicationPreferredRegions = new List { "East US", "West US" }, + // OR: ApplicationRegion = "East US", + LimitToEndpoint = false, // default — enables region discovery + EnableTcpConnectionEndpointRediscovery = true, // default + AccountInitializationCustomEndpoints = new HashSet + { + new Uri("https://fallback-endpoint.documents.azure.com:443/") + } +}; +``` + +## Requirements + +### Requirement: Connection Mode Selection + +The SDK SHALL support two connection modes with distinct transport characteristics. + +#### Gateway mode + +**Where** `CosmosClientOptions.ConnectionMode = ConnectionMode.Gateway`, **when** requests are made, the SDK SHALL route all requests through the Cosmos DB gateway proxy via HTTPS (port 443) and force the protocol to HTTPS regardless of other settings. + +#### Direct mode (default) + +**Where** `CosmosClientOptions.ConnectionMode = ConnectionMode.Direct` (or not specified), **when** requests are made, the SDK SHALL use direct TCP connections to backend replicas for data-plane requests and use the gateway for metadata requests. + +### Requirement: Gateway Mode Configuration + +The SDK SHALL support configuring HTTP connection behavior for Gateway mode. + +#### Max connection limit + +**Where** `CosmosClientOptions.GatewayModeMaxConnectionLimit` is set (default: 50), **when** multiple concurrent requests are made in Gateway mode, the SDK SHALL maintain at most that many simultaneous HTTP connections. + +#### Custom HttpClient + +**Where** `CosmosClientOptions.HttpClientFactory` is set, **when** the SDK creates HTTP connections, the SDK SHALL use the provided HttpClient factory and ignore `GatewayModeMaxConnectionLimit` and `WebProxy` settings. + +#### Web proxy + +**Where** `CosmosClientOptions.WebProxy` is set, **when** requests are made in Gateway mode, the SDK SHALL route HTTP traffic through the specified proxy. + +### Requirement: Direct Mode TCP Configuration + +The SDK SHALL support fine-grained TCP connection tuning for Direct mode. + +| Property | Default | Minimum | Description | +|----------|---------|---------|-------------| +| `IdleTcpConnectionTimeout` | — | 10 minutes | Close idle TCP connections after this duration | +| `OpenTcpConnectionTimeout` | 5 seconds | — | Timeout for establishing new TCP connections | +| `MaxRequestsPerTcpConnection` | 30 | — | Max multiplexed requests per TCP connection | +| `MaxTcpConnectionsPerEndpoint` | — | — | Max TCP connections to a single backend node | +| `PortReuseMode` | — | — | TCP port reuse strategy | + +### Requirement: Request Timeout + +The SDK SHALL enforce a configurable timeout for individual requests. + +#### Default timeout + +**While** no timeout is configured, **when** a request is made, the SDK SHALL time out after 6 seconds (default `RequestTimeout`). + +#### Custom timeout + +**Where** `CosmosClientOptions.RequestTimeout` is set, **when** a request exceeds the configured duration, the SDK SHALL throw a `CosmosException` with status 408 (RequestTimeout). + +### Requirement: Endpoint Discovery + +The SDK SHALL automatically discover and connect to available service endpoints. + +#### Automatic region discovery + +**Where** `CosmosClientOptions.LimitToEndpoint = false` (default), **when** the client is initialized, the SDK SHALL query the account for available regions and populate the endpoint cache. + +#### Limit to single endpoint + +**Where** `CosmosClientOptions.LimitToEndpoint = true`, **when** the client is initialized, the SDK SHALL use only the provided endpoint URI, perform no region discovery, and require that `ApplicationRegion` and `ApplicationPreferredRegions` are not set. + +#### TCP connection endpoint rediscovery + +**Where** `CosmosClientOptions.EnableTcpConnectionEndpointRediscovery = true` (default), **when** a TCP connection is reset, the SDK SHALL refresh the endpoint address cache and establish connections to newly discovered endpoints. + +### Requirement: Custom Initialization Endpoints + +The SDK SHALL support custom endpoints for account initialization in geo-failover scenarios. + +#### Custom init endpoints + +**Where** `CosmosClientOptions.AccountInitializationCustomEndpoints` contains fallback URIs, **if** the primary account endpoint is unreachable during initialization, **then** the SDK SHALL attempt to initialize from the custom endpoints. + +### Requirement: SSL/TLS Configuration + +The SDK SHALL support custom certificate validation for Direct mode. + +#### Custom certificate validation + +**Where** `CosmosClientOptions.ServerCertificateCustomValidationCallback` is set, **when** a TLS connection is established, the SDK SHALL invoke the custom callback for certificate validation. + +### Requirement: Mutual Exclusivity of Settings + +The SDK SHALL enforce that conflicting connection settings cannot be used together. + +| Conflict | Behavior | +|----------|----------| +| `WebProxy` + `HttpClientFactory` | `ArgumentException` at client construction | +| `ApplicationRegion` + `ApplicationPreferredRegions` | `ArgumentException` at client construction | +| `LimitToEndpoint = true` + either region setting | `ArgumentException` at client construction | + +## Interactions + +- **Handler Pipeline**: `TransportHandler` is the leaf handler that invokes `GatewayStoreModel` (Gateway) or `ServerStoreModel` (Direct). See `handler-pipeline` spec. +- **Retry Policies**: Transport-level errors (timeouts, connection resets) trigger retry policies. See `retry-and-failover` spec. +- **Cross-Region Hedging**: Hedged requests may target different regional endpoints. See `cross-region-hedging` spec. +- **Diagnostics**: Transport-level statistics are captured in `ClientSideRequestStatisticsTraceDatum`. See `diagnostics-and-observability` spec. + +## References + +- Source: `Microsoft.Azure.Cosmos/src/CosmosClientOptions.cs` +- Source: `Microsoft.Azure.Cosmos/src/GatewayStoreModel.cs` +- Source: `Microsoft.Azure.Cosmos/src/Handler/TransportHandler.cs` +- Source: `Microsoft.Azure.Cosmos/src/Routing/GlobalEndpointManager.cs` +- Source: `Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs` \ No newline at end of file