Skip to content

[Internal] Distributed Transactions: Adds openspec proposal for SDK implementation of Distributed Transact…#5781

Draft
Meghana-Palaparthi wants to merge 2 commits intomainfrom
users/Meghana-Palaparthi/DTS_openspec
Draft

[Internal] Distributed Transactions: Adds openspec proposal for SDK implementation of Distributed Transact…#5781
Meghana-Palaparthi wants to merge 2 commits intomainfrom
users/Meghana-Palaparthi/DTS_openspec

Conversation

@Meghana-Palaparthi
Copy link
Copy Markdown
Contributor

Pull Request Template

Description

This pull request introduces a comprehensive design proposal and technical specification for adding distributed transactions to Azure Cosmos DB, enabling atomic writes and consistent snapshot reads across multiple partitions and containers within a single database account. The changes define the context, goals, wire protocol, SDK API surface, and detailed decisions for implementing distributed transactions, including both write and read operations, via a unified endpoint.

The most important changes are:

Design and Context:

  • Added a detailed design document (design.md) outlining the current limitations of Cosmos DB transactions, the new distributed transaction capabilities, and the rationale behind key architectural decisions. This includes a breakdown of SDK routing, server-side support, and the wire contract for distributed transactions.
  • Provided a proposal (proposal.md) explaining the motivation ("Why"), the specific API and wire protocol changes ("What Changes"), new and modified capabilities, and the expected impact on the SDK and service.

API and Capability Additions:

  • Introduced new factory methods on CosmosClient: CreateDistributedWriteTransaction() and CreateDistributedReadTransaction(), enabling clients to build and commit cross-partition, cross-container transactions.
  • Defined new types: DistributedWriteTransaction, DistributedReadTransaction, and a shared DistributedTransactionResponse for both transaction types, with per-operation ETag and session token handling.

Wire Protocol and Endpoint:

  • Specified a single HTTP endpoint (POST /operations/dtc) for both write and read distributed transactions, with the operationType in the request body distinguishing the operation. Detailed the request/response headers, body schema, and status code handling, including idempotency and retry semantics.

Session Consistency and Idempotency:

  • Outlined the handling of session tokens and idempotency tokens to ensure session consistency and safe retries for write transactions, with clear SDK-side logic for merging tokens and managing retries.

Specification Metadata:

  • Added a minimal .openspec.yaml file to register the spec-driven schema and creation date for the distributed transactions feature.

Type of change

Please delete options that are not relevant.

  • [] Bug fix (non-breaking change which fixes an issue)
  • [✓] New feature (non-breaking change which adds functionality)
  • [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [] This change requires a documentation update

Closing issues

To automatically close an issue: closes #IssueNumber

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@ananth7592
Copy link
Copy Markdown
Member

AI Spec Review -- PR #5781

Distributed Transactions OpenSpec

This spec is thorough and well-structured, covering wire protocol, retry semantics, session consistency, and API surface. However, there are 3 critical internal contradictions between design.md and the spec files that must be resolved before this can serve as an implementation guide.

Severity Count
Blocking 3

Top Issues

  1. 207 vs 'never returned' contradiction -- design.md says 207 is never returned, but distributed-read-transaction/spec.md requires returning 207 when an item is not found
  2. 424 vs 453 status code -- Write spec says rolled-back ops get 424, design says 453/sub-status 5415
  3. 503 not in retry table -- Read spec references 503 but it is absent from the design retry table

See inline comments for details.


AI-generated review -- may be incorrect. Agree? Resolve the conversation. Disagree? Reply with your reasoning.

- **200** → `DistributedTransactionResponse.IsSuccessStatusCode = true`; `operationResponses` contains per-operation results as returned from the prepare phase (Create → 201, Replace/Upsert → 200, Delete → 204).
- **452** → `DistributedTransactionResponse.IsSuccessStatusCode = false`; `operationResponses` contains: **453 / sub-status 5415** (`DtcOperationRolledBack`) for any operation that voted Yes and was rolled back, and the **original error code** (e.g. 409, 412) for whichever operation(s) voted No and caused the abort.

`207 Multi-Status` is never returned by the Gateway for distributed transactions.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking · Correctness: Internal Contradiction

207 Multi-Status — 'never returned' contradicts the read transaction spec

This line states:

207 Multi-Status is never returned by the Gateway for distributed transactions.

But specs/distributed-read-transaction/spec.md (Scenario: One item not found) requires:

the SDK SHALL return a DistributedTransactionResponse with StatusCode = 207

And tasks.md references '207 mixed' as a unit test case.

These directly contradict each other. If the Gateway truly never returns 207, then the SDK must synthesize it from a 200 envelope with per-operation 404s — but that promotion logic isn't specified anywhere. If the Gateway does return 207, this line is wrong.

Suggestion: Clarify whether 207 is a third terminal envelope status from the server, or an SDK-synthesized status code. If SDK-synthesized, specify the promotion rule (e.g., 'if envelope is 200 but any per-operation result is non-2xx, the SDK promotes the envelope StatusCode to 207').


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.


Each operation in a `DistributedWriteTransaction` SHALL support an `IfMatchEtag` precondition. If the document's current ETag does not match, the entire transaction SHALL be rolled back.

#### Scenario: ETag precondition fails on one operation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking · Correctness: Internal Contradiction

Rolled-back operation status code: 424 here vs 453 in design.md

This scenario says rolled-back operations carry StatusCode = 424 Failed Dependency. But design.md (Decision 11) says they carry 453 / sub-status 5415 (DtcOperationRolledBack), and the wire contract table also uses 453.

424 is a standard WebDAV status code. 453 is a custom Cosmos DB code. These are fundamentally different values — an implementer cannot satisfy both.

Suggestion: Align on whichever status code the server actually returns. Update either this spec or the design doc. If the server returns 453, update this spec to use StatusCode = 453 with SubStatusCode = 5415 (DtcOperationRolledBack).


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.


#### Scenario: Network failure triggers automatic retry

- **WHEN** `CommitTransactionAsync` experiences a transient failure (408 Request Timeout or 503 Service Unavailable)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking · Correctness: Internal Contradiction

503 Service Unavailable is not in the design.md retry table

This scenario references '408 Request Timeout or 503 Service Unavailable' as retryable failures. However, the comprehensive retry table in design.md does not include 503 at all. The table lists 408, 449/5352, 429/3200, and 500 with sub-statuses 5411-5413 — but no 503.

Either 503 is a valid Gateway response that's missing from the retry table, or this scenario cites a non-existent response code. An implementer following the design retry table would not implement 503 retry; an implementer following this spec would.

Suggestion: Add 503 to the design.md retry table if the Gateway can return it, or replace 503 here with the actual retryable codes from the table (408, 449, 500/5411-5413).


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants