[Internal] Distributed Transactions: Adds openspec proposal for SDK implementation of Distributed Transact…#5781
Conversation
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
AI Spec Review -- PR #5781Distributed Transactions OpenSpec This spec is thorough and well-structured, covering wire protocol, retry semantics, session consistency, and API surface. However, there are 3 critical internal contradictions between
Top Issues
See inline comments for details. AI-generated review -- may be incorrect. Agree? Resolve the conversation. Disagree? Reply with your reasoning. |
| - **200** → `DistributedTransactionResponse.IsSuccessStatusCode = true`; `operationResponses` contains per-operation results as returned from the prepare phase (Create → 201, Replace/Upsert → 200, Delete → 204). | ||
| - **452** → `DistributedTransactionResponse.IsSuccessStatusCode = false`; `operationResponses` contains: **453 / sub-status 5415** (`DtcOperationRolledBack`) for any operation that voted Yes and was rolled back, and the **original error code** (e.g. 409, 412) for whichever operation(s) voted No and caused the abort. | ||
|
|
||
| `207 Multi-Status` is never returned by the Gateway for distributed transactions. |
There was a problem hiding this comment.
🔴 Blocking · Correctness: Internal Contradiction
207 Multi-Status — 'never returned' contradicts the read transaction spec
This line states:
207 Multi-Statusis never returned by the Gateway for distributed transactions.
But specs/distributed-read-transaction/spec.md (Scenario: One item not found) requires:
the SDK SHALL return a
DistributedTransactionResponsewithStatusCode = 207
And tasks.md references '207 mixed' as a unit test case.
These directly contradict each other. If the Gateway truly never returns 207, then the SDK must synthesize it from a 200 envelope with per-operation 404s — but that promotion logic isn't specified anywhere. If the Gateway does return 207, this line is wrong.
Suggestion: Clarify whether 207 is a third terminal envelope status from the server, or an SDK-synthesized status code. If SDK-synthesized, specify the promotion rule (e.g., 'if envelope is 200 but any per-operation result is non-2xx, the SDK promotes the envelope StatusCode to 207').
|
|
||
| Each operation in a `DistributedWriteTransaction` SHALL support an `IfMatchEtag` precondition. If the document's current ETag does not match, the entire transaction SHALL be rolled back. | ||
|
|
||
| #### Scenario: ETag precondition fails on one operation |
There was a problem hiding this comment.
🔴 Blocking · Correctness: Internal Contradiction
Rolled-back operation status code: 424 here vs 453 in design.md
This scenario says rolled-back operations carry StatusCode = 424 Failed Dependency. But design.md (Decision 11) says they carry 453 / sub-status 5415 (DtcOperationRolledBack), and the wire contract table also uses 453.
424 is a standard WebDAV status code. 453 is a custom Cosmos DB code. These are fundamentally different values — an implementer cannot satisfy both.
Suggestion: Align on whichever status code the server actually returns. Update either this spec or the design doc. If the server returns 453, update this spec to use StatusCode = 453 with SubStatusCode = 5415 (DtcOperationRolledBack).
|
|
||
| #### Scenario: Network failure triggers automatic retry | ||
|
|
||
| - **WHEN** `CommitTransactionAsync` experiences a transient failure (408 Request Timeout or 503 Service Unavailable) |
There was a problem hiding this comment.
🔴 Blocking · Correctness: Internal Contradiction
503 Service Unavailable is not in the design.md retry table
This scenario references '408 Request Timeout or 503 Service Unavailable' as retryable failures. However, the comprehensive retry table in design.md does not include 503 at all. The table lists 408, 449/5352, 429/3200, and 500 with sub-statuses 5411-5413 — but no 503.
Either 503 is a valid Gateway response that's missing from the retry table, or this scenario cites a non-existent response code. An implementer following the design retry table would not implement 503 retry; an implementer following this spec would.
Suggestion: Add 503 to the design.md retry table if the Gateway can return it, or replace 503 here with the actual retryable codes from the table (408, 449, 500/5411-5413).
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
Pull Request Template
Description
This pull request introduces a comprehensive design proposal and technical specification for adding distributed transactions to Azure Cosmos DB, enabling atomic writes and consistent snapshot reads across multiple partitions and containers within a single database account. The changes define the context, goals, wire protocol, SDK API surface, and detailed decisions for implementing distributed transactions, including both write and read operations, via a unified endpoint.
The most important changes are:
Design and Context:
design.md) outlining the current limitations of Cosmos DB transactions, the new distributed transaction capabilities, and the rationale behind key architectural decisions. This includes a breakdown of SDK routing, server-side support, and the wire contract for distributed transactions.proposal.md) explaining the motivation ("Why"), the specific API and wire protocol changes ("What Changes"), new and modified capabilities, and the expected impact on the SDK and service.API and Capability Additions:
CosmosClient:CreateDistributedWriteTransaction()andCreateDistributedReadTransaction(), enabling clients to build and commit cross-partition, cross-container transactions.DistributedWriteTransaction,DistributedReadTransaction, and a sharedDistributedTransactionResponsefor both transaction types, with per-operation ETag and session token handling.Wire Protocol and Endpoint:
POST /operations/dtc) for both write and read distributed transactions, with theoperationTypein the request body distinguishing the operation. Detailed the request/response headers, body schema, and status code handling, including idempotency and retry semantics.Session Consistency and Idempotency:
Specification Metadata:
.openspec.yamlfile to register the spec-driven schema and creation date for the distributed transactions feature.Type of change
Please delete options that are not relevant.
Closing issues
To automatically close an issue: closes #IssueNumber