Skip to content

bug: Race condition: Context cancellation during CommitTx causes false error when commit succeeds #1978

@kprokopenko

Description

@kprokopenko

Bug Report

YDB GO SDK version:

v3.122.0

Environment

Current behavior:

When a user cancels the context passed to a retry operation (e.g., DoTx) that contains an interactive transaction, a race condition can occur during the CommitTx call:

  1. The commit request is sent to the server
  2. The user cancels the context (either explicitly or due to deadline expiration)
  3. The context cancellation happens after the commit request has been sent but before the response is received
  4. The SDK returns an error to the user indicating the commit failed (typically context canceled or context deadline exceeded)
  5. However, the commit actually succeeded on the server side

This creates a misleading situation where the user sees an error, but the transaction was actually committed successfully. The user cannot determine whether the commit actually succeeded or failed.

Expected behavior:

The SDK should either:

  1. Handle this race condition gracefully by checking if the commit actually succeeded before returning an error, or
  2. Provide clear documentation/guidance on how to handle this scenario, or
  3. Implement a mechanism to distinguish between "commit failed" and "context cancelled but commit may have succeeded"

Steps to reproduce:

  1. Create a context with a timeout or cancellation mechanism
  2. Initiate an interactive transaction using DoTx with this context
  3. Within the transaction, perform some operations
  4. The CommitTx call is made automatically by DoTx when the operation returns nil
  5. Cancel the context (or let timeout expire) after the commit request has been sent but before the response is received
  6. Observe that the SDK returns an error, even though the commit may have succeeded on the server

Related code:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

err := db.Query().DoTx(ctx, func(ctx context.Context, tx query.TxActor) error {
    // ... perform operations ...
    
    // If context is cancelled here (e.g., timeout expires),
    // and CommitTx has already sent the request to the server,
    // the user will see an error even though commit may have succeeded
    return nil
})

Related implementation files:

  • internal/query/client.go: doTx function (line 272-306)
  • internal/query/transaction.go: CommitTx method (line 327-361)
  • retry/retry.go: Retry loop that checks context cancellation

Other information:

While this situation was relatively rare in the C++ SDK, it becomes much more common with implicit deadline propagation to all calls. Users may not anticipate this race condition and might expect the SDK to handle it automatically.

This issue is particularly relevant as implicit deadline propagation becomes more common, making this race condition more frequent than in previous SDK versions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions