Skip to content

[#31530] YSQL: Stop object-lock RPCs from leaking read-time state into fast-path DML#31531

Open
ellabaron-code wants to merge 2 commits into
yugabyte:masterfrom
Shopify:fix-acquire-object-lock-read-time-leak
Open

[#31530] YSQL: Stop object-lock RPCs from leaking read-time state into fast-path DML#31531
ellabaron-code wants to merge 2 commits into
yugabyte:masterfrom
Shopify:fix-acquire-object-lock-read-time-leak

Conversation

@ellabaron-code
Copy link
Copy Markdown
Collaborator

@ellabaron-code ellabaron-code commented May 8, 2026

Background

For object locks (table locks, row locks during DML) there are two
paths between a PG client session and its local tserver:

  1. Shared-memory IPC fast path. pggate writes the lock entry into a
    shared memory region the tserver maps. No RPC.
  2. RPC fallback. pggate sends a PgAcquireObjectLockRequestPB to the
    local tserver's PgClientService.

The fast path is taken whenever shared memory is healthy and the lock
mode is fastpath-eligible (<= ROW EXCLUSIVE). Otherwise we fall back to
the RPC path. On macOS, shared-memory IPC currently fails, so every
fastpath-eligible object lock takes the RPC path.

The bug

The RPC fallback (PgTxnManager::AcquireObjectLock) shares two helpers
with regular statement execution: CalculateIsolation and
SetupPerformOptions. Any call into these helpers is assumed to be
preparing options for a statement that reads or writes, so they mutate
per-transaction state derived from yb_read_after_commit_visibility:

  • CalculateIsolation unconditionally recomputes need_defer_read_point_.
    With yb_read_after_commit_visibility = 'deferred', this lands true.
  • SetupPerformOptions inspects need_defer_read_point_ and the relaxed
    GUC and sets defer_read_point or clamp_uncertainty_window on the
    outgoing PgPerformOptionsPB.

When AcquireObjectLock invokes these helpers, both pieces of state end
up on the lock RPC's options. The tserver, processing the lock RPC,
reads those flags and picks a (deferred or clamped) read time during
lock acquisition. The picked time is stored in the session's read
point. The fast-path UPDATE/INSERT/DELETE that immediately follows
inherits that pre-picked read time and never asks the storage layer to
pick its own.

This is a correctness issue, not just a test-visible one. Fast-path
writes skip conflict resolution on regulardb and rely on the storage
layer picking the right read time at execution. When lock acquisition
pre-picks a stale (deferred or clamped) read time, the write runs at
the wrong snapshot — the visibility guarantees the session asked for
via yb_read_after_commit_visibility are silently violated.

The symptom in test is picked_read_time_on_docdb (a per-tablet metric
incremented when the storage layer picks the read time) staying at 0,
which fails these tests on macOS:

PgReadTimeTest.CheckDeferredReadAfterCommitVisibility
PgReadTimeTest.CheckRelaxedReadAfterCommitVisibility

Linux passes only because shared-memory IPC succeeds — the RPC path,
and therefore CalculateIsolation/SetupPerformOptions, is never invoked
during lock acquisition. The bug is latent on Linux and would surface
the moment shared-memory acquisition isn't available (e.g., a higher
mode lock that's not fastpath-eligible, or any future regression in
the shared-memory channel).

The fix

Object lock RPCs perform no reads or writes; their options must not
carry read-time manipulations meant for DML. Decouple lock acquisition
from DML option setup:

  • Extract SetupReadTimeOptions out of SetupPerformOptions. It owns
    every read-time mutator on PgPerformOptionsPB: snapshot read time,
    clamp/defer, parallel-mode ENSURE_READ_TIME_IS_SET,
    read_time_manipulation, read_time_for_follower_reads, and the
    RESET-clear path. SetupPerformOptions invokes it only when the op
    is not a local-fastpath object lock. read_time_serial_no still
    flows through SetupPerformOptions for lock RPCs — they need it to
    line up with the subsequent statement.

  • AcquireObjectLock no longer calls CalculateIsolation. Lock
    acquisition picks no read time, does no DML, and does not change
    the isolation level; the statement that follows runs its own
    CalculateIsolation.

The discriminator is IsLocalObjectLockOp(mode <= ROW EXCLUSIVE).
Higher-mode locks still take the RPC path on Linux and macOS, but
is_local_object_lock_op is false for them, so they continue to run
SetupReadTimeOptions and CalculateIsolation exactly as before — the
heavyweight DDL/DML that follows depends on it.

Test

Verified on macOS that both PgReadTimeTest.Check{Deferred,Relaxed}
ReadAfterCommitVisibility now pass. No behavior change on the Linux
shared-memory fast path since SetupPerformOptions and CalculateIsolation
are not invoked there.

Resolves #31530.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 8, 2026

Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 8e53891
🔍 Latest deploy log https://app.netlify.com/projects/infallible-bardeen-164bc9/deploys/6a02ab2a9866b7000861738c
😎 Deploy Preview https://deploy-preview-31531--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the PgTxnManager to prevent deferred read points from leaking into subsequent fast-path writes during local-fastpath object lock acquisition. It introduces an is_local_object_lock_op flag to the CalculateIsolation and SetupPerformOptions methods to skip recomputing need_defer_read_point_ and the clamp/defer setup when acquiring locks. I have no feedback to provide.

@pao214 pao214 requested review from basavaraj29 and pao214 May 8, 2026 22:59
Comment thread src/yb/yql/pggate/pg_txn_manager.cc Outdated
|| yb_read_after_commit_visibility == YB_DEFERRED_READ_AFTER_COMMIT_VISIBILITY)
&& docdb_isolation != IsolationLevel::SERIALIZABLE_ISOLATION;
//
// Skip recomputing need_defer_read_point_ when called from local-fastpath object lock
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain the purpose of calling CalculateIsolation after this when is_local_object_lock_op is true?

cc @basavaraj29

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the code to no longer call CalculateIsolation() from AcquireObjectLock().

Comment thread src/yb/yql/pggate/pg_txn_manager.cc Outdated
Copy link
Copy Markdown
Contributor

@pao214 pao214 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added @basavaraj29 for review.
Added a couple comments. Plan to keep the discussion conversational. Let me know if anything is unclear.

@pao214
Copy link
Copy Markdown
Contributor

pao214 commented May 12, 2026

commit summary review:

A YSQL backend (pggate) talks only to its colocated tserver.

  1. what is the motivation behind adding this sentence to the commit summary? is it because local lock ops do not need to pick read time?
  2. as a general rule, lets try and keep the commit summary relevant to the issue. its great that the commit summary is clear. but the commit summary must be relevant
  3. can we replace colocated tserver with pg client session? saying that is more accurate since the colocated tserver also has a storage layer. and this statement is not talking about the storage layer

Those helpers were originally designed for DML

Can you please check the accuracy of this? DDLs also call these functions.

and never asks the storage layer to pick its own.

it is also important for the storage layer to pick the read time since fast path writes do not perform conflict resolution on regulardb. i think it is worth making it clear that this is NOT a test only fix. there is an actual correctness issue from this

@ellabaron-code ellabaron-code requested a review from pao214 May 12, 2026 04:11
…th DML

Background
----------
For object locks (table locks, row locks during DML) there are two
paths between a PG client session and its local tserver:

  1. Shared-memory IPC fast path. pggate writes the lock entry into a
     shared memory region the tserver maps. No RPC.
  2. RPC fallback. pggate sends a PgAcquireObjectLockRequestPB to the
     local tserver's PgClientService.

The fast path is taken whenever shared memory is healthy and the lock
mode is fastpath-eligible (<= ROW EXCLUSIVE). Otherwise we fall back to
the RPC path. On macOS, shared-memory IPC currently fails, so every
fastpath-eligible object lock takes the RPC path.

The bug
-------
The RPC fallback (PgTxnManager::AcquireObjectLock) shares two helpers
with regular statement execution: CalculateIsolation and
SetupPerformOptions. Any call into these helpers is assumed to be
preparing options for a statement that reads or writes, so they mutate
per-transaction state derived from yb_read_after_commit_visibility:

  * CalculateIsolation unconditionally recomputes need_defer_read_point_.
    With yb_read_after_commit_visibility = 'deferred', this lands true.
  * SetupPerformOptions inspects need_defer_read_point_ and the relaxed
    GUC and sets defer_read_point or clamp_uncertainty_window on the
    outgoing PgPerformOptionsPB.

When AcquireObjectLock invokes these helpers, both pieces of state end
up on the *lock RPC's* options. The tserver, processing the lock RPC,
reads those flags and picks a (deferred or clamped) read time during
lock acquisition. The picked time is stored in the session's read
point. The fast-path UPDATE/INSERT/DELETE that immediately follows
inherits that pre-picked read time and never asks the storage layer to
pick its own.

This is a correctness issue, not just a test-visible one. Fast-path
writes skip conflict resolution on regulardb and rely on the storage
layer picking the right read time at execution. When lock acquisition
pre-picks a stale (deferred or clamped) read time, the write runs at
the wrong snapshot — the visibility guarantees the session asked for
via yb_read_after_commit_visibility are silently violated.

The symptom in test is picked_read_time_on_docdb (a per-tablet metric
incremented when the storage layer picks the read time) staying at 0,
which fails these tests on macOS:

  PgReadTimeTest.CheckDeferredReadAfterCommitVisibility
  PgReadTimeTest.CheckRelaxedReadAfterCommitVisibility

Linux passes only because shared-memory IPC succeeds — the RPC path,
and therefore CalculateIsolation/SetupPerformOptions, is never invoked
during lock acquisition. The bug is latent on Linux and would surface
the moment shared-memory acquisition isn't available (e.g., a higher
mode lock that's not fastpath-eligible, or any future regression in
the shared-memory channel).

The fix
-------
Object lock RPCs perform no reads or writes; their options must not
carry read-time manipulations meant for DML. Plumb the existing
IsLocalObjectLockOp flag (already a parameter to CalculateIsolation)
into SetupPerformOptions and use it to gate two places:

  * CalculateIsolation: skip the assignment to need_defer_read_point_
    when the call is on the local-fastpath object-lock path. The DML
    that follows a fastpath-eligible lock is a NON_TRANSACTIONAL write
    that short-circuits before CalculateIsolation runs again, so a
    leaked true would survive into the Perform; not mutating the flag
    avoids that.

  * SetupPerformOptions: skip the entire clamp/defer block when called
    from AcquireObjectLock. This covers the relaxed case (a direct GUC
    read inside SetupPerformOptions, with no need_defer_read_point_
    intermediary) and is defense-in-depth for the deferred case.

Both gates use the same signal: IsLocalObjectLockOp(mode <= ROW EXCLUSIVE).
Higher-mode locks are followed by heavyweight DDL/DML that re-runs
CalculateIsolation properly, so they don't need the gate.

Test
----
Verified on macOS that both PgReadTimeTest.Check{Deferred,Relaxed}
ReadAfterCommitVisibility now pass. No behavior change on the Linux
shared-memory fast path since SetupPerformOptions and CalculateIsolation
are not invoked there.
Address pao214's review feedback on the prior commit. That commit fixed
the symptom (leaked defer/clamp on lock RPCs) by gating individual
mutators with IsLocalObjectLockOp. pao214 pointed out the cleaner shape:
object lock RPCs are not reads or writes, so they shouldn't be running
DML-shaped option setup in the first place.

Restructure rather than gate:

  * Extract SetupReadTimeOptions out of SetupPerformOptions. It owns
    every read-time mutator: snapshot_read_time, defer/clamp, parallel-
    mode ENSURE_READ_TIME_IS_SET, read_time_manipulation, follower-reads,
    and the RESET-clear path.

  * SetupPerformOptions calls SetupReadTimeOptions only when the call is
    not a local-fastpath object-lock op. read_time_serial_no continues
    to flow through unconditionally — the lock RPC still needs it to
    line up with the subsequent statement.

  * AcquireObjectLock no longer calls CalculateIsolation. Lock
    acquisition picks no read time, does no DML, and does not change
    isolation level; the statement that follows runs its own
    CalculateIsolation. Remove the now-unused is_local_object_lock_op
    parameter from CalculateIsolation and drop the three internal
    gates the prior commit added there.

  * ReleaseSessionObjectLock drops its IsLocalObjectLockOp(false)
    argument to match the new CalculateIsolation signature.

No behavior change on the Linux shared-memory fast path (the RPC path
still isn't exercised there for fastpath-eligible locks). Higher-mode
locks that do hit the RPC path are unaffected: is_local_object_lock_op
is false for them, so SetupReadTimeOptions still runs as before.

PgReadTimeTest.Check{Deferred,Relaxed}ReadAfterCommitVisibility pass on
both macOS and Linux.
@ellabaron-code ellabaron-code force-pushed the fix-acquire-object-lock-read-time-leak branch from 346e78f to 8e53891 Compare May 12, 2026 04:23
@ellabaron-code
Copy link
Copy Markdown
Collaborator Author

Changed the fix as you recommended. Fixed commit message for the first commit as well as updated PR description.

Please, let me know if you see any problems with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[YSQL] Object-lock RPC fallback leaks read-time state into fast-path DML

3 participants