[#31530] YSQL: Stop object-lock RPCs from leaking read-time state into fast-path DML#31531
[#31530] YSQL: Stop object-lock RPCs from leaking read-time state into fast-path DML#31531ellabaron-code wants to merge 2 commits into
Conversation
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Code Review
This pull request modifies the PgTxnManager to prevent deferred read points from leaking into subsequent fast-path writes during local-fastpath object lock acquisition. It introduces an is_local_object_lock_op flag to the CalculateIsolation and SetupPerformOptions methods to skip recomputing need_defer_read_point_ and the clamp/defer setup when acquiring locks. I have no feedback to provide.
| || yb_read_after_commit_visibility == YB_DEFERRED_READ_AFTER_COMMIT_VISIBILITY) | ||
| && docdb_isolation != IsolationLevel::SERIALIZABLE_ISOLATION; | ||
| // | ||
| // Skip recomputing need_defer_read_point_ when called from local-fastpath object lock |
There was a problem hiding this comment.
can you explain the purpose of calling CalculateIsolation after this when is_local_object_lock_op is true?
cc @basavaraj29
There was a problem hiding this comment.
Changed the code to no longer call CalculateIsolation() from AcquireObjectLock().
pao214
left a comment
There was a problem hiding this comment.
Added @basavaraj29 for review.
Added a couple comments. Plan to keep the discussion conversational. Let me know if anything is unclear.
|
commit summary review:
Can you please check the accuracy of this? DDLs also call these functions.
it is also important for the storage layer to pick the read time since fast path writes do not perform conflict resolution on regulardb. i think it is worth making it clear that this is NOT a test only fix. there is an actual correctness issue from this |
…th DML
Background
----------
For object locks (table locks, row locks during DML) there are two
paths between a PG client session and its local tserver:
1. Shared-memory IPC fast path. pggate writes the lock entry into a
shared memory region the tserver maps. No RPC.
2. RPC fallback. pggate sends a PgAcquireObjectLockRequestPB to the
local tserver's PgClientService.
The fast path is taken whenever shared memory is healthy and the lock
mode is fastpath-eligible (<= ROW EXCLUSIVE). Otherwise we fall back to
the RPC path. On macOS, shared-memory IPC currently fails, so every
fastpath-eligible object lock takes the RPC path.
The bug
-------
The RPC fallback (PgTxnManager::AcquireObjectLock) shares two helpers
with regular statement execution: CalculateIsolation and
SetupPerformOptions. Any call into these helpers is assumed to be
preparing options for a statement that reads or writes, so they mutate
per-transaction state derived from yb_read_after_commit_visibility:
* CalculateIsolation unconditionally recomputes need_defer_read_point_.
With yb_read_after_commit_visibility = 'deferred', this lands true.
* SetupPerformOptions inspects need_defer_read_point_ and the relaxed
GUC and sets defer_read_point or clamp_uncertainty_window on the
outgoing PgPerformOptionsPB.
When AcquireObjectLock invokes these helpers, both pieces of state end
up on the *lock RPC's* options. The tserver, processing the lock RPC,
reads those flags and picks a (deferred or clamped) read time during
lock acquisition. The picked time is stored in the session's read
point. The fast-path UPDATE/INSERT/DELETE that immediately follows
inherits that pre-picked read time and never asks the storage layer to
pick its own.
This is a correctness issue, not just a test-visible one. Fast-path
writes skip conflict resolution on regulardb and rely on the storage
layer picking the right read time at execution. When lock acquisition
pre-picks a stale (deferred or clamped) read time, the write runs at
the wrong snapshot — the visibility guarantees the session asked for
via yb_read_after_commit_visibility are silently violated.
The symptom in test is picked_read_time_on_docdb (a per-tablet metric
incremented when the storage layer picks the read time) staying at 0,
which fails these tests on macOS:
PgReadTimeTest.CheckDeferredReadAfterCommitVisibility
PgReadTimeTest.CheckRelaxedReadAfterCommitVisibility
Linux passes only because shared-memory IPC succeeds — the RPC path,
and therefore CalculateIsolation/SetupPerformOptions, is never invoked
during lock acquisition. The bug is latent on Linux and would surface
the moment shared-memory acquisition isn't available (e.g., a higher
mode lock that's not fastpath-eligible, or any future regression in
the shared-memory channel).
The fix
-------
Object lock RPCs perform no reads or writes; their options must not
carry read-time manipulations meant for DML. Plumb the existing
IsLocalObjectLockOp flag (already a parameter to CalculateIsolation)
into SetupPerformOptions and use it to gate two places:
* CalculateIsolation: skip the assignment to need_defer_read_point_
when the call is on the local-fastpath object-lock path. The DML
that follows a fastpath-eligible lock is a NON_TRANSACTIONAL write
that short-circuits before CalculateIsolation runs again, so a
leaked true would survive into the Perform; not mutating the flag
avoids that.
* SetupPerformOptions: skip the entire clamp/defer block when called
from AcquireObjectLock. This covers the relaxed case (a direct GUC
read inside SetupPerformOptions, with no need_defer_read_point_
intermediary) and is defense-in-depth for the deferred case.
Both gates use the same signal: IsLocalObjectLockOp(mode <= ROW EXCLUSIVE).
Higher-mode locks are followed by heavyweight DDL/DML that re-runs
CalculateIsolation properly, so they don't need the gate.
Test
----
Verified on macOS that both PgReadTimeTest.Check{Deferred,Relaxed}
ReadAfterCommitVisibility now pass. No behavior change on the Linux
shared-memory fast path since SetupPerformOptions and CalculateIsolation
are not invoked there.
Address pao214's review feedback on the prior commit. That commit fixed
the symptom (leaked defer/clamp on lock RPCs) by gating individual
mutators with IsLocalObjectLockOp. pao214 pointed out the cleaner shape:
object lock RPCs are not reads or writes, so they shouldn't be running
DML-shaped option setup in the first place.
Restructure rather than gate:
* Extract SetupReadTimeOptions out of SetupPerformOptions. It owns
every read-time mutator: snapshot_read_time, defer/clamp, parallel-
mode ENSURE_READ_TIME_IS_SET, read_time_manipulation, follower-reads,
and the RESET-clear path.
* SetupPerformOptions calls SetupReadTimeOptions only when the call is
not a local-fastpath object-lock op. read_time_serial_no continues
to flow through unconditionally — the lock RPC still needs it to
line up with the subsequent statement.
* AcquireObjectLock no longer calls CalculateIsolation. Lock
acquisition picks no read time, does no DML, and does not change
isolation level; the statement that follows runs its own
CalculateIsolation. Remove the now-unused is_local_object_lock_op
parameter from CalculateIsolation and drop the three internal
gates the prior commit added there.
* ReleaseSessionObjectLock drops its IsLocalObjectLockOp(false)
argument to match the new CalculateIsolation signature.
No behavior change on the Linux shared-memory fast path (the RPC path
still isn't exercised there for fastpath-eligible locks). Higher-mode
locks that do hit the RPC path are unaffected: is_local_object_lock_op
is false for them, so SetupReadTimeOptions still runs as before.
PgReadTimeTest.Check{Deferred,Relaxed}ReadAfterCommitVisibility pass on
both macOS and Linux.
346e78f to
8e53891
Compare
|
Changed the fix as you recommended. Fixed commit message for the first commit as well as updated PR description. Please, let me know if you see any problems with the fix. |
Background
For object locks (table locks, row locks during DML) there are two
paths between a PG client session and its local tserver:
shared memory region the tserver maps. No RPC.
local tserver's PgClientService.
The fast path is taken whenever shared memory is healthy and the lock
mode is fastpath-eligible (<= ROW EXCLUSIVE). Otherwise we fall back to
the RPC path. On macOS, shared-memory IPC currently fails, so every
fastpath-eligible object lock takes the RPC path.
The bug
The RPC fallback (PgTxnManager::AcquireObjectLock) shares two helpers
with regular statement execution: CalculateIsolation and
SetupPerformOptions. Any call into these helpers is assumed to be
preparing options for a statement that reads or writes, so they mutate
per-transaction state derived from yb_read_after_commit_visibility:
With yb_read_after_commit_visibility = 'deferred', this lands true.
GUC and sets defer_read_point or clamp_uncertainty_window on the
outgoing PgPerformOptionsPB.
When AcquireObjectLock invokes these helpers, both pieces of state end
up on the lock RPC's options. The tserver, processing the lock RPC,
reads those flags and picks a (deferred or clamped) read time during
lock acquisition. The picked time is stored in the session's read
point. The fast-path UPDATE/INSERT/DELETE that immediately follows
inherits that pre-picked read time and never asks the storage layer to
pick its own.
This is a correctness issue, not just a test-visible one. Fast-path
writes skip conflict resolution on regulardb and rely on the storage
layer picking the right read time at execution. When lock acquisition
pre-picks a stale (deferred or clamped) read time, the write runs at
the wrong snapshot — the visibility guarantees the session asked for
via yb_read_after_commit_visibility are silently violated.
The symptom in test is picked_read_time_on_docdb (a per-tablet metric
incremented when the storage layer picks the read time) staying at 0,
which fails these tests on macOS:
PgReadTimeTest.CheckDeferredReadAfterCommitVisibility
PgReadTimeTest.CheckRelaxedReadAfterCommitVisibility
Linux passes only because shared-memory IPC succeeds — the RPC path,
and therefore CalculateIsolation/SetupPerformOptions, is never invoked
during lock acquisition. The bug is latent on Linux and would surface
the moment shared-memory acquisition isn't available (e.g., a higher
mode lock that's not fastpath-eligible, or any future regression in
the shared-memory channel).
The fix
Object lock RPCs perform no reads or writes; their options must not
carry read-time manipulations meant for DML. Decouple lock acquisition
from DML option setup:
Extract SetupReadTimeOptions out of SetupPerformOptions. It owns
every read-time mutator on PgPerformOptionsPB: snapshot read time,
clamp/defer, parallel-mode ENSURE_READ_TIME_IS_SET,
read_time_manipulation, read_time_for_follower_reads, and the
RESET-clear path. SetupPerformOptions invokes it only when the op
is not a local-fastpath object lock. read_time_serial_no still
flows through SetupPerformOptions for lock RPCs — they need it to
line up with the subsequent statement.
AcquireObjectLock no longer calls CalculateIsolation. Lock
acquisition picks no read time, does no DML, and does not change
the isolation level; the statement that follows runs its own
CalculateIsolation.
The discriminator is IsLocalObjectLockOp(mode <= ROW EXCLUSIVE).
Higher-mode locks still take the RPC path on Linux and macOS, but
is_local_object_lock_op is false for them, so they continue to run
SetupReadTimeOptions and CalculateIsolation exactly as before — the
heavyweight DDL/DML that follows depends on it.
Test
Verified on macOS that both PgReadTimeTest.Check{Deferred,Relaxed}
ReadAfterCommitVisibility now pass. No behavior change on the Linux
shared-memory fast path since SetupPerformOptions and CalculateIsolation
are not invoked there.
Resolves #31530.