Description
Context
We use SELECT ... FOR UPDATE SKIP LOCKED under high throughput. The system must sustain ~700 ops/sec.
SQL pattern
BEGIN;
SELECT id FROM items WHERE ... ORDER BY id LIMIT 1 FOR UPDATE SKIP LOCKED;
DELETE FROM items WHERE id = $1;
INSERT INTO claimed (...) VALUES (...);
COMMIT;
The problem
Under high contention, hundreds of concurrent workers compete for the same rows on the same tablet. We've identified two bottlenecks in the SKIP LOCKED path:
1. One RPC per candidate row
YugabyteDB disables batching for SKIP LOCKED — each candidate row is a separate RPC from YSQL to the TServer. With N concurrent lockers, a new transaction skips ~N/2 rows on average before finding a free one. At 200 concurrent lockers, that's ~100 RPCs per operation.
This is because SKIP LOCKED needs an immediate per-row answer ("locked or free?") to decide whether to skip, so the existing batch lock path is bypassed.
2. L0 SST file scan cost
For each candidate row, the TServer scans every L0 SST file in the Intents RocksDB to check for conflicting intents. DocDB uses num_levels = 1 with universal compaction, so all files are in L0 with overlapping key ranges. With N active lockers churning intents, L0 files pile up faster than compaction can merge them — and every file must be checked to confirm a row is free.
In MySQL/InnoDB, the equivalent operation is an O(1) lookup in an in-memory lock hash table.
Optimizations we're considering
A. Batch SKIP LOCKED (prototype PR)
Instead of one RPC per candidate, YSQL collects a batch of candidate ybctids and sends them in a single RPC. The TServer iterates locally, skipping locked rows and locking the first free one.
B. Pessimistic lock resolution
When SKIP LOCKED encounters a row with a lock intent, it currently enters the full conflict resolution path — collecting all conflicts, then making an RPC to the transaction status tablet to resolve each one. For SKIP LOCKED, this is unnecessary: if the intent belongs to a locally-known pending transaction, we already know the row is locked and can skip it immediately. This prototype short-circuits that path, avoiding the status tablet RPC for the common case.
C. Intents RocksDB bloom filter
A bloom filter on the Intents RocksDB would short-circuit the L0 file scan when no intents exist for a key. For rows that are not locked (the common case once you skip past the first N), this turns the check from O(L0 files) to O(1).
D. Lock count hint
If the TServer tracks the number of active lockers per tablet, it could skip the first N rows without checking intents — jumping directly to the (N+1)th row as the first real candidate.
Warning: Please confirm that this issue does not contain any sensitive information
Description
Context
We use
SELECT ... FOR UPDATE SKIP LOCKEDunder high throughput. The system must sustain ~700 ops/sec.SQL pattern
The problem
Under high contention, hundreds of concurrent workers compete for the same rows on the same tablet. We've identified two bottlenecks in the
SKIP LOCKEDpath:1. One RPC per candidate row
YugabyteDB disables batching for
SKIP LOCKED— each candidate row is a separate RPC from YSQL to the TServer. With N concurrent lockers, a new transaction skips ~N/2 rows on average before finding a free one. At 200 concurrent lockers, that's ~100 RPCs per operation.This is because
SKIP LOCKEDneeds an immediate per-row answer ("locked or free?") to decide whether to skip, so the existing batch lock path is bypassed.2. L0 SST file scan cost
For each candidate row, the TServer scans every L0 SST file in the Intents RocksDB to check for conflicting intents. DocDB uses
num_levels = 1with universal compaction, so all files are in L0 with overlapping key ranges. With N active lockers churning intents, L0 files pile up faster than compaction can merge them — and every file must be checked to confirm a row is free.In MySQL/InnoDB, the equivalent operation is an O(1) lookup in an in-memory lock hash table.
Optimizations we're considering
A. Batch SKIP LOCKED (prototype PR)
Instead of one RPC per candidate, YSQL collects a batch of candidate ybctids and sends them in a single RPC. The TServer iterates locally, skipping locked rows and locking the first free one.
B. Pessimistic lock resolution
When
SKIP LOCKEDencounters a row with a lock intent, it currently enters the full conflict resolution path — collecting all conflicts, then making an RPC to the transaction status tablet to resolve each one. ForSKIP LOCKED, this is unnecessary: if the intent belongs to a locally-known pending transaction, we already know the row is locked and can skip it immediately. This prototype short-circuits that path, avoiding the status tablet RPC for the common case.C. Intents RocksDB bloom filter
A bloom filter on the Intents RocksDB would short-circuit the L0 file scan when no intents exist for a key. For rows that are not locked (the common case once you skip past the first N), this turns the check from O(L0 files) to O(1).
D. Lock count hint
If the TServer tracks the number of active lockers per tablet, it could skip the first N rows without checking intents — jumping directly to the (N+1)th row as the first real candidate.
Warning: Please confirm that this issue does not contain any sensitive information