Skip to content

[YSQL] SKIP LOCKED performance under high contention #31347

@emilienoel

Description

@emilienoel

Description

Context

We use SELECT ... FOR UPDATE SKIP LOCKED under high throughput. The system must sustain ~700 ops/sec.

SQL pattern

BEGIN;
SELECT id FROM items WHERE ... ORDER BY id LIMIT 1 FOR UPDATE SKIP LOCKED;
DELETE FROM items WHERE id = $1;
INSERT INTO claimed (...) VALUES (...);
COMMIT;

The problem

Under high contention, hundreds of concurrent workers compete for the same rows on the same tablet. We've identified two bottlenecks in the SKIP LOCKED path:

1. One RPC per candidate row

YugabyteDB disables batching for SKIP LOCKED — each candidate row is a separate RPC from YSQL to the TServer. With N concurrent lockers, a new transaction skips ~N/2 rows on average before finding a free one. At 200 concurrent lockers, that's ~100 RPCs per operation.

This is because SKIP LOCKED needs an immediate per-row answer ("locked or free?") to decide whether to skip, so the existing batch lock path is bypassed.

2. L0 SST file scan cost

For each candidate row, the TServer scans every L0 SST file in the Intents RocksDB to check for conflicting intents. DocDB uses num_levels = 1 with universal compaction, so all files are in L0 with overlapping key ranges. With N active lockers churning intents, L0 files pile up faster than compaction can merge them — and every file must be checked to confirm a row is free.

In MySQL/InnoDB, the equivalent operation is an O(1) lookup in an in-memory lock hash table.

Optimizations we're considering

A. Batch SKIP LOCKED (prototype PR)

Instead of one RPC per candidate, YSQL collects a batch of candidate ybctids and sends them in a single RPC. The TServer iterates locally, skipping locked rows and locking the first free one.

B. Pessimistic lock resolution

When SKIP LOCKED encounters a row with a lock intent, it currently enters the full conflict resolution path — collecting all conflicts, then making an RPC to the transaction status tablet to resolve each one. For SKIP LOCKED, this is unnecessary: if the intent belongs to a locally-known pending transaction, we already know the row is locked and can skip it immediately. This prototype short-circuits that path, avoiding the status tablet RPC for the common case.

C. Intents RocksDB bloom filter

A bloom filter on the Intents RocksDB would short-circuit the L0 file scan when no intents exist for a key. For rows that are not locked (the common case once you skip past the first N), this turns the check from O(L0 files) to O(1).

D. Lock count hint

If the TServer tracks the number of active lockers per tablet, it could skip the first N rows without checking intents — jumping directly to the (N+1)th row as the first real candidate.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/new-featureThis is a request for a completely new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions