session: Prepare on one shard per node only #1320

wprzytula · 2025-04-12T15:01:19Z

Overview

This PR comes with two main enhancements / optimisations:

Prepare on every node, not on every shard. This corresponds to Don't prepare the statement on all connections #1290.
Wait only for one prepare attempt to succeed, not for all prepare attempts. This is inspired by cpp-driver's logic. (dropped)

Below, I describe both changes in detail.

Prepare on every node, not on every shard

Just as other drivers do, now the Rust driver also prepares statements on a single connection to every node, not on a single connection to every shard. This brings performance benefits for both driver and cluster, as it lowers the overhead of handling duplicated prepare requests on both sides.

Problem: unlucky random choice

As mentioned in #1290 (comment), we might be unlucky enough to end up with all randomly chosen connections being non-working. For example, the targeted shards might be overloaded.

Solution

Fallback logic is introduced. Once we fail to prepare the statement on any node though the randomly chosen connections, the preparation is re-attempted on all connections (which is essentially how it used to work before this PR).

Wait only for one prepare attempt to succeed (dropped)

Having taken a look at cpp-driver's handling of statement preparation, I noticed that the cpp-driver waits only for the first preparation to succeed. Preparation on all remaining nodes is done in the background, which decreases latency of the prepare operation from the driver user's PoV. To understand why, consider a node that is stuck/overloaded/temporarily unavailable. If we wait for prepare responses from all nodes, we are limited by the slowest response. This is not what ScyllaDB is designed for - it should be available and fast even if some nodes happen to be unavailable or overloaded.

I decided to implement this behaviour in the Rust driver. This is achieved by spawning a tokio worker task which prepares the statement on all nodes. It feeds all results, successes or failures, into a channel and this way signals the parent task. The parent task finishes early once it receives a successful response (a deserialized PreparedStatement). Meanwhile, the worker task keeps handling the remaining preparations in the background.

~~The change brings two main benefits:~~
~~1. reduces driver's latency upon statement preparation;~~
2. prevents the situation when one stuck node freezes the driver upon
preparation - there are no preparation timeouts! (sic!).

Tests

TODO. Waiting for #1246 to be merged. Then I'll add a new feature to the proxy to be aware of which shard was targeted by the intercepted frame. With that feature available, I'll write tests.

Fixes: #1290

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
~~[ ] I have provided docstrings for the public items that I want to introduce.~~
~~[ ] I have adjusted the documentation in ./docs/source/.~~
I added appropriate Fixes: annotations to PR description.

github-actions · 2025-04-12T15:03:23Z

cargo semver-checks found no API-breaking changes in this PR.
Checked commit: 00d31ab

muzarski

Looks really good. I just wonder - maybe a better idea than spawning a separate task is to just introduce configurable timeout for each attempt (see my other comment).

scylla/src/cluster/node.rs

scylla/src/cluster/state.rs

scylla/src/client/session.rs

wprzytula · 2025-04-16T08:49:00Z

v1.0.1: Addressed @muzarski's nits.

wprzytula · 2025-04-16T08:49:31Z

v.1.0.2: rebased on main.

wprzytula · 2025-04-19T14:32:41Z

v1.1:

replaced internal calls to Session::prepare() with Session::prepare_nongeneric();
added verification of prepared id equality in the worker task.

wprzytula · 2025-04-22T08:52:44Z

v1.1: rebased on main. Incorporated changes from #1329.

Lorak-mmk

The change to prepare on single shard per node is fine.

I'm not sure about the second change (waiting only for the first response).
I do agree that indefinite waiting (if really possible) is a problem, but the approach taken does not convince me.

This is a totally separate change from the one we discussed and planned, so it should be a separate PR. I would like to not block the change we planned, so please split those changes into 2 PRs. I undestand we do have a habit of making small side improvements in PRs, but this is not it, it is another feature comparable in size with the original one that caused this PR.
Rationale about cpp-driver doing this too is not enough. cpp-driver can't really be called a good driver, and it has many, many questionable design decisions. What do other drivers do? If Python Java and Gocql also do this then it is a much stronger arguments.
You did not fix the indefinite waiting problem (btw is it really indefinite? Won't the connection get broken?). The way to fix it is to utilize client-side timeouts (probably using timeout from global execution profile). If we do this then one of the 2 stated benefits of this change ceases to exist.
Previously indefinite timeout would cause the Future returned from Prepare to never resolve.
- It was easily noticeable by user. Now the user will not know about such issue.
- Such Future could be dropped by user, and the only remnant of this would be some entry in ResponseHandlerMap. Now for each such case we will get stale Tokio task running indefinitely in background.
The other stated benfit is the latency of preparation. This does not convince me. I don't remember latency of preparation ever being brought up as an issue by anyone, so why are working on it?
This is probably a rare case, but please note that the following is possible after your change:
- User prepared a statement
- prepare() returns after receiving one response
- User executes this prepared statement
- It lands on a different node than the one that responded first
- The node did not yet process prepare request, so returns UNPREPARED
- We now get unnecessary round trip and network calls.

The complex logic of the preparation_worker is not a problem to me. It's encapsulated, it's (I hope you agree) well-documented, and it uses tokio abstractions to be more high-level.

No matter how well implemented and encapsulated it is, it is still additional code, additional complexity. It increases the amount of stuff that contributors need to read and understand.
We should always have a good reason to introduce additional code, in order to minimize future maintenance burden.

wprzytula · 2025-04-24T12:36:38Z

v2.0:

dropped the "wait for only first successful response" part of this PR.

wprzytula · 2025-04-24T13:39:30Z

Opened #1332.

scylla/src/client/session.rs

wprzytula · 2025-04-25T06:46:06Z

v2.0.1: fixed outdated comments.

scylla/src/cluster/state.rs

scylla/src/client/session.rs

wprzytula · 2025-05-07T07:05:32Z

v2.1:

squashed the &mut dyn Iterator commit into the fallback commit,
extracted more commits to make review smoother,
improved readability of the iter_working_connections*() methods,
brought back the, accidentally removed, logic that collects prepared tracing IDs from prepared statements on all connections.

scylla/src/cluster/state.rs

scylla/src/client/session.rs

`Session::prepare()` is split into thin generic `prepare()` function and nongeneric `prepare_nongeneric()` function that contains all logic. This is to prevent monomorphisation of the whole logic just because one passes several types that implement `Into<Statement>` to `Session::prepare()`.

There's no need for `Session::prepare_nongeneric()` to accept owned `Statement`. It's thus made accept `&Statement`, which elides clone in at least one call site (`Session::prepare_batch()`).

`Session::extract_partitioner_name()` was needlessly a method. It does not require `Self` to perform its work. Therefore, I made it an associated function.

`ClusterState::iter_working_connections` is added `_to_shards` suffix to stress that it returns an iterator over connections to **all shards**. This is different from the semantics of the function that is going to be implemented in subsequent commits, thus the name change.

…lity The convoluted iterator logic in that method caused mental trouble many times. Now it's better split into parts and documented with comments. I hope it helps future readers and reviewers.

This is analogous to iter_working_connections_to_shards(), but it returns only one (random) connection for each node, not for each shard.

Scylla prepares the statement on every shard when processing PREPARE request: https://github.com/scylladb/scylladb/blob/8f0d0daf53397aa68312571ab9d01d8b75cd1770/transport/server.cc#L1104-L1114. At the same time, driver unnecessarily tries to prepare the statement on all connections, which means connections to every shard, so possibly multiple connections to every node. This commit makes the driver prepare the statement only on a single (random) connection to every node. There is one catch: with the new logic, we might sometimes fail preparation even though we could succeed if tried on different shards (some shards might be, for example, overloaded). This issue is solved in further commits.

The docstring now mentions that the statement is prepared on all nodes.

This prepares for the fallback logic introduced in the next commit.

Before this commit, we could unnecessarily return an error from prepare: Let's say we have a 1-node cluster, and we have a broken connection, and we don't retry on another - we would retry error to the user despite possibly being able to prepare. This commit introduces fallback logic to `Session::prepare()`: if preparation on a single (random) connection to every node fails, the whole preparation is retried, this time on a single connection to every shard.

wprzytula · 2025-05-07T08:39:16Z

v2.1.1: adressed @Lorak-mmk comments.

wprzytula · 2025-05-07T08:39:27Z

Rebased on main.

wprzytula added the performance Improves performance of existing features label Apr 12, 2025

wprzytula added this to the 1.2.0 milestone Apr 12, 2025

wprzytula requested review from Lorak-mmk and muzarski April 12, 2025 15:01

wprzytula self-assigned this Apr 12, 2025

wprzytula mentioned this pull request Apr 14, 2025

Schema agreement: Fetch schema version from one connection per node only #1323

Merged

7 tasks

muzarski suggested changes Apr 15, 2025

View reviewed changes

muzarski reviewed Apr 15, 2025

View reviewed changes

scylla/src/client/session.rs Outdated Show resolved Hide resolved

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from b318d22 to c747a25 Compare April 16, 2025 08:48

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from c747a25 to 5a5a8db Compare April 16, 2025 08:49

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from 5a5a8db to bd62bba Compare April 19, 2025 14:30

wprzytula requested a review from muzarski April 19, 2025 14:32

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from bd62bba to 87aee23 Compare April 22, 2025 08:51

Lorak-mmk requested changes Apr 22, 2025

View reviewed changes

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from 87aee23 to 40c507f Compare April 24, 2025 12:35

wprzytula mentioned this pull request Apr 24, 2025

Session::prepare(): Wait only for one prepare attempt to succeed #1332

Open

muzarski approved these changes Apr 24, 2025

View reviewed changes

scylla/src/client/session.rs Outdated Show resolved Hide resolved

scylla/src/client/session.rs Outdated Show resolved Hide resolved

scylla/src/client/session.rs Outdated Show resolved Hide resolved

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from 40c507f to 0236ffe Compare April 25, 2025 06:45

wprzytula requested review from Lorak-mmk and muzarski April 25, 2025 07:19

muzarski approved these changes Apr 29, 2025

View reviewed changes

Lorak-mmk requested changes May 4, 2025

View reviewed changes

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from 0236ffe to 0021871 Compare May 7, 2025 07:03

muzarski approved these changes May 7, 2025

View reviewed changes

wprzytula requested a review from Lorak-mmk May 7, 2025 07:19

Lorak-mmk requested changes May 7, 2025

View reviewed changes

scylla/src/cluster/state.rs Outdated Show resolved Hide resolved

scylla/src/cluster/state.rs Outdated Show resolved Hide resolved

scylla/src/client/session.rs Outdated Show resolved Hide resolved

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from 0021871 to b209464 Compare May 7, 2025 08:38

wprzytula added 12 commits May 7, 2025 10:38

session: prepare_nongeneric() accepts &Statement

5fd0fff

There's no need for `Session::prepare_nongeneric()` to accept owned `Statement`. It's thus made accept `&Statement`, which elides clone in at least one call site (`Session::prepare_batch()`).

session: remove &self from extract_partitioner_name()

662136f

`Session::extract_partitioner_name()` was needlessly a method. It does not require `Self` to perform its work. Therefore, I made it an associated function.

cluster/state: improve iter_working_connections_to_shards() readabi…

b390fb3

…lity The convoluted iterator logic in that method caused mental trouble many times. Now it's better split into parts and documented with comments. I hope it helps future readers and reviewers.

cluster/state: introduce iter_working_connections_to_nodes()

f1d0a85

This is analogous to iter_working_connections_to_shards(), but it returns only one (random) connection for each node, not for each shard.

session: enhance Session::prepare() docstring

8486a9c

The docstring now mentions that the statement is prepared on all nodes.

session: extract Session::prepare_on_all()

cd59d30

This prepares for the fallback logic introduced in the next commit.

session: log error upon prepared statement id mismatch

6b461ae

session: describe "no timeouts upon prepare" issue

00d31ab

wprzytula force-pushed the prepare-on-one-shard-per-node-only branch from b209464 to 00d31ab Compare May 7, 2025 08:39

wprzytula requested review from Lorak-mmk and muzarski May 7, 2025 08:39

Lorak-mmk approved these changes May 7, 2025

View reviewed changes

muzarski approved these changes May 7, 2025

View reviewed changes

wprzytula merged commit 506027c into scylladb:main May 7, 2025
12 checks passed

wprzytula deleted the prepare-on-one-shard-per-node-only branch May 7, 2025 08:45

wprzytula mentioned this pull request May 9, 2025

Preparation logic: write tests #1348

Closed

wprzytula mentioned this pull request May 26, 2025

Release 1.2.0 #1360

Merged

session: Prepare on one shard per node only #1320

session: Prepare on one shard per node only #1320

Uh oh!

Conversation

wprzytula commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Prepare on every node, not on every shard

Problem: unlucky random choice

Solution

Wait only for one prepare attempt to succeed (dropped)

Tests

Pre-review checklist

Uh oh!

github-actions bot commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muzarski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wprzytula commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wprzytula commented Apr 16, 2025

Uh oh!

wprzytula commented Apr 19, 2025

Uh oh!

wprzytula commented Apr 22, 2025

Uh oh!

Lorak-mmk left a comment

Choose a reason for hiding this comment

Uh oh!

wprzytula commented Apr 24, 2025

Uh oh!

wprzytula commented Apr 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wprzytula commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wprzytula commented May 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wprzytula commented May 7, 2025

Uh oh!

wprzytula commented May 7, 2025

Uh oh!

Uh oh!

Uh oh!

wprzytula commented Apr 12, 2025 •

edited

Loading

github-actions bot commented Apr 12, 2025 •

edited

Loading

wprzytula commented Apr 16, 2025 •

edited

Loading