Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session: Prepare on one shard per node only #1320

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

wprzytula
Copy link
Collaborator

Overview

This PR comes with two main enhancements / optimisations:

  1. Prepare on every node, not on every shard. This corresponds to Don't prepare the statement on all connections #1290.
  2. Wait only for one prepare attempt to succeed, not for all prepare attempts. This is inspired by cpp-driver's logic.

Below, I describe both changes in detail.

Prepare on every node, not on every shard

Just as other drivers do, now the Rust driver also prepares statements on a single connection to every node, not on a single connection to every shard. This brings performance benefits for both driver and cluster, as it lowers the overhead of handling duplicated prepare requests on both sides.

Problem: unlucky random choice

As mentioned in #1290 (comment), we might be unlucky enough to end up with all randomly chosen connections being non-working. For example, the targeted shards might be overloaded.

Solution

Fallback logic is introduced. Once we fail to prepare the statement on any node though the randomly chosen connections, the preparation is re-attempted on all connections (which is essentially how it used to work before this PR).

Wait only for one prepare attempt to succeed

Having taken a look at cpp-driver's handling of statement preparation, I noticed that the cpp-driver waits only for the first preparation to succeed. Preparation on all remaining nodes is done in the background, which decreases latency of the prepare operation from the driver user's PoV. To understand why, consider a node that is stuck/overloaded/temporarily unavailable. If we wait for prepare responses from all nodes, we are limited by the slowest response. This is not what ScyllaDB is designed for - it should be available and fast even if some nodes happen to be unavailable or overloaded.

I decided to implement this behaviour in the Rust driver. This is achieved by spawning a tokio worker task which prepares the statement on all nodes. It feeds all results, successes or failures, into a channel and this way signals the parent task. The parent task finishes early once it receives a successful response (a deserialized PreparedStatement). Meanwhile, the worker task keeps handling the remaining preparations in the background.

The change brings two main benefits:
1. reduces driver's latency upon statement preparation;
2. prevents the situation when one stuck node freezes the driver upon
preparation - there are no preparation timeouts! (sic!).

Tests

TODO. Waiting for #1246 to be merged. Then I'll add a new feature to the proxy to be aware of which shard was targeted by the intercepted frame. With that feature available, I'll write tests.

Fixes: #1290

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have provided docstrings for the public items that I want to introduce.
  • [ ] I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

`Session::prepare()` is split into thin generic `prepare()` function
and nongeneric `prepare_nongeneric()` function that contains all logic.
This is to prevent monomorphisation of the whole logic just because
one passes several types that implement `Into<Statement>` to
`Session::prepare()`.
`ClusterState::iter_working_connections` is added `_to_shards` suffix
to stress that it returns an iterator over connections to
**all shards**. This is different from the semantics of the function
that is going to be implemented in subsequent commits, thus the name
change.
This is analogous to iter_working_connections_to_shards(), but it
returns only one (random) connection for each node, not for each shard.
Scylla prepares the statement on every shard when processing PREPARE
request:
https://github.com/scylladb/scylladb/blob/8f0d0daf53397aa68312571ab9d01d8b75cd1770/transport/server.cc#L1104-L1114.

At the same time, driver unnecessarily tries to prepare the statement on
all connections, which means connections to every shard, so possibly
multiple connections to every node.

This commit makes the driver prepare the statement only on a single
(random) connection to every node.

There is one catch: with the new logic, we might sometimes fail
preparation even though we could succeed if tried on different shards
(some shards might be, for example, overloaded). This issue is solved
in further commits.
The docstring now mentions that the statement is prepared on all nodes.

I also added a comment that considers possible optimisation by altering
behaviour of `Session::prepare()`. I'm going to introduce it in next
commits.
cpp-driver, for instance, only waits for the first preparation attempt
to succeed. This commit follows this approach, which brings two main
benefits:
1. reduces driver's latency upon statement preparation;
2. prevents the situation when one stuck node freezes the driver upon
   preparation - **there are no preparation timeouts!** (sic!).

This is implemented by spawning a tokio worker task which prepares the
statement on all nodes. It feeds all results, successes or failures,
into a channel and this way signals the parent task. The parent task
finishes early once it receives a successful response (a deserialized
`PreparedStatement`). Meanwhile, the worker task keeps handling the
remaining preparations in the background.
This is a step towards implementing the fallback logic, which will solve
the already mentioned issue arising from attempting preparation only on
a subset of connections (for recall, we might be unlucky and randomly
choose a defunct connection or an overloaded shard).
This will aid readability of the following commit.
`prepare_on_all()` extracts all the logic to attempt preparation on
either all nodes or all shards. This will allow us implementing fallback
logic in the next commit.

This commit is viewed best without whitespace difference.
Before this commit, we could unnecessarily return an error from prepare:
Let's say we have a 1-node cluster, and we have a broken connection, and
we don't retry on another - we would retry error to the user despite
possibly being able to prepare.

This commit introduces fallback logic to `Session::prepare()`:
if preparation on a single (random) connection to every node fails,
the whole preparation is retried, this time on a single connection to
every shard.

I'm a bit unhappy that this requires us to clone `statement` on the
happy path (i.e., if the on-all-nodes preparation attempt succeeds),
but I'm quite convinced it's negligible overhead.
@wprzytula wprzytula added the performance Improves performance of existing features label Apr 12, 2025
@wprzytula wprzytula added this to the 1.2.0 milestone Apr 12, 2025
@wprzytula wprzytula requested review from Lorak-mmk and muzarski April 12, 2025 15:01
@wprzytula wprzytula self-assigned this Apr 12, 2025
Copy link

cargo semver-checks found no API-breaking changes in this PR.
Checked commit: b318d22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improves performance of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't prepare the statement on all connections
1 participant