-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
session: Prepare on one shard per node only #1320
Open
wprzytula
wants to merge
11
commits into
scylladb:main
Choose a base branch
from
wprzytula:prepare-on-one-shard-per-node-only
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
session: Prepare on one shard per node only #1320
wprzytula
wants to merge
11
commits into
scylladb:main
from
wprzytula:prepare-on-one-shard-per-node-only
+183
−37
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`Session::prepare()` is split into thin generic `prepare()` function and nongeneric `prepare_nongeneric()` function that contains all logic. This is to prevent monomorphisation of the whole logic just because one passes several types that implement `Into<Statement>` to `Session::prepare()`.
`ClusterState::iter_working_connections` is added `_to_shards` suffix to stress that it returns an iterator over connections to **all shards**. This is different from the semantics of the function that is going to be implemented in subsequent commits, thus the name change.
This is analogous to iter_working_connections_to_shards(), but it returns only one (random) connection for each node, not for each shard.
Scylla prepares the statement on every shard when processing PREPARE request: https://github.com/scylladb/scylladb/blob/8f0d0daf53397aa68312571ab9d01d8b75cd1770/transport/server.cc#L1104-L1114. At the same time, driver unnecessarily tries to prepare the statement on all connections, which means connections to every shard, so possibly multiple connections to every node. This commit makes the driver prepare the statement only on a single (random) connection to every node. There is one catch: with the new logic, we might sometimes fail preparation even though we could succeed if tried on different shards (some shards might be, for example, overloaded). This issue is solved in further commits.
The docstring now mentions that the statement is prepared on all nodes. I also added a comment that considers possible optimisation by altering behaviour of `Session::prepare()`. I'm going to introduce it in next commits.
cpp-driver, for instance, only waits for the first preparation attempt to succeed. This commit follows this approach, which brings two main benefits: 1. reduces driver's latency upon statement preparation; 2. prevents the situation when one stuck node freezes the driver upon preparation - **there are no preparation timeouts!** (sic!). This is implemented by spawning a tokio worker task which prepares the statement on all nodes. It feeds all results, successes or failures, into a channel and this way signals the parent task. The parent task finishes early once it receives a successful response (a deserialized `PreparedStatement`). Meanwhile, the worker task keeps handling the remaining preparations in the background.
This is a step towards implementing the fallback logic, which will solve the already mentioned issue arising from attempting preparation only on a subset of connections (for recall, we might be unlucky and randomly choose a defunct connection or an overloaded shard).
This will aid readability of the following commit.
`prepare_on_all()` extracts all the logic to attempt preparation on either all nodes or all shards. This will allow us implementing fallback logic in the next commit. This commit is viewed best without whitespace difference.
Before this commit, we could unnecessarily return an error from prepare: Let's say we have a 1-node cluster, and we have a broken connection, and we don't retry on another - we would retry error to the user despite possibly being able to prepare. This commit introduces fallback logic to `Session::prepare()`: if preparation on a single (random) connection to every node fails, the whole preparation is retried, this time on a single connection to every shard. I'm a bit unhappy that this requires us to clone `statement` on the happy path (i.e., if the on-all-nodes preparation attempt succeeds), but I'm quite convinced it's negligible overhead.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR comes with two main enhancements / optimisations:
Below, I describe both changes in detail.
Prepare on every node, not on every shard
Just as other drivers do, now the Rust driver also prepares statements on a single connection to every node, not on a single connection to every shard. This brings performance benefits for both driver and cluster, as it lowers the overhead of handling duplicated prepare requests on both sides.
Problem: unlucky random choice
As mentioned in #1290 (comment), we might be unlucky enough to end up with all randomly chosen connections being non-working. For example, the targeted shards might be overloaded.
Solution
Fallback logic is introduced. Once we fail to prepare the statement on any node though the randomly chosen connections, the preparation is re-attempted on all connections (which is essentially how it used to work before this PR).
Wait only for one prepare attempt to succeed
Having taken a look at cpp-driver's handling of statement preparation, I noticed that the cpp-driver waits only for the first preparation to succeed. Preparation on all remaining nodes is done in the background, which decreases latency of the prepare operation from the driver user's PoV. To understand why, consider a node that is stuck/overloaded/temporarily unavailable. If we wait for prepare responses from all nodes, we are limited by the slowest response. This is not what ScyllaDB is designed for - it should be available and fast even if some nodes happen to be unavailable or overloaded.
I decided to implement this behaviour in the Rust driver. This is achieved by spawning a tokio worker task which prepares the statement on all nodes. It feeds all results, successes or failures, into a channel and this way signals the parent task. The parent task finishes early once it receives a successful response (a deserialized
PreparedStatement
). Meanwhile, the worker task keeps handling the remaining preparations in the background.The change brings two main benefits:
1. reduces driver's latency upon statement preparation;
2. prevents the situation when one stuck node freezes the driver upon
preparation - there are no preparation timeouts! (sic!).
Tests
TODO. Waiting for #1246 to be merged. Then I'll add a new feature to the proxy to be aware of which shard was targeted by the intercepted frame. With that feature available, I'll write tests.
Fixes: #1290
Pre-review checklist
[ ] I have provided docstrings for the public items that I want to introduce.[ ] I have adjusted the documentation in./docs/source/
.Fixes:
annotations to PR description.