-
Notifications
You must be signed in to change notification settings - Fork 70
Panic: invalid number of shards during connection pooling #610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It'd be great to explain the problem, the impact and the solution. |
|
I don't think you need to reserve to only returning an error in such case. |
I wanted to discuss this in more details with you. A few options were discussed on Monday. Returning errors instead of panics was mentioned as a few necessary step as far as I remember. |
baaf1a8 to
ffd3eb7
Compare
- Implement connection migration during shard count changes - Preserve existing connections during shard count increases - Close excess connections when shard count decreases - Add comprehensive unit tests for topology change scenarios Fixes: "invalid number of shards" panic during connection pooling Co-authored-by: Dmitry Kropachev <[email protected]>
ffd3eb7 to
26a1b0e
Compare
@mykaul Please check now. |
dkropachev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, couple of minor changes
| pool.initConnPicker(conn) | ||
| pool.connPicker.Put(conn) | ||
| if err := pool.connPicker.Put(conn); err != nil { | ||
| return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to close connection if put had failed, look at the Put code, probably connection is being closed there too, so you need to extract all connection closing code out to here, if it is possible.
| p.logger.Printf("scylla: %s shard count changed from %d to %d, rebuilding connection pool", | ||
| p.address, p.nrShards, nrShards) | ||
| } | ||
| p.handleShardTopologyChange(conn, nrShards) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| p.handleShardTopologyChange(conn, nrShards) | |
| p.handleShardCountChange(conn, nrShards) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also before do so, please check if new shard count is not a zero, if it is zero you need to drop the connection.
We also need to consider that it could persist, which may lead to connection storm.
Let's address it in here - #614
| for i, conn := range oldConns { | ||
| if conn != nil && i < newShardCount { | ||
| newConns[i] = conn | ||
| migratedCount++ | ||
| } else if conn != nil { | ||
| toClose = append(toClose, conn) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for i, conn := range oldConns { | |
| if conn != nil && i < newShardCount { | |
| newConns[i] = conn | |
| migratedCount++ | |
| } else if conn != nil { | |
| toClose = append(toClose, conn) | |
| } | |
| } | |
| for i, conn := range oldConns { | |
| if conn == nil { | |
| continue | |
| } | |
| if i < newShardCount { | |
| newConns[i] = conn | |
| migratedCount++ | |
| } else { | |
| toClose = append(toClose, conn) | |
| } | |
| } |
|
Closed in favor of #619 |
Handle Dynamic Shard Topology Changes in ScyllaDB Connection Picker
Addresses #605.
Problem
The ScyllaDB connection picker (
scyllaConnPicker) had several critical issues when handling dynamic shard topology changes during cluster operations:Impact
Solution
Implemented comprehensive shard topology change handling in
scyllaConnPicker.Put().Original code:
New implementation:
Connection migration (sketch):
Key Changes:
handleShardTopologyChange()intelligently migrates connections during topology transitionsTesting