feat: support connection lifetime for single client#727
Conversation
|
@rueian Here is draft. Cloud you check your additional points on the discussion. There is no additional tests yet.
|
| } | ||
| atomic.AddInt32(&p.waits, -1) | ||
| atomic.AddInt32(&p.blcksig, -1) | ||
| p.StopTimer() |
There was a problem hiding this comment.
Oops. I will fix it...
| } | ||
| } | ||
| p.cond.L.Unlock() | ||
| v.StopTimer() |
There was a problem hiding this comment.
If the timer is not stopped successfully, we need to acquire another connection.
There was a problem hiding this comment.
Ah, that's right. Thanks!
| r2ps bool // identify this pipe is used for resp2 pubsub or not | ||
| noNoDelay bool | ||
| lftm time.Duration // lifetime | ||
| lftmMu sync.Mutex // guards lifetime timer |
There was a problem hiding this comment.
Do we really need the mutex and the bool flag?
There was a problem hiding this comment.
I reviewed again, we don't need bool flag.
I thought that time.Reset and time.Stop need mutex when using <= go 1.22 . Maybe I've got it wrong.
There was a problem hiding this comment.
The source looks like it is thread-safe https://cs.opensource.google/go/go/+/refs/tags/go1.22.0:src/runtime/time.go;l=314.
There was a problem hiding this comment.
Thanks, you're right. it looks like thread-safe. I will remove it.
I misread This cannot be done concurrent to other receives from the Timer's channel or other calls to the Timer's Stop method. of https://pkg.go.dev/time@go1.21.13#Timer.Stop . Sorry.
There was a problem hiding this comment.
And we are using the AfterFunc timer which has no channel associated.
31ffe91 to
88c8d7e
Compare
|
The rest is the implementation about retrying on singleclient. |
I think we should use your original proposal and nothing to do with the retry handler. retry:
resp = c.conn.Do(ctx, cmd)
if resp.Error() == errConnExpired {
goto retry
}
if c.retry && cmd.IsReadOnly() && c.isRetryable(resp.Error(), ctx) {
...Because whenever an errConnExpired occurs, we know the connection is closed by ourselves, so it should be safe to retry immediately. |
|
@rueian Thanks. Surely we know the error and it's not good to show errConnExpired to outside when disabling retry too. Retry logic is almost done, just need to add that tests. |
Co-authored-by: Rueian <rueiancsie@gmail.com>
| resps = c.conn.DoMulti(ctx, multi...).s | ||
| if c.hasConnLftm { | ||
| for _, resp := range resps { | ||
| if resp.Error() == errConnExpired { |
There was a problem hiding this comment.
Is it possible that errConnExpired happens in the middle of DoMulti? I am not sure, but If it is possible then we should not retry preceding requests that don't receive the error.
There was a problem hiding this comment.
Ah, I think it's unlikely. Surely all responses have same error when changing p.state.
I will change that like the following.
if resps[0].Error() == errConnExpired {
goto retry
}There was a problem hiding this comment.
ok, could you leave a comment in the code to explain why it won't happen?
There was a problem hiding this comment.
When I was checking the behavior of connection lifetime on concurrent process and then I found the error read tcp [::1]:35190->[::1]:6379: use of closed network connection through singleClient.DoMulti. I think probably the error occurred because of pipe.Close() , but the investigation isn't going well. Could you advice me for that? @rueian
There was a problem hiding this comment.
@rueian Thanks! As you said, it looks like that _backgroundRead returns that error.
I will change that lines to return errConnExpired when expired and then errConnExpired happens in the middle of DoMulti, so we may should check the error of all response.
There was a problem hiding this comment.
Sorry for late. I had no time to spare... I think I can work on this problem from this week. Anyway, I will merge any updates of connection pools. @rueian
I have a feeling that probably this function is only for read replica, right? It seems like write cmds don't work on this approach. Sometime incremented value is 10001, 10002 and so on when loop is 10000.
There was a problem hiding this comment.
Sorry for late. I had no time to spare... I think I can work on this problem from this week.
No worries.
Sometime incremented value is 10001, 10002 and so on when loop is 10000.
What do you mean by this? I think this can be a general feature for those who want a limited lifetime on each connection for any reason.
There was a problem hiding this comment.
When I just counted up for 10000 times using connection lifetime option, the value of keys is over 10000. But my implementation is not correct for now at the point of view of the error handling of _backgroundRead, I will try to count up again after implementing correctly.
Signed-off-by: Rueian <rueiancsie@gmail.com>
|
Hi @terut, would you mind adding the retry logic to the cluster client and sentinel client in a follow-up PR? |
|
@rueian Okay, I'll take care of it. Sorry for waiting long time. |
|
Thanks @terut! |
|
@rueian Thank you for your great help for a long time 😄 |
|
You’re welcome! I know some users really need this feature, so it’s great that we have it. However, I hope we can have your follow-up PR for adding retries to cluster and sentienl soon because next week is the next release cycle. If we don't have the PR merged, we probably can't include this new feature in the next release. |
* feat: add connection lifetime option to single client * Remove mutex and timer flag for connection lifetime timer * Retry wire accquition when failed to stop connection lifetime timer * Add timer test to pipe * Add test for reseting timer and stopping timer when using pool * Remove p.StopTimer() from p.Close() Co-authored-by: Rueian <rueiancsie@gmail.com> * Forced to retry when errConnExpired * Remove hasConnLftm and check resps[0] to retry for multi cmds * Recover connection lifetime error in the middle of calls * Fix the handling of connection lifetime error of DoMultiCache * perf: apply fieldaligments Signed-off-by: Rueian <rueiancsie@gmail.com> --------- Signed-off-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com>
* feat: add connection lifetime option to single client * Remove mutex and timer flag for connection lifetime timer * Retry wire accquition when failed to stop connection lifetime timer * Add timer test to pipe * Add test for reseting timer and stopping timer when using pool * Remove p.StopTimer() from p.Close() Co-authored-by: Rueian <rueiancsie@gmail.com> * Forced to retry when errConnExpired * Remove hasConnLftm and check resps[0] to retry for multi cmds * Recover connection lifetime error in the middle of calls * Fix the handling of connection lifetime error of DoMultiCache * perf: apply fieldaligments Signed-off-by: Rueian <rueiancsie@gmail.com> --------- Signed-off-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Rueian <rueiancsie@gmail.com>
Background
Recently I noticed that request is unbalanced when its replica failover on memorystore for redis of GCP if the connection keeps. So I consider about connection lifetime to reconnect to redis endpoint because existing connection are not rerouted when a node reintroduced.
Here is the document about archtecture and connection balance manegement.
https://cloud.google.com/memorystore/docs/redis/about-read-replicas#architecture
https://cloud.google.com/memorystore/docs/redis/about-read-replicas#connection_balance_management
Ref: #725
Solution
Support connection lifetime for single client to reconnect fixed read endpoint.