Added SharedReplaceableClient and made worker client replaceable between retries#1000
Conversation
85eddf2 to
29a2e2c
Compare
| struct SharedClientData<C> | ||
| where | ||
| C: Clone + Send + Sync, | ||
| { | ||
| client: RwLock<C>, | ||
| generation: AtomicU32, | ||
| } |
There was a problem hiding this comment.
This seems more or less the same as https://docs.rs/arc-swap/latest/arc_swap/
Any reason not to just use that?
There was a problem hiding this comment.
We need both (A) mutable access to the obtained value and (B) avoid unnecessary clones if the value hasn't changed, and as far as I can tell, there's no easy way to do that with ArcSwap.
There was a problem hiding this comment.
You don't actually need mutable access though, no? The only place the write portion of the lock is used is when you're replacing the value, which is the swap part of arcswap. Either way, in fetch(), the entire client is cloned.
There was a problem hiding this comment.
I mean the mutable access to the local clone. In arc_swap, fetch-if-updated is implemented through arc_swap::cache::Cache type, and that type only provides immutable references to the inner value, which would prevent us from calling client methods.
There was a problem hiding this comment.
I'm not sure how Cache plays into it - the main things you'd use are either https://docs.rs/arc-swap/latest/arc_swap/type.ArcSwap.html#method.load-1 or https://docs.rs/arc-swap/latest/arc_swap/type.ArcSwap.html#method.load_full
In this case, since we literally always just immediately clone the client after taking it out of the lock, you can just use load_full and it directly gives you a cloned, ownable client. Seems like that'd work fine?
There was a problem hiding this comment.
load_full doesn't allow checking if the value changed before making a clone, and we'd have to call it before every client call. Considering that "client was not replaced" is the >99.9% case, I think it's worth optimizing for. The PR implementation does 1 atomic load and no copying on the hot path.
There was a problem hiding this comment.
Aaaah, ok sorry I see what you're getting at now, with the whole pathway that goes through refresh_inner (which, side note, might rename that to get_mut_refreshing or something that indicates the action is getting with a side effect of maybe refreshing).
Yeah, that makes sense. And now I see why you were talking about Cache since it has that functionality, but doesn't do mut refs.
Overall then I think this makes sense and works for me. I would say it might be good to have a unit test that tries some loading / replacing with multiple threads going at once to ensure there's not anything unexpected happening when stressed that way
db3cd8d to
f2f2a36
Compare
f2f2a36 to
31fcf69
Compare
What was changed
SharedReplaceableClienttype, a client wrapper that allows replacing the underlying client post-creation. When wrapped insideRetryClient, it allows replacing the client between retries of the same RPC call.WorkerClientBagto useSharedReplaceableClientso that replacing the worker client works as expected for polling.test-utilsto make it easier to spawn additional server instances inside tests.Why?
Fixes an issue where workers would get stuck in retry loop when polling and replacing the client would have no effect.
Checklist
Closes [Bug] Replacing a client on a worker that is failing to poll doesn't start using the new client #976
How was this tested:
client/src/replaceable.rsreplace_client_works_after_polling_failureintests/integ_tests/polling_tests.rs