-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce intra cluster conflicts #5371
base: main
Are you sure you want to change the base?
Conversation
b727716
to
04000f7
Compare
noting need for a liveness check at the start (so the 'leader' is always a live node as of the invocation) and also to consider maintenance mode (receipt of that message from a 'leader' specifically) |
577dc25
to
e0807f5
Compare
src/fabric/src/fabric_doc_update.erl
Outdated
append_update_replies( | ||
Rest1, Rest2, W, dict:append_list(Doc, lists:duplicate(W, conflict), Dict0) | ||
); | ||
append_update_replies([Doc | Rest1], [Reply | Rest2], W, Dict0) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we hit a conflict on the 2nd reply of a W=3 request, would we still want to do a lists:duplicate(W, conflict)
set of a fake replies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm good question. the 2nd reply is only happening if we started the followers, and if we do that we're doing them all in parallel, so whether we send conflict based on just the 2nd reply or conflict|accepted if we got 201/409 from both, there's still the same chance of a stored conflict, it's only the status code of the response that would be wrong (we'd send 409 instead of 202, but that can happen anyway).
I had considered doing this in tiers. that is, if we hear that the 'leader' shard is down (rexi DOWN or the rexi EXIT for maintenance mode) we could recalculate and deem one of the N-1 remaining shards as the leader, and do the same logic. I didn't do it because I didn't like the latency implications, but it leads to the situation above.
I agree it would be clearer if the fake reply thing only triggers if the reply is from a leader. It won't make things much better but it'll be easier to understand later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it seems unusual that on a 2nd conflict reply we'd end up with with 4 total replies on an N=3 db -- the original 1 + (W=3) conflict
ones. Just worried that has a chance to mess up some our logic since we're, in general, careful around getting just the right number of replies and decide what to based on that.
src/fabric/src/fabric_doc_update.erl
Outdated
Rest1, Rest2, W, dict:append_list(Doc, lists:duplicate(W, conflict), Dict0) | ||
); | ||
append_update_replies([Doc | Rest1], [Reply | Rest2], W, Dict0) -> | ||
append_update_replies(Rest1, Rest2, W, dict:append(Doc, Reply, Dict0)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny style nit: DocRest vs ReplyRest might be more clear, and we're already using them in remove_conflicts function, so it would be more consistent, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change append_update_replies, so Rest1 / Rest2 are in the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point. A clean up like that (a rename for clarity) would go a long with a separate #acc{}
record PR ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very nice improvement!
I haven't looked at it in depth yet, so had only a few style suggestions and questions.
Definitely like introducing the acc record. To help review the change easier, would be possible to split it out as a separate PR and and merge that first? It's a great improvement on its own, and it would help minimize the leader/follow bits changes only in a subsequent PR.
I put the |
79f92d7
to
ef74641
Compare
That would look nice when merging, however since a bunch of notes and comments are related to just the acc change may still look nice to do a preliminary cleanup / #acc{} record PR and keep the the overall discussion concerning it in that PR then move on the main one. |
ef74641
to
e30f426
Compare
it's hard enough to get CI to run for one pull request. I'll do it this time but I really don't like this idea at all. I separated out the refactoring from the functional change within the PR and it makes little sense to commit one without the other. |
#5385 for acc record |
7871979
to
e482d53
Compare
I think it's worthwhile. At least in this case the test failures do seem related to bulk_docs:
I keep an eye on flaky failures in the CI and this not a test that usually fails. |
for sure, that indicates a bug here. It took 8 runs locally to get an internal_server_error so this will be fun to track down, but it needs to be found |
e482d53
to
2b006b8
Compare
fdf909c
to
5c70ed7
Compare
3e22ba1
to
50d5248
Compare
This should prevent spurious intra-cluster conflicts most of the time. It is not true consistency, however. spurious conflicts are still possible whenever the nodes in the cluster disagree on the current live set of other nodes.
50d5248
to
5869098
Compare
Overview
CouchDB issues all write requests in parallel without coordination, applying a quorum on the results of those independent actions. When updating a document concurrently this can lead to the introduction of a stored conflict if two different writes reach separate nodes first. This is undesirable.
This patch changes fabric_doc_update in the following ways;
Testing recommendations
There is some existing coverage in the module itself but more testing is needed before this can be merged.
Related Issues or Pull Requests
Checklist
rel/overlay/etc/default.ini
src/docs
folder