-
Notifications
You must be signed in to change notification settings - Fork 685
ct: Replicate with term #28024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
ct: Replicate with term #28024
Conversation
43f096d
to
b2d0a43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the Raft replication API to consolidate term-based replication by moving the expected_term
parameter from dedicated method overloads into the replicate_options
struct. This eliminates the need for separate replicate
and replicate_in_stages
overloads that accepted a term parameter.
Key changes:
- Added
expected_term
field toreplicate_options
struct with a new constructor - Removed term-based overloads from consensus methods
- Updated all call sites to pass expected term via
replicate_options
instead of as a separate parameter
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
src/v/raft/replicate.h | Added expected_term field and new constructor to replicate_options |
src/v/raft/consensus.h | Removed term-based method overloads, updated documentation |
src/v/raft/consensus.cc | Removed term-based overload implementations |
src/v/raft/replicate_batcher.h | Removed expected_term parameter from method signatures |
src/v/raft/replicate_batcher.cc | Updated to use expected_term from replicate_options |
Multiple test and implementation files | Updated call sites to use new API pattern |
src/v/cluster/rm_stm.cc
Outdated
if (!opts.expected_term.has_value()) { | ||
opts.expected_term = _insync_term; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do a term check here if it is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer have the check in the old place. The fewer surprises the better.
if (!opts.expected_term.has_value()) { | ||
opts.expected_term = _insync_term; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else verify term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to propagate it and let rm_stm to validate normally and maybe step down there. I want to retain all the logging that happens prior to step down.
, batches(std::move(batches)) | ||
, batch_id(batch_id) | ||
, opts(update_replicate_options(opts)) | ||
, opts(update_replicate_options(opts, this->partition->term())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the term returned in the fence. Otherwise you need to check the term here against the fence term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/v/cluster/rm_stm.cc
Outdated
auto unit = co_await _state_lock.hold_read_lock(); | ||
if (bid.is_transactional) { | ||
co_return co_await transactional_replicate(bid, std::move(batch)); | ||
co_return co_await transactional_replicate(bid, std::move(batch), opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a behavioral change here. transactions didn't honor replicate_options.timeout prior to this patch and now they do, since opts are passed along (one can argue they should but I think that should be out of this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed that, now I'm passing the optional there so there is nothing to get this timeout from
src/v/cluster/rm_stm.cc
Outdated
|
||
auto r = co_await _raft->replicate( | ||
synced_term, std::move(batch), make_replicate_options()); | ||
auto r = co_await _raft->replicate(std::move(batch), opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a behavioral change here.. make_replicate_options() sets force_flush = true (to bypass write caching overrides). Would be better to use make_replicate_options() in all transactional code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Retry command for Build#74075please wait until all jobs are finished before running the slash command
|
CI test resultstest results on build#74075
test results on build#74099
|
This commit adds a plumbing for the expected term. Previously, the expected term was passed directly into 'raft::replicate' and 'raft::replicate_in_stages' (2x overloads of each method existed). This commit adds expected_term to replicate_options. The field is supposed to be used to replace the 'replicate' method overloads that take expected term. Since now we have expected_term field in the replicate_options we no longer need to pass this value directly. This commit removes overloads that accept exected term and propagates the value all the way up to the batcher and updates use cases. Signed-off-by: Evgeny Lazin <[email protected]> Signed-off-by: Evgeny Lazin <[email protected]>
Signed-off-by: Evgeny Lazin <[email protected]>
b2d0a43
to
54966ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Will let the replication bless with an approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also LGTM let's wait on @bharathv / @mmaslankaprv to approve. It seems like the primary change is in the second commit? If so, and if you don't get timely approval from replication team then please just resubmit the first commit as a separate PR for replication team review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
This PR adds plumbing required to pass expected term using the
raft::replicate_options
struct instead of using a dedicatedreplicate
orreplicate_in_stages
overload. It uses this field to replicate placeholder batches with specified term.The alternative is to keep
replicate
andrepicate_in_stages
overloads. I don't like this because the same parameter can now be passed using two different means (vial parameter or via thereplicate_options
struct).Backports Required
Release Notes