Skip to content

Commit 8ce543a

Browse files
dumbbellmergify[bot]
authored andcommitted
rabbit_quorum_queue: Wait for member add in add_member/4
[Why] The `ra:member_add/3` call returns before the change is committed. This is ok for that addition but any follow-up changes to the cluster might be rejected with the `cluster_change_not_permitted` error. [How] Instead of changing other places to wait or retry their cluster membership change, this patch waits for the current add to be applied before proceeding and returning. This fixes some transient failures in CI where such follow-up changes are rejected and not retried, leaving the cluster in an unexpected state for the testcase. An example is with `quorum_queue_SUITE:force_shrink_member_to_current_member/1` (cherry picked from commit 99d8e90)
1 parent 83da9fe commit 8ce543a

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

deps/rabbit/src/rabbit_quorum_queue.erl

+14-1
Original file line numberDiff line numberDiff line change
@@ -1346,14 +1346,27 @@ add_member(Q, Node, Membership, Timeout) when ?amqqueue_is_quorum(Q) ->
13461346
maps:get(id, Conf)
13471347
end,
13481348
case ra:add_member(Members, ServerIdSpec, Timeout) of
1349-
{ok, _, Leader} ->
1349+
{ok, {RaIndex, RaTerm}, Leader} ->
13501350
Fun = fun(Q1) ->
13511351
Q2 = update_type_state(
13521352
Q1, fun(#{nodes := Nodes} = Ts) ->
13531353
Ts#{nodes => [Node | Nodes]}
13541354
end),
13551355
amqqueue:set_pid(Q2, Leader)
13561356
end,
1357+
%% The `ra:member_add/3` call above returns before the
1358+
%% change is committed. This is ok for that addition but
1359+
%% any follow-up changes to the cluster might be rejected
1360+
%% with the `cluster_change_not_permitted` error.
1361+
%%
1362+
%% Instead of changing other places to wait or retry their
1363+
%% cluster membership change, we wait for the current add
1364+
%% to be applied using a conditional leader query before
1365+
%% proceeding and returning.
1366+
{ok, _, _} = ra:leader_query(
1367+
Leader,
1368+
{erlang, is_list, []},
1369+
#{condition => {applied, {RaIndex, RaTerm}}}),
13571370
_ = rabbit_amqqueue:update(QName, Fun),
13581371
rabbit_log:info("Added a replica of quorum ~ts on node ~ts", [rabbit_misc:rs(QName), Node]),
13591372
ok;

0 commit comments

Comments
 (0)