Skip to content

KAFKA-18966: Change controller_num_nodes_override to override_num_isolated_controllers #19194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

CalvinConfluent
Copy link
Contributor

Bounce the broker that hosting the controller can cause unnecessary downtime for the partition. Avoid creating only 1 controller node in the combined kraft mode. For more details please check https://issues.apache.org/jira/browse/KAFKA-18966

@github-actions github-actions bot added triage PRs from the community tests Test fixes (including flaky tests) small Small PRs labels Mar 12, 2025
@chia7712 chia7712 changed the title kafka-18966: don't honor the controller_num_nodes_override in combined kraft test KAFKA-18966: don't honor the controller_num_nodes_override in combined kraft test Mar 13, 2025
@chia7712
Copy link
Member

@CalvinConfluent Could you please share the tests to me? I'd like to test it on my local to ensure it does reduce the test time.

@CalvinConfluent
Copy link
Contributor Author

@chia7712 Thanks for checking. It is the transactions_test.py where Combined_kraft and hard_bounce are used. The transaction timeout is 40s, in my environment, the controller restart time(broker restart + controller starting handle ISR updates) is very close to 40s which makes the test flaky.
I may update the PR a bit.

@CalvinConfluent CalvinConfluent changed the title KAFKA-18966: don't honor the controller_num_nodes_override in combined kraft test KAFKA-18966: Change controller_num_nodes_override to override_num_isolated_controllers Mar 13, 2025
@@ -330,7 +327,7 @@ def __init__(self, context, num_nodes, zk, security_protocol=SecurityConfig.PLAI
if self.quorum_info.has_brokers:
num_nodes_broker_role = num_nodes
if self.quorum_info.has_controllers:
self.num_nodes_controller_role = self.num_kraft_controllers(num_nodes_broker_role, controller_num_nodes_override)
self.num_nodes_controller_role = self.num_kraft_controllers(num_nodes_broker_role, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon me, can assigning the value to 0 address the "Propose to set the controller node to at least 3"? It seems to me, it still depend on the num_nodes_broker_role, right?

        if num_nodes_broker_role < 3:
            return 1
        if num_nodes_broker_role < 5:
            return 3
        return 5

https://github.com/apache/kafka/blob/trunk/tests/kafkatest/services/kafka/kafka.py#L495

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the available nodes are less than 3, it only assigns 1 controller node. I can update the description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be mistaken, but I understand this pull request to be intended to increase the number of controller nodes, thereby preventing a complete controller service outage during a rolling bounce. Is that correct?

@github-actions github-actions bot removed the triage PRs from the community label Mar 17, 2025
@CalvinConfluent
Copy link
Contributor Author

@chia7712 Can you help take another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
small Small PRs tests Test fixes (including flaky tests)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants