Skip to content

Commit a61a37f

Browse files
authored
KAFKA-19452: Fix flaky test LogRecoveryTest.testHWCheckpointWithFailuresMultipleLogSegments (apache#20121)
The `testHWCheckpointWithFailuresMultipleLogSegments` test in `LogRecoveryTest` was failing intermittently due to a race condition during its failure simulation. In successful runs, the follower broker would restart and rejoin the In-Sync Replica (ISR) set before the old leader's failure was fully processed. This allowed for a clean and timely leader election to the now in-sync follower. However, in the failing runs, the follower did not rejoin the ISR before the leader election was triggered. With no replicas in the ISR and unclean leader election disabled by default for the test, the controller correctly refused to elect a new leader, causing the test to time out. This commit fixes the flakiness by overriding the controller configuration for this test to explicitly enable unclean leader election. This allows the out-of-sync replica to be promoted to leader, making the test deterministic and stable. Reviewers: Jun Rao <junrao@gmail.com>
1 parent 8733798 commit a61a37f

1 file changed

Lines changed: 10 additions & 0 deletions

File tree

core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,16 @@ class LogRecoveryTest extends QuorumTestHarness {
6464
def hwFile2 = new OffsetCheckpointFile(new File(configProps2.logDirs.get(0), ReplicaManager.HighWatermarkFilename), null)
6565
var servers = Seq.empty[KafkaBroker]
6666

67+
// testHWCheckpointWithFailuresMultipleLogSegments simulates broker failures that can leave the only available replica out of the
68+
// ISR. By enabling unclean leader election, we ensure that the test can proceed and elect
69+
// the out-of-sync replica as the new leader, which is necessary to validate the log
70+
// recovery and high-watermark checkpointing logic under these specific failure conditions.
71+
override def kraftControllerConfigs(testInfo: TestInfo): Seq[Properties] = {
72+
val properties = new Properties()
73+
properties.put(ReplicationConfigs.UNCLEAN_LEADER_ELECTION_ENABLE_CONFIG, "true")
74+
Seq(properties)
75+
}
76+
6777
// Some tests restart the brokers then produce more data. But since test brokers use random ports, we need
6878
// to use a new producer that knows the new ports
6979
def updateProducer(): Unit = {

0 commit comments

Comments
 (0)