Skip to content

[BUG] Kraft Metadata Cluster down due commit recordbatch exceed limit #2057

Open
@lifepuzzlefun

Description

@lifepuzzlefun

Version & Environment

master

What went wrong?

raft no available leader the leader elected and step down due commit fail.

[2024-10-09 18:39:02,084] INFO [RaftManager id=2] Become candidate due to fetch timeout (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:02,085] INFO [RaftManager id=2] Completed transition to CandidateState(localId=2, localDirectoryId=ZopiWufpyFjYOJ-CzWKiMQ,epoch=951, retries=1, voteStates={1=UNRECORDED, 2=GRANTED, 3=UNRECORDED}, highWatermark=Optional[LogOffsetMetadata(offset=3186751955, metadata=Optional.empty)], electionTimeoutMs=1495) from FollowerState(fetchTimeoutMs=2000, epoch=950, leaderId=1, voters=[1, 2, 3], highWatermark=Optional[LogOffsetMetadata(offset=3186751955, metadata=Optional.empty)], fetchingSnapshot=Optional.empty) (org.apache.kafka.raft.QuorumState)
[2024-10-09 18:39:02,087] INFO [RaftManager id=2] Completed transition to Leader(localId=2, epoch=951, epochStartOffset=3186751957, highWatermark=Optional.empty, voterStates={1=ReplicaState(nodeId=1, endOffset=Optional.empty, lastFetchTimestamp=-1, lastCaughtUpTimestamp=-1, hasAcknowledgedLeader=false), 2=ReplicaState(nodeId=2, endOffset=Optional.empty, lastFetchTimestamp=-1, lastCaughtUpTimestamp=-1, hasAcknowledgedLeader=true), 3=ReplicaState(nodeId=3, endOffset=Optional.empty, lastFetchTimestamp=-1, lastCaughtUpTimestamp=-1, hasAcknowledgedLeader=false)}) from CandidateState(localId=2, localDirectoryId=ZopiWufpyFjYOJ-CzWKiMQ,epoch=951, retries=1, voteStates={1=GRANTED, 2=GRANTED, 3=UNRECORDED}, highWatermark=Optional[LogOffsetMetadata(offset=3186751955, metadata=Optional.empty)], electionTimeoutMs=1495) (org.apache.kafka.raft.QuorumState)
[2024-10-09 18:39:02,090] INFO [RaftManager id=2] High watermark set to LogOffsetMetadata(offset=3186751958, metadata=Optional[(segmentBaseOffset=3182036153,relativePositionInSegment=260993522)]) for the first time for epoch 951 based on indexOfHw 1 and voters [ReplicaState(nodeId=1, endOffset=Optional[LogOffsetMetadata(offset=3186751958, metadata=Optional[(segmentBaseOffset=3182036153,relativePositionInSegment=260993522)])], lastFetchTimestamp=1728470342090, lastCaughtUpTimestamp=1728470342090, hasAcknowledgedLeader=true), ReplicaState(nodeId=2, endOffset=Optional[LogOffsetMetadata(offset=3186751958, metadata=Optional[(segmentBaseOffset=3182036153,relativePositionInSegment=260993522)])], lastFetchTimestamp=-1, lastCaughtUpTimestamp=-1, hasAcknowledgedLeader=true), ReplicaState(nodeId=3, endOffset=Optional[LogOffsetMetadata(offset=3186751957, metadata=Optional[(segmentBaseOffset=3182036153,relativePositionInSegment=260993416)])], lastFetchTimestamp=1728470342089, lastCaughtUpTimestamp=-1, hasAcknowledgedLeader=true)] (org.apache.kafka.raft.LeaderState)
[2024-10-09 18:39:05,994] ERROR Encountered quorum controller fault: commitStreamSetObject: event failed with IllegalStateException (treated as UnknownServerException) at epoch 951 in 12120 microseconds. Renouncing leadership and reverting to the last committed offset 3186752114. (org.apache.kafka.server.fault.LoggingFaultHandler)
java.lang.IllegalStateException: Attempted to atomically commit 38457 records, but maxRecordsPerBatch is 25000
	at org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:1034)
	at org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:936)
	at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:131)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:214)
	at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:185)
	at java.base/java.lang.Thread.run(Thread.java:833)
[2024-10-09 18:39:05,994] INFO [RaftManager id=2] Received user request to resign from the current epoch 951 (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:05,994] INFO [RaftManager id=2] Failed to handle fetch from 3 at 3186752115 due to NOT_LEADER_OR_FOLLOWER (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:05,994] INFO [RaftManager id=2] Failed to handle fetch from 1 at 3186752115 due to NOT_LEADER_OR_FOLLOWER (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:05,994] INFO [RaftManager id=2] Failed to handle fetch from 1001 at 3186752115 due to NOT_LEADER_OR_FOLLOWER (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:05,994] INFO [RaftManager id=2] Failed to handle fetch from 1004 at 3186752115 due to NOT_LEADER_OR_FOLLOWER (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:05,994] INFO [RaftManager id=2] Failed to handle fetch from 1003 at 3186752115 due to NOT_LEADER_OR_FOLLOWER (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:05,994] INFO [RaftManager id=2] Failed to handle fetch from 1002 at 3186752115 due to NOT_LEADER_OR_FOLLOWER (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:05,994] INFO [RaftManager id=2] Completed transition to ResignedState(localId=2, epoch=951, voters=[1, 2, 3], electionTimeoutMs=1366, unackedVoters=[1, 3], preferredSuccessors=[1, 3]) from Leader(localId=2, epoch=951, epochStartOffset=3186751957, highWatermark=Optional[LogOffsetMetadata(offset=3186752115, metadata=Optional[(segmentBaseOffset=3182036153,relativePositionInSegment=261000559)])], voterStates={1=ReplicaState(nodeId=1, endOffset=Optional[LogOffsetMetadata(offset=3186752115, metadata=Optional[(segmentBaseOffset=3182036153,relativePositionInSegment=261000559)])], lastFetchTimestamp=1728470345961, lastCaughtUpTimestamp=1728470345961, hasAcknowledgedLeader=true), 2=ReplicaState(nodeId=2, endOffset=Optional[LogOffsetMetadata(offset=3186752115, metadata=Optional[(segmentBaseOffset=3182036153,relativePositionInSegment=261000559)])], lastFetchTimestamp=-1, lastCaughtUpTimestamp=-1, hasAcknowledgedLeader=true), 3=ReplicaState(nodeId=3, endOffset=Optional[LogOffsetMetadata(offset=3186752115, metadata=Optional[(segmentBaseOffset=3182036153,relativePositionInSegment=261000559)])], lastFetchTimestamp=1728470345961, lastCaughtUpTimestamp=1728470345961, hasAcknowledgedLeader=true)}) (org.apache.kafka.raft.QuorumState)
[2024-10-09 18:39:05,998] INFO [RaftManager id=2] Completed transition to Unattached(epoch=952, voters=[1, 2, 3], electionTimeoutMs=1238) from ResignedState(localId=2, epoch=951, voters=[1, 2, 3], electionTimeoutMs=1366, unackedVoters=[], preferredSuccessors=[1, 3]) (org.apache.kafka.raft.QuorumState)
[2024-10-09 18:39:05,998] INFO [RaftManager id=2] Completed transition to Voted(epoch=952, votedKey=ReplicaKey(id=1, directoryId=Optional.empty), voters=[1, 2, 3], electionTimeoutMs=1223, highWatermark=Optional.empty) from Unattached(epoch=952, voters=[1, 2, 3], electionTimeoutMs=1238) (org.apache.kafka.raft.QuorumState)
[2024-10-09 18:39:05,998] INFO [RaftManager id=2] Vote request VoteRequestData(clusterId='DMSoJVXo9Q', topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, candidateEpoch=952, candidateId=1, lastOffsetEpoch=951, lastOffset=3186752115)])]) with epoch 952 is granted (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:06,002] INFO [RaftManager id=2] Completed transition to FollowerState(fetchTimeoutMs=2000, epoch=952, leaderId=1, voters=[1, 2, 3], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from Voted(epoch=952, votedKey=ReplicaKey(id=1, directoryId=Optional.empty), voters=[1, 2, 3], electionTimeoutMs=1223, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState)
[2024-10-09 18:39:06,100] INFO [RaftManager id=2] High watermark set to Optional[LogOffsetMetadata(offset=3186752116, metadata=Optional.empty)] for the first time for epoch 952 (org.apache.kafka.raft.FollowerState)
[2024-10-09 18:39:07,177] INFO [RaftManager id=2] Become candidate due to fetch timeout (org.apache.kafka.raft.KafkaRaftClient)
[2024-10-09 18:39:07,179] INFO [RaftManager id=2] Completed transition to CandidateState(localId=2, localDirectoryId=ZopiWufpyFjYOJ-CzWKiMQ,epoch=953, retries=1, voteStates={1=UNRECORDED, 2=GRANTED, 3=UNRECORDED}, highWatermark=Optional[LogOffsetMetadata(offset=3186752217, metadata=Optional.empty)], electionTimeoutMs=1295) from FollowerState(fetchTimeoutMs=2000, epoch=952, leaderId=1, voters=[1, 2, 3], highWatermark=Optional[LogOffsetMetadata(offset=3186752217, metadata=Optional.empty)], fetchingSnapshot=Optional.empty) (org.apache.kafka.raft.QuorumState)

What should have happened instead?

How to reproduce the issue?

create 5w+ partition and delete them at 100 concurrency and only have 5 node in cluster.
stop one node and start it.

so one node have big partition number. once trigger upload their may have a lot of StreamObject and Stream to upload
I think when commitSSO this will happen. and this may cause the whole cluster not function.

Additional information

Please attach any relevant logs, backtraces, or metric charts.

Metadata

Metadata

Assignees

Labels

StalebugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions