Skip to content

Conversation

@squah-confluent
Copy link
Contributor

...so that we capture stack traces for these errors.

...so that we capture stack traces for these errors.
@github-actions github-actions bot added triage PRs from the community group-coordinator small Small PRs labels Nov 2, 2025
@squah-confluent
Copy link
Contributor Author

squah-confluent commented Nov 2, 2025

I've been trying to investigate some errors in group coordinator loading/unloading and the lack of stack traces is getting in the way. eg.

[GroupCoordinator id=2] Failed to unload metadata for __consumer_offsets-4 with epoch OptionalInt[5] due to java.lang.IllegalStateException.
...
[GroupCoordinator id=2] Failed to load metadata from __consumer_offsets-4 with epoch 10 due to java.lang.RuntimeException: Replaying record CoordinatorRecord(key=ConsumerGroupCurrentMemberAssignmentKey(groupId='...', memberId='ZxHk7W53S_aHFdpxYc-_Jw'), value=ApiMessageAndVersion(ConsumerGroupCurrentMemberAssignmentValue(memberEpoch=854659, previousMemberEpoch=854633, state=0, assignedPartitions=[TopicPartitions(topicId=9lL1aTMuSC22QAXsHgzhew, partitions=[1, 2]), TopicPartitions(topicId=RHKM682KQYyOfF1XsOSF1A, partitions=[0]), TopicPartitions(topicId=rKx9q1JmS1uP-ug_cj56ug, partitions=[0]), TopicPartitions(topicId=I7EtFwesTRubnj-VHClqbQ, partitions=[2]), TopicPartitions(topicId=ydAln6IUTZe-od9UUkn3rg, partitions=[2])], partitionsPendingRevocation=[]) at version 0)) from __consumer_offsets-4 at offset 3889549 with producer id -1 and producer epoch -1 failed..

EDIT: The first error is due to https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-19857.

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squah-confluent thanks for this patch

} catch (Throwable ex) {
log.error("Failed to load metadata from {} with epoch {} due to {}.",
tp, epoch, ex.toString());
tp, epoch, ex.toString(), ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the whole exception ex is logged, changing ex.toString() to ex.getMessage() would eliminate a lot of redundant information in the log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing! I've updated the PR to use getMessage instead.

// already make an effort to catch exceptions in the unload method.
log.error("Failed to unload metadata for {} with epoch {} due to {}.",
tp, partitionEpoch, ex.toString());
tp, partitionEpoch, ex.toString(), ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@github-actions github-actions bot removed the triage PRs from the community label Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants