KAFKA-20312: Handle null leader during OffsetFetcher regroup safely by nileshkumar3 · Pull Request #21760 · apache/kafka

nileshkumar3 · 2026-03-15T03:40:21Z

Description:

This PR fixes a potential NullPointerException in OffsetFetcherUtils.regroupPartitionMapByNode when regrouping partitions by leader during offset reset / list-offsets.

Background

Partitions are grouped by leader via metadata.fetch().leaderFor(tp). If metadata changes between the initial leader lookup and the regroup step (e.g. leadership change or stale metadata), leaderFor(tp) can return null. The previous implementation used Collectors.groupingBy(..., leaderFor(...)), which throws an NPE when the classifier returns null.

Fix

OffsetFetcherUtils.regroupPartitionMapByNode
Replaced the stream-based grouping with a loop that skips partitions whose leader is null, adds them to a caller-provided partitionsToRetry set, and does not trigger metadata refresh (callers are responsible for retry and metadata).

Callers

OffsetFetcher (classic consumer): passes partitionsToRetry into the helper; in resetPositionsAsync, when the set is non-empty, calls setNextAllowedRetry(partitionsToRetry, now + retryBackoffMs) and metadata.requestUpdate(false).
OffsetsRequestManager (new consumer): passes a local retry set into the helper, then adds skipped partitions to state.remainingToSearch (with timestamp) and calls metadata.requestUpdate(false) when the set is non-empty.
This keeps existing retry semantics and avoids the NPE.

Tests

OffsetFetcherTest.testResetPositionsMetadataRefreshWhenLeaderBecomesUnknownDuringRegroup
Simulates leaderFor(tp) returning null during regroup (first metadata.fetch() stubbed to a cluster with no partition, then real method). Asserts no exception, partition stays pending reset, and after backoff and a second attempt with valid metadata the offset reset succeeds.

OffsetsRequestManagerTest.testFetchOffsetsRegroupSkipsNullLeaderPartition_NoNPE
Simulates the same scenario in the fetch-offsets path: currentLeader has a leader but metadata.fetch() returns a cluster where one partition has no leader. Asserts no NPE, one request sent (for the partition with a leader), and that the skipped partition is retried after metadata update and completes successfully.

…own during regroup

KAFKA-20312: Avoid NPE in OffsetFetcherUtils when leader becomes unkn…

e1d73ea

…own during regroup

github-actions bot added triage PRs from the community consumer clients labels Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-20312: Handle null leader during OffsetFetcher regroup safely#21760

KAFKA-20312: Handle null leader during OffsetFetcher regroup safely#21760
nileshkumar3 wants to merge 1 commit intoapache:trunkfrom
nileshkumar3:KAFKA-20312-fix-offsetfetcher-null-leader-regroup

nileshkumar3 commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nileshkumar3 commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant