Skip to content

Conversation

@yzeng1618
Copy link
Contributor

Purpose of this pull request

image Fix a potential hang after Flink restart/restore when using FakeSource: once split discovery is finished, some readers may receive no splits (e.g. all splits were already assigned before restore). Previously, FakeSourceSplitEnumerator only called signalNoMoreSplits when it actually assigned splits, which could leave readers waiting forever.

Does this PR introduce any user-facing change?

Yes (bug fix). Previously jobs could hang after restart/restore; now readers can finish correctly.

How was this patch tested?

Added unit tests: FakeSourceSplitEnumeratorTest.java

Check list

@dybyte
Copy link
Contributor

dybyte commented Jan 5, 2026

This seems duplicate with #10208 .

@yzeng1618
Copy link
Contributor Author

yzeng1618 commented Jan 5, 2026

This seems duplicate with #10208 .

Thanks for the heads-up! I checked #10208.

#10208 adds a framework-level re-signal: if a reader re-registers after failover and NoMoreSplitsEvent was already signaled before, the engine/flink translation will re-signal it to that reader.

This PR (#10275) fixes a different issue in FakeSourceSplitEnumerator: it previously called signalNoMoreSplits() only when it actually assigned splits. After restore (or late registration), a reader may receive zero splits, so it never gets the end-of-input signal and can wait forever (see the added unit test signalNoMoreSplitsAfterRestoreWhenNoPendingSplits).

So they’re complementary: #10275 ensures FakeSource always produces signalNoMoreSplits for all registered readers after discovery (even when no split is assigned), while #10208 ensures previously signaled NoMoreSplits can be re-delivered after failover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants