Skip to content

Conversation

@yzeng1618
Copy link
Contributor

Fixes #10286

Purpose of this pull request

Fix a correctness issue in connector-hbase source: when multiple splits (regions) are assigned, HbaseSourceReader could effectively scan only the first split because an exhausted ResultScanner was reused for subsequent splits.
This PR creates a new ResultScanner per split, closes it after scanning, and adds a unit test to prevent regression.
A @VisibleForTesting constructor is added only to allow injecting a mocked HbaseClient in tests; the production behavior is unchanged.

Does this PR introduce any user-facing change?

Yes. For HBase source, the read count can now correctly include all assigned splits/regions. Previously, the job could finish successfully but read only a fraction of rows.

How was this patch tested?

Added unit test: HbaseSourceReaderTest

Check list

@yzeng1618
Copy link
Contributor Author

After-repair Test Results
企业微信截图_17676849392798

Copy link
Contributor

@chl-wxp chl-wxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@corgy-w corgy-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@corgy-w corgy-w merged commit d393d2a into apache:dev Jan 7, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][Connector-V2][Hbase] HBase source only scans the first split/region, read count << HBase count

4 participants