Skip to content

HBASE-29356 Incorrect split behavior when region information is missing #7035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

qqvpp
Copy link

@qqvpp qqvpp commented May 26, 2025

Description
We have identified a bug in the SimpleRegionNormalizer logic that leads to incorrect region splits when region size information is missing. If the size cannot be determined for one or more regions (e.g. due to unavailable metrics from RegionServers), the average region size calculation becomes incorrect. This results in a scenario where all regions may be considered too large and get split unintentionally.

Observed Behavior:

When region size data is not available (e.g., getRegionSizeMB() returns -1), the computed average size does not account for that, and regions with valid size may appear excessively large compared to the average — resulting in multiple unnecessary splits.

Expected Behavior:

If region size is unknown for some regions, those regions should be skipped during normalization. The average region size should be computed only from the regions for which the size is known. No region should be split or merged unless its size is known.

Patch:

Skips regions with unknown size from average size computation.
Prevents split and merge operations on regions with unknown size.
Adds unit tests for scenarios with partial or total absence of size data.
Patch author: Milan Vymazal [email protected]

Tests:

testSplitOfLargeRegionIfOneIsNotKnow verifies correct behavior when one region has unknown size.
testSplitOfAllUnknownSize ensures that no split happens if size data is missing for all regions.
Reproduction:

Unfortunately, we are unable to reliably reproduce this bug in a live environment, since we cannot easily simulate the condition where RegionServer metrics are missing. However, we have confirmed the behavior through code analysis and the added unit tests.

@Apache9 Apache9 requested review from Copilot and ndimiduk May 26, 2025 14:03
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the incorrect region split behavior when region size information is missing by skipping regions with unknown size during average size computation and preventing operations on them.

  • Updated test mocks and added unit tests to verify that regions with unknown size are skipped.
  • Modified the region size averaging logic in SimpleRegionNormalizer to account only for regions with known sizes.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
hbase-server/src/test/java/org/apache/hadoop/hbase/master/normalizer/TestSimpleRegionNormalizer.java Updated test mocks and added new test methods to handle regions with unknown sizes
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java Revised average region size calculation and added checks to skip regions with an unknown size
Comments suppressed due to low confidence (1)

hbase-server/src/test/java/org/apache/hadoop/hbase/master/normalizer/TestSimpleRegionNormalizer.java:778

  • [nitpick] The test method name 'testSplitOfLargeRegionIfOneIsNotKnow' is unclear; it likely should be 'testSplitOfLargeRegionIfOneIsNotKnown' for better readability.
@Test public void testSplitOfLargeRegionIfOneIsNotKnow() {

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@qqvpp qqvpp force-pushed the HBASE-29356-master branch from 467bd83 to 09e52d7 Compare May 27, 2025 04:28
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache9
Copy link
Contributor

Apache9 commented Jun 1, 2025

Ping @ndimiduk.

Please take a look at this one?

Seems reasonable.

@ndimiduk
Copy link
Member

ndimiduk commented Jun 2, 2025

Heya @qqvpp thanks for the contribution. Can you please create for yourself a Jira account? We use Jira for project tracking.

https://selfserve.apache.org/

if (targetRegionCount > 0) {
avgRegionSize = totalSizeMb / (double) targetRegionCount;
avgRegionSize =
totalSizeMb / (double) targetRegionCount - (regionCount - regionCountKnownSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to adjust the denominator here, I think that you also need to protect against a value <= 0. In that case, you can throw, like we do on entry into the method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a condition to verify the denominator and a test.

} else {
avgRegionSize = totalSizeMb / (double) regionCount;
avgRegionSize = totalSizeMb / (double) regionCountKnownSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you also need to protect against a 0 value here, in the same way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the condition, including the test modification.

@qqvpp qqvpp force-pushed the HBASE-29356-master branch from 09e52d7 to b1dd462 Compare June 3, 2025 11:32
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 5m 6s master passed
+1 💚 compile 5m 1s master passed
+1 💚 checkstyle 0m 48s master passed
+1 💚 spotbugs 2m 20s master passed
+1 💚 spotless 1m 6s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 31s the patch passed
+1 💚 compile 4m 21s the patch passed
+1 💚 javac 4m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 0s the patch passed
+1 💚 spotbugs 2m 30s the patch passed
+1 💚 hadoopcheck 15m 51s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 1m 7s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
55m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7035/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7035
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 8d206b3bdead 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / b1dd462
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7035/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 17s master passed
+1 💚 compile 0m 59s master passed
+1 💚 javadoc 0m 28s master passed
+1 💚 shadedjars 6m 8s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 12s the patch passed
+1 💚 compile 0m 58s the patch passed
+1 💚 javac 0m 58s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 6m 2s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 211m 11s hbase-server in the patch passed.
238m 1s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7035/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7035
Optional Tests javac javadoc unit compile shadedjars
uname Linux b462523c9ea5 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / b1dd462
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7035/3/testReport/
Max. process+thread count 5374 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7035/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants