Skip to content

HBASE-29272 When Spark reads an HBase snapshot, it always read empty … #6947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

terrytlu
Copy link
Contributor

Fix the issue that after Spark 3.2.0, when Spark reads an HBase snapshot, it always read empty, even if the hbase snapshot actually has data.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch 3 times, most recently from a96d09e to 5e35c8d Compare April 29, 2025 13:17
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from 5e35c8d to 79c6087 Compare May 7, 2025 08:13
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.


// constructor for mapreduce framework / Writable
public InputSplit() {
}

public InputSplit(TableDescriptor htd, RegionInfo regionInfo, List<String> locations, Scan scan,
Path restoreDir) {
this(htd, regionInfo, locations, scan, restoreDir, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem quite right in here, because SnapShotStats.getStoreFilesSize() would return 0 if the table has no any data.
What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing 😃 , it shouldn't always be 1 here, let me try to fix it..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to keep this constructor? The parent class is IA.Private, which means we are free to change anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try to remove it..

SnapshotStats(final Configuration conf, final FileSystem fs, final SnapshotManifest mainfest)
throws CorruptedSnapshotException {
this.snapshot = SnapshotDescriptionUtils.readSnapshotInfo(fs, mainfest.getSnapshotDir());
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* Utility class to calculate the size of each region in a snapshot.
*/
public class SnapshotRegionSizeCalculator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add IA annotation for this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from 0b34b27 to e45f5fc Compare May 27, 2025 06:41
…data.

HBASE-29272 When Spark reads an HBase snapshot, it always read empty data.
@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from e45f5fc to 5f5ee39 Compare May 27, 2025 06:50
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

…data.

HBASE-29272 When Spark reads an HBase snapshot, it always read empty data.
@terrytlu terrytlu force-pushed the master-HBASE-29272 branch from a4bf605 to 9c3d569 Compare May 28, 2025 08:22
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 3m 23s master passed
+1 💚 compile 3m 45s master passed
+1 💚 checkstyle 0m 49s master passed
+1 💚 spotbugs 2m 5s master passed
+1 💚 spotless 0m 51s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 3m 6s the patch passed
+1 💚 compile 3m 44s the patch passed
+1 💚 javac 3m 44s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 47s the patch passed
+1 💚 spotbugs 2m 19s the patch passed
+1 💚 hadoopcheck 12m 14s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 18s The patch does not generate ASF License warnings.
42m 42s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/9/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6947
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 1b6c803bc06b 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9c3d569
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server hbase-mapreduce U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/9/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 3m 14s master passed
+1 💚 compile 1m 17s master passed
+1 💚 javadoc 0m 41s master passed
+1 💚 shadedjars 6m 4s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 7s the patch passed
+1 💚 compile 1m 17s the patch passed
+1 💚 javac 1m 17s the patch passed
+1 💚 javadoc 0m 41s the patch passed
+1 💚 shadedjars 6m 3s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 210m 29s hbase-server in the patch passed.
+1 💚 unit 20m 27s hbase-mapreduce in the patch passed.
258m 21s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/9/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6947
Optional Tests javac javadoc unit compile shadedjars
uname Linux 2b2a5b6ed9c5 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9c3d569
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/9/testReport/
Max. process+thread count 4831 (vs. ulimit of 30000)
modules C: hbase-server hbase-mapreduce U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6947/9/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@terrytlu
Copy link
Contributor Author

terrytlu commented Jun 4, 2025

Hi @guluo2016 and @Apache9 , could you help review this pr again? 🙏 thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants