HBASE-29255: Integrate backup WAL cleanup logic with the delete command #7007

vinayakphegde · 2025-05-21T06:19:54Z

No description provided.

kgeisz

This looks good to me overall, aside from one nit comment.

One more thing, do you think a lot of these System.err/out.println() statements can be replaced with LOG.info/error()? I know we want to give some feedback to the user via the Terminal, but it seems like a lot of these messages should go to the log (like the messages in BackupCommands.updateBackupTableStartTimes(), BackupCommands. deleteOldWALFiles(), etc.)

kgeisz · 2025-06-02T21:53:46Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java

+      Configuration conf = getConf() != null ? getConf() : HBaseConfiguration.create();
+      String backupWalDir = conf.get(CONF_CONTINUOUS_BACKUP_WAL_DIR);
+
+      if (backupWalDir == null || backupWalDir.isEmpty()) {


nit - You can use Strings.isNullOrEmpty() from org.apache.hbase.thirdparty.com.google.common.base

Suggested change

if (backupWalDir == null || backupWalDir.isEmpty()) {

if (Strings.isNullOrEmpty(backupWalDir)) {

vinayakphegde · 2025-06-03T16:35:05Z

One more thing, do you think a lot of these System.err/out.println() statements can be replaced with LOG.info/error()? I know we want to give some feedback to the user via the Terminal, but it seems like a lot of these messages should go to the log (like the messages in BackupCommands.updateBackupTableStartTimes(), BackupCommands. deleteOldWALFiles(), etc.)

Good point. we have lot of println lines everywhere in backup and restore code. let me create a new Jira to address this issue.

abhradeepkundu · 2025-06-04T03:56:15Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java

+        return;
+      }
+
+      try (Connection conn = ConnectionFactory.createConnection(conf);


NIT: avoid using generic name like conn, use specific like masterConn

abhradeepkundu · 2025-06-04T04:29:15Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java

+        return;
+      }
+
+      try (Connection conn = ConnectionFactory.createConnection(conf);


This connection creation is unnecessary I feel. Super class already has a connection open. Please verify If you can reuse it.

True, we'll reuse that!

abhradeepkundu · 2025-06-04T04:47:38Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java

+          // If WAL files of that day are older than cutoff time, delete them
+          if (dayStart + ONE_DAY_IN_MILLISECONDS - 1 < cutoffTime) {
+            System.out.println("Deleting outdated WAL directory: " + dirPath);
+            fs.delete(dirPath, true);


If there is an api to delete in batches, we should use it. Also based on the nos of the file you are deleting this method can take lot of time. May be we can asynchronous here. Please give a thought

If there is an api to delete in batches, we should use it.

Yeah, I checked but couldn’t find any API that supports batch deletion.

Also based on the nos of the file you are deleting this method can take lot of time. May be we can asynchronous here. Please give a thought

About going async — it’s a good idea, but it might add some complexity. We’d need to track if the delete actually finished, retry on failure, and maybe notify the user when it’s done.

So we should probably think about whether the added complexity is worth the gain. Also, right now, all our backup and restore commands (like full backup, incremental, restore) are synchronous anyway, and those can take hours.

I think async is definitely a good direction — just that it probably makes sense to build a proper framework around it first, so we can handle retries, tracking, and notifications across the board. What do you think?

Lets build a job co-ordinator framework with zookeeper. We should build that outside the scope of this ticket off course.

Sure, let me create a jira for that.

Good point guys, but before going down this rabbit hole, please do some performance tests for justification. Try to delete 100, 10000 and 1 million files in a single directory and share how much time does it take synchronously. Delete/unlink operations should be relatively quick in any filesystem, but let's see how it works with S3.

abhradeepkundu

One discussion point. One change request.

abhradeepkundu · 2025-06-04T14:52:48Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java

+   */
+  public void updateContinuousBackupTableSet(Set<TableName> tablesToUpdate, long newStartTimestamp)
+    throws IOException {
+    try (Table table = connection.getTable(tableName)) {


NIT: Add a null check for tablesToUpdate

abhradeepkundu

One more minor comment, But overall LGTM

Apache-HBase · 2025-06-04T18:57:55Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 43s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ HBASE-28957 Compile Tests _
+1 💚	mvninstall	4m 2s		HBASE-28957 passed
+1 💚	compile	0m 33s		HBASE-28957 passed
-0 ⚠️	checkstyle	0m 11s	/buildtool-branch-checkstyle-hbase-backup.txt	The patch fails to run checkstyle in hbase-backup
+1 💚	spotbugs	0m 34s		HBASE-28957 passed
+1 💚	spotless	0m 46s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 1s		the patch passed
+1 💚	compile	0m 30s		the patch passed
-0 ⚠️	javac	0m 30s	/results-compile-javac-hbase-backup.txt	hbase-backup generated 4 new + 109 unchanged - 0 fixed = 113 total (was 109)
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 9s	/buildtool-patch-checkstyle-hbase-backup.txt	The patch fails to run checkstyle in hbase-backup
+1 💚	spotbugs	0m 36s		the patch passed
+1 💚	hadoopcheck	12m 5s		Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚	spotless	0m 45s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 10s		The patch does not generate ASF License warnings.
		31m 53s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#7007
JIRA Issue	HBASE-29255
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 3f85f005e67d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-28957 / `3655d48`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	84 (vs. ulimit of 30000)
modules	C: hbase-backup U: hbase-backup
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/5/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-06-04T19:40:21Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 48s		Docker mode activated.
-0 ⚠️	yetus	0m 4s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ HBASE-28957 Compile Tests _
+1 💚	mvninstall	4m 27s		HBASE-28957 passed
+1 💚	compile	0m 37s		HBASE-28957 passed
+1 💚	javadoc	0m 32s		HBASE-28957 passed
+1 💚	shadedjars	8m 18s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	4m 17s		the patch passed
+1 💚	compile	0m 34s		the patch passed
+1 💚	javac	0m 34s		the patch passed
+1 💚	javadoc	0m 25s		the patch passed
+1 💚	shadedjars	8m 34s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	25m 26s		hbase-backup in the patch passed.
		55m 1s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#7007
JIRA Issue	HBASE-29255
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux 2cf9fb5e70ef 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-28957 / `3655d48`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/5/testReport/
Max. process+thread count	3543 (vs. ulimit of 30000)
modules	C: hbase-backup U: hbase-backup
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7007/5/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

anmolnar

Thanks @vinayakphegde, patch looks good to me. However I have the same criticism that I mentioned previously: unit tests are missing.

Since all of your helper methods are private you cannot test them individually, so you need to set up an entire starship in your test case, call the command and verify the output. This is end 2 end testing. You will get a yes/no answer to your question about whether my function is working. If the answer is yes, we're fine, but if it's no, you'll have no idea about where the problem is and you have to debug.

Unit testing individual methods gives more detail about what's working and what's not.

anmolnar · 2025-06-04T21:11:26Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java

+          // If WAL files of that day are older than cutoff time, delete them
+          if (dayStart + ONE_DAY_IN_MILLISECONDS - 1 < cutoffTime) {
+            System.out.println("Deleting outdated WAL directory: " + dirPath);
+            fs.delete(dirPath, true);


Good point guys, but before going down this rabbit hole, please do some performance tests for justification. Try to delete 100, 10000 and 1 million files in a single directory and share how much time does it take synchronously. Delete/unlink operations should be relatively quick in any filesystem, but let's see how it works with S3.

kgeisz · 2025-06-04T22:20:45Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java

+    /**
+     * Updates the start time for continuous backups if older than cutoff timestamp.
+     * @param sysTable        Backup system table
+     * @param cutoffTimestamp Timestamp before which WALs are no longer needed
+     */
+    private void updateBackupTableStartTimes(BackupSystemTable sysTable, long cutoffTimestamp)


Hey @vinayakphegde, this is the function that led me to ask for clarification on why we need to update the start times of the continuous backups. Maybe you could add another line or two to the docstring here that elaborates on why we need to do this? That may make it more clear to others in the future.

This comment has been minimized.

Sign in to view

vinayakphegde added 2 commits June 2, 2025 10:34

Store bulkload files in daywise bucket as well

80e65ce

Integrate backup WAL cleanup logic with the delete command

bdc0de5

vinayakphegde force-pushed the HBASE-29255 branch from c033241 to bdc0de5 Compare June 2, 2025 05:20

This comment has been minimized.

Sign in to view

kgeisz approved these changes Jun 3, 2025

View reviewed changes

address the review comments

331ca51

This comment has been minimized.

Sign in to view

abhradeepkundu reviewed Jun 4, 2025

View reviewed changes

abhradeepkundu suggested changes Jun 4, 2025

View reviewed changes

address the review comments

86ba7e2

This comment has been minimized.

Sign in to view

abhradeepkundu reviewed Jun 4, 2025

View reviewed changes

abhradeepkundu approved these changes Jun 4, 2025

View reviewed changes

address the review comments

3655d48

anmolnar approved these changes Jun 4, 2025

View reviewed changes

kgeisz reviewed Jun 4, 2025

View reviewed changes

	if (backupWalDir == null \|\| backupWalDir.isEmpty()) {
	if (Strings.isNullOrEmpty(backupWalDir)) {

HBASE-29255: Integrate backup WAL cleanup logic with the delete command #7007

Are you sure you want to change the base?

HBASE-29255: Integrate backup WAL cleanup logic with the delete command #7007

Conversation

vinayakphegde commented May 21, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

kgeisz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

vinayakphegde commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhradeepkundu Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhradeepkundu left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

abhradeepkundu left a comment

Choose a reason for hiding this comment

Uh oh!

Apache-HBase commented Jun 4, 2025

Uh oh!

Apache-HBase commented Jun 4, 2025

Uh oh!

anmolnar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kgeisz left a comment •

edited

Loading

vinayakphegde commented Jun 3, 2025 •

edited

Loading

abhradeepkundu Jun 4, 2025 •

edited

Loading