Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19474: [ABFS][FnsOverBlob] Listing Optimizations to avoid multiple iteration over list response. #7421

Open
wants to merge 11 commits into
base: trunk
Choose a base branch
from

Conversation

anujmodi2021
Copy link
Contributor

@anujmodi2021 anujmodi2021 commented Feb 21, 2025

Description of PR

On blob endpoint, there are a couple of handling that is needed to be done on client side.
This involves:

  1. Parsing of xml response and converting them to VersionedFileStatus list
  2. Removing duplicate entries for non-empty explicit directories coming due to presence of the marker files
  3. Trigerring Rename recovery on the previously failed rename indicated by the presence of pending json file.

Currently all three are done in a separate iteration over whole list. This is to pbring all those things to a common place so that single iteration over list reposne can handle all three.

How was this patch tested?

For code changes:

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 20s #7421 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7421
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/1/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus

This comment was marked as outdated.

@anujmodi2021 anujmodi2021 changed the title Hadoop 19234 followup HADOOP-19474: [ABFS][FnsOverBlob] Listing Optimizations to avoid multiple iteration over list response. Mar 3, 2025
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 11 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 26m 34s trunk passed
+1 💚 compile 0m 23s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 22s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 20s trunk passed
+1 💚 mvnsite 0m 23s trunk passed
+1 💚 javadoc 0m 27s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 19s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 0m 42s trunk passed
+1 💚 shadedclient 24m 1s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 24m 12s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 18s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 18s the patch passed
+1 💚 compile 0m 16s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 16s the patch passed
+1 💚 blanks 0m 1s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 11s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 17 new + 12 unchanged - 0 fixed = 29 total (was 12)
+1 💚 mvnsite 0m 18s the patch passed
-1 ❌ javadoc 0m 14s /results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt hadoop-tools_hadoop-azure-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 generated 2 new + 10 unchanged - 0 fixed = 12 total (was 10)
-1 ❌ javadoc 0m 15s /results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 generated 2 new + 10 unchanged - 0 fixed = 12 total (was 10)
+1 💚 spotbugs 0m 40s the patch passed
+1 💚 shadedclient 24m 33s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 23s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 23s The patch does not generate ASF License warnings.
84m 22s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/4/artifact/out/Dockerfile
GITHUB PR #7421
JIRA Issue HADOOP-19474
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 5e009114837d 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ec1419b
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/4/testReport/
Max. process+thread count 559 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 10m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 11 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 45s trunk passed
+1 💚 compile 0m 40s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 57s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 31s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 javadoc 0m 39s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 7s trunk passed
+1 💚 shadedclient 35m 38s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 35m 58s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 20s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 18 new + 12 unchanged - 0 fixed = 30 total (was 12)
+1 💚 mvnsite 0m 33s the patch passed
-1 ❌ javadoc 0m 25s /results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt hadoop-tools_hadoop-azure-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 generated 2 new + 10 unchanged - 0 fixed = 12 total (was 10)
-1 ❌ javadoc 0m 27s /results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06.txt hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 generated 2 new + 10 unchanged - 0 fixed = 12 total (was 10)
+1 💚 spotbugs 1m 8s the patch passed
+1 💚 shadedclient 37m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 50s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
135m 4s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/3/artifact/out/Dockerfile
GITHUB PR #7421
JIRA Issue HADOOP-19474
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux b3d24f968b2c 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b37c3d2
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/3/testReport/
Max. process+thread count 761 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021 anujmodi2021 marked this pull request as ready for review March 4, 2025 08:47
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 11 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 59s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 38s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 41s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 10s trunk passed
+1 💚 shadedclient 39m 49s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 40m 11s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 32s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 mvnsite 0m 33s the patch passed
+1 💚 javadoc 0m 31s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 1m 14s the patch passed
+1 💚 shadedclient 39m 57s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 1s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
137m 10s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/5/artifact/out/Dockerfile
GITHUB PR #7421
JIRA Issue HADOOP-19474
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 8621c9c4dac7 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3edab0f
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/5/testReport/
Max. process+thread count 524 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7421/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

* limitations under the License.
*/

package org.apache.hadoop.fs.azurebfs.contracts.services;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class expected to be in the contracts folder ?

throw new AbfsDriverException(e);
}
} catch (IOException e) {
LOG.error("Unable to deserialize list results", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have the uri now, should we include that in the error log as well ?

return listResponseData;
}

private boolean isRenamePendingJsonPathEntry(BlobListResultEntrySchema entry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing javadocs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to this :- private boolean isRenamePendingJsonPathEntry(BlobListResultEntrySchema entry) {
String path = entry.path() != null ? entry.path().toUri().getPath() : null;
return path != null && !entry.path().isRoot() && isAtomicRenameKey(path) && path.endsWith(RenameAtomicity.SUFFIX);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants