Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18891 hadoop distcp needs support to filter by file/directory attribute #6070

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

AuthurWang2009
Copy link

@AuthurWang2009 AuthurWang2009 commented Sep 14, 2023

Description of PR

In some circumstances, we need to filter file/directory by file/directroy. For example, we need to filter out them by file modified time, isDir attrs, etc.

So, should we introduce a new method public boolean shouldCopy(CopyListingFileStatus fileStatus) ?
by this approach, we can introduce a more fluent way to do things than public abstract boolean shouldCopy(Path path).

To achieve the goal:
1、Create a method named shouldCopy(CopyListingFileStatus fileStatus) in CopyFilter abstract method, with a supportFileStatus() swtich method which return false by default.
2、For subclasses which impl the abstract class and want to use the new method, should overwrite shouldCopy(CopyListingFileStatus fileStatus) and for the same time, return supportFileStatus() to true.
3、This change is compatible with old use case.

As a impl:
1、I first create a abstract FileStatusCopyFilter extends CopyFilter
2、then create DirCopyFilter class extends FileStatusCopyFilter
3、and , implement UniformRecordInputFormat to support DirCopyFilter

How was this patch tested?

added unit tests

1、add distcp.filters.class=org.apache.hadoop.tools.DirCopyFilter to distcp-default.xml or set it by -Ddistcp.filters.class=org.apache.hadoop.tools.DirCopyFilter
2、then execute distcp commands

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 26s trunk passed
+1 💚 compile 0m 31s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 28s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 32s trunk passed
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 27s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 49s trunk passed
+1 💚 shadedclient 37m 9s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 22s the patch passed
+1 💚 compile 0m 22s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 22s the patch passed
+1 💚 compile 0m 19s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 19s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 17s /results-checkstyle-hadoop-tools_hadoop-distcp.txt hadoop-tools/hadoop-distcp: The patch generated 22 new + 17 unchanged - 0 fixed = 39 total (was 17)
+1 💚 mvnsite 0m 22s the patch passed
-1 ❌ javadoc 0m 21s /results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 generated 5 new + 37 unchanged - 0 fixed = 42 total (was 37)
-1 ❌ javadoc 0m 20s /results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_382-8u382-ga-120.04.1-b05 with JDK Private Build-1.8.0_382-8u382-ga-120.04.1-b05 generated 5 new + 37 unchanged - 0 fixed = 42 total (was 37)
+1 💚 spotbugs 0m 47s the patch passed
+1 💚 shadedclient 37m 5s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 52s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
150m 6s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/1/artifact/out/Dockerfile
GITHUB PR #6070
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 409c528a0781 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 25203eb
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/1/testReport/
Max. process+thread count 535 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@AuthurWang2009
Copy link
Author

Due to my mistakes, I incorrectly commit my pull requests to branch, but not trunk.

Compared to the previous commit, I have made the following improvements:
1、import ordering
2、javadocs, and indentation policy
3、some comment to explain how the code runs

Thanks for steveloughran for your reviews.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 4s trunk passed
+1 💚 compile 0m 27s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 25s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 25s trunk passed
+1 💚 mvnsite 0m 29s trunk passed
+1 💚 javadoc 0m 30s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 25s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 43s trunk passed
+1 💚 shadedclient 20m 44s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 19s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 18s the patch passed
+1 💚 compile 0m 17s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 17s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 14s /results-checkstyle-hadoop-tools_hadoop-distcp.txt hadoop-tools/hadoop-distcp: The patch generated 10 new + 21 unchanged - 4 fixed = 31 total (was 25)
+1 💚 mvnsite 0m 18s the patch passed
+1 💚 javadoc 0m 18s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 37s the patch passed
+1 💚 shadedclient 20m 41s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 13m 50s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 29s The patch does not generate ASF License warnings.
98m 30s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/3/artifact/out/Dockerfile
GITHUB PR #6070
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 9b0de60a82ce 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a4c93e6
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/3/testReport/
Max. process+thread count 629 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 48m 31s trunk passed
+1 💚 compile 0m 31s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 27s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 checkstyle 0m 28s trunk passed
+1 💚 mvnsite 0m 33s trunk passed
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 27s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 spotbugs 0m 52s trunk passed
+1 💚 shadedclient 37m 18s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 23s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 21s the patch passed
+1 💚 compile 0m 20s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚 javac 0m 20s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 16s /results-checkstyle-hadoop-tools_hadoop-distcp.txt hadoop-tools/hadoop-distcp: The patch generated 10 new + 21 unchanged - 4 fixed = 31 total (was 25)
+1 💚 mvnsite 0m 23s the patch passed
-1 ❌ javadoc 0m 19s /results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 generated 4 new + 37 unchanged - 0 fixed = 41 total (was 37)
-1 ❌ javadoc 0m 19s /results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_382-8u382-ga-120.04.1-b05 with JDK Private Build-1.8.0_382-8u382-ga-120.04.1-b05 generated 4 new + 37 unchanged - 0 fixed = 41 total (was 37)
+1 💚 spotbugs 0m 47s the patch passed
+1 💚 shadedclient 37m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 16m 38s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
151m 24s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/2/artifact/out/Dockerfile
GITHUB PR #6070
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 7ce6a281c645 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0c5f8b4
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/2/testReport/
Max. process+thread count 530 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6070/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants