-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18891 hadoop distcp needs support to filter by file/directory attribute #6070
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
25203eb
to
0c5f8b4
Compare
0c5f8b4
to
a4c93e6
Compare
Due to my mistakes, I incorrectly commit my pull requests to branch, but not trunk. Compared to the previous commit, I have made the following improvements: Thanks for steveloughran for your reviews. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Description of PR
In some circumstances, we need to filter file/directory by file/directroy. For example, we need to filter out them by file modified time, isDir attrs, etc.
So, should we introduce a new method public boolean shouldCopy(CopyListingFileStatus fileStatus) ?
by this approach, we can introduce a more fluent way to do things than public abstract boolean shouldCopy(Path path).
To achieve the goal:
1、Create a method named shouldCopy(CopyListingFileStatus fileStatus) in CopyFilter abstract method, with a supportFileStatus() swtich method which return false by default.
2、For subclasses which impl the abstract class and want to use the new method, should overwrite shouldCopy(CopyListingFileStatus fileStatus) and for the same time, return supportFileStatus() to true.
3、This change is compatible with old use case.
As a impl:
1、I first create a abstract FileStatusCopyFilter extends CopyFilter
2、then create DirCopyFilter class extends FileStatusCopyFilter
3、and , implement UniformRecordInputFormat to support DirCopyFilter
How was this patch tested?
added unit tests
1、add distcp.filters.class=org.apache.hadoop.tools.DirCopyFilter to distcp-default.xml or set it by -Ddistcp.filters.class=org.apache.hadoop.tools.DirCopyFilter
2、then execute distcp commands
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?