HADOOP-19801. Allow to skip recursive file deletion for non-empty dir…#8306
HADOOP-19801. Allow to skip recursive file deletion for non-empty dir…#8306deepujain wants to merge 7 commits intoapache:trunkfrom
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
Thanks for looking into this feature! Let's merge my PR into yours: deepujain#1 |
|
@EnricoMi Sure. Once you’re done, please let me know if anything is pending from my side. I’m not familiar with the process or timeline yet, as this is my first time contributing to the Hadoop projects. I currently have a few pending PRs awaiting merge. |
|
My JIRA id is deepujain |
|
Great! If you are happy with my proposed changes to your PR, merge deepujain#1. |
|
Merged your PR, thanks for the docs and rephrase. @EnricoMi |
| DELETE_NON_EMPTY_DIRECTORY_ENABLED); | ||
| conf.setBoolean(DELETE_NON_EMPTY_DIRECTORY_ENABLED, true); | ||
| disableFilesystemCaching(conf); | ||
| AbstractFSContract contract = createContract(conf); |
There was a problem hiding this comment.
you can just create a new filesystem
new S3AFileSystem() and call init on it...be sure to close the FS afterwards.
There was a problem hiding this comment.
Addressed: testCapabilityWhenEnabled() now creates a new S3AFileSystem(), calls init on it with the config, and closes the FS in a try-with-resources. Removed the contract/disableFilesystemCaching path. Ready for another look.
There was a problem hiding this comment.
thanks, that's great
|
🎊 +1 overall
This message was automatically generated. |
steveloughran
left a comment
There was a problem hiding this comment.
all good, one minor checkstyle change and I'm happy.
Presumably you've tested this against a store with the feature?
| LOG.debug("deleting empty directory {}", path); | ||
| deleteObjectAtPath(path, key, false); | ||
| } else if (deleteNonEmptyDirectoryEnabled) { | ||
| LOG.debug("deleting non-empty directory {} with single request (endpoint supports it)", path); |
There was a problem hiding this comment.
yetus is complaining about line length; can you split it at the comma?
|
Checkstyle fix is in (split the long LOG.debug line in DeleteOperation.java). Yes : tested against a VAST endpoint with the feature; enabled the option and confirmed recursive delete uses a single delete request. Thanks for the review. |
|
🎊 +1 overall
This message was automatically generated. |
|
JIRA id is deepujain |
| public class ITestS3ADeleteNonEmptyDirectoryCapability extends | ||
| AbstractContractDeleteTest { |
There was a problem hiding this comment.
Class ITestS3ADeleteNonEmptyDirectoryCapability extends AbstractContractDeleteTest, but ITestS3AContractDelete also extends AbstractContractDeleteTest, so those abstract tests are executed twice for the S3A filesystem.
Either add your capability tests to ITestS3AContractDelete, or have ITestS3ADeleteNonEmptyDirectoryCapability extends AbstractFSContractTestBase.
| */ | ||
| @Test | ||
| public void testCapabilityDisabledByDefault() throws Throwable { | ||
| Assertions.assertThat(getFileSystem().hasPathCapability(new Path("/"), |
There was a problem hiding this comment.
These tests should work with any value for fs.s3a.delete.non-empty-directory.enabled set by the user (while testing other integration tests).
| Assertions.assertThat(getFileSystem().hasPathCapability(new Path("/"), | |
| Configuration conf = new Configuration(getFileSystem().getConf()); | |
| removeBaseAndBucketOverrides(getTestBucketName(conf), conf, | |
| DELETE_NON_EMPTY_DIRECTORY_ENABLED); | |
| disableFilesystemCaching(conf); | |
| AbstractFSContract contract = createContract(conf); | |
| contract.init(); | |
| Assertions.assertThat(contract.getTestFileSystem().hasPathCapability(new Path("/"), |
This common code should be refactored out into a separate method.
|
@deepujain how did you test this? Did you run Did you
With 1., obviously, assertions in tests With 1. and 2., non-delete requests fail for me with This is against VAST 5.4.3.1. |
|
Adding this to shows request headers: |
|
Maybe we also have to adjust the instrumentation: EnricoMi#2 At least we should make this debug log message correct when |
…ectory. Add fs.s3a.delete.non-empty-directory.enabled. When true, recursive delete uses a single delete request for the directory key (for S3-compatible endpoints that support it, e.g. VAST).
…em() and close FS.
9f2d9dd to
27eacbb
Compare
|
Addressed the current review feedback and force-pushed the branch. The optimized non-empty-directory path now sends |
|
🎊 +1 overall
This message was automatically generated. |
| * Value: {@value}. | ||
| */ | ||
| public static final String DELETE_NON_EMPTY_DIRECTORY_HEADER = | ||
| "x-amz-delete-contents"; |
There was a problem hiding this comment.
I wouldn't put a proprietary (non AWS S3) header to the project.
Attempt #8417 is more generic and might be reusable for other purposes.
27eacbb to
1426ae3
Compare
|
Addressed the new review point and force-pushed. This revision removes the hardcoded proprietary delete header from S3A, adds generic request-type custom header support (fs.s3a.client.{s3,sts}.custom.headers.request.REQUEST), and updates the non-empty-directory delete docs to use request-scoped configuration when an endpoint needs extra headers. Local validation passed with JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home/bin:$PATH /opt/homebrew/bin/mvn -Dmaven.repo.local=/tmp/codex-m2 test -pl hadoop-tools/hadoop-aws -am -Dtest=TestAwsClientConfig,TestRequestFactory -DskipTests=false (20 tests, 0 failures). |
|
💔 -1 overall
This message was automatically generated. |
1426ae3 to
b530128
Compare
|
Addressed the latest Yetus javadoc failure with the missing public Javadocs on the new request-header API additions, then force-pushed the updated single commit. Local validation passed with JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home /opt/homebrew/bin/mvn -Dmaven.repo.local=/tmp/codex-m2 test -pl hadoop-tools/hadoop-aws -am -Dtest=TestAwsClientConfig,TestRequestFactory -DskipTests=false (20 tests, 0 failures). I also retried a local hadoop-aws javadoc build, but Maven still fails earlier in the reactor on cached hadoop-project-dist snapshot resolution before it reaches this module. |
|
💔 -1 overall
This message was automatically generated. |
b530128 to
619773b
Compare
|
Follow-up push for the latest javadoc artifact: the remaining new warning was on CUSTOM_REQUEST_HEADERS_S3_PREFIX because the added @value tag is not valid on that derived String constant. I removed that tag and force-pushed the updated single commit. The exact module-level javadoc command is still not reproducible locally here because standalone hadoop-aws resolution needs snapshot artifacts that are not available from apache.snapshots in this environment, but the targeted AWS tests from the earlier push still passed locally: TestAwsClientConfig and TestRequestFactory (20 tests, 0 failures). |
|
🎊 +1 overall
This message was automatically generated. |
Summary
Adds option
fs.s3a.delete.non-empty-directory.enabled. When true, recursive delete of a non-empty directory sends a single delete request for the directory key (prefix) instead of listing and bulk-deleting contained objects. Only enable for S3-compatible endpoints that support deleting a non-empty directory in one request (e.g. VAST).Change
DELETE_NON_EMPTY_DIRECTORY_ENABLED(fs.s3a.delete.non-empty-directory.enabled), defaultfalse.S3A_DYNAMIC_CAPABILITIESfor path capability probing.initialize(), pass toDeleteOperation, expose viahasPathCapability(DELETE_NON_EMPTY_DIRECTORY_ENABLED).deleteNonEmptyDirectoryEnabled. When true and directory is non-empty, calldeleteObjectAtPath(path, key, false)once instead ofdeleteDirectoryTree(path, key).Why no new tests
New tests would require an S3-compatible endpoint that supports prefix delete (e.g. VAST); we don't run such backends in CI. The change is a guarded code path (config off by default). Manual verification: enabled the option against a VAST endpoint and confirmed recursive delete uses a single delete request.
JIRA
Fixes HADOOP-19801