HADOOP-14837 : Support Read Restored Glacier Objects #6407

bpahuja · 2024-01-04T10:03:10Z

Description of PR

Currently S3A does not distinguish Glacier and Glacier Deep Archive files, as it doesn't examine the storage class or verify if an object is in the process of restoration from Glacier. Attempting to access an in-progress Glacier file via S3A results in an AmazonS3Exception, indicating the operation's invalidity for the object's storage class.

As part of this change, Users will be able to successfully read restored glacier objects from the s3 location of the table using S3A. It will ignore any Glacier archived files if they are in process of being restored asynchronously. There will be no change to the existing behavior and additional configuration will be needed to enable the above mentioned flow.

The config which would control the behavior of the S3AFileSystem with respect to glacier storage classes will be fs.s3a.glacier.read-restored-objects

The config can have 3 values:

READ_ALL : This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the users and not changing the experience for them.
SKIP_ALL_GLACIER: If this value is set then this will ignore any S3 Objects which are tagged with Glacier storage classes and retrieve the others.
READ_RESTORED_GLACIER_OBJECTS: If this value is set then restored status of the Glacier object will be checked, if restored the objects would be read like normal S3 objects else they will be ignored as the objects would not have been retrieved from the S3 Glacier. ( To check the restored status, newly introduced RestoredStatus will be used which is present in the S3 Object). This wasn't previously possible as ListObjects did not return any information about the restore status of an object, only it's storage class.

A new FileStatusAcceptor is created which will use the RestoreStatus attribute from the S3Object and will filter out or include the glacier objects from the list as defined by the config. FileStatusAcceptor is an interface with 3 overloaded predicates, which filter the files based on the conditions defined in the said predicates. A new attribute RestoreStatus of will be used from the response of ListObjects . This field will indicate whether the object is unrestored, restoring, or restored, and also when the expiration of that restore is.

How was this patch tested?

Integration Tests (hadoop-aws)

All the Integration tests are passing, the tests were run in accordance with https://hadoop.apache.org/docs/current2/hadoop-aws/tools/hadoop-aws/testing.html. The tests were executed in the region us-east-1.

There were 2 failures observed which seem intermittent and unrelated to the change introduced in this CR. As the default behavior of S3AFileSystem was not changed.

Failures observed :

ITestS3ACommitterFactory.testEverything
ITestS3AConfiguration.testRequestTimeout

Manual Testing

Manual testing of the change was done with Spark v3.5

A Parquet table was created using the following in Spark-SQL.

CREATE DATABASE IF NOT EXISTS glacier_test  location "s3a://<bucket>/data/glacier_test";

USE glacier_test;

CREATE TABLE IF NOT EXISTS parquet_glacier_test (id int, data string) using parquet location "s3a://<bucket>/data/glacier_test/parquet_glacier_test";

INSERT INTO parquet_glacier_test  VALUES (1, 'a'), (2, 'b'), (3, 'c');

INSERT INTO parquet_glacier_test  VALUES (4, 'a'), (5, 'b'), (6, 'c');

INSERT INTO parquet_glacier_test  VALUES (7, 'a'), (8, 'b'), (9, 'c');

Was able to successfully retrieve the data using the following.

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
|  1|   a|
|  2|   b|
|  3|   c|
+---+----+

The storage class of the file s3://<bucket>/data/glacier_test/parquet_glacier_test/part-00000-f9cb400e-35b2-41f7-9c39-8e34cd830fed-c000.snappy.parquet was changed to Glacier Flexible Retrieval (formerly Glacier) from Standard.

When trying to retrieve the data again form the same table, the following exception was observed.

software.amazon.awssdk.services.s3.model.InvalidObjectStateException: The operation is not valid for the object's storage class (Service: S3, Status Code: 403, Request ID: X05JDR633AAK4TBQ, Extended Request ID: uOxWdN4giUAuB9a4YWvnyrXPYCi2U35P5BrHhFO3aLSLLe4GtWhXGXCEJ/Ld5EyGr5b6VezTzeI=):InvalidObjectState
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:243)
	at org.apache.hadoop.fs.s3a.Invoker.onceTrackingDuration(Invoker.java:149)
	at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:278)
	at org.apache.hadoop.fs.s3a.S3AInputStream.lambda$lazySeek$1(S3AInputStream.java:425)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$3(Invoker.java:284)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$5(Invoker.java:408)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
	at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:404)
	at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:282)
	at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:326)
	at org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:417)
	at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:536)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)

I restarted the spark-sql session by setting the following config.

spark-sql --conf spark.hadoop.fs.s3a.glacier.read-restored-objects=SKIP_ALL_GLACIER

Trying to access the table now , resulted in the following.

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
+---+----+

The spark-sql session was restarted with the following config

spark-sql --conf spark.hadoop.fs.s3a.glacier.read-restored-objects=READ_RESTORED_GLACIER_OBJECTS

Trying to access the table now , resulted in the same result as the previous step as unrestored glacier file was ignored when the table was read.

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
+---+----+

The restore for the file s3://<bucket>/data/glacier_test/parquet_glacier_test/part-00000-f9cb400e-35b2-41f7-9c39-8e34cd830fed-c000.snappy.parquet was initiated using the S3 Console.

Trying to access the table now , resulted in the same result as the previous step as the glacier file was still being restored and was not available.

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
+---+----+

On retrying after 5-7 minutes ( as it was expedited retrieval ) the following was the result, which is as expected :

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
|  1|   a|
|  2|   b|
|  3|   c|
+---+----+

For code changes:

[Yes] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
[Yes] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
[NA] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
[NA] If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

ahmarsuhail

Thanks for contributing, looks good mostly, sharing some initial feedback.

Let's get rid of the acceptor, update documentation and should add an ITest.
Test that for a glacier file, if it's READ_ALL, an error is thrown (I think that is existing behaviour?), if it's SKIP_ALL_GLACIER then it's skipped. and test restore behaviour too if possible.

ahmarsuhail · 2024-01-17T16:05:48Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectStorageClassFilter.java

+/**
+ * S3ObjectStorageClassFilter will filter the S3 files based on the fs.s3a.glacier.read.restored.objects configuration set in S3AFileSystem
+ * The config can have 3 values:
+ * READ_ALL: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the customers and not changing the experience for them.


Instead of saying current behaviour, can you specify what that current behaviour is. It errors right?

Sure will update

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectStorageClassFilter.java

hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java

steveloughran

most of my comments are on the basic stuff, especially those test assertions and the need to have a single factored out assertion() method.

Now, can we have the storage class an attribute in S3AFileStatus? populated in listings and from HEAD requests? included in .toString()? that could be useful in future

steveloughran · 2024-01-18T14:20:20Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java

@@ -18,6 +18,7 @@

 package org.apache.hadoop.fs.s3a;

+import org.apache.hadoop.util.Lists;


should go with the rest of the hadoop imports. it's in the wrong place in a lot of files as the move off guava lists was a search and replace without reordering

steveloughran · 2024-01-18T14:21:52Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java

+   * Accept all entries except Glacier objects if the config fs.s3a.glacier.read.restored.objects,
+   * is set to SKIP_ALL_GLACIER
+   */
+  public static class GlacierStatusAcceptor implements FileStatusAcceptor {


does this storage class come in on a v2 LIST request? guess it must...

steveloughran · 2024-01-18T14:22:25Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -52,6 +52,8 @@
 import java.util.concurrent.atomic.AtomicBoolean;
 import javax.annotation.Nullable;

+import org.apache.hadoop.fs.s3a.Listing.AcceptAllButSelfAndS3nDirs;


nit, import ordering

steveloughran · 2024-01-18T14:23:08Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -5213,7 +5223,7 @@ private RemoteIterator<S3ALocatedFileStatus> innerListFiles(
      RemoteIterator<S3ALocatedFileStatus> listFilesAssumingDir =
              listing.getListFilesAssumingDir(path,
                      recursive,
-                      acceptor,
+                  acceptors,


nit: restore consistent indentation

steveloughran · 2024-01-18T14:24:15Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectStorageClassFilter.java

+
+
+/**
+ * S3ObjectStorageClassFilter will filter the S3 files based on the fs.s3a.glacier.read.restored.objects configuration set in S3AFileSystem


be good to use
on lines and {@code } around formatted code so that javadocs and IDEs render better

steveloughran · 2024-01-18T14:27:54Z

...adoop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/TestS3AReadRestoredGlacierObjects.java

+            RestoreStatus.builder().isRestoreInProgress(false).build(),
+            ObjectStorageClass.GLACIER));
+
+    Assert.assertFalse(result);


i require a meaningful error message on all failures. ideally using assertj whose describedAs() class can do string formatting.

o = getS3ObjectWithStorageClassAndRestoreStatus(...) Assertions.assertThat(acceptor.accept(...)) .describedAs("accept %s", o) .isFalse()

just imagine you've seen a test failure: what information would you want in the assertion message to begin debugging it?

steveloughran · 2024-01-18T14:28:38Z

...adoop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/TestS3AReadRestoredGlacierObjects.java

+            RestoreStatus.builder().isRestoreInProgress(false).build(),
+            ObjectStorageClass.DEEP_ARCHIVE));
+
+    Assert.assertFalse(result);


same. you could actually have an assertAcceptance(object, outcome) method instead of all this duplication

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java

steveloughran · 2024-01-18T14:33:37Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -576,6 +580,10 @@ public void initialize(URI name, Configuration originalConf)

      s3aInternals = createS3AInternals();

+      s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))


use getTrimmed(key, ""), toUpper() and then do the matching. fail meaningfully if the value isn't recognised.

bpahuja · 2024-01-19T05:54:04Z

most of my comments are on the basic stuff, especially those test assertions and the need to have a single factored out assertion() method.

Am converting the current test into an ITest, will update assertions according to the recommendations

Now, can we have the storage class an attribute in S3AFileStatus? populated in listings and from HEAD requests? included in .toString()? that could be useful in future

This is something that can be done, do we want this as part of this PR ? Or a separate one adding the storage class to S3AFileStatus ?

hadoop-yetus · 2024-01-22T21:41:09Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 24s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	13m 44s		Maven dependency ordering for branch
+1 💚	mvninstall	22m 14s		trunk passed
+1 💚	compile	9m 50s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	8m 59s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	2m 29s		trunk passed
+1 💚	mvnsite	1m 20s		trunk passed
+1 💚	javadoc	0m 59s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 46s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	2m 0s		trunk passed
+1 💚	shadedclient	21m 15s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 21s		Maven dependency ordering for patch
-1 ❌	mvninstall	0m 15s	/patch-mvninstall-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
-1 ❌	compile	8m 45s	/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt	root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌	javac	8m 45s	/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt	root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌	compile	8m 9s	/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt	root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-1 ❌	javac	8m 9s	/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt	root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	2m 9s	/results-checkstyle-root.txt	root: The patch generated 22 new + 6 unchanged - 0 fixed = 28 total (was 6)
-1 ❌	mvnsite	0m 24s	/patch-mvnsite-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	javadoc	0m 50s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 47s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 ❌	spotbugs	0m 25s	/patch-spotbugs-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	shadedclient	21m 14s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	16m 27s		hadoop-common in the patch passed.
-1 ❌	unit	0m 28s	/patch-unit-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		150m 53s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/3/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname	Linux e8c67875814a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `4a9f038`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/3/testReport/
Max. process+thread count	3149 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2024-01-22T21:42:37Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 22s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	13m 42s		Maven dependency ordering for branch
+1 💚	mvninstall	22m 18s		trunk passed
+1 💚	compile	9m 43s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	9m 4s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	2m 27s		trunk passed
+1 💚	mvnsite	1m 18s		trunk passed
+1 💚	javadoc	0m 57s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 48s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	2m 3s		trunk passed
+1 💚	shadedclient	21m 11s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 21s		Maven dependency ordering for patch
-1 ❌	mvninstall	0m 16s	/patch-mvninstall-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
-1 ❌	compile	8m 39s	/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt	root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌	javac	8m 39s	/patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt	root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌	compile	8m 7s	/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt	root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-1 ❌	javac	8m 7s	/patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt	root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	2m 12s	/results-checkstyle-root.txt	root: The patch generated 22 new + 6 unchanged - 0 fixed = 28 total (was 6)
-1 ❌	mvnsite	0m 23s	/patch-mvnsite-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	javadoc	0m 50s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 50s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 ❌	spotbugs	0m 24s	/patch-spotbugs-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	shadedclient	21m 29s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	16m 28s		hadoop-common in the patch passed.
-1 ❌	unit	0m 28s	/patch-unit-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	asflicense	0m 31s		The patch does not generate ASF License warnings.
		150m 41s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/4/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname	Linux 5fa6c0e6973e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `4a9f038`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/4/testReport/
Max. process+thread count	1281 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2024-01-24T09:30:41Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 21s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	31m 57s		trunk passed
+1 💚	compile	0m 26s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 21s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 21s		trunk passed
+1 💚	mvnsite	0m 28s		trunk passed
+1 💚	javadoc	0m 19s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 23s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 45s		trunk passed
+1 💚	shadedclient	19m 28s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 19s		the patch passed
+1 💚	compile	0m 20s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 20s		the patch passed
+1 💚	compile	0m 16s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 16s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 12s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 18 new + 6 unchanged - 0 fixed = 24 total (was 6)
+1 💚	mvnsite	0m 18s		the patch passed
+1 💚	javadoc	0m 10s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 16s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 43s		the patch passed
+1 💚	shadedclient	19m 22s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 26s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 23s		The patch does not generate ASF License warnings.
		81m 44s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/5/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 4dfd5fc0d4f0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `f0144a7`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/5/testReport/
Max. process+thread count	552 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2024-01-24T16:01:21Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 23s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	32m 31s		trunk passed
+1 💚	compile	0m 25s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 17s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 20s		trunk passed
+1 💚	mvnsite	0m 26s		trunk passed
+1 💚	javadoc	0m 18s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 23s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 44s		trunk passed
+1 💚	shadedclient	19m 20s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 20s		the patch passed
+1 💚	compile	0m 21s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 21s		the patch passed
+1 💚	compile	0m 15s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 15s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 10s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 13 new + 6 unchanged - 0 fixed = 19 total (was 6)
+1 💚	mvnsite	0m 17s		the patch passed
+1 💚	javadoc	0m 9s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 17s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 45s		the patch passed
+1 💚	shadedclient	19m 17s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 28s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 22s		The patch does not generate ASF License warnings.
		82m 58s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/6/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 732160df4277 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `4e9b673`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/6/testReport/
Max. process+thread count	551 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/6/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

ahmarsuhail

looks good mostly, but the ITest needs work. will sync with you on that

ahmarsuhail · 2024-01-29T12:02:00Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -581,6 +583,12 @@ public void initialize(URI name, Configuration originalConf)

      s3aInternals = createS3AInternals();

+      s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))


would prefer conf.get(READ_RESTORED_GLACIER_OBJECTS, READ_ALL) , meaning READ_ALL is the default. and then you can get rid of the orElse()

@ahmarsuhail but doing the the way it is does handle case differences.

I'd go for getTrimmed(READ_RESTORED_GLACIER_OBJECTS, ""); if empty string map to empty optional, otherwise .toupper and valueof. one thing to consider: meaningful failure if the value doesn't map.

I'd change Configuration to do that case mapping if it wasn't such a critical class

or we just go for "upper case is required" and use what you've proposed. more brittle but simpler?

ahmarsuhail · 2024-01-29T12:05:54Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectStorageClassFilter.java

+ * <pre>
+ * {@link S3ObjectStorageClassFilter} will filter the S3 files based on the {@code fs.s3a.glacier.read.restored.objects} configuration set in {@link S3AFileSystem}
+ * The config can have 3 values:
+ * {@code READ_ALL}: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the customers and not changing the experience for them.


remove This will be done to keep the current behavior for the customers and not changing the experience for them. and add something like "Retrieval of Galcier files will fail with xxx ` whatever the error is currently

ahmarsuhail · 2024-01-29T12:10:39Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java

@@ -411,4 +416,8 @@ public RequestFactory getRequestFactory() {
  public boolean isCSEEnabled() {
    return isCSEEnabled;
  }
+
+  public S3ObjectStorageClassFilter s3ObjectsStorageClassFilter() {


nit: rename to getS3ObjectStorageClassFilter(), and add java docs for the method

ahmarsuhail · 2024-01-29T12:13:09Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md

+<description>
+  The config can have 3 values:
+
+  * READ_ALL: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior (i.e failing for an unrestored glacier class file) for the customers and not changing the experience for them.


similar to above, remove This will be done to keep the current behavior for the customers and not changing the experience for them. and add something like "Retrieval of Galcier files will fail with xxx ` whatever the error is currently

ahmarsuhail · 2024-01-29T12:21:06Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+  }
+
+  @Override
+  protected Configuration createConfiguration() {


move method to the top

ahmarsuhail · 2024-01-29T12:22:23Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+    newConf.set(STORAGE_CLASS, STORAGE_CLASS_GLACIER); // Create Glacier objects
+    skipIfStorageClassTestsDisabled(newConf);
+    disableFilesystemCaching(newConf);
+    removeBaseAndBucketOverrides(newConf, STORAGE_CLASS, FAST_UPLOAD_BUFFER);


remove line 103 and 104, don't think they're needed ..

ahmarsuhail · 2024-01-29T12:28:39Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+
+  @Parameterized.Parameters(name = "fast-upload-buffer-{0}")
+  public static Collection<Object[]> params() {
+    return Arrays.asList(new Object[][]{


You don't need to parameterize here as we already test this in ITestS3AStorageClass. here we just want to focus on this glacier specific behaviour.

ahmarsuhail · 2024-01-29T12:29:51Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+
+    FileSystem fs = contract.getTestFileSystem();
+    Path dir = methodPath();
+    fs.mkdirs(dir);


you don't need to test for this here

ahmarsuhail · 2024-01-29T12:31:16Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+  @Override
+  protected Configuration createConfiguration() {
+    Configuration newConf = super.createConfiguration();
+    newConf.set(STORAGE_CLASS, STORAGE_CLASS_GLACIER); // Create Glacier objects


we should test for Glaicer and deep archive though..as that's in your StorageClassFilterMap

hadoop-yetus · 2024-02-05T09:00:45Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 0s		Docker mode activated.
-1 ❌	patch	0m 16s		#6407 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.

Subsystem	Report/Notes
GITHUB PR	#6407
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/7/console
versions	git=2.34.1
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2024-02-05T11:18:44Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 21s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	31m 31s		trunk passed
+1 💚	compile	0m 23s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 19s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 19s		trunk passed
+1 💚	mvnsite	0m 23s		trunk passed
+1 💚	javadoc	0m 17s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 22s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 41s		trunk passed
+1 💚	shadedclient	19m 21s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 16s		the patch passed
+1 💚	compile	0m 20s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 20s		the patch passed
+1 💚	compile	0m 16s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 16s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 13s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 21 new + 6 unchanged - 0 fixed = 27 total (was 6)
+1 💚	mvnsite	0m 18s		the patch passed
+1 💚	javadoc	0m 10s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 17s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 41s		the patch passed
+1 💚	shadedclient	19m 5s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 27s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 24s		The patch does not generate ASF License warnings.
		81m 4s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/8/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 6f1c70e86dea 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `d035d15`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/8/testReport/
Max. process+thread count	663 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/8/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

steveloughran · 2024-02-05T21:58:07Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -581,6 +583,12 @@ public void initialize(URI name, Configuration originalConf)

      s3aInternals = createS3AInternals();

+      s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))


@ahmarsuhail but doing the the way it is does handle case differences.

I'd go for getTrimmed(READ_RESTORED_GLACIER_OBJECTS, ""); if empty string map to empty optional, otherwise .toupper and valueof. one thing to consider: meaningful failure if the value doesn't map.

I'd change Configuration to do that case mapping if it wasn't such a critical class

steveloughran · 2024-02-05T21:59:10Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ObjectStorageClassFilter.java

+    this.filter = filter;
+  }
+
+  private static boolean isNotGlacierObject(S3Object object) {


add javadocs all the way down here, thanks

Sure will add for methods in the class

steveloughran · 2024-02-05T22:00:08Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java

@@ -25,6 +25,7 @@
 import java.util.concurrent.CompletableFuture;
 import java.util.concurrent.ExecutorService;

+import org.apache.hadoop.fs.s3a.S3ObjectStorageClassFilter;


can you move down to the rest of the org.apache.
these guava things are in the wrong block due to the big search and replace which created them

steveloughran · 2024-02-05T22:01:17Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -581,6 +583,12 @@ public void initialize(URI name, Configuration originalConf)

      s3aInternals = createS3AInternals();

+      s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))


or we just go for "upper case is required" and use what you've proposed. more brittle but simpler?

steveloughran · 2024-02-05T22:03:13Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+    S3Client s3Client = getFileSystem().getS3AInternals().getAmazonS3Client("test");
+
+    // Create a restore object request
+    RestoreObjectRequest requestRestore = RestoreObjectRequest.builder()


prefer this was in the RequestFactory interface and builder, as it'll let us do things like add audit context and anything else in future

steveloughran · 2024-02-05T22:05:25Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+        getFilePrefixForListObjects(), "/");
+    S3Object s3GlacierObject =  getS3GlacierObject(s3Client, s3ListRequest);
+
+    while ((s3GlacierObject != null && s3GlacierObject.restoreStatus().isRestoreInProgress()) && retryCount < MAX_RETRIES) {


LambdaTestUtils.await() is designed to handle this.

steveloughran · 2024-02-05T22:06:17Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+    skipIfStorageClassTestsDisabled(newConf);
+    disableFilesystemCaching(newConf);
+    removeBaseAndBucketOverrides(newConf, STORAGE_CLASS);
+    newConf.set(REJECT_OUT_OF_SPAN_OPERATIONS, "false");


or better: create an audit span

steveloughran · 2024-02-05T22:07:24Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+
+  enum Type { GLACIER_AND_DEEP_ARCHIVE, GLACIER }
+
+  @Parameterized.Parameters


look at other uses of this to see how we generate useful strings for logs

be aware the pattern is used in the method path, so musn't create invalid paths. it's just text is so much better than [0]

Sure will update

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

bpahuja · 2024-02-06T10:05:08Z

I 'd go for getTrimmed(READ_RESTORED_GLACIER_OBJECTS, ""); if empty string map to empty optional, otherwise .toupper and valueof. one thing to consider: meaningful failure if the value doesn't map.

or we just go for "upper case is required" and use what you've proposed. more brittle but simpler?

Regarding the above, I will update the config read to getTrimmed and set the config values of defualt to READ_ALL instead of "".

meaningful failure if the value doesn't map

the valueof method will throw the illegalArgs exception if an invalid value is set in the config

hadoop-yetus · 2024-02-07T08:27:49Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 23s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
-1 ❌	mvninstall	1m 40s	/branch-mvninstall-root.txt	root in trunk failed.
-1 ❌	compile	0m 22s	/branch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt	hadoop-aws in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌	compile	3m 21s	/branch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt	hadoop-aws in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-0 ⚠️	checkstyle	0m 19s	/buildtool-branch-checkstyle-hadoop-tools_hadoop-aws.txt	The patch fails to run checkstyle in hadoop-aws
-1 ❌	mvnsite	0m 21s	/branch-mvnsite-hadoop-tools_hadoop-aws.txt	hadoop-aws in trunk failed.
-1 ❌	javadoc	0m 22s	/branch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt	hadoop-aws in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌	javadoc	0m 5s	/branch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt	hadoop-aws in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 💚	spotbugs	1m 45s		trunk passed
-1 ❌	shadedclient	4m 27s		branch has errors when building and testing our client artifacts.
			_ Patch Compile Tests _
-1 ❌	mvninstall	0m 20s	/patch-mvninstall-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
-1 ❌	compile	0m 20s	/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌	javac	0m 20s	/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
+1 💚	compile	0m 30s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 30s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 20s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 27 new + 0 unchanged - 0 fixed = 27 total (was 0)
+1 💚	mvnsite	0m 19s		the patch passed
+1 💚	javadoc	0m 11s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 14s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 39s		the patch passed
-1 ❌	shadedclient	1m 11s		patch has errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	0m 22s	/patch-unit-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+0 🆗	asflicense	0m 20s		ASF License check generated no output?
		19m 52s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/9/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux b4c0370956e2 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `e110e2a`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/9/testReport/
Max. process+thread count	88 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/9/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

bpahuja · 2024-02-07T09:46:15Z

Build failed due to an OOM Error,

build passing on local

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:33 min
[INFO] Finished at: 2024-02-07T08:13:12Z
[INFO] ------------------------------------------------------------------------
[ERROR] unable to create new native thread -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError

hadoop-yetus · 2024-02-07T11:11:18Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 20s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	31m 45s		trunk passed
+1 💚	compile	0m 25s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 20s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 21s		trunk passed
+1 💚	mvnsite	0m 25s		trunk passed
+1 💚	javadoc	0m 18s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 21s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 43s		trunk passed
+1 💚	shadedclient	19m 36s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 17s		the patch passed
+1 💚	compile	0m 19s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 19s		the patch passed
+1 💚	compile	0m 16s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 16s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 12s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 21 new + 6 unchanged - 0 fixed = 27 total (was 6)
+1 💚	mvnsite	0m 18s		the patch passed
+1 💚	javadoc	0m 9s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 17s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 42s		the patch passed
+1 💚	shadedclient	19m 1s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 28s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 23s		The patch does not generate ASF License warnings.
		81m 32s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/10/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 8e4ff0d8df99 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `c5dcd5c`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/10/testReport/
Max. process+thread count	559 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/10/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran

really close now: just import placement (always a PITA) and some checkstyle line length complaints. Other than that: ready to merge!

steveloughran · 2024-02-15T12:27:20Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java

@@ -18,6 +18,7 @@

 package org.apache.hadoop.fs.s3a;

+import org.apache.hadoop.fs.s3a.api.S3ObjectStorageClassFilter;


new org.apache imports MUST go into that block for new classes. set your IDE up for this and life gets simpler for all

steveloughran · 2024-02-15T12:27:32Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

@@ -52,6 +52,7 @@
 import java.util.concurrent.atomic.AtomicBoolean;
 import javax.annotation.Nullable;

+import org.apache.hadoop.fs.s3a.api.S3ObjectStorageClassFilter;


steveloughran · 2024-02-15T12:28:12Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java

@@ -37,7 +37,7 @@
 import org.apache.hadoop.fs.s3a.S3AFileStatus;
 import org.apache.hadoop.fs.s3a.S3AInputPolicy;
 import org.apache.hadoop.fs.s3a.S3AStorageStatistics;
-import org.apache.hadoop.fs.s3a.S3ObjectStorageClassFilter;
+import org.apache.hadoop.fs.s3a.api.S3ObjectStorageClassFilter;


this has to move down to L42 now, as its in a sub package.

steveloughran · 2024-02-15T12:30:29Z

...-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/api/S3ObjectStorageClassFilter.java


+import org.apache.hadoop.fs.s3a.S3AFileSystem;


now, this is a branch new class. in which case the L22 import can go into the org.apache block and the S3aFilesystem one with it. the only reason it is in the wrong place in existing code is that the move to repackaged classes was a big search and replace only: no re-ordering

steveloughran · 2024-02-15T12:31:48Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

@@ -89,7 +92,7 @@ private FileSystem createFiles(String s3ObjectStorageClassFilter) throws Throwab
    FileSystem fs = contract.getTestFileSystem();
    Path dir = methodPath();
    fs.mkdirs(dir);


skip this for a marginal increase in performance. create() assumes there's a dir and just creates the object in the destination path without checks.

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java

steveloughran · 2024-02-15T12:35:39Z

@bpahuja have you tested this with any of: s3 express, third party stores, gcs? Just want to make sure things work and the tests are skipped?

I'll inevitably do the test runs anyway, but it'd save me time and effort, and as I don't run those regularly enough, multiple commits may be the source of regressions

bpahuja · 2024-02-16T11:11:54Z

@bpahuja have you tested this with any of: s3 express, third party stores, gcs? Just want to make sure things work and the tests are skipped?

Hello @steveloughran , wanted to confirm the requirement here ? Is it that you wanted me to ensure that this test is skipped when ran in other environments ?

If that's the case then IMO, it should work as other Integration tests do in this package as it is being inheriting AbstractS3ATestBase.

Do let me know if something other then this is needed from me :)

Thanks

hadoop-yetus · 2024-02-16T11:50:52Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 23s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	32m 13s		trunk passed
+1 💚	compile	0m 23s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 19s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 20s		trunk passed
+1 💚	mvnsite	0m 23s		trunk passed
+1 💚	javadoc	0m 17s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 21s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 44s		trunk passed
+1 💚	shadedclient	19m 11s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	19m 23s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 19s		the patch passed
+1 💚	compile	0m 19s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 19s		the patch passed
+1 💚	compile	0m 14s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 14s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 11s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 2 new + 6 unchanged - 0 fixed = 8 total (was 6)
+1 💚	mvnsite	0m 19s		the patch passed
+1 💚	javadoc	0m 9s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 16s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 40s		the patch passed
+1 💚	shadedclient	18m 42s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 13s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 23s		The patch does not generate ASF License warnings.
		80m 59s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/14/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 7c86bd58d065 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `df48a67`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/14/testReport/
Max. process+thread count	551 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/14/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

bpahuja · 2024-02-21T07:15:30Z

Hello @steveloughran, Just a gentle reminder to review the PR. Thanks 😄

hadoop-yetus · 2024-02-22T11:48:45Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 26s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	markdownlint	0m 1s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	36m 54s		trunk passed
+1 💚	compile	0m 28s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 23s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 22s		trunk passed
+1 💚	mvnsite	0m 30s		trunk passed
+1 💚	javadoc	0m 21s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 23s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 48s		trunk passed
+1 💚	shadedclient	22m 54s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	23m 7s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 20s		the patch passed
+1 💚	compile	0m 20s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 20s		the patch passed
+1 💚	compile	0m 15s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 15s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 14s	/results-checkstyle-hadoop-tools_hadoop-aws.txt	hadoop-tools/hadoop-aws: The patch generated 2 new + 6 unchanged - 0 fixed = 8 total (was 6)
+1 💚	mvnsite	0m 19s		the patch passed
+1 💚	javadoc	0m 11s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 19s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	0m 47s		the patch passed
+1 💚	shadedclient	22m 6s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 15s		hadoop-aws in the patch passed.
+1 💚	asflicense	0m 26s		The patch does not generate ASF License warnings.
		93m 44s

Subsystem	Report/Notes
Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/15/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux fe484a3a2b53 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `cbb8580`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/15/testReport/
Max. process+thread count	556 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/15/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

bpahuja · 2024-02-26T07:31:13Z

Hello @steveloughran, Just a gentle reminder to review the PR. Thanks 😄

steveloughran · 2024-02-26T16:08:31Z

I'm doing nothing but helping it get Hadoop 3.4.0 out the door this week. No code of my except related to packaging; no reviews of other people except related to the release at all. Sorry.

Well you are waiting for a review on your code from myself or someone else -why not getting involved? There is no release engineering team doing this and it is up to all of us developers in the community to get our hands dirty. We all have different deployment environments and we all have different things we want to test. And, given this is the first public release with the AWS V2 SDK: it matters a lot that this thing ships. It will be the way we actually find real world bugs.

Look on the hadoop common-dev list for the announcement of the next release candidate: once the announcement is made we have five days to test and vote on the RC. That is why we are under so much time pressure here.

Note I've created a project, https://github.com/apache/hadoop-release-support , to assist in qualifying. One thing which would be good it would be some extra scripts we could run to help validate storage operations -operations which we can then execute against cloud storage from either the local host or a remote one -I am qualifying the Arm64 build on a raspberry pi5 under my television and would like to have that testing fully automated.

Anything you can do here will be very much appreciated. And like I said: I'm unlikely to be looking at any other code right now.

bpahuja · 2024-03-18T06:02:32Z

Hello @steveloughran, Just a gentle reminder to review the PR :) , Do take a look whenever you get some time

Thanks

steveloughran · 2024-03-18T11:31:09Z

I am catching up now, as warned giving priority to people who helped validate the RC. the core project is a community project, and everyone gets to participate

steveloughran · 2024-03-19T18:48:10Z

...-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/api/S3ObjectStorageClassFilter.java

+import software.amazon.awssdk.services.s3.model.S3Object;
+
+import org.apache.hadoop.fs.s3a.S3AFileSystem;
+import org.apache.hadoop.thirdparty.com.google.common.collect.Sets;


could you use org.apache.hadoop.util.Sets here. its part of our attempt to isolate ourselves better from guava changes and the pain that causes downstream

...-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/api/S3ObjectStorageClassFilter.java

steveloughran · 2024-03-19T18:52:21Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md

+
+[Amazon S3 Glacier (S3 Glacier)](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) is a secure and durable service for low-cost data archiving and
+long-term backup.
+With S3 Glacier, you can store your data cost effectively for months, years, or even decades.


replace "you" with something like "it is possible to store data more cost-effectively"; this is the docs for the connector, not marketing...

steveloughran · 2024-03-19T18:53:41Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+  }
+
+  private FileSystem createFiles(String s3ObjectStorageClassFilter) throws Throwable {
+    Configuration conf = this.createConfiguration();


sorry I meant "remove the this. prefix"; just use createConfiguration() directly

steveloughran · 2024-03-19T18:55:05Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+    try (FileSystem fs = createFiles(S3ObjectStorageClassFilter.SKIP_ALL_GLACIER.name())) {
+      Assertions.assertThat(
+          fs.listStatus(methodPath()))
+        .describedAs("FileStatus List of %s", methodPath()).isEmpty();


can you put the .isEmpty() on the line below; we like to try and split them up as they're very complex to read.

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

steveloughran · 2024-03-19T20:49:35Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+    });
+  }
+
+  private final int maxRetries = 100;


these are static/final, so make constants and upper case.

steveloughran · 2024-03-19T20:50:52Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+    conf.set(READ_RESTORED_GLACIER_OBJECTS, s3ObjectStorageClassFilter);
+    // Create Glacier objects:Storage Class:DEEP_ARCHIVE/GLACIER
+    conf.set(STORAGE_CLASS, glacierClass);
+    S3AContract contract = (S3AContract) createContract(conf);


you don't need to do this. just do

fs = new S3AFileSystem() fs.init(conf)

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

steveloughran

Happy with production code apart from a couple of minor comments; been testing locally now.

Having just tested it, it took 2+ minutes to complete, even while skipping 6 tests.

Even as part of a parallel run, that is going to hurt.

Would it be possible to abandon the elegance and structure of parameterised test suites and try and combine them?
that is:

Create one glaciated object and then verify the different filesystem configurations do/don't find it. This will save a lot of set up and tear down overhead.

I know it goes against the "test one thing per method" and "parameterize" but it helps scale.

if you look at the AbstractSTestS3AHugeFiles tests you can see how by running the test methods in name order we can actually retain some of this isolation. however, it complicates life in other ways (we need to keep the test files out of the normal test methodpath, can't run a test on its own anyway,... -so don't bother with this, just know that sometimes testing just gets complicated.

side issue: is this test going to run up significant charges?

steveloughran · 2024-04-15T20:37:07Z

@bpahuja can you address the review comments? Otherwise I'm going to forget about it.

Get it done now and we can target 3.4.1

bpahuja · 2024-04-16T04:28:40Z

Sure @steveloughran, will prioritize this

…s3-api and some minor changes

…ther minor changes

bpahuja · 2024-05-16T09:02:36Z

@steveloughran , I have pushed the changes for ITestS3AReadRestoredGlacierObjects.java , and it now has only 1 test with all the test cases. In total , the setup happens twice, once for Glacier Storage class and once more for Glacier Deep Archive Storage class. The number of setups happening has been reduced , where earlier it created the file for each test case.
Please review whenever you get some time.
Let me know if any more changes are needed here.

Thanks :)

hadoop-yetus · 2024-05-16T09:49:57Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 23s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	markdownlint	0m 0s		markdownlint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	36m 30s		trunk passed
+1 💚	compile	0m 28s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	compile	0m 24s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	checkstyle	0m 22s		trunk passed
+1 💚	mvnsite	0m 25s		trunk passed
-1 ❌	javadoc	0m 24s	/branch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt	hadoop-aws in trunk failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌	javadoc	0m 27s	/branch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt	hadoop-aws in trunk failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌	spotbugs	0m 26s	/branch-spotbugs-hadoop-tools_hadoop-aws.txt	hadoop-aws in trunk failed.
+1 💚	shadedclient	3m 0s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	3m 24s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
-1 ❌	mvninstall	0m 21s	/patch-mvninstall-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
-1 ❌	compile	0m 22s	/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌	javac	0m 22s	/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌	compile	0m 22s	/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌	javac	0m 22s	/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 20s	/buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt	The patch fails to run checkstyle in hadoop-aws
-1 ❌	mvnsite	0m 22s	/patch-mvnsite-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
-1 ❌	javadoc	0m 5s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt	hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌	javadoc	0m 20s	/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt	hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌	spotbugs	0m 22s	/patch-spotbugs-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+1 💚	shadedclient	3m 17s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	unit	0m 22s	/patch-unit-hadoop-tools_hadoop-aws.txt	hadoop-aws in the patch failed.
+0 🆗	asflicense	0m 22s		ASF License check generated no output?
		50m 21s

Subsystem	Report/Notes
Docker	ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/16/artifact/out/Dockerfile
GITHUB PR	#6407
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname	Linux 2f4a590eb200 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `c1c72d9`
Default Java	Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/16/testReport/
Max. process+thread count	77 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/16/console
versions	git=2.25.1 maven=3.6.3
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran

sorry, been busy and neglected this

I've got some minor comments and then once yetus is happy we can merge.
Note that in #6789 I'm adding some case insensitive way to resolve a list of enum values in a config, for now the work you've done is the only way to be case insensitive...pity.

now, I wonder what is going to break on stores without this feature?

what happens on S3 Express stores?

I've proposed how to bind to the existing skipIfStorageClassTestsDisabled(conf) call which will skip the test during setup.

steveloughran · 2024-05-16T14:15:29Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+        LOG.warn("Invalid value for the config {} is set. Valid values are:" +
+                "READ_ALL, SKIP_ALL_GLACIER, READ_RESTORED_GLACIER_OBJECTS. Defaulting to READ_ALL",
+            READ_RESTORED_GLACIER_OBJECTS);
+        s3ObjectStorageClassFilter = S3ObjectStorageClassFilter.READ_ALL;


lets fall to the default. maybe pull the conf.getTrimmed() out of the try {} so it's value can be printed too.

FWIW in #6789 I'm doing a getEnumSet() which is case independent too.

steveloughran · 2024-05-16T14:15:40Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

+      try {
+        s3ObjectStorageClassFilter = Optional.of(conf.getTrimmed(READ_RESTORED_GLACIER_OBJECTS,
+                DEFAULT_READ_RESTORED_GLACIER_OBJECTS))
+            .map(String::toUpperCase)


(Locale.ROOT)

steveloughran · 2024-05-16T14:16:09Z

...-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/api/S3ObjectStorageClassFilter.java

+  }
+
+  /**
+   * Returns the filter function set as part of the enum definition


needs a trailing .

steveloughran · 2024-05-16T16:18:03Z

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/StoreContext.java

@@ -117,6 +117,8 @@ public class StoreContext implements ActiveThreadSpanSource<AuditSpan> {
  /** Is client side encryption enabled? */
  private final boolean isCSEEnabled;

+  private final S3ObjectStorageClassFilter s3ObjectStorageClassFilter;


nit: javadocs

steveloughran · 2024-05-16T16:18:30Z

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md

@@ -927,8 +927,34 @@ The switch to turn S3A auditing on or off.
    Should auditing of S3A requests be enabled?
  </description>
 </property>
+```
+## <a name="glacier"></a> Glacier Object Support


can you add a newline. thanks

steveloughran · 2024-05-16T16:18:55Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.fs.s3a.S3AFileSystem;


move to the apache group

steveloughran · 2024-05-16T16:29:17Z

...doop-aws/src/test/java/org/apache/hadoop/fs/s3a/list/ITestS3AReadRestoredGlacierObjects.java

+
+  private FileSystem createFileSystem(String s3ObjectStorageClassFilter) throws Throwable {
+    Configuration conf = createConfiguration();
+    conf.set(READ_RESTORED_GLACIER_OBJECTS, s3ObjectStorageClassFilter);


insert line

skipIfStorageClassTestsDisabled(conf);

(and matching import

import static org.apache.hadoop.fs.s3a.S3ATestUtils.skipIfStorageClassTestsDisabled;

See ITestS3AStorageClass for an example.

github-actions bot added trunk Common TOOLS AWS labels Jan 4, 2024

ahmarsuhail requested changes Jan 17, 2024

View reviewed changes

steveloughran requested changes Jan 18, 2024

View reviewed changes

bpahuja force-pushed the feature/HADOOP-14837 branch from 4a9f038 to f0144a7 Compare January 24, 2024 08:07

bpahuja force-pushed the feature/HADOOP-14837 branch from f0144a7 to 4e9b673 Compare January 24, 2024 14:37

bpahuja requested review from ahmarsuhail and steveloughran January 24, 2024 14:38

steveloughran self-assigned this Jan 25, 2024

ahmarsuhail requested changes Jan 29, 2024

View reviewed changes

bpahuja force-pushed the feature/HADOOP-14837 branch from 4e9b673 to 8a70267 Compare February 5, 2024 08:57

bpahuja force-pushed the feature/HADOOP-14837 branch from 8a70267 to d035d15 Compare February 5, 2024 09:56

bpahuja requested a review from ahmarsuhail February 5, 2024 09:59

steveloughran requested changes Feb 5, 2024

View reviewed changes

bpahuja force-pushed the feature/HADOOP-14837 branch from d035d15 to e110e2a Compare February 7, 2024 08:06

bpahuja requested a review from steveloughran February 7, 2024 08:07

bpahuja force-pushed the feature/HADOOP-14837 branch from e110e2a to c5dcd5c Compare February 7, 2024 09:48

steveloughran requested changes Feb 15, 2024

View reviewed changes

bpahuja requested a review from steveloughran February 16, 2024 12:01

shameersss1 approved these changes Mar 18, 2024

View reviewed changes

steveloughran requested changes Mar 19, 2024

View reviewed changes

Bhavay Pahuja added 7 commits May 15, 2024 12:18

HADOOP-14837: Glacier read restored objects support

403f50f

Fixing checkstyle

6110c7c

updating documentation and moving S3ObjectStorageClassFilter.java to …

de6d1b0

…s3-api and some minor changes

minor updates

04548bb

import ordering changes and minor change in integration test

863a5db

handle invalid config value gracefully

c152dc2

Optmize ITestS3AReadRestoredGlacierObjects to reduce setup time and o…

c1c72d9

…ther minor changes

bpahuja force-pushed the feature/HADOOP-14837 branch from cbb8580 to c1c72d9 Compare May 16, 2024 08:58

bpahuja requested a review from steveloughran May 16, 2024 09:11

apache deleted a comment from hadoop-yetus May 16, 2024

steveloughran requested changes May 16, 2024

View reviewed changes

		@@ -18,6 +18,7 @@

		package org.apache.hadoop.fs.s3a;

		import org.apache.hadoop.util.Lists;



		/**
		* S3ObjectStorageClassFilter will filter the S3 files based on the fs.s3a.glacier.read.restored.objects configuration set in S3AFileSystem

		@@ -576,6 +580,10 @@ public void initialize(URI name, Configuration originalConf)

		s3aInternals = createS3AInternals();

		s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))

		@@ -581,6 +583,12 @@ public void initialize(URI name, Configuration originalConf)

		s3aInternals = createS3AInternals();

		s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))


		enum Type { GLACIER_AND_DEEP_ARCHIVE, GLACIER }

		@Parameterized.Parameters

		@@ -18,6 +18,7 @@

		package org.apache.hadoop.fs.s3a;

		import org.apache.hadoop.fs.s3a.api.S3ObjectStorageClassFilter;

HADOOP-14837 : Support Read Restored Glacier Objects #6407

Are you sure you want to change the base?

HADOOP-14837 : Support Read Restored Glacier Objects #6407

Conversation

bpahuja commented Jan 4, 2024

Description of PR

How was this patch tested?

Integration Tests (hadoop-aws)

Manual Testing

For code changes:

Uh oh!

ahmarsuhail left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bpahuja commented Jan 19, 2024

Uh oh!

hadoop-yetus commented Jan 22, 2024

Uh oh!

hadoop-yetus commented Jan 22, 2024

Uh oh!

hadoop-yetus commented Jan 24, 2024

Uh oh!

hadoop-yetus commented Jan 24, 2024

Uh oh!

ahmarsuhail left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Feb 5, 2024

Uh oh!

hadoop-yetus commented Feb 5, 2024