Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19485. S3A: upgrade AWS SDK #7479

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Mar 7, 2025

Upgrade to 2.30.27

How was this patch tested?

Regression testing in progress

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran
Copy link
Contributor Author

testing in progress; also writing a new, expanded and very strict doc on qualifying a release, based on the experience of recent upgrades.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 16s #7479 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7479
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7479/1/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

Test failures with the 2.30.27

The issue of HADOOP-19272 #7048; S3A: AWS SDK 2.25.53 warnings logged about transfer manager not using CRT client has been fixed

java.lang.AssertionError: 
[LOG output does not contain the forbidden text. Has the SDK been fixed?] 
Expecting:
 <"">
to contain:
 <"The provided S3AsyncClient is an instance of MultipartS3AsyncClient"> 
        at org.apache.hadoop.fs.s3a.impl.ITestAwsSdkWorkarounds.testNoisyLogging(ITestAwsSdkWorkarounds.java:100)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.lang.Thread.run(Thread.java:750)

this test can be culled

@steveloughran steveloughran force-pushed the s3/HADOOP-19485-SDK-upgrade branch from 8639127 to 9720b02 Compare March 7, 2025 10:59
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 16s #7479 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7479
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7479/2/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

steveloughran commented Mar 7, 2025

ITestS3AEndpointRegion failures are probably caused by aws/aws-sdk-java-v2#5562 or a related change. Key:the execution context attribute AwsExecutionAttribute.ENDPOINT_OVERRIDDEN is now false when not true, rather than unset/null.
This is a test-only regression.

@steveloughran
Copy link
Contributor Author

Testing with third party stores shows the MD-5 thing is there. Also shows that no SDK testing is ever performed against third-party stores, which is something to consider

       org.apache.hadoop.fs.s3a.AWSBadRequestException: Remove S3 Files on s3a://dellecs/job-00-fork-0003/test/test: software.amazon.awssdk.services.s3.model.InvalidRequestException: Missing required header for this request: Content-MD5 (Service: S3, Status Code: 400, Request ID: 0c07c879:1953935cbee:1a45b:1d85, Extended Request ID: 85e1d41b57b608d4e58222b552dea52902e93b05a12f63f54730ae77769df8d1) (SDK Attempt Count: 1):InvalidRequest: Missing required header for this request: Content-MD5 (Service: S3, Status Code: 400, Request ID: 0c07c879:1953935cbee:1a45b:1d85, Extended Request ID: 85e1d41b57b608d4e58222b552dea52902e93b05a12f63f54730ae77769df8d1) (SDK Attempt Count: 1)

Upgrade to 2.30.27

Change-Id: Ic0652dc95c619559c45c9f0a153813b73a076d13
AwsSdkWorkarounds no longer needs to cut back on transfer manager logging
(HADOOP-19272).

Remove log downgrade and change assertion to expect nothing to be logged.

Change-Id: I5edcf674c1eede8327538979ddab2fe98d2e53e2
Change in state of AwsExecutionAttribute.ENDPOINT_OVERRIDDEN
attribute requires test tuning to match.

Change-Id: I80050ce9ffffa6b4f1b05dd16e83b18d2ce63678
Refresh IAM credentials a hard coded 60s before the session credentials
fully expire.

Change-Id: I2a61584cc99d761cc4b9af6a669224f309425088
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 15s #7479 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7479
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7479/3/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Switch is in client; commented out in test log properties;
covered in troubleshooting doc

Change-Id: If70447d8eb3d3d0e03db5c169cd1aabf844931bd
@steveloughran steveloughran force-pushed the s3/HADOOP-19485-SDK-upgrade branch from b8314e3 to 288fd09 Compare March 7, 2025 19:38
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 17s #7479 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #7479
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7479/4/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants