Skip to content

Conversation

Ben-El
Copy link
Contributor

@Ben-El Ben-El commented Sep 18, 2025

Closes #9512

Followup task of #9492

Changes

  • Replace the external aws s3api call with AWS SDK v2 (S3Client.putObject).
  • Keep atomic “create-if-absent” semantics using PutObjectRequest.ifNoneMatch("*") (no overwrites).
  • Log via streams.value.log (INFO for stdout-like messages).
  • On errors, surface clear messages, and map 412 to “Artifact already exists”.
  • There's no --acl public-read anymore. With Bucket Owner Enforced, ACLs are ignored and access is governed by the bucket policy.
  • Add meta-build deps under clients/spark/project/s3-upload-sdk.sbt:
    • software.amazon.awssdk:s3 and :regions (v2.33.x).
    • These are build-time only and do not enter the assembly jar (no impact on EMR6/EMR7 runtime).

Testings

On the actual bucket, verified that:

@Ben-El Ben-El added exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached labels Sep 18, 2025
@Ben-El Ben-El requested a review from arielshaqed September 18, 2025 13:24
@Ben-El Ben-El changed the title Remove unused --acl parameter from S3 upload script in build.sbt s3Upload: better logging & error handling; remove --acl (Bucket Owner Enforced) Sep 18, 2025
@Ben-El Ben-El requested a review from nopcoder September 18, 2025 14:31
@Ben-El Ben-El removed the request for review from nopcoder September 18, 2025 15:14
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we cannot just use a Java S3 client - either the AWS one or the MinIO one would do. AFAIK both support "if-none-match" semantics.

The thing about running processes is that they can fail in so many interesting ways! For example, the previous code would (always) lose output. The new code can report strange exit codes (>128) if the process exits with a signal. It separates the failure notification from the error message, because the error message goes out to the process logs. It will also break easily if aws is not on the path, or if the wrong version of aws is on the path. It has an unenforced dependency when used in an action - you must install an AWS CLI. And I find it hard to predict what it will do in different circumstances.

@Ben-El Ben-El requested a review from arielshaqed September 18, 2025 19:25
@Ben-El Ben-El changed the title s3Upload: better logging & error handling; remove --acl (Bucket Owner Enforced) build(spark): switch to AWS SDK v2 for atomic S3 upload (ifNoneMatch); remove --acl Sep 18, 2025
@Ben-El Ben-El changed the title build(spark): switch to AWS SDK v2 for atomic S3 upload (ifNoneMatch); remove --acl build(spark): switch to AWS SDK v2 for atomic S3 upload (ifNoneMatch) Sep 18, 2025
@Ben-El Ben-El changed the title build(spark): switch to AWS SDK v2 for atomic S3 upload (ifNoneMatch) Build(spark): switch to AWS SDK v2 for atomic S3 upload (ifNoneMatch) Sep 18, 2025
@Ben-El
Copy link
Contributor Author

Ben-El commented Sep 18, 2025

I don't understand why we cannot just use a Java S3 client - either the AWS one or the MinIO one would do. AFAIK both support "if-none-match" semantics.

The thing about running processes is that they can fail in so many interesting ways! For example, the previous code would (always) lose output. The new code can report strange exit codes (>128) if the process exits with a signal. It separates the failure notification from the error message, because the error message goes out to the process logs. It will also break easily if aws is not on the path, or if the wrong version of aws is on the path. It has an unenforced dependency when used in an action - you must install an AWS CLI. And I find it hard to predict what it will do in different circumstances.

You're right, updated the solution !
Thx !

@Ben-El Ben-El changed the title Build(spark): switch to AWS SDK v2 for atomic S3 upload (ifNoneMatch) Refactor S3 upload logic: replace AWS CLI with SDK Sep 18, 2025
@Ben-El Ben-El changed the title Refactor S3 upload logic: replace AWS CLI with SDK Refactor S3 upload logic - replace AWS CLI with SDK Sep 18, 2025
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I highly recommend a fix - adding the cause - to the exceptions thrown on error. And this does not need to be re-reviewed.

Note that this also removes ACL: public-read.

@Ben-El Ben-El requested a review from nopcoder September 21, 2025 08:59
// Using AWS SDK v2, fails the build on any error.
lazy val s3Upload = taskKey[Unit]("Upload JAR to S3 without override existing")
val publishBucket = settingKey[String]("Target S3 bucket for publishing the JAR")
val publishRegion = settingKey[String]("AWS region for publishing the JAR")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can get it from env or other supported way by the java sdk

@Ben-El Ben-El requested a review from nopcoder September 21, 2025 10:05
Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. remove the extra import.

@Ben-El Ben-El merged commit f3ae0bc into master Sep 21, 2025
66 of 67 checks passed
@Ben-El Ben-El deleted the remove-unused-acl-parameter-from-S3-upload branch September 21, 2025 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove public acl flag from s3PutIfAbsent task in spark/build.sbt

3 participants