Skip to content

Conversation

@Luka-J9
Copy link

@Luka-J9 Luka-J9 commented Oct 16, 2025

Currently this is a minimal change to demonstrate the issue so that the strategy can be discussed before addressed.

fixes #1245

.withCustomHeaders(
Map(
requestPayerHeader -> requestPayerHeaderValue,
storgeClassHeader -> storageClassHeaderValue,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds the header that is valid for creating the multipart copy request, but is not valid for the put requests

Docs list all valid headers here

I realize that there is also the storage class header value, but it also doesn't include all valid header values which is why I'm adding it this way. Additionally looking at the code it appears the behavior would be the same.

.withHeader(sseCSourceAlgorithmHeader, new EqualToPattern(sseCSourceAlgorithmHeaderValue))
.withHeader(sseCSourceKeyHeader, new EqualToPattern(sseCSourceKeyHeaderValue))
.withHeader(requestPayerHeader, new EqualToPattern(requestPayerHeaderValue)))
.withHeader(requestPayerHeader, new EqualToPattern(requestPayerHeaderValue))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is showing the correct outcome, but with the change in S3Stream.scala we remove the valid header of requestPayerHeaderValue. Without it we get illegal storageClassHeader

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test now passes with the proposed change

@Luka-J9 Luka-J9 marked this pull request as draft October 16, 2025 21:52
@Luka-J9 Luka-J9 changed the title [DRAFT] AWS S3: Handle Illegal Headers in Copy Part AWS S3: Handle Illegal Headers in Copy Part Oct 16, 2025
@Luka-J9
Copy link
Author

Luka-J9 commented Oct 16, 2025

@mdedetrich / @pjfanning - I made this draft PR to demonstrate the problem in order to discuss the path forward for fixing this problem. I'm going to look to see if I can create an integration test tomorrow.

I'm thinking that we have an allow list of all correct headers and filter out all the headers that start with the x-amz- that do not belong to that list?

I say starts with x-amz- as there might exist use cases with S3 compliant stores that have custom headers. This is just pure speculation however.

The main con here is that if AWS introduces new headers in the future this would require an update. Alternatively we may wish to simply remove x-amz-storage-class and create a block list, which would still need to be maintained.

Open to alternative suggestions if there are any

@mdedetrich
Copy link
Contributor

mdedetrich commented Oct 17, 2025

@mdedetrich / @pjfanning - I made this draft PR to demonstrate the problem in order to discuss the path forward for fixing this problem. I'm going to look to see if I can create an integration test tomorrow.

I'm thinking that we have an allow list of all correct headers and filter out all the headers that start with the x-amz- that do not belong to that list?

I say starts with x-amz- as there might exist use cases with S3 compliant stores that have custom headers. This is just pure speculation however.

The main con here is that if AWS introduces new headers in the future this would require an update. Alternatively we may wish to simply remove x-amz-storage-class and create a block list, which would still need to be maintained.

Open to alternative suggestions if there are any

I would say, if we do have an allowlist it should be provided in as configuration in reference.conf so people can add more values without needing us to do an additional release. The initial list can be hardcoded, but there can be additional pekko.connectors.s3.additional-upload-whitelist-headers configuration where people can also add additional headers in the case that s3 adds something new.

@pjfanning @raboof thoughts?

@raboof
Copy link
Member

raboof commented Oct 17, 2025

Sorry, I don't think I understand s3 well enough to have a meaningful opinion here

@pjfanning
Copy link
Member

I have no strong opinion about being proactive about headers. It seems to introduce its own risks. If we do introduce a new config, avoid any name that uses colour based naming, eg white or black.

@Luka-J9
Copy link
Author

Luka-J9 commented Oct 17, 2025

@mdedetrich / @pjfanning - Do I need to worry about source/binary compat? Not sure I can add the configuration handling without breaking compatibility on some level

Additionally I assume I should be handling the headers the same way for all header S3Request types and not just CopyPart?

@pjfanning
Copy link
Member

@mdedetrich / @pjfanning - Do I need to worry about source/binary compat? Not sure I can add the configuration handling without breaking compatibility on some level

Additionally I assume I should be handling the headers the same way for all header S3Request types and not just CopyPart?

@Luka-J9 it's ok to break binary compatibility on main branch as it is targeted at a 2.0.0 release. Just add a mima-filters file. You'll find examples in the src code already (.excludes file extension).
We can decide based on whether it is safe to backport to a 1.3.0 release after that. We would accept some level of change for that but it would probably need careful thought as to what is acceptable.

@mdedetrich
Copy link
Contributor

mdedetrich commented Oct 17, 2025

@mdedetrich / @pjfanning - Do I need to worry about source/binary compat? Not sure I can add the configuration handling without breaking compatibility on some level

Additionally I assume I should be handling the headers the same way for all header S3Request types and not just CopyPart?

So I am not sure how this change would be binary or even source breaking, it should be just internal behaviour without needing to change and function/method signatures. We already have a mima check that will verify if there is a binary breaking change.

Regarding adding the configuration setting, this is fine as long as its released in a version with a bumped minor version.

@InternalApi private[s3] case object HeadObject extends S3Request
@InternalApi private[s3] case object GetObject extends S3Request {

override def allowedHeaders: Set[String] = Set(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add links to where in the docs I sourced this from

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also double check these when I move this out of draft

bucket: String,
method: HttpMethod,
httpRequest: (HttpMethod, S3Settings) => HttpRequest,
headers: Seq[HttpHeader],
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly it seems this was unused

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I noticed this as well, if you want to fix this I would prefer to put it in a separate PR so we don't mix this together (its easier to bisect/revert changes if they are in their own PR's/commits)

@Luka-J9
Copy link
Author

Luka-J9 commented Oct 17, 2025

I've updated the PR with something more concrete in terms of implementation. I still need to add tests and docs, but wanted to get this out there for some early feedback about the strategy. If the approach is good I can add tests and move this out of draft for a more detailed and formal review

@Luka-J9
Copy link
Author

Luka-J9 commented Oct 20, 2025

@mdedetrich / @pjfanning - turns out the only errors when running mimaReportBinaryIssues were with one of the case classes creation methods, which only required a couple of compatibility functions to be added 🥳

If the general structure is okay - I'll write tests in the next day or two, just would like a rough 👍 before investing more time into this approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MultipartCopy Fails with the presence of Storage Class Header

4 participants