fix(s3): preserve question marks in copy source keys#1713
Conversation
| given() | ||
| .contentType("text/plain") | ||
| .body("copy test") | ||
| .when() | ||
| .put("/" + sourceBucket + "/" + srcKey) | ||
| .then() | ||
| .statusCode(200); |
There was a problem hiding this comment.
PUT stores source object at wrong key due to raw
? in URL path
REST Assured (backed by Apache HttpClient) treats ? as the query-string delimiter when building the request URI, not as a literal path character. The PUT call .put("/" + sourceBucket + "/" + srcKey) where srcKey = "folder/file with question ?.txt" will send a request with path /copy-question-source-bucket/folder/file with question and query string txt. The server stores the object at key folder/file with question (without ?.txt). Both subsequent copy operations then look up folder/file with question ?.txt, find no object, and would return NoSuchKey — the test would fail at the copy-source assertions, not validate the fix. Use the already-defined encodedSrcKey variable for the PUT path so the ? is sent percent-encoded as %3F and the key is stored correctly.
| private String extractVersionId(String query, String originalCopySource) { | ||
| if (!query.contains("versionId=")) { | ||
| return null; | ||
| } | ||
|
|
||
| String versionId = null; | ||
| for (String pair : query.split("&")) { | ||
| int eq = pair.indexOf('='); | ||
| if (eq <= 0) { | ||
| continue; | ||
| } | ||
| String name = pair.substring(0, eq); | ||
| String value = pair.substring(eq + 1); | ||
| String name = decodeCopySourceComponent(pair.substring(0, eq), originalCopySource); | ||
| String value = decodeCopySourceComponent(pair.substring(eq + 1), originalCopySource); | ||
| if ("versionId".equals(name)) { | ||
| versionId = value; | ||
| break; | ||
| } | ||
| } | ||
| return new ParsedCopySource(objectKey, versionId); | ||
| return versionId; |
There was a problem hiding this comment.
versionId silently dropped when key contains a raw
? alongside ?versionId=
When a copy-source header contains both a raw (unencoded) ? in the key and a trailing ?versionId=X — e.g. /bucket/dir/file?.txt?versionId=abc — parseCopySourceObject finds the first ? at the ? inside the key. The query string becomes txt?versionId=abc, which does contain versionId=. extractVersionId then splits on & and gets the single token txt?versionId=abc; its name (everything before the first =) is txt?versionId ≠ "versionId", so it returns null. The fallback then decodes the entire pathAfterBucket (including ?versionId=abc) as the key, silently discarding the version selector. The recommended AWS approach is to percent-encode ? as %3F in the key portion, which the new code handles correctly — worth documenting in the Javadoc on parseCopySourceObject.
| @Test | ||
| @Order(21) | ||
| void copyObjectWithQuestionMarkInSourceKeySucceeds() { | ||
| String sourceBucket = "copy-question-source-bucket"; | ||
| String destBucket = "copy-question-dest-bucket"; | ||
| String srcKey = "folder/file with question ?.txt"; | ||
| String encodedSrcKey = "folder/file%20with%20question%20%3F.txt"; | ||
|
|
||
| given().put("/" + sourceBucket).then().statusCode(200); | ||
| given().put("/" + destBucket).then().statusCode(200); | ||
|
|
||
| given() | ||
| .contentType("text/plain") | ||
| .body("copy test") | ||
| .when() | ||
| .put("/" + sourceBucket + "/" + srcKey) | ||
| .then() | ||
| .statusCode(200); | ||
|
|
||
| given() | ||
| .header("x-amz-copy-source", "/" + sourceBucket + "/" + encodedSrcKey) | ||
| .when() | ||
| .put("/" + destBucket + "/copied/encoded.txt") | ||
| .then() | ||
| .statusCode(200) | ||
| .body(containsString("CopyObjectResult")); | ||
|
|
||
| given() | ||
| .header("x-amz-copy-source", "/" + sourceBucket + "/" + srcKey) | ||
| .when() | ||
| .put("/" + destBucket + "/copied/raw.txt") | ||
| .then() | ||
| .statusCode(200) | ||
| .body(containsString("CopyObjectResult")); | ||
|
|
||
| given() | ||
| .when() | ||
| .get("/" + destBucket + "/copied/encoded.txt") | ||
| .then() | ||
| .statusCode(200) | ||
| .body(equalTo("copy test")); | ||
|
|
||
| given() | ||
| .when() | ||
| .get("/" + destBucket + "/copied/raw.txt") | ||
| .then() | ||
| .statusCode(200) | ||
| .body(equalTo("copy test")); | ||
|
|
||
| given().delete("/" + sourceBucket + "/" + srcKey); | ||
| given().delete("/" + destBucket + "/copied/encoded.txt"); | ||
| given().delete("/" + destBucket + "/copied/raw.txt"); | ||
| given().delete("/" + sourceBucket); | ||
| given().delete("/" + destBucket); | ||
| } |
There was a problem hiding this comment.
UploadPartCopy path has no regression coverage for ? in source key
The PR description states the same parsing fix was applied to both CopyObject and UploadPartCopy to keep them aligned. Only CopyObject has a regression test here. Per AGENTS.md, any behavior affecting AWS compatibility should have automated test coverage. Adding a corresponding UploadPartCopy scenario (initiate multipart upload → uploadPartCopy with a percent-encoded ? in the key → complete) would verify the handleUploadPartCopy path stays correct as well.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary
This PR fixes S3
CopyObjectandUploadPartCopywhenx-amz-copy-sourcereferences a source key that contains a literal question mark.The reported failure was that Floci returned
NoSuchKeyfor a source object that already existed and could be resolved withHeadObject, as long as the copied key contained?and the header used either the AWS-style encoded form (%3F) or the raw literal character.Root cause
The copy-source parsing logic decoded the entire header value before splitting bucket, key, and optional query parameters.
That meant a header like:
/bucket/folder/file%20with%20question%20%3F.txtwas decoded first into:
/bucket/folder/file with question ?.txtAfter that, the parser treated the first
?as the beginning of a query string and truncated the source key tofolder/file with question. The copy path then looked up the wrong key and returnedNoSuchKey.This also made raw
x-amz-copy-sourcevalues containing a literal?behave the same way.What changed
The parsing flow now:
x-amz-copy-sourceinto bucket and path before URL decoding?section as metadata when it actually containsversionId=versionIdquery is present?versionId=...The same parsing fix was applied to both
CopyObjectandUploadPartCopyso the two operations stay aligned.Tests
Added an integration regression test that verifies both of these succeed:
file%20with%20question%20%3F.txtfile with question ?.txtThe test also verifies the copied object content after each operation.
Validation
I could not execute the Maven test suite in this environment because
javais not installed andJAVA_HOMEis not configured, so local runtime validation is still pending.Recommended follow-up commands: