Skip to content

fix(s3): preserve question marks in copy source keys#1713

Open
Tomoya0k wants to merge 1 commit into
floci-io:mainfrom
Tomoya0k:fix/s3-copy-source-question-mark
Open

fix(s3): preserve question marks in copy source keys#1713
Tomoya0k wants to merge 1 commit into
floci-io:mainfrom
Tomoya0k:fix/s3-copy-source-question-mark

Conversation

@Tomoya0k

@Tomoya0k Tomoya0k commented Jul 3, 2026

Copy link
Copy Markdown

Summary

This PR fixes S3 CopyObject and UploadPartCopy when x-amz-copy-source references a source key that contains a literal question mark.

The reported failure was that Floci returned NoSuchKey for a source object that already existed and could be resolved with HeadObject, as long as the copied key contained ? and the header used either the AWS-style encoded form (%3F) or the raw literal character.

Root cause

The copy-source parsing logic decoded the entire header value before splitting bucket, key, and optional query parameters.

That meant a header like:

/bucket/folder/file%20with%20question%20%3F.txt

was decoded first into:

/bucket/folder/file with question ?.txt

After that, the parser treated the first ? as the beginning of a query string and truncated the source key to folder/file with question . The copy path then looked up the wrong key and returned NoSuchKey.

This also made raw x-amz-copy-source values containing a literal ? behave the same way.

What changed

The parsing flow now:

  • splits the raw x-amz-copy-source into bucket and path before URL decoding
  • decodes bucket, object-key, and query components separately
  • only interprets a trailing ? section as metadata when it actually contains versionId=
  • preserves literal question marks in the object key when no versionId query is present
  • keeps existing support for copy-source version selection via ?versionId=...

The same parsing fix was applied to both CopyObject and UploadPartCopy so the two operations stay aligned.

Tests

Added an integration regression test that verifies both of these succeed:

  • encoded copy source: file%20with%20question%20%3F.txt
  • raw copy source: file with question ?.txt

The test also verifies the copied object content after each operation.

Validation

I could not execute the Maven test suite in this environment because java is not installed and JAVA_HOME is not configured, so local runtime validation is still pending.

Recommended follow-up commands:

.\mvnw.cmd -Dtest=S3IntegrationTest#copyObjectWithQuestionMarkInSourceKeySucceeds test
.\mvnw.cmd -Dtest=S3CopyObjectVersionedIntegrationTest test

@greptile-apps

greptile-apps Bot commented Jul 3, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes CopyObject and UploadPartCopy when the x-amz-copy-source header references a key that contains a question mark — either percent-encoded as %3F or as a raw literal. The root cause was that the copy-source header was fully URL-decoded before splitting on ?, so a decoded ? inside the key was mistakenly treated as the query-string delimiter and truncated the key.

  • S3Controller: Refactors parseCopySourceObject, handleCopyObject, and handleUploadPartCopy to split the raw header into bucket, key, and query components before decoding each piece separately via a new decodeCopySourceComponent helper; a new extractVersionId method preserves ?versionId=… support.
  • S3IntegrationTest: Adds copyObjectWithQuestionMarkInSourceKeySucceeds to regression-test both the percent-encoded (%3F) and raw-literal (?) copy-source forms; renumbers two subsequent @Order annotations accordingly.

Confidence Score: 3/5

The controller logic is sound for the primary cases, but the new regression test has a construction defect that would cause it to fail at runtime, meaning the fix goes into main without verified automated coverage.

The production parsing change in S3Controller is well-reasoned and handles the main scenario correctly. However, the test PUT request builds its URL with a raw ? character in the path, which HTTP clients treat as the query-string delimiter, so the source object is stored at a truncated key and the copy assertions would return NoSuchKey rather than prove the fix works. The author confirmed the suite could not be run locally, so this defect went undetected. Additionally, UploadPartCopy received the identical code fix but has no regression test at all.

S3IntegrationTest.java — the PUT that seeds the source object needs its URL corrected to use the percent-encoded key so the object is stored at the right path before the copy assertions run.

Important Files Changed

Filename Overview
src/main/java/io/github/hectorvent/floci/services/s3/S3Controller.java Fixes copy-source parsing by splitting bucket/key on the raw header before URL-decoding each component separately; correctly handles percent-encoded ? and raw literal ? without versionId, but silently drops versionId when the key contains a raw ? alongside a ?versionId= suffix.
src/test/java/io/github/hectorvent/floci/services/s3/S3IntegrationTest.java Adds regression test for ? in copy-source key, but the PUT that stores the source object uses a raw ? in the URL path which REST Assured/HttpClient treats as a query-string delimiter — the object is stored at the wrong key, so the copy assertions would hit NoSuchKey and the test would fail rather than validate the fix. No UploadPartCopy coverage added despite the parallel code change.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant S3Controller
    participant parseCopySourceObject
    participant extractVersionId
    participant decodeCopySourceComponent
    participant S3Service

    Client->>S3Controller: PUT /dest-bucket/dest-key
    Note right of Client: x-amz-copy-source: /src-bucket/dir%2Ffile%3F.txt
    S3Controller->>S3Controller: strip leading slash, find first slash
    S3Controller->>decodeCopySourceComponent: decode bucket component
    decodeCopySourceComponent-->>S3Controller: src-bucket
    S3Controller->>parseCopySourceObject: "raw pathAfterBucket = dir%2Ffile%3F.txt"
    parseCopySourceObject->>parseCopySourceObject: indexOf('?') returns -1
    parseCopySourceObject->>decodeCopySourceComponent: decode full path
    decodeCopySourceComponent-->>parseCopySourceObject: dir/file?.txt
    parseCopySourceObject-->>S3Controller: "ParsedCopySource(key=dir/file?.txt, versionId=null)"
    S3Controller->>S3Service: copyObject(src-bucket, dir/file?.txt, ...)
    S3Service-->>S3Controller: copied object
    S3Controller-->>Client: 200 CopyObjectResult

    Note over Client,S3Service: Versioned copy: /src-bucket/dir%2Ffile%3F.txt?versionId=abc
    S3Controller->>parseCopySourceObject: "raw pathAfterBucket = dir%2Ffile%3F.txt?versionId=abc"
    parseCopySourceObject->>parseCopySourceObject: indexOf('?') finds literal ? before versionId
    parseCopySourceObject->>extractVersionId: "query = versionId=abc"
    extractVersionId-->>parseCopySourceObject: "versionId = abc"
    parseCopySourceObject->>decodeCopySourceComponent: decode dir%2Ffile%3F.txt
    decodeCopySourceComponent-->>parseCopySourceObject: dir/file?.txt
    parseCopySourceObject-->>S3Controller: "ParsedCopySource(key=dir/file?.txt, versionId=abc)"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant S3Controller
    participant parseCopySourceObject
    participant extractVersionId
    participant decodeCopySourceComponent
    participant S3Service

    Client->>S3Controller: PUT /dest-bucket/dest-key
    Note right of Client: x-amz-copy-source: /src-bucket/dir%2Ffile%3F.txt
    S3Controller->>S3Controller: strip leading slash, find first slash
    S3Controller->>decodeCopySourceComponent: decode bucket component
    decodeCopySourceComponent-->>S3Controller: src-bucket
    S3Controller->>parseCopySourceObject: "raw pathAfterBucket = dir%2Ffile%3F.txt"
    parseCopySourceObject->>parseCopySourceObject: indexOf('?') returns -1
    parseCopySourceObject->>decodeCopySourceComponent: decode full path
    decodeCopySourceComponent-->>parseCopySourceObject: dir/file?.txt
    parseCopySourceObject-->>S3Controller: "ParsedCopySource(key=dir/file?.txt, versionId=null)"
    S3Controller->>S3Service: copyObject(src-bucket, dir/file?.txt, ...)
    S3Service-->>S3Controller: copied object
    S3Controller-->>Client: 200 CopyObjectResult

    Note over Client,S3Service: Versioned copy: /src-bucket/dir%2Ffile%3F.txt?versionId=abc
    S3Controller->>parseCopySourceObject: "raw pathAfterBucket = dir%2Ffile%3F.txt?versionId=abc"
    parseCopySourceObject->>parseCopySourceObject: indexOf('?') finds literal ? before versionId
    parseCopySourceObject->>extractVersionId: "query = versionId=abc"
    extractVersionId-->>parseCopySourceObject: "versionId = abc"
    parseCopySourceObject->>decodeCopySourceComponent: decode dir%2Ffile%3F.txt
    decodeCopySourceComponent-->>parseCopySourceObject: dir/file?.txt
    parseCopySourceObject-->>S3Controller: "ParsedCopySource(key=dir/file?.txt, versionId=abc)"
Loading

Reviews (1): Last reviewed commit: "fix(s3): preserve question marks in copy..." | Re-trigger Greptile

Comment on lines +395 to +401
given()
.contentType("text/plain")
.body("copy test")
.when()
.put("/" + sourceBucket + "/" + srcKey)
.then()
.statusCode(200);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 PUT stores source object at wrong key due to raw ? in URL path

REST Assured (backed by Apache HttpClient) treats ? as the query-string delimiter when building the request URI, not as a literal path character. The PUT call .put("/" + sourceBucket + "/" + srcKey) where srcKey = "folder/file with question ?.txt" will send a request with path /copy-question-source-bucket/folder/file with question and query string txt. The server stores the object at key folder/file with question (without ?.txt). Both subsequent copy operations then look up folder/file with question ?.txt, find no object, and would return NoSuchKey — the test would fail at the copy-source assertions, not validate the fix. Use the already-defined encodedSrcKey variable for the PUT path so the ? is sent percent-encoded as %3F and the key is stored correctly.

Comment on lines +2671 to +2689
private String extractVersionId(String query, String originalCopySource) {
if (!query.contains("versionId=")) {
return null;
}

String versionId = null;
for (String pair : query.split("&")) {
int eq = pair.indexOf('=');
if (eq <= 0) {
continue;
}
String name = pair.substring(0, eq);
String value = pair.substring(eq + 1);
String name = decodeCopySourceComponent(pair.substring(0, eq), originalCopySource);
String value = decodeCopySourceComponent(pair.substring(eq + 1), originalCopySource);
if ("versionId".equals(name)) {
versionId = value;
break;
}
}
return new ParsedCopySource(objectKey, versionId);
return versionId;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 versionId silently dropped when key contains a raw ? alongside ?versionId=

When a copy-source header contains both a raw (unencoded) ? in the key and a trailing ?versionId=X — e.g. /bucket/dir/file?.txt?versionId=abcparseCopySourceObject finds the first ? at the ? inside the key. The query string becomes txt?versionId=abc, which does contain versionId=. extractVersionId then splits on & and gets the single token txt?versionId=abc; its name (everything before the first =) is txt?versionId"versionId", so it returns null. The fallback then decodes the entire pathAfterBucket (including ?versionId=abc) as the key, silently discarding the version selector. The recommended AWS approach is to percent-encode ? as %3F in the key portion, which the new code handles correctly — worth documenting in the Javadoc on parseCopySourceObject.

Comment on lines 384 to +438
@Test
@Order(21)
void copyObjectWithQuestionMarkInSourceKeySucceeds() {
String sourceBucket = "copy-question-source-bucket";
String destBucket = "copy-question-dest-bucket";
String srcKey = "folder/file with question ?.txt";
String encodedSrcKey = "folder/file%20with%20question%20%3F.txt";

given().put("/" + sourceBucket).then().statusCode(200);
given().put("/" + destBucket).then().statusCode(200);

given()
.contentType("text/plain")
.body("copy test")
.when()
.put("/" + sourceBucket + "/" + srcKey)
.then()
.statusCode(200);

given()
.header("x-amz-copy-source", "/" + sourceBucket + "/" + encodedSrcKey)
.when()
.put("/" + destBucket + "/copied/encoded.txt")
.then()
.statusCode(200)
.body(containsString("CopyObjectResult"));

given()
.header("x-amz-copy-source", "/" + sourceBucket + "/" + srcKey)
.when()
.put("/" + destBucket + "/copied/raw.txt")
.then()
.statusCode(200)
.body(containsString("CopyObjectResult"));

given()
.when()
.get("/" + destBucket + "/copied/encoded.txt")
.then()
.statusCode(200)
.body(equalTo("copy test"));

given()
.when()
.get("/" + destBucket + "/copied/raw.txt")
.then()
.statusCode(200)
.body(equalTo("copy test"));

given().delete("/" + sourceBucket + "/" + srcKey);
given().delete("/" + destBucket + "/copied/encoded.txt");
given().delete("/" + destBucket + "/copied/raw.txt");
given().delete("/" + sourceBucket);
given().delete("/" + destBucket);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 UploadPartCopy path has no regression coverage for ? in source key

The PR description states the same parsing fix was applied to both CopyObject and UploadPartCopy to keep them aligned. Only CopyObject has a regression test here. Per AGENTS.md, any behavior affecting AWS compatibility should have automated test coverage. Adding a corresponding UploadPartCopy scenario (initiate multipart upload → uploadPartCopy with a percent-encoded ? in the key → complete) would verify the handleUploadPartCopy path stays correct as well.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant