Skip to content

aws_replace_quote_entities() only handles ", fails round-trip for spec-conformant XML emitting " #627

@harshavardhana

Description

@harshavardhana

Summary

aws_replace_quote_entities() in source/s3_util.c:339 does a literal
byte match against the 6-byte string " and copies anything else
through verbatim. As a result, ETag values returned by S3-compatible
servers that emit the equivalent numeric character reference ("
decimal or " hex) end up with the literal entity text in the
response ETag header used for subsequent If-Match requests, which
then fail with HTTP 412.

Per XML 1.0 §4.6, the predefined " and the numeric reference
" are semantically identical and both forms are spec-conformant.
A receiver should accept either.

Where it bites

source/s3_auto_ranged_put.c reads the <ETag> from the
CompleteMultipartUpload response, runs it through
aws_replace_quote_entities(), and stores the result as the response's
ETag header. If the server emitted &#34;…&#34;, the header now
literally contains the seven characters &#34;, and any follow-up
conditional request (If-Match, CopyObject source If-Match, etc.)
fails to match.

Repro

Any server using Go's encoding/xml exhibits this — the stdlib
escapeText hardcodes "&#34; (numeric refs are used uniformly
to keep escaping context-independent across attributes vs element
content). Trivial repro:

xml.NewEncoder(os.Stdout).Encode(struct{ ETag string }{ETag: `"abc-1"`})
// <ETag>&#34;abc-1&#34;</ETag>

Pointing aws-c-s3 (or anything CRT-based — mountpoint-s3, AWS CLI v2
with CRT) at such a server, doing a multipart upload, and then issuing
a request with If-Match: <returned ETag> reproduces the 412.

Suggested fix

Two options:

  1. Minimal: extend the recognized set in
    aws_replace_quote_entities() to include &#34; and &#x22; (and
    ideally &apos;/&#39;/&#x27; for symmetry).
  2. Better: replace the ad-hoc string match with a proper XML
    entity decoder pass (the five predefined entities + numeric
    character references). The function only runs on already-extracted
    XML text, so a real decoder is appropriate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions