Skip to content

fix(lambda-edge): classify ZIP-based document subtypes as binary#4900

Open
rokasta12 wants to merge 1 commit into
honojs:mainfrom
rokasta12:fix/lambda-edge-binary-mime
Open

fix(lambda-edge): classify ZIP-based document subtypes as binary#4900
rokasta12 wants to merge 1 commit into
honojs:mainfrom
rokasta12:fix/lambda-edge-binary-mime

Conversation

@rokasta12

Copy link
Copy Markdown
Contributor

Problem

isContentTypeBinary in src/adapter/lambda-edge/handler.ts used the regex application/(.*json|.*xml).*, which matches any subtype that contains the substring json or xml anywhere — including ZIP archives whose subtypes happen to embed those letters:

Content-Type Actual format Currently classified
application/vnd.openxmlformats-officedocument.wordprocessingml.document (.docx) ZIP non-binary ❌
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (.xlsx) ZIP non-binary ❌
application/vnd.openxmlformats-officedocument.presentationml.presentation (.pptx) ZIP non-binary ❌
application/epub+zip (.epub) ZIP non-binary ❌
application/vnd.oasis.opendocument.text (.odt) ZIP non-binary ❌

When isContentTypeBinary returns false, the response body is returned to CloudFront un-base64-encoded, so any user serving these document types from a Lambda@Edge handler delivers a corrupted file.

Fix

Replace the regex with the boundary-aware pattern already used by the aws-lambda adapter (src/adapter/aws-lambda/handler.ts:675):

^text\/(?:plain|html|css|javascript|csv)|(?:\/|\+)(?:json|xml)\s*(?:;|$)

This accepts:

  • text/(plain|html|css|javascript|csv) at the start
  • /json, /xml, +json, +xml followed by ; or end-of-string

So application/atom+xml, application/ld+json, and image/svg+xml continue to be treated as text, while OOXML/EPUB/ODT subtypes are correctly classified as binary.

Test

Added a regression test covering the OOXML triple, application/epub+zip, and application/vnd.oasis.opendocument.text alongside the existing assertions. The new test fails on main and passes with the regex fix.

  • Add tests
  • Run tests (bun run test — passes)
  • bun run format:fix && bun run lint:fix
  • Add TSDoc/JSDoc — no API surface change

The previous regex `application/(.*json|.*xml).*` matched any subtype
containing the substrings "json" or "xml" anywhere, including
`application/vnd.openxmlformats-officedocument.*` (.docx/.xlsx/.pptx),
`application/epub+zip`, and `application/vnd.oasis.opendocument.*`.

These are ZIP archives. Treating them as text caused the body to be
returned un-base64-encoded, so CloudFront delivered corrupted files
to clients.

Replaces the regex with the boundary-aware pattern already used by the
aws-lambda adapter: it accepts `text/(plain|html|css|javascript|csv)`
at the start, or `/json`, `/xml`, `+json`, `+xml` followed by `;` or
end-of-string. This keeps `application/atom+xml`, `application/ld+json`,
and `image/svg+xml` as text while correctly rejecting OOXML/EPUB/ODT.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant