Skip to content

[feature] response:stream-binary-resource — zero-copy binary download#6466

Open
joewiz wants to merge 2 commits into
eXist-db:developfrom
joewiz:feature/response-stream-binary-resource
Open

[feature] response:stream-binary-resource — zero-copy binary download#6466
joewiz wants to merge 2 commits into
eXist-db:developfrom
joewiz:feature/response-stream-binary-resource

Conversation

@joewiz

@joewiz joewiz commented Jun 11, 2026

Copy link
Copy Markdown
Member

[This PR was co-authored with Claude Code. -Joe]

Summary

Adds response:stream-binary-resource($binary-resource-path, $content-type, $filename?), which streams a stored binary resource straight from the database to the servlet response output stream — without ever materializing it in the JVM heap.

response:stream-binary($binary, …) takes a fully-loaded xs:base64Binary, so the caller must first read the whole resource into memory (e.g. util:binary-doc()). For a large download that means the entire resource is materialized on the heap before a single byte goes out. The new function opens the stored BinaryDocument and copies it directly to the response via broker.readBinaryResource(Txn, BinaryDocument, OutputStream) — the same zero-copy path RESTServer already uses for binary downloads (RESTServer.java:1782–1783).

Signature

response:stream-binary-resource(
    $binary-resource-path as xs:string,
    $content-type         as xs:string,
    $filename            as xs:string?   (: optional — sets Content-Disposition :)
) as empty-sequence()

How it relates to the existing response: streamers

function takes mechanism memory
response:stream($node, $opts) a node serializes XML/HTML to the output stream n/a
response:stream-binary($binary, $type, $f?) an already-materialized xs:base64Binary binary.streamBinaryTo(os) caller must load the whole resource into heap first
response:stream-binary-resource($path, $type, $f?) (new) a db path opens the stored BinaryDocument, broker.readBinaryResource(txn, binDoc, os) never materializes — streams disk → socket

What changed

  • response/StreamBinaryResource.java (new) — the function. Resolves the path (read lock + permission check, the same way util:binary-doc does), verifies it's a binary document, sets Content-Type (and Content-Disposition from $filename), and streams zero-copy within a transaction.
  • response/ResponseModule.java — registers the two arities.
  • RestBinariesTest.javastreamBinaryResourceRaw stores a binary and asserts the bytes returned over a real HTTP request are byte-identical (mirrors the existing streamBinaryRaw test).

Test

RestBinariesTest4/4 green (incl. the new byte-identical case). exist-core builds clean; Codacy PMD clean.

Scope: this is db paths only (file: is deliberately out of scope)

The function operates on database binary resources, not local filesystem files. Nothing inherent prevents a file: variant — streaming a local file is actually simpler (no broker, lock, transaction, or BinaryDocument: just Files.newInputStream(path).transferTo(os)). It's left out on purpose, for two reasons:

  1. Security. Filesystem access from a request-handling function is a file-disclosure / path-traversal surface — an ungated response:stream-binary-resource('file:///etc/passwd', …) would stream arbitrary server files to a client. eXist treats file: access from XQuery as a DBA-only boundary, so a file: branch would need that guard and warrants its own scrutiny.
  2. Focus + consistency. Keeping this PR to the db case keeps the security-sensitive surface separate, and lets a future file: variant follow the convention PR [feature] Support file: URIs in fn:collection() for filesystem directory querying #6192 is establishing for fn:collection rather than inventing a parallel one.

If file: support were wanted, the shape would mirror #6192's fn:collection: branch on the URI scheme (db vs file:), restrict the file: branch to DBA users (a security boundary, consistent with fn:doc() / #6192), and stream via Files.newInputStream(...).transferTo(response.getOutputStream()) — zero-copy, no materialization. That fills a real gap (the EXPath file:read-binary materializes the whole file), but it belongs in its own PR.

Context

This is the download half of the binary-streaming work surfaced by the existdb-openapi binary-transport track (eXist-db/existdb-openapi#35 / #38). Functional binary transport already works today via response:stream-binary; this is the scale primitive for large downloads. The upload counterpart (a non-materializing request:get-input-stream() so a handler can pipe a large upload to broker.storeDocument) is deliberately a separate follow-up — it depends on how the request body is consumed (single-read; the Roaster raw-body path) and needs its own investigation.

response:stream-binary($binary, ...) requires the caller to first
materialize the whole resource into the JVM heap (e.g. via
util:binary-doc()), so a large download holds the entire resource in
memory before a byte goes out.

Add response:stream-binary-resource($binary-resource-path, $content-type,
$filename?), which opens the stored BinaryDocument and copies it straight
to the servlet response output stream via broker.readBinaryResource(Txn,
...) -- the same zero-copy path RESTServer uses for binary downloads --
without ever materializing the bytes. Intended for large downloads.

A client (e.g. existdb-openapi's GET /api/db/resource/{path}) can then
replace util:binary-doc(...) => response:stream-binary(...) with a single
response:stream-binary-resource($path, ...) call.

Adds RestBinariesTest#streamBinaryResourceRaw asserting the wire result is
byte-identical over a real HTTP request. RestBinariesTest 4/4 green;
Codacy PMD clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@line-o

line-o commented Jun 11, 2026

Copy link
Copy Markdown
Member

That is BIG!

try {
uri = XmldbURI.xmldbUriFor(path);
} catch (final URISyntaxException e) {
throw new XPathException(this, "Invalid binary resource path: " + path, e);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have a fitting Error code here, too

Comment thread exist-core/src/test/java/org/exist/xquery/RestBinariesTest.java
…tent-Disposition test

Address @line-o's review on eXist-db#6466: every failure path now carries a fitting
error code instead of a bare XPathException, and the 3-arg (filename) form has
explicit Content-Disposition coverage.

- Resource is not a binary document -> XPTY0004 (a type error, per the review).
- Invalid resource path / permission denied -> FODC0002 ("Error retrieving
  resource"), the same code already used for the not-found case, so all three
  "cannot get the resource" conditions share one code.
- Transaction / IO error while streaming -> new EXXQDY0007 ("I/O error while
  streaming a binary resource to the response"), in the eXist error namespace.
- RestBinariesTest: add streamBinaryResourceWithFilename, asserting the 3-arg
  form sends Content-Disposition: inline; filename="..." with a byte-identical
  body.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz

joewiz commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

Thanks @line-o — all addressed in df06128.

Every failure path now carries a fitting error code instead of a bare XPathException:

  • "Resource is not a binary document" → XPTY0004. You read it right as a type error.
  • Invalid resource path (URISyntaxException) → FODC0002 ("Error retrieving resource") — the same code already on the not-found case just below it.
  • Permission denied → FODC0002 too. It was previously XPDY0002, which is wrong (that code means "dynamic context not assigned"); folding it under FODC0002 keeps all three "cannot get the resource" conditions on one code.
  • Transaction / IO error while streaming → new EXXQDY0007 ("I/O error while streaming a binary resource to the response") — added to the eXist error namespace, which is the EXERR-family fit you pointed at. The transaction and IO catches share it.

I left the servlet-context guard on XPDY0002, since "used outside a servlet response" genuinely is a missing-dynamic-context case — happy to change it if you'd prefer otherwise.

For the test: added streamBinaryResourceWithFilename to RestBinariesTest, which calls the 3-arg response:stream-binary-resource#3 with a filename and asserts the response carries Content-Disposition: inline; filename="download.bin" (and that the body is still byte-identical). Full RestBinariesTest green (5/5).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants