Skip to content

[optimize] xmldb:store — stream a binary value instead of materializing to a heap byte[]#6467

Open
joewiz wants to merge 1 commit into
eXist-db:developfrom
joewiz:feature/store-binary-no-materialize
Open

[optimize] xmldb:store — stream a binary value instead of materializing to a heap byte[]#6467
joewiz wants to merge 1 commit into
eXist-db:developfrom
joewiz:feature/store-binary-no-materialize

Conversation

@joewiz

@joewiz joewiz commented Jun 11, 2026

Copy link
Copy Markdown
Member

[This PR was co-authored with Claude Code. -Joe]

Summary

xmldb:store / xmldb:store-as-binary stored a base64Binary item by calling BinaryValue.toJavaObject() — which reads the entire value into a heap byte[]. For a large binary (e.g. xmldb:store-as-binary($c, $n, request:get-data()) piping a multi-GB upload) that materializes the whole resource in memory before storing, risking OutOfMemoryError.

This passes the BinaryValue through to the resource instead. LocalBinaryResource.setContent already accepts a BinaryValue and keeps it, and LocalCollection.storeBinaryResource streams it through the binary cache rather than holding it on the heap.

The one-line change

XMLDBStore.java:

// before
resource.setContent(((BinaryValue) item).toJavaObject());   // whole value -> heap byte[]
// after
resource.setContent((BinaryValue) item);                    // streamed via the binary cache

Why this is correct (and what it relies on)

  • LocalBinaryResource.setContent(Object) has a case BinaryValue that stores the value as-is (no materialization); getStreamContent() then yields binaryValue.getInputStream().
  • LocalCollection.storeBinaryResource pipes that stream to broker.storeDocument(…, InputSource, …).
  • The intermediate is the binary cache, which is disk-backed by default (conf.xml: <binary-manager><cache class="org.exist.util.io.FileFilterInputStreamCache"/></binary-manager>). So the bytes live on disk, not the heap.
  • xmldb:store runs server-side and always uses a LocalCollection/LocalBinaryResource, so the remote XML:DB resource path (which expects a byte[]) is not on this function's path.

Honest caveats

  • It's an optimization, not a correctness fix. Store/readback is byte-identical either way, so there is no failing-on-develop test; the heap benefit is by construction (the byte[] is gone). Covered by the existing binary tests (XqueryApiTest exercises xmldb:store-as-binary(xs:base64Binary(…)), plus RestBinariesTest/XmldbBinariesTest) — all green.
  • Double traversal. LocalCollection.storeBinaryResource calls getStreamLength() (a counting pass over the value) before it streams, so the value is read twice — but both reads go through the disk-backed cache, not the heap. For a large binary this trades one heap byte[] for two disk passes: slower, but it no longer OOMs.
  • Cache backing is configurable. The heap win assumes the default FileFilterInputStreamCache; an instance configured with a memory cache would still hold the value in memory.

Context / scope

Part of the binary-streaming track (existdb-openapi#35 / #38). The download half shipped as response:stream-binary-resource (#6466, zero-copy via broker.readBinaryResource). This is the upload half's pragmatic improvement: remove the unconditional heap byte[] from the common xmldb:store idiom. The true zero-copy upload — a request:get-input-stream() so a handler pipes the raw request stream straight to broker.storeDocument(InputSource) with no cache at all — is a deliberate separate follow-up; it depends on the single-read request body (the Roaster raw-body consumption order) and warrants its own PR.

Test

XqueryApiTest, RestBinariesTest, XmldbBinariesTest — all green. Build clean; Codacy PMD clean on the changed line (the one PMD note, NPathComplexity on evalWithCollection, is pre-existing and untouched by this one-line change).

…g to a heap byte[]

xmldb:store / xmldb:store-as-binary stored a base64Binary item by calling
BinaryValue.toJavaObject(), which reads the entire value into a heap byte[].
For a large binary -- e.g. xmldb:store-as-binary($c, $n, request:get-data())
piping a multi-GB upload -- that materializes the whole resource in memory
before storing, risking OutOfMemoryError.

Pass the BinaryValue through to the resource instead. LocalBinaryResource
keeps the BinaryValue, and LocalCollection.storeBinaryResource streams it
through the binary cache (disk-backed by default: FileFilterInputStreamCache
in conf.xml) rather than holding it on the heap.

Correctness is unchanged (byte-identical store/readback); covered by existing
binary tests (XqueryApiTest, RestBinariesTest, XmldbBinariesTest). Note the
store path calls getStreamLength() (a counting pass) before streaming, so the
value is traversed twice -- both through the disk-backed cache, not the heap.
True zero-copy (raw request stream -> broker.storeDocument, no cache) is the
separate request:get-input-stream() follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz joewiz requested a review from a team as a code owner June 11, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant