[optimize] xmldb:store — stream a binary value instead of materializing to a heap byte[]#6467
Open
joewiz wants to merge 1 commit into
Open
[optimize] xmldb:store — stream a binary value instead of materializing to a heap byte[]#6467joewiz wants to merge 1 commit into
joewiz wants to merge 1 commit into
Conversation
…g to a heap byte[] xmldb:store / xmldb:store-as-binary stored a base64Binary item by calling BinaryValue.toJavaObject(), which reads the entire value into a heap byte[]. For a large binary -- e.g. xmldb:store-as-binary($c, $n, request:get-data()) piping a multi-GB upload -- that materializes the whole resource in memory before storing, risking OutOfMemoryError. Pass the BinaryValue through to the resource instead. LocalBinaryResource keeps the BinaryValue, and LocalCollection.storeBinaryResource streams it through the binary cache (disk-backed by default: FileFilterInputStreamCache in conf.xml) rather than holding it on the heap. Correctness is unchanged (byte-identical store/readback); covered by existing binary tests (XqueryApiTest, RestBinariesTest, XmldbBinariesTest). Note the store path calls getStreamLength() (a counting pass) before streaming, so the value is traversed twice -- both through the disk-backed cache, not the heap. True zero-copy (raw request stream -> broker.storeDocument, no cache) is the separate request:get-input-stream() follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[This PR was co-authored with Claude Code. -Joe]
Summary
xmldb:store/xmldb:store-as-binarystored abase64Binaryitem by callingBinaryValue.toJavaObject()— which reads the entire value into a heapbyte[]. For a large binary (e.g.xmldb:store-as-binary($c, $n, request:get-data())piping a multi-GB upload) that materializes the whole resource in memory before storing, riskingOutOfMemoryError.This passes the
BinaryValuethrough to the resource instead.LocalBinaryResource.setContentalready accepts aBinaryValueand keeps it, andLocalCollection.storeBinaryResourcestreams it through the binary cache rather than holding it on the heap.The one-line change
XMLDBStore.java:Why this is correct (and what it relies on)
LocalBinaryResource.setContent(Object)has acase BinaryValuethat stores the value as-is (no materialization);getStreamContent()then yieldsbinaryValue.getInputStream().LocalCollection.storeBinaryResourcepipes that stream tobroker.storeDocument(…, InputSource, …).conf.xml:<binary-manager><cache class="org.exist.util.io.FileFilterInputStreamCache"/></binary-manager>). So the bytes live on disk, not the heap.xmldb:storeruns server-side and always uses aLocalCollection/LocalBinaryResource, so the remote XML:DB resource path (which expects abyte[]) is not on this function's path.Honest caveats
developtest; the heap benefit is by construction (thebyte[]is gone). Covered by the existing binary tests (XqueryApiTestexercisesxmldb:store-as-binary(xs:base64Binary(…)), plusRestBinariesTest/XmldbBinariesTest) — all green.LocalCollection.storeBinaryResourcecallsgetStreamLength()(a counting pass over the value) before it streams, so the value is read twice — but both reads go through the disk-backed cache, not the heap. For a large binary this trades one heapbyte[]for two disk passes: slower, but it no longer OOMs.FileFilterInputStreamCache; an instance configured with a memory cache would still hold the value in memory.Context / scope
Part of the binary-streaming track (existdb-openapi#35 / #38). The download half shipped as
response:stream-binary-resource(#6466, zero-copy viabroker.readBinaryResource). This is the upload half's pragmatic improvement: remove the unconditional heapbyte[]from the commonxmldb:storeidiom. The true zero-copy upload — arequest:get-input-stream()so a handler pipes the raw request stream straight tobroker.storeDocument(InputSource)with no cache at all — is a deliberate separate follow-up; it depends on the single-read request body (the Roaster raw-body consumption order) and warrants its own PR.Test
XqueryApiTest,RestBinariesTest,XmldbBinariesTest— all green. Build clean; Codacy PMD clean on the changed line (the one PMD note,NPathComplexityonevalWithCollection, is pre-existing and untouched by this one-line change).