Skip to content

feat(db): path-in-URL raw binary resource transport (#35/#38)#56

Open
joewiz wants to merge 10 commits into
eXist-db:developfrom
joewiz:feat/db-binary-transport
Open

feat(db): path-in-URL raw binary resource transport (#35/#38)#56
joewiz wants to merge 10 commits into
eXist-db:developfrom
joewiz:feat/db-binary-transport

Conversation

@joewiz

@joewiz joewiz commented Jun 11, 2026

Copy link
Copy Markdown
Member

[This PR was co-authored with Claude Code. -Joe]

Summary

Adds GET/PUT /api/db/resource/{path} — a binary-safe, path-in-URL resource transport, the clean alternative to the JSON-envelope /api/db/resource (which is text/base64-only). This is the roaster-native way to move binary resources (no controller workaround, no /api/* side door), addressing #35 and superseding #38.

Stacks on #55 (db-core), which stacks on #54. Until those merge the diff here also shows their commits; review the final feat(db): path-in-URL raw binary resource transport commit. Merge order: #54#55 → this.

How it works

  • GET streams a binary resource's raw bytes with response:stream-binarynot via the serializer, which would emit the value's base64 text and corrupt it. This is the established roaster pattern (roaster's own test app uses it; it's the same raw-bytes path eXist's REST server takes). An XML/text resource is returned as a node so roaster serializes it once with its stored mime (a pre-serialized string would be XML-escaped a second time).
  • PUT stores the raw request body via db-core (roaster hands a non-json/xml body through as raw data; mime inferred from the name), returning { stored, runPath } (201 on create, 200 on overwrite).

No exist-core or roaster change is needed — binary transport works on stock eXist today via the existing response:stream-binary. (The investigation behind this is why there is no Roaster PR: binary was never broken in roaster — roaster:response(mime, bytes) just isn't the way to return binary.)

Follow-up: zero-copy for large files

response:stream-binary still materializes the whole base64Binary. eXist-core PR eXist-db/exist#6466 adds response:stream-binary-resource($path, $content-type, $filename?) (broker-level zero-copy, no heap materialization). When that lands in a release, db:get-resource-raw's util:binary-doc(…) => response:stream-binary(…) becomes a single response:stream-binary-resource($stored, …) call. Functional transport here does not wait on it.

Testing

Self-contained Cypress suite (db-binary.cy.js): raw round-trip is not base64-mangled, 201/200 + stored/runPath, XML served with its mime, 404. Full db suite stays green (59 + 4). Verified byte-identical over HTTP on stock eXist (a PNG round-trips intact).

Supersedes #38; closes the binary half of #35 on merge.

joewiz and others added 10 commits June 9, 2026 20:22
The /api/db endpoints spoke eXist's stored, percent-encoded form on the
wire: listings returned "caf%C3%A9.xml", so the existdb-oxygen-plugin and
other clients displayed encoded names and could not tell which form was
canonical. eXide and TEI Publisher already decode for display and encode
before storage; this brings existdb-openapi in line.

Every handler now encodes the incoming wire path once (db:to-stored)
before any doc()/collection()/xmldb:* call, and decodes every name and
path leaving the API (db:to-display).

fn:iri-to-uri is used inbound (not xmldb:encode) because it matches
eXist's own storage escaping: it percent-encodes spaces and non-ASCII but
leaves sub-delims (' & + @) and existing %XX untouched. Verified against a
live instance: xmldb:store("café.xml") -> caf%C3%A9.xml and
store("quote'name.xml") -> quote'name.xml (literal), while xmldb:store
throws outright on a raw space. iri-to-uri reproduces store's output where
store succeeds, additionally encodes the space store rejects, and is
idempotent on already-encoded paths -- so older encoded-path clients keep
working. xmldb:encode would full RFC-3986 encode the sub-delims and fail
to resolve names xmldb:store left literal.

Proven end-to-end with "späce & quøte'd.xml" (space + sub-delim +
non-ASCII): store -> list shows decoded -> get by decoded path returns
content -> delete.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The existing db.cy.js tests use ASCII names only, so they pass but never
exercise the encode-on-input / decode-on-output boundary in db.xqm. Add an
awkward-names block: store, read, list, and remove resources named
"café déjà.xml" (non-ASCII + space) and "o'brien.xml" (sub-delim apostrophe,
which xmldb:store leaves literal) — all addressed by their DECODED name. Assert
the read echoes the decoded path, content is intact, and the listing shows
decoded names (no %XX), confirming the round trip.

Verified on a live instance (existdb-openapi on an ft:fields bed): both names
store and round-trip with decoded path/content and decoded listing names.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
db:to-display decoded names with xmldb:decode-uri, which form-decodes "+"
to a space (the x-www-form-urlencoded convention; eXist-db/exist#1824).
But a "+" in a stored name is always a literal "+" -- spaces are stored as
%20 -- and db:to-stored (fn:iri-to-uri) leaves "+" untouched on the encode
side, so a name like "naïve+test.xml" stored correctly but read back as
"naïve test.xml".

Protect a literal "+" as %2B before xmldb:decode-uri so it decodes back to
"+", restoring symmetry with the encode side. This mirrors what
URIUtils.decodeForURI (the core fix in eXist-db/exist#6451) does, applied
at the API layer so it is correct independent of the core build, and
forward-compatible once #6451 lands. Spaces (%20) are unaffected.

Verified end-to-end against a live instance: "naïve+test.xml" stores as
na%C3%AFve+test.xml on disk, lists and reads back as naïve+test.xml. Adds
the "+" case to the Cypress awkward-name coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split modules/db.xqm into a roaster-independent core (db-core.xqm) and a
thin roaster wrapper, so the same db-resource CRUD + naming-correctness
implementation can be shared in-process by other apps (e.g. eXide) without
an HTTP hop or re-auth.

db-core.xqm holds all the logic: list / get-resource / store /
create-collection / remove-resource / remove-collection / move / copy /
properties / set-permissions / sync / modules. Functions take a wire path
plus an options map and return plain maps; failures are signalled as typed
errors in the http://exist-db.org/api/db-core/error namespace
(bad-request / not-found / forbidden / conflict / server-error) rather than
HTTP responses. db:to-stored / db:to-display (the iri-to-uri encode/decode
boundary, incl. the literal-"+" guard) now live here as db-core:to-stored /
db-core:to-display — the single home for naming correctness.

db.xqm is now purely HTTP framing: each handler unpacks the roaster
$request, calls db-core, and maps a typed error to a status via a small
error-response helper. Store's 201-vs-200 (new vs existing) is conveyed by
a `created` flag the wrapper strips before responding; create-collection
keeps its 201. No api.json or route changes — behavior-preserving.

Verified against a live eXist: all 46 db Cypress tests pass, including the
awkward-name (café / o'brien / naïve+test) encode/decode coverage from eXist-db#54,
and the 400/403/404/409 error mappings match the pre-refactor responses.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…th, set-mime

Adds the audit §2 items 2–6 to db-core (the single implementation), exposed
through the thin roaster wrapper and api.json, with Cypress coverage:

- item 2 — GET /api/db/resource?meta=full flattens the resource's metadata
  (owner, group, mode, acl, size, created, last-modified) alongside the
  content, sparing a second /properties round trip. Flat (not nested) to
  match both eXide's load response and our own /properties shape. Default
  (no meta) is unchanged.
- item 3 — every list item carries a `writable` boolean (sm:has-access "w"),
  a file-browser affordance evaluated authoritatively server-side rather
  than left for the client to derive from mode bits.
- item 5 — a flat (non-recursive) listing carries start/count pagination and
  a `total`; child collections then resources, sliced by start (1-based) and
  count (default: all). Tree listings are unaffected.
- item 6 — get-resource always returns runPath.
- item 4 — set MIME via POST /api/db/permissions (xmldb:set-mime-type). A
  mime incompatible with the resource's storage class (eXist only allows an
  XML-class mime on an XML resource, a binary-class mime on a binary one)
  now surfaces as a clean 400 instead of an undeclared-500 → SENR0001.

writable + pagination are always-on (per design decision): default output is
a superset of the prior shape, never a truncation. Serialization (PR eXist-db#48) and
binary streaming (eXist-db#38/eXist-db#35) remain out of scope.

Verified on a live eXist: 54 db Cypress tests pass (46 prior + 8 new),
query.cy.js stays green (routing intact).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ace import

Lets in-process consumers (eXide's db adapter first) import db-core by
namespace alone — no absolute-path `at` hint:

    import module namespace dbc="http://exist-db.org/api/db-core";

Uses the kuberam `<xquerySet>` mechanism (the same one semver.xq uses) to
emit an `<xquery>` registration into the generated expath-pkg.xml. Three
details made it actually resolve:

- path-preserving include (directory=${basedir}, include modules/db-core.xqm,
  outputDirectory=.) so the registration `<file>` is `modules/db-core.xqm`,
  matching the deploy path — a bare basename registration fails with
  err:XQST0059 (file not found at the package root).
- db-core's transitive dependency dbutils is registered public too, and
  db-core now imports it by namespace (dropping the relative `at "dbutils.xqm"`
  hint, which has no base URI when db-core is loaded via the public-module
  registry — that path errored with "Source for module dbutils not found").
- both db-core.xqm and dbutils.xqm are excluded from the bulk modules fileSet
  so the xquerySets place them (no double-copy).

Verified on a live eXist (full XAR build + install + restart): namespace
import resolves the whole chain (to-stored / to-display / list incl. dbutils),
the app's own endpoints still work, and all 54 db Cypress tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Browse listings now carry mime-type on resource items (collections have no
mime), so a file-browser consumer (e.g. eXide's tree) gets the content type
without a per-item follow-up call. Consistent with get-resource / properties,
which already report mime-type. Additive; covered by the db Cypress suite (54
passing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…urce

Absorbs existdb-openapi#48 (feat/db-resource-serialization) into the
db-core-centric implementation, per the "db-core is the single implementation;
close conflicting older PRs" direction. db-core's get-resource now serializes
XML resources with any W3C serialization parameters the caller supplies —
method / indent / omit-xml-declaration / encoding / media-type / item-separator
— via dbc:serialization-params + dbc:yes-no (ported from eXist-db#48). Omitted
parameters fall through to the conf.xml serializer defaults, so a request with
no params is byte-for-byte identical to before; binary resources are unaffected.

The roaster wrapper passes the request parameters straight through as db-core
options (db-core reads meta + the serialization keys, ignores the rest).
expand-xincludes is deliberately NOT handled (eXist 7.0.0-beta3 can't honor it
for node-to-string serialization; task eXist-db#37) — omitted rather than silently
expanding.

api.json documents the six params + the updated 200 description; eXist-db#48's Cypress
serialization suite is ported. 59 db tests pass on a live eXist.

With this, existdb-openapi#48 can close as superseded by the db-core work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eXist's sm:chown/chgrp/chmod and the mode validator throw a generic
ErrorCodes.ERROR for permission-denied, bad-owner/group, and malformed-mode
alike (PermissionsFunction.java + XPathException.java default), distinguishable
only by message text — too brittle to key on $err:code. So db-core:set-permissions
now classifies by OPERATION: a chown/chgrp/chmod failure is overwhelmingly an
authorization failure -> forbidden (403); a set-mime failure is an input error
(mime incompatible with the resource's storage class) -> bad-request (400).

Verified the side-effects still fire (mode read-back after set) and 59 db tests
pass. Noted upstream for the exist-strategy session: eXist should assign distinct
error codes so this could key on $err:code instead of per-operation heuristics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t-db#38)

Adds GET/PUT /api/db/resource/{path} — a binary-safe, roaster-native resource
transport, the clean alternative to the JSON-envelope /api/db/resource (which is
text/base64-only). Closes the binary side of eXist-db#35/eXist-db#38 without a controller
workaround (Juri's concern about an /api/* side door).

- GET streams a binary resource's raw bytes with response:stream-binary (NOT via
  the serializer, which would emit base64 text and corrupt it — the established
  roaster pattern, the same path eXist's REST server uses); an XML/text resource
  is returned as a node so roaster serializes it once with its stored mime
  (returning a pre-serialized string would be XML-escaped a second time).
- PUT stores the raw request body via db-core (binary bodies arrive intact;
  mime inferred from the name), returning { stored, runPath } (201/200).

No exist-core or roaster change needed — binary transport works on stock eXist
via the existing response:stream-binary. (Zero-copy streaming of very large
stored binaries is a separate exist-core optimization, tracked with
exist-strategy.) Self-contained Cypress coverage (raw round-trip not
base64-mangled, 201/200 + stored/runPath, XML serialized with mime, 404); full
db suite stays green (59 + 4).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz

joewiz commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

@line-o — this picks up your request from #38 ("I would rather like to see us come up with a way to serve binaries from roaster efficiently"). Your instinct that efficiently was the operative word turned out to be exactly right, and it's shaped a two-part plan — this PR is part 1.

The key finding (download side): binary serving was never actually broken in Roaster — response:stream-binary is the working pattern (it's what Roaster's own test/app uses to stream uploads back). The corruption I'd seen was a handler calling roaster:response(…, util:binary-doc(…)), which runs the value through the serializer and emits its base64 text. So no Roaster change was needed for functional serving. This PR (#56) does it the Roaster-native way: a path-in-URL GET/PUT /api/db/resource/{path} that streams via response:stream-binary — superseding #38's controller workaround (no /api/* side door).

The real gap is the one you pointed at — heap materialization for large files, and it splits cleanly:

  • Part 1 — download (this PR + core). Functional now via response:stream-binary. For efficiency, [feature] response:stream-binary-resource — zero-copy binary download exist#6466 adds response:stream-binary-resource($path, $content-type, $filename?) — zero-copy from the stored BinaryDocument (broker.readBinaryResource straight to the response OutputStream, no base64Binary materialized). When it lands, this PR's GET handler swaps its two-step util:binary-doc(…) => response:stream-binary(…) for that one call.

  • Part 2 — upload. Same shape of finding: the streaming pieces already exist; the heap blow-up is one unnecessary call. request:get-data() already hands back a streaming, disk-cacheable BinaryValue (a CachingFilterInputStream, not a heap byte[]), and LocalBinaryResource.setContent(BinaryValue) can store it without materializing — but xmldb:store-as-binary calls .toJavaObject() first (XMLDBStore.java:189), pulling the whole upload into a heap byte[]. Level 1 (the recommended upload fix): pass the BinaryValue through to setContent instead — every xmldb:store-as-binary($c, $n, request:get-data()) then streams through the (disk-capable) cache. No new API, no Roaster coupling — a ~one-line core change with a careful test (remote-resource path + multi-read + large-binary no-OOM). Level 2 — a request:get-input-stream() primitive for true zero-copy — is deferred: it needs the raw stream before the body is read, which couples to an x-roaster-raw-body opt-out (a Roaster concern I'd want to design with you), and it only saves the disk-cache copy at extreme scale.

So: efficient binary serving = a clean Roaster-native surface (this PR) + zero-copy download (eXist-db/exist#6466) + a one-line streaming-store upload fix (part 2). Notably none of it needs a Roaster change for correctness — the x-roaster-raw-body opt-out only enters at Level 2, and I'd bring that to you as its own design conversation when we get there. Wanted you to see how #38's ask is being addressed end to end rather than worked around.

// Serialization parameters on GET /api/db/resource (existdb-openapi#48,
// folded into db-core's get-resource). XML resources honor the W3C
// serialization params; omitted params defer to conf.xml defaults.
describe('GET /api/db/resource — serialization parameters (oxex / #48)', () => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oxex?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Juri, that was some shorthand for existdb-oxygen-plugin (based on my previous attempt to generalize hsg-project: https://github.com/joewiz/oxex). I considered naming the plugin oxex but so far have stuck with the more verbose name.

});
});

it('indent=yes pretty-prints; indent=no does not', () => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case should be split into two

});
});

it('indent tolerates true/false as well as yes/no', () => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

});
});

it('omit-xml-declaration=no includes the XML declaration; =yes omits it', () => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Comment thread modules/api.json
"responses": {
"200": {
"description": "Resource content with metadata",
"description": "Resource content (always includes runPath). With meta=full, the resource's metadata fields are flattened in alongside the content. XML resources are serialized using any serialization parameters supplied (method/indent/omit-xml-declaration/encoding/media-type/item-separator); omitted parameters fall through to the conf.xml serializer defaults, so a request with no serialization parameters is byte-for-byte identical to before. NOTE: eXist's `expand-xincludes` is not yet supported here — as of eXist 7.0.0-beta3 it cannot be honored for node-to-string serialization, so it is intentionally omitted rather than silently expanding.",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the runPath property for?
Why would any metadata be returned when the resource itself is queried?
If one would have metadata alongside the content I would expect it to be part of the header.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have asked this earlier but I missed this. I will do some manual tests of this endpoints in order to get a better understanding of its behaviour and hope to better understand the intent.

Comment thread modules/api.json
}
}
},
"/api/db/resource/{path}": {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will end up with path like /api/db/resource/db/apps/my-app/data/test.xml.
Can we get away with just /api/resource/{path} ?

@joewiz

joewiz commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@line-o Great questions! They've prompted some serious reflection. An overhaul is in progress. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants