feat(db): path-in-URL raw binary resource transport (#35/#38)#56
feat(db): path-in-URL raw binary resource transport (#35/#38)#56joewiz wants to merge 10 commits into
Conversation
The /api/db endpoints spoke eXist's stored, percent-encoded form on the
wire: listings returned "caf%C3%A9.xml", so the existdb-oxygen-plugin and
other clients displayed encoded names and could not tell which form was
canonical. eXide and TEI Publisher already decode for display and encode
before storage; this brings existdb-openapi in line.
Every handler now encodes the incoming wire path once (db:to-stored)
before any doc()/collection()/xmldb:* call, and decodes every name and
path leaving the API (db:to-display).
fn:iri-to-uri is used inbound (not xmldb:encode) because it matches
eXist's own storage escaping: it percent-encodes spaces and non-ASCII but
leaves sub-delims (' & + @) and existing %XX untouched. Verified against a
live instance: xmldb:store("café.xml") -> caf%C3%A9.xml and
store("quote'name.xml") -> quote'name.xml (literal), while xmldb:store
throws outright on a raw space. iri-to-uri reproduces store's output where
store succeeds, additionally encodes the space store rejects, and is
idempotent on already-encoded paths -- so older encoded-path clients keep
working. xmldb:encode would full RFC-3986 encode the sub-delims and fail
to resolve names xmldb:store left literal.
Proven end-to-end with "späce & quøte'd.xml" (space + sub-delim +
non-ASCII): store -> list shows decoded -> get by decoded path returns
content -> delete.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The existing db.cy.js tests use ASCII names only, so they pass but never exercise the encode-on-input / decode-on-output boundary in db.xqm. Add an awkward-names block: store, read, list, and remove resources named "café déjà.xml" (non-ASCII + space) and "o'brien.xml" (sub-delim apostrophe, which xmldb:store leaves literal) — all addressed by their DECODED name. Assert the read echoes the decoded path, content is intact, and the listing shows decoded names (no %XX), confirming the round trip. Verified on a live instance (existdb-openapi on an ft:fields bed): both names store and round-trip with decoded path/content and decoded listing names. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
db:to-display decoded names with xmldb:decode-uri, which form-decodes "+" to a space (the x-www-form-urlencoded convention; eXist-db/exist#1824). But a "+" in a stored name is always a literal "+" -- spaces are stored as %20 -- and db:to-stored (fn:iri-to-uri) leaves "+" untouched on the encode side, so a name like "naïve+test.xml" stored correctly but read back as "naïve test.xml". Protect a literal "+" as %2B before xmldb:decode-uri so it decodes back to "+", restoring symmetry with the encode side. This mirrors what URIUtils.decodeForURI (the core fix in eXist-db/exist#6451) does, applied at the API layer so it is correct independent of the core build, and forward-compatible once #6451 lands. Spaces (%20) are unaffected. Verified end-to-end against a live instance: "naïve+test.xml" stores as na%C3%AFve+test.xml on disk, lists and reads back as naïve+test.xml. Adds the "+" case to the Cypress awkward-name coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split modules/db.xqm into a roaster-independent core (db-core.xqm) and a thin roaster wrapper, so the same db-resource CRUD + naming-correctness implementation can be shared in-process by other apps (e.g. eXide) without an HTTP hop or re-auth. db-core.xqm holds all the logic: list / get-resource / store / create-collection / remove-resource / remove-collection / move / copy / properties / set-permissions / sync / modules. Functions take a wire path plus an options map and return plain maps; failures are signalled as typed errors in the http://exist-db.org/api/db-core/error namespace (bad-request / not-found / forbidden / conflict / server-error) rather than HTTP responses. db:to-stored / db:to-display (the iri-to-uri encode/decode boundary, incl. the literal-"+" guard) now live here as db-core:to-stored / db-core:to-display — the single home for naming correctness. db.xqm is now purely HTTP framing: each handler unpacks the roaster $request, calls db-core, and maps a typed error to a status via a small error-response helper. Store's 201-vs-200 (new vs existing) is conveyed by a `created` flag the wrapper strips before responding; create-collection keeps its 201. No api.json or route changes — behavior-preserving. Verified against a live eXist: all 46 db Cypress tests pass, including the awkward-name (café / o'brien / naïve+test) encode/decode coverage from eXist-db#54, and the 400/403/404/409 error mappings match the pre-refactor responses. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…th, set-mime Adds the audit §2 items 2–6 to db-core (the single implementation), exposed through the thin roaster wrapper and api.json, with Cypress coverage: - item 2 — GET /api/db/resource?meta=full flattens the resource's metadata (owner, group, mode, acl, size, created, last-modified) alongside the content, sparing a second /properties round trip. Flat (not nested) to match both eXide's load response and our own /properties shape. Default (no meta) is unchanged. - item 3 — every list item carries a `writable` boolean (sm:has-access "w"), a file-browser affordance evaluated authoritatively server-side rather than left for the client to derive from mode bits. - item 5 — a flat (non-recursive) listing carries start/count pagination and a `total`; child collections then resources, sliced by start (1-based) and count (default: all). Tree listings are unaffected. - item 6 — get-resource always returns runPath. - item 4 — set MIME via POST /api/db/permissions (xmldb:set-mime-type). A mime incompatible with the resource's storage class (eXist only allows an XML-class mime on an XML resource, a binary-class mime on a binary one) now surfaces as a clean 400 instead of an undeclared-500 → SENR0001. writable + pagination are always-on (per design decision): default output is a superset of the prior shape, never a truncation. Serialization (PR eXist-db#48) and binary streaming (eXist-db#38/eXist-db#35) remain out of scope. Verified on a live eXist: 54 db Cypress tests pass (46 prior + 8 new), query.cy.js stays green (routing intact). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ace import
Lets in-process consumers (eXide's db adapter first) import db-core by
namespace alone — no absolute-path `at` hint:
import module namespace dbc="http://exist-db.org/api/db-core";
Uses the kuberam `<xquerySet>` mechanism (the same one semver.xq uses) to
emit an `<xquery>` registration into the generated expath-pkg.xml. Three
details made it actually resolve:
- path-preserving include (directory=${basedir}, include modules/db-core.xqm,
outputDirectory=.) so the registration `<file>` is `modules/db-core.xqm`,
matching the deploy path — a bare basename registration fails with
err:XQST0059 (file not found at the package root).
- db-core's transitive dependency dbutils is registered public too, and
db-core now imports it by namespace (dropping the relative `at "dbutils.xqm"`
hint, which has no base URI when db-core is loaded via the public-module
registry — that path errored with "Source for module dbutils not found").
- both db-core.xqm and dbutils.xqm are excluded from the bulk modules fileSet
so the xquerySets place them (no double-copy).
Verified on a live eXist (full XAR build + install + restart): namespace
import resolves the whole chain (to-stored / to-display / list incl. dbutils),
the app's own endpoints still work, and all 54 db Cypress tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Browse listings now carry mime-type on resource items (collections have no mime), so a file-browser consumer (e.g. eXide's tree) gets the content type without a per-item follow-up call. Consistent with get-resource / properties, which already report mime-type. Additive; covered by the db Cypress suite (54 passing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…urce Absorbs existdb-openapi#48 (feat/db-resource-serialization) into the db-core-centric implementation, per the "db-core is the single implementation; close conflicting older PRs" direction. db-core's get-resource now serializes XML resources with any W3C serialization parameters the caller supplies — method / indent / omit-xml-declaration / encoding / media-type / item-separator — via dbc:serialization-params + dbc:yes-no (ported from eXist-db#48). Omitted parameters fall through to the conf.xml serializer defaults, so a request with no params is byte-for-byte identical to before; binary resources are unaffected. The roaster wrapper passes the request parameters straight through as db-core options (db-core reads meta + the serialization keys, ignores the rest). expand-xincludes is deliberately NOT handled (eXist 7.0.0-beta3 can't honor it for node-to-string serialization; task eXist-db#37) — omitted rather than silently expanding. api.json documents the six params + the updated 200 description; eXist-db#48's Cypress serialization suite is ported. 59 db tests pass on a live eXist. With this, existdb-openapi#48 can close as superseded by the db-core work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eXist's sm:chown/chgrp/chmod and the mode validator throw a generic ErrorCodes.ERROR for permission-denied, bad-owner/group, and malformed-mode alike (PermissionsFunction.java + XPathException.java default), distinguishable only by message text — too brittle to key on $err:code. So db-core:set-permissions now classifies by OPERATION: a chown/chgrp/chmod failure is overwhelmingly an authorization failure -> forbidden (403); a set-mime failure is an input error (mime incompatible with the resource's storage class) -> bad-request (400). Verified the side-effects still fire (mode read-back after set) and 59 db tests pass. Noted upstream for the exist-strategy session: eXist should assign distinct error codes so this could key on $err:code instead of per-operation heuristics. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t-db#38) Adds GET/PUT /api/db/resource/{path} — a binary-safe, roaster-native resource transport, the clean alternative to the JSON-envelope /api/db/resource (which is text/base64-only). Closes the binary side of eXist-db#35/eXist-db#38 without a controller workaround (Juri's concern about an /api/* side door). - GET streams a binary resource's raw bytes with response:stream-binary (NOT via the serializer, which would emit base64 text and corrupt it — the established roaster pattern, the same path eXist's REST server uses); an XML/text resource is returned as a node so roaster serializes it once with its stored mime (returning a pre-serialized string would be XML-escaped a second time). - PUT stores the raw request body via db-core (binary bodies arrive intact; mime inferred from the name), returning { stored, runPath } (201/200). No exist-core or roaster change needed — binary transport works on stock eXist via the existing response:stream-binary. (Zero-copy streaming of very large stored binaries is a separate exist-core optimization, tracked with exist-strategy.) Self-contained Cypress coverage (raw round-trip not base64-mangled, 201/200 + stored/runPath, XML serialized with mime, 404); full db suite stays green (59 + 4). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
[This response was co-authored with Claude Code. -Joe] @line-o — this picks up your request from #38 ("I would rather like to see us come up with a way to serve binaries from roaster efficiently"). Your instinct that efficiently was the operative word turned out to be exactly right, and it's shaped a two-part plan — this PR is part 1. The key finding (download side): binary serving was never actually broken in Roaster — The real gap is the one you pointed at — heap materialization for large files, and it splits cleanly:
So: efficient binary serving = a clean Roaster-native surface (this PR) + zero-copy download (eXist-db/exist#6466) + a one-line streaming-store upload fix (part 2). Notably none of it needs a Roaster change for correctness — the |
| // Serialization parameters on GET /api/db/resource (existdb-openapi#48, | ||
| // folded into db-core's get-resource). XML resources honor the W3C | ||
| // serialization params; omitted params defer to conf.xml defaults. | ||
| describe('GET /api/db/resource — serialization parameters (oxex / #48)', () => { |
There was a problem hiding this comment.
Sorry, Juri, that was some shorthand for existdb-oxygen-plugin (based on my previous attempt to generalize hsg-project: https://github.com/joewiz/oxex). I considered naming the plugin oxex but so far have stuck with the more verbose name.
| }); | ||
| }); | ||
|
|
||
| it('indent=yes pretty-prints; indent=no does not', () => { |
There was a problem hiding this comment.
this case should be split into two
| }); | ||
| }); | ||
|
|
||
| it('indent tolerates true/false as well as yes/no', () => { |
| }); | ||
| }); | ||
|
|
||
| it('omit-xml-declaration=no includes the XML declaration; =yes omits it', () => { |
| "responses": { | ||
| "200": { | ||
| "description": "Resource content with metadata", | ||
| "description": "Resource content (always includes runPath). With meta=full, the resource's metadata fields are flattened in alongside the content. XML resources are serialized using any serialization parameters supplied (method/indent/omit-xml-declaration/encoding/media-type/item-separator); omitted parameters fall through to the conf.xml serializer defaults, so a request with no serialization parameters is byte-for-byte identical to before. NOTE: eXist's `expand-xincludes` is not yet supported here — as of eXist 7.0.0-beta3 it cannot be honored for node-to-string serialization, so it is intentionally omitted rather than silently expanding.", |
There was a problem hiding this comment.
What is the runPath property for?
Why would any metadata be returned when the resource itself is queried?
If one would have metadata alongside the content I would expect it to be part of the header.
There was a problem hiding this comment.
I should have asked this earlier but I missed this. I will do some manual tests of this endpoints in order to get a better understanding of its behaviour and hope to better understand the intent.
| } | ||
| } | ||
| }, | ||
| "/api/db/resource/{path}": { |
There was a problem hiding this comment.
We will end up with path like /api/db/resource/db/apps/my-app/data/test.xml.
Can we get away with just /api/resource/{path} ?
|
@line-o Great questions! They've prompted some serious reflection. An overhaul is in progress. ;) |
[This PR was co-authored with Claude Code. -Joe]
Summary
Adds
GET/PUT /api/db/resource/{path}— a binary-safe, path-in-URL resource transport, the clean alternative to the JSON-envelope/api/db/resource(which is text/base64-only). This is the roaster-native way to move binary resources (no controller workaround, no/api/*side door), addressing #35 and superseding #38.How it works
response:stream-binary— not via the serializer, which would emit the value's base64 text and corrupt it. This is the established roaster pattern (roaster's own test app uses it; it's the same raw-bytes path eXist's REST server takes). An XML/text resource is returned as a node so roaster serializes it once with its stored mime (a pre-serialized string would be XML-escaped a second time).{ stored, runPath }(201 on create, 200 on overwrite).No exist-core or roaster change is needed — binary transport works on stock eXist today via the existing
response:stream-binary. (The investigation behind this is why there is no Roaster PR: binary was never broken in roaster —roaster:response(mime, bytes)just isn't the way to return binary.)Follow-up: zero-copy for large files
response:stream-binarystill materializes the wholebase64Binary. eXist-core PR eXist-db/exist#6466 addsresponse:stream-binary-resource($path, $content-type, $filename?)(broker-level zero-copy, no heap materialization). When that lands in a release,db:get-resource-raw'sutil:binary-doc(…) => response:stream-binary(…)becomes a singleresponse:stream-binary-resource($stored, …)call. Functional transport here does not wait on it.Testing
Self-contained Cypress suite (
db-binary.cy.js): raw round-trip is not base64-mangled, 201/200 +stored/runPath, XML served with its mime, 404. Full db suite stays green (59 + 4). Verified byte-identical over HTTP on stock eXist (a PNG round-trips intact).Supersedes #38; closes the binary half of #35 on merge.