[feature] compression: add serialization-options argument to compression:zip / compression:tar#6493
Conversation
…amilies
A map{...} constructor builds a homogeneously-typed MapType that recorded a
concrete keyType. MapType.get()/contains() then ran convert() on the lookup
key, casting it to keyType (e.g. integer 12 -> string "12") before delegating
to the underlying map. This conflated keys of different op:same-key families
that share a lexical value: map{"12":"x"}(12) wrongly returned "x", and
map{"2020-01-01":1}(xs:date("2020-01-01")) wrongly matched. It also conflated
within the numeric family by lossy cast: map{1:1}(1.5) wrongly matched.
Per XQuery 3.1 section 17.1, map keys compare via op:same-key, a type-group
comparison with no key casting: the numeric family interchanges, the string
family (xs:string/xs:anyURI/xs:untypedAtomic) interchanges, every other type
matches only itself.
convert() is now redundant. The underlying map's comparator already implements
op:same-key (AbstractMapType.sameKey), and NumericValue/string-family hashCode
are canonical, so cross-type-within-family pairs (xs:integer(5) / xs:double(5.0),
xs:string / xs:anyURI) already hash into the same bucket. Removing convert()
makes MapType behave like SingleKeyMapType, which already compares with sameKey
directly. keyType is retained only for the getKeyType() API.
XQTS misses this because qt3/qt4 place their cross-family distinctness tests on
map:entry() (a MIXED-key map, where convert() was a no-op), never on the
constructor form. New XQSuite tests (mapKeySameKey.xqm) cover the constructor
form: cross-family distinctness plus within-family positives that must still
interchange.
Coupling: this closes the non-conformant serialization "bare-string backdoor".
eXist serialization parameters are looked up by their exist: QName; a user could
also pass them as the prefixed string "exist:expand-xincludes" only because
convert() coerced the QName lookup to a string. With the fix, QName and string
are distinct keys, so only the spec-aligned QName form is honored. The affected
tests (serialize.xql, fnSerializeNewline.xqm, file module sync-serialize.xqm)
are updated: the prefixed-string cases now assert the key is ignored, and the
real functionality is retained via the QName form.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
compression:zip and compression:tar serialized XML resources written to
the archive using only the serializer defaults plus any prolog
"declare option exist:serialize"; there was no way to control
serialization per call. Add an optional 5th argument of serialization
options, mirroring fn:serialize and file:sync#3.
As with fn:serialize, the argument accepts either form:
- a map(*), e.g. map { "indent": false() }; or
- a W3C output:serialization-parameters element.
Standard W3C parameters use their string key (map) or output-namespace
child element; eXist extension parameters use their exist-namespace
QName key (map) or exist-namespace child element. Parsing is delegated
to FunSerialize.getSerializationProperties so both forms behave exactly
as they do for fn:serialize. The result is applied after (so overriding)
the prolog declare option exist:serialize, and re-read on each call.
The new arity is registered for both zip and tar in CompressionModule.
Stacked on eXist-db#6491 (MapType same-key conformance): the
exist-namespace QName key is the spec-conformant form for the extension
parameters. Until eXist-db#6491 merges this branch also carries its commit; it
rebases to a clean diff once eXist-db#6491 lands.
Relates to eXist-db#178
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…forms compression:zip / compression:tar read the strip-prefix argument only when exactly 3 arguments were supplied (args.length == 3), so whenever a 4th (encoding) or 5th (serialization-options) argument was also given, strip-prefix was silently ignored and archive entries kept their full /db/... path instead of being archive-root-relative. strip-prefix is always argument 2 across the 3-, 4- and 5-arg signatures, so the guard should be args.length >= 3 (matching the >= 4 form the encoding read just below already uses). This fixes both zip and tar via their shared AbstractCompressFunction base, and adds CompressionTests coverage asserting strip-prefix is applied at 4-arg and 5-arg. The == 3 guard predates this PR (the 4-arg encoding form has ignored strip-prefix all along); it is folded in here because it lives in the function this PR is already editing and it makes the new 5-arg form usable for producing installable, root-relative archives. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
[This response was co-authored with Claude Code. -Joe] I've folded a small pre-existing fix into this PR (commit b680220): Also includes: strip-prefix fix for the 4- and 5-arg formsWhile building a collection-export endpoint against the new 5-arg form, a pre-existing bug surfaced:
This predates the PR (the 4-arg |
Export a collection subtree as a ZIP (default) or an installable EXPath .xar, with caller-chosen serialization params (indent, omit-xml-declaration, expand-xincludes, and the full vocabulary) applied to XML resources only; binary resources are stored byte-for-byte. format=xar names the archive <abbrev>-<version>.xar from expath-pkg.xml and places expath-pkg.xml/repo.xml at the archive root so it installs via repo:install-and-deploy (400 if the collection has no descriptor). Delegates to eXist's compression:zip, passing the serialization params as its 5th argument; reuses the dbc:serialization-params helper from the resource-consolidation work (exist:-namespace expand-xincludes). Requires an eXist base with the compression serialization-options argument (eXist-db/exist#6493). Bumps the OpenAPI contract version (info.version) to 0.10.0 for the additive endpoint. Tests: cypress db_export.cy.js (13 cases), green against the #6493 build; fixtures seed via native eXist REST to stay independent of the resource-store path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Export a collection subtree as a ZIP (default) or an installable EXPath .xar, with caller-chosen serialization params (indent, omit-xml-declaration, expand-xincludes, and the full vocabulary) applied to XML resources only; binary resources are stored byte-for-byte. format=xar names the archive <abbrev>-<version>.xar from expath-pkg.xml and places expath-pkg.xml/repo.xml at the archive root so it installs via repo:install-and-deploy (400 if the collection has no descriptor). Delegates to eXist's compression:zip, passing the serialization params as its 5th argument; reuses the dbc:serialization-params helper from the resource-consolidation work (exist:-namespace expand-xincludes). Requires an eXist base with the compression serialization-options argument (eXist-db/exist#6493). Bumps the OpenAPI contract version (info.version) to 0.10.0 for the additive endpoint. Tests: cypress db_export.cy.js (13 cases), green against the #6493 build; fixtures seed via native eXist REST to stay independent of the resource-store path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Export a collection subtree as a ZIP (default) or an installable EXPath .xar, with caller-chosen serialization params (indent, omit-xml-declaration, expand-xincludes, and the full vocabulary) applied to XML resources only; binary resources are stored byte-for-byte. format=xar names the archive <abbrev>-<version>.xar from expath-pkg.xml and places expath-pkg.xml/repo.xml at the archive root so it installs via repo:install-and-deploy (400 if the collection has no descriptor). Delegates to eXist's compression:zip, passing the serialization params as its 5th argument; reuses the dbc:serialization-params helper from the resource-consolidation work (exist:-namespace expand-xincludes). Requires an eXist base with the compression serialization-options argument (eXist-db/exist#6493). Bumps the OpenAPI contract version (info.version) to 0.10.0 for the additive endpoint. Tests: cypress db_export.cy.js (13 cases), green against the #6493 build; fixtures seed via native eXist REST to stay independent of the resource-store path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
[This PR was co-authored with Claude Code. -Joe]
Summary
compression:zipandcompression:tarserialize the XML resources they write into the archive, but there was no way to control that serialization per call — the functions used the serializer defaults plus whateverdeclare option exist:serializethe prolog set. This adds an optional 5th argument of serialization options to both functions, mirroringfn:serializeandfile:sync#3.As with
fn:serialize, the argument accepts either form — amap(*)or a W3Coutput:serialization-parameterselement:(identical 5-argument shape for
compression:tar)What changed
AbstractCompressFunction— a newSERIALIZATION_OPTIONS_PARAM(item()?, optional). When supplied, it is parsed byFunSerialize.getSerializationProperties(...)— the same public entry pointfn:serializeuses — so the map and element forms (and their error handling, e.g.XPTY0004for anything else) behave identically tofn:serialize. The resultingPropertiesis applied to every XML entry written to the archive, after (so overriding) the prologdeclare option exist:serialize, and re-read on eacheval(no leakage between calls). Parsing is extracted into a smallparseSerializationOptions(...)helper soeval's NPath complexity is unchanged.ZipFunction/TarFunction— add the 5-argument signature(sources, use-collection-hierarchy, strip-prefix, encoding, serialization-options).CompressionModule— register the new arity for bothzipandtar.zip-tests.xql— new XQSuite coverage (10 tests total):expand-xincludeshonored via its exist-namespaceQNamekey (true expands, false preserves the include); the no-map default is unchanged; a mixed map (W3C string key + existQNamekey) coexists; and theoutput:serialization-parameterselement form is honored (exist param as an exist-namespace child, where a boolean usesvalue="yes"/"no"per the element-form convention). Assertions inspect the extracted node by structure (re-serializing a preservedxi:includewith a relative href would re-trigger XInclude resolution against a missing base URI).Key form
Consistent with the rest of eXist's serialization-option handling (and with
file:sync#3): in the map form, standard W3C parameters use their string key and eXist extension parameters use their exist-namespacexs:QNamekey (the spec-conformant form for implementation-defined parameters); in the element form, W3C parameters are output-namespace children and eXist parameters are exist-namespace children. This is why the PR is stacked on #6491 — that fix makes the exist-namespaceQNamekey the single accepted form for the extension parameters in the map.Test plan
compression:zip(..., map { QName($exist-ns, "expand-xincludes"): true() })expands the<xi:include>in the archived resource;false()preserves it.indent) and an existQNamekey (expand-xincludes) is honored.output:serialization-parameterselement form is honored (exist param as an exist-namespace child element).CompressionTestsXQSuite green (10/10).AvoidReassigningParametersonremoveLeadingOffset,NPathComplexityoncompressElement,UnusedLocalVariableupdateLock— are present on develop and untouched here).Notes
declare option output:method "xml"(W3Coutput:namespace) to be honored bycompression:zip, in addition to the already-supporteddeclare option exist:serialize. This PR addresses the underlying need — per-call control over how archived XML is serialized, covering both W3C and eXist parameters in either themap(*)oroutput:serialization-parameterselement form — which is the modern idiom (fn:serialize,file:sync). It does not wire up the W3Coutput:prolog declaration (a different mechanism: a query-prologdeclare option, not a value passed to the function). Filed as Relates to compression:zip is not supporting Xquery 3.0 serialize option #178 rather than Closes, so you can decide whether the argument satisfies the issue or whether the prolog-output:path should also be implemented (I'd suggest a separate, smaller follow-up if so).declare option exist:serializecontinues to work and is unaffected; the map argument simply takes precedence over it.