Skip to content

[feature] compression: add serialization-options argument to compression:zip / compression:tar#6493

Open
joewiz wants to merge 3 commits into
eXist-db:developfrom
joewiz:feature/compression-serialization-options
Open

[feature] compression: add serialization-options argument to compression:zip / compression:tar#6493
joewiz wants to merge 3 commits into
eXist-db:developfrom
joewiz:feature/compression-serialization-options

Conversation

@joewiz

@joewiz joewiz commented Jun 18, 2026

Copy link
Copy Markdown
Member

[This PR was co-authored with Claude Code. -Joe]

Summary

compression:zip and compression:tar serialize the XML resources they write into the archive, but there was no way to control that serialization per call — the functions used the serializer defaults plus whatever declare option exist:serialize the prolog set. This adds an optional 5th argument of serialization options to both functions, mirroring fn:serialize and file:sync#3.

As with fn:serialize, the argument accepts either form — a map(*) or a W3C output:serialization-parameters element:

compression:zip($sources as xs:anyType+,
                $use-collection-hierarchy as xs:boolean,
                $strip-prefix as xs:string,
                $encoding as xs:string,
                $serialization-options as item()?) as xs:base64Binary*

(identical 5-argument shape for compression:tar)

(: map form :)
compression:zip($sources, true(), "", "UTF8",
    map {
        "indent": false(),                                                          (: W3C param, string key :)
        QName("http://exist.sourceforge.net/NS/exist", "expand-xincludes"): true()  (: exist param, QName key :)
    }
)

(: output:serialization-parameters element form :)
compression:zip($sources, true(), "", "UTF8",
    <output:serialization-parameters xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
                                     xmlns:exist="http://exist.sourceforge.net/NS/exist">
        <output:indent value="no"/>                  (: W3C param, output-namespace child :)
        <exist:expand-xincludes value="yes"/>        (: exist param, exist-namespace child :)
    </output:serialization-parameters>
)

What changed

  • AbstractCompressFunction — a new SERIALIZATION_OPTIONS_PARAM (item()?, optional). When supplied, it is parsed by FunSerialize.getSerializationProperties(...) — the same public entry point fn:serialize uses — so the map and element forms (and their error handling, e.g. XPTY0004 for anything else) behave identically to fn:serialize. The resulting Properties is applied to every XML entry written to the archive, after (so overriding) the prolog declare option exist:serialize, and re-read on each eval (no leakage between calls). Parsing is extracted into a small parseSerializationOptions(...) helper so eval's NPath complexity is unchanged.
  • ZipFunction / TarFunction — add the 5-argument signature (sources, use-collection-hierarchy, strip-prefix, encoding, serialization-options).
  • CompressionModule — register the new arity for both zip and tar.
  • zip-tests.xql — new XQSuite coverage (10 tests total): expand-xincludes honored via its exist-namespace QName key (true expands, false preserves the include); the no-map default is unchanged; a mixed map (W3C string key + exist QName key) coexists; and the output:serialization-parameters element form is honored (exist param as an exist-namespace child, where a boolean uses value="yes"/"no" per the element-form convention). Assertions inspect the extracted node by structure (re-serializing a preserved xi:include with a relative href would re-trigger XInclude resolution against a missing base URI).

Key form

Consistent with the rest of eXist's serialization-option handling (and with file:sync#3): in the map form, standard W3C parameters use their string key and eXist extension parameters use their exist-namespace xs:QName key (the spec-conformant form for implementation-defined parameters); in the element form, W3C parameters are output-namespace children and eXist parameters are exist-namespace children. This is why the PR is stacked on #6491 — that fix makes the exist-namespace QName key the single accepted form for the extension parameters in the map.

Test plan

  • compression:zip(..., map { QName($exist-ns, "expand-xincludes"): true() }) expands the <xi:include> in the archived resource; false() preserves it.
  • No map argument: archived XML serializes exactly as before (the serializer's own default).
  • A map mixing a W3C string key (indent) and an exist QName key (expand-xincludes) is honored.
  • The output:serialization-parameters element form is honored (exist param as an exist-namespace child element).
  • CompressionTests XQSuite green (10/10).
  • Codacy PMD: no new findings on the changed Java files (the three pre-existing findings — AvoidReassigningParameters on removeLeadingOffset, NPathComplexity on compressElement, UnusedLocalVariable updateLock — are present on develop and untouched here).

Notes

  • Scope vs. the original report. compression:zip is not supporting Xquery 3.0 serialize option #178 (2013) asked specifically for the prolog declare option output:method "xml" (W3C output: namespace) to be honored by compression:zip, in addition to the already-supported declare option exist:serialize. This PR addresses the underlying need — per-call control over how archived XML is serialized, covering both W3C and eXist parameters in either the map(*) or output:serialization-parameters element form — which is the modern idiom (fn:serialize, file:sync). It does not wire up the W3C output: prolog declaration (a different mechanism: a query-prolog declare option, not a value passed to the function). Filed as Relates to compression:zip is not supporting Xquery 3.0 serialize option #178 rather than Closes, so you can decide whether the argument satisfies the issue or whether the prolog-output: path should also be implemented (I'd suggest a separate, smaller follow-up if so).
  • declare option exist:serialize continues to work and is unaffected; the map argument simply takes precedence over it.

joewiz and others added 2 commits June 17, 2026 17:44
…amilies

A map{...} constructor builds a homogeneously-typed MapType that recorded a
concrete keyType. MapType.get()/contains() then ran convert() on the lookup
key, casting it to keyType (e.g. integer 12 -> string "12") before delegating
to the underlying map. This conflated keys of different op:same-key families
that share a lexical value: map{"12":"x"}(12) wrongly returned "x", and
map{"2020-01-01":1}(xs:date("2020-01-01")) wrongly matched. It also conflated
within the numeric family by lossy cast: map{1:1}(1.5) wrongly matched.

Per XQuery 3.1 section 17.1, map keys compare via op:same-key, a type-group
comparison with no key casting: the numeric family interchanges, the string
family (xs:string/xs:anyURI/xs:untypedAtomic) interchanges, every other type
matches only itself.

convert() is now redundant. The underlying map's comparator already implements
op:same-key (AbstractMapType.sameKey), and NumericValue/string-family hashCode
are canonical, so cross-type-within-family pairs (xs:integer(5) / xs:double(5.0),
xs:string / xs:anyURI) already hash into the same bucket. Removing convert()
makes MapType behave like SingleKeyMapType, which already compares with sameKey
directly. keyType is retained only for the getKeyType() API.

XQTS misses this because qt3/qt4 place their cross-family distinctness tests on
map:entry() (a MIXED-key map, where convert() was a no-op), never on the
constructor form. New XQSuite tests (mapKeySameKey.xqm) cover the constructor
form: cross-family distinctness plus within-family positives that must still
interchange.

Coupling: this closes the non-conformant serialization "bare-string backdoor".
eXist serialization parameters are looked up by their exist: QName; a user could
also pass them as the prefixed string "exist:expand-xincludes" only because
convert() coerced the QName lookup to a string. With the fix, QName and string
are distinct keys, so only the spec-aligned QName form is honored. The affected
tests (serialize.xql, fnSerializeNewline.xqm, file module sync-serialize.xqm)
are updated: the prefixed-string cases now assert the key is ignored, and the
real functionality is retained via the QName form.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
compression:zip and compression:tar serialized XML resources written to
the archive using only the serializer defaults plus any prolog
"declare option exist:serialize"; there was no way to control
serialization per call. Add an optional 5th argument of serialization
options, mirroring fn:serialize and file:sync#3.

As with fn:serialize, the argument accepts either form:
  - a map(*), e.g. map { "indent": false() }; or
  - a W3C output:serialization-parameters element.
Standard W3C parameters use their string key (map) or output-namespace
child element; eXist extension parameters use their exist-namespace
QName key (map) or exist-namespace child element. Parsing is delegated
to FunSerialize.getSerializationProperties so both forms behave exactly
as they do for fn:serialize. The result is applied after (so overriding)
the prolog declare option exist:serialize, and re-read on each call.

The new arity is registered for both zip and tar in CompressionModule.

Stacked on eXist-db#6491 (MapType same-key conformance): the
exist-namespace QName key is the spec-conformant form for the extension
parameters. Until eXist-db#6491 merges this branch also carries its commit; it
rebases to a clean diff once eXist-db#6491 lands.

Relates to eXist-db#178

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz joewiz requested a review from a team as a code owner June 18, 2026 02:31
…forms

compression:zip / compression:tar read the strip-prefix argument only when
exactly 3 arguments were supplied (args.length == 3), so whenever a 4th
(encoding) or 5th (serialization-options) argument was also given, strip-prefix
was silently ignored and archive entries kept their full /db/... path instead
of being archive-root-relative.

strip-prefix is always argument 2 across the 3-, 4- and 5-arg signatures, so the
guard should be args.length >= 3 (matching the >= 4 form the encoding read just
below already uses). This fixes both zip and tar via their shared
AbstractCompressFunction base, and adds CompressionTests coverage asserting
strip-prefix is applied at 4-arg and 5-arg.

The == 3 guard predates this PR (the 4-arg encoding form has ignored strip-prefix
all along); it is folded in here because it lives in the function this PR is
already editing and it makes the new 5-arg form usable for producing installable,
root-relative archives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joewiz

joewiz commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

[This response was co-authored with Claude Code. -Joe]

I've folded a small pre-existing fix into this PR (commit b680220):

Also includes: strip-prefix fix for the 4- and 5-arg forms

While building a collection-export endpoint against the new 5-arg form, a pre-existing bug surfaced: compression:zip / compression:tar read the strip-prefix argument only when exactly 3 arguments were supplied (args.length == 3), so whenever a 4th (encoding) or 5th (serialization-options) argument was also given, strip-prefix was silently ignored and archive entries kept their full /db/... path instead of being archive-root-relative.

strip-prefix is always argument 2 across the 3-, 4- and 5-arg signatures, so the guard is now args.length >= 3 (matching the >= 4 form the encoding read just below already uses). Fixes both zip and tar via their shared AbstractCompressFunction base.

This predates the PR (the 4-arg encoding form has ignored strip-prefix all along), but it's folded in here because it lives in the function this PR is already editing and it makes the new 5-arg form actually usable for producing installable, root-relative archives. CompressionTests now asserts strip-prefix is applied at 4-arg and 5-arg, not just 3-arg.

joewiz added a commit to joewiz/existdb-openapi that referenced this pull request Jun 18, 2026
Export a collection subtree as a ZIP (default) or an installable EXPath
.xar, with caller-chosen serialization params (indent, omit-xml-declaration,
expand-xincludes, and the full vocabulary) applied to XML resources only;
binary resources are stored byte-for-byte. format=xar names the archive
<abbrev>-<version>.xar from expath-pkg.xml and places expath-pkg.xml/repo.xml
at the archive root so it installs via repo:install-and-deploy (400 if the
collection has no descriptor).

Delegates to eXist's compression:zip, passing the serialization params as
its 5th argument; reuses the dbc:serialization-params helper from the
resource-consolidation work (exist:-namespace expand-xincludes). Requires an
eXist base with the compression serialization-options argument
(eXist-db/exist#6493).

Bumps the OpenAPI contract version (info.version) to 0.10.0 for the additive
endpoint.

Tests: cypress db_export.cy.js (13 cases), green against the #6493 build;
fixtures seed via native eXist REST to stay independent of the resource-store
path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
joewiz added a commit to joewiz/existdb-openapi that referenced this pull request Jun 18, 2026
Export a collection subtree as a ZIP (default) or an installable EXPath
.xar, with caller-chosen serialization params (indent, omit-xml-declaration,
expand-xincludes, and the full vocabulary) applied to XML resources only;
binary resources are stored byte-for-byte. format=xar names the archive
<abbrev>-<version>.xar from expath-pkg.xml and places expath-pkg.xml/repo.xml
at the archive root so it installs via repo:install-and-deploy (400 if the
collection has no descriptor).

Delegates to eXist's compression:zip, passing the serialization params as
its 5th argument; reuses the dbc:serialization-params helper from the
resource-consolidation work (exist:-namespace expand-xincludes). Requires an
eXist base with the compression serialization-options argument
(eXist-db/exist#6493).

Bumps the OpenAPI contract version (info.version) to 0.10.0 for the additive
endpoint.

Tests: cypress db_export.cy.js (13 cases), green against the #6493 build;
fixtures seed via native eXist REST to stay independent of the resource-store
path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
joewiz added a commit to joewiz/existdb-openapi that referenced this pull request Jun 19, 2026
Export a collection subtree as a ZIP (default) or an installable EXPath
.xar, with caller-chosen serialization params (indent, omit-xml-declaration,
expand-xincludes, and the full vocabulary) applied to XML resources only;
binary resources are stored byte-for-byte. format=xar names the archive
<abbrev>-<version>.xar from expath-pkg.xml and places expath-pkg.xml/repo.xml
at the archive root so it installs via repo:install-and-deploy (400 if the
collection has no descriptor).

Delegates to eXist's compression:zip, passing the serialization params as
its 5th argument; reuses the dbc:serialization-params helper from the
resource-consolidation work (exist:-namespace expand-xincludes). Requires an
eXist base with the compression serialization-options argument
(eXist-db/exist#6493).

Bumps the OpenAPI contract version (info.version) to 0.10.0 for the additive
endpoint.

Tests: cypress db_export.cy.js (13 cases), green against the #6493 build;
fixtures seed via native eXist REST to stay independent of the resource-store
path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant