Skip to content

Latest commit

 

History

History
188 lines (130 loc) · 5.58 KB

File metadata and controls

188 lines (130 loc) · 5.58 KB

citum-migrate

citum-migrate converts a CSL 1.0 style (.csl) into a Citum style (.yaml).

The migration pipeline is now output-driven first:

  1. Extract global options from CSL XML.
    • Includes processing/disambiguation extraction and citation-sort mapping.
    • Emits citation/bibliography contributor overrides when et-al thresholds differ by scope.
  2. Resolve citation and bibliography templates from inferred output artifacts.
  3. Fall back to XML template compilation only when template artifacts are missing or rejected.

This keeps option extraction deterministic while scaling template migration to large style corpora.

When the target style is already known in the repo as a profile or journal wrapper, citum-migrate now derives that lineage from current repo truth and may emit extends:-based wrapper output instead of flattening everything into a standalone style. Unknown or unresolved styles still fall back to standalone output.

CLI Usage

cargo run --bin citum-migrate -- <style.csl> [flags]

Example:

cargo run --bin citum-migrate -- styles-legacy/apa.csl > styles/apa.yaml

Flags

  • --template-source auto|hand|inferred|xml
  • --live-infer-backend auto|embedded|node
  • --template-dir <path>
  • --min-template-confidence <0.0..1.0>
  • --debug-variable <name>

--template-source

  • auto (default): hand-authored -> inferred cache/live -> XML fallback
  • hand: hand-authored only -> XML fallback
  • inferred: inferred cache only -> XML fallback
  • xml: XML templates only

Important: inferred mode is cache-only and never runs live Node/citeproc-js inference.

--live-infer-backend

  • auto (default): embedded JS runtime first, then Node subprocess fallback
  • embedded: embedded JS runtime only
  • node: legacy Node subprocess only

This flag only applies when --template-source auto needs live inference after cache lookup. Cache hits still win first.

What "hand-authored" means

In this README, hand-authored means a checked-in Citum style YAML file created manually (human or agent-assisted), not generated by citum-migrate or infer-template.js.

Path convention:

  • examples/<style-name>-style.yaml

citum-migrate reads citation and bibliography templates from that file when available. Resolution is section-level:

  • if the hand-authored file contains only bibliography template data, citation can still come from inferred cache (or XML fallback)
  • if it contains both sections, both are used

Template Resolution Order

In auto mode:

  1. examples/<style-name>-style.yaml (hand-authored template sections)
  2. templates/inferred/<style-name>.bibliography.json
  3. templates/inferred/<style-name>.citation.json
  4. Legacy cache compatibility: templates/inferred/<style-name>.json (bibliography)
  5. Live inference via embedded JS runtime (auto mode default)
  6. Live inference via scripts/infer-template.js Node fallback (auto mode only)
  7. XML template compiler fallback

Embedded Runtime Bundle

The embedded runtime bundle is committed at:

  • crates/citum-migrate/js/embedded-template-runtime.js

Regenerate it after changing the host-neutral inference core or citeproc bundle:

node scripts/build-embedded-template-runtime.js

Precompile Once, Migrate in Rust

For large-scale migration, precompute inferred templates once, then run Rust migrations without citeproc-js:

# 1) Precompute inferred template cache for all parent styles
./scripts/batch-infer.sh

# 2) Or precompute selected styles
./scripts/batch-infer.sh --styles "apa elsevier-harvard ieee"

# 3) Migrate using cache-only inferred mode (no live Node inference)
cargo run --bin citum-migrate -- styles-legacy/apa.csl --template-source inferred

Cache Artifact Format

Section-keyed cache files:

  • templates/inferred/STYLE_NAME.bibliography.json
  • templates/inferred/STYLE_NAME.citation.json

Each file is produced by:

node scripts/infer-template.js styles-legacy/STYLE_NAME.csl --section=bibliography --fragment
node scripts/infer-template.js styles-legacy/STYLE_NAME.csl --section=citation --fragment

Fragment shape:

{
  "meta": {
    "style": "apa",
    "confidence": 0.85,
    "delimiter": ". ",
    "entrySuffix": ".",
    "wrap": "parentheses"
  },
  "bibliography": {
    "template": []
  }
}

citation artifacts use the same shape with a citation section key.

Confidence Gate

--min-template-confidence rejects inferred fragments below threshold before use.

Example:

cargo run --bin citum-migrate -- styles-legacy/apa.csl \
  --template-source auto \
  --min-template-confidence 0.80

When rejected, migration falls back to XML template compilation for that section.

Fidelity Expectations

citum-migrate does not guarantee perfect output equivalence for every legacy style without review. Current expectations:

  • Inferred templates are primarily used to raise bibliography fidelity.
  • Citation fidelity is protected by guardrails and section-level XML fallback.
  • Note styles can still require manual review/tuning more often than author-date and numeric styles.

As of February 19, 2026, a random stratified benchmark of 30 styles (author-date, numeric, note) showed:

  • Citation: XML 90.8% vs inferred 90.4% (-0.4pp)
  • Bibliography: XML 89.5% vs inferred 93.3% (+3.8pp)

Use oracle validation for style-level acceptance:

node scripts/oracle.js styles-legacy/your-style.csl --json

Notes

  • Output is written to stdout; redirect to a file as needed.
  • Options extraction remains XML-based by design.
  • Template inference is output-driven to avoid procedural CSL template translation bottlenecks.