Skip to content

refactor: rename plotting subpackage to visualization#21

Merged
edkerk merged 1 commit into
developfrom
refactor/rename-plotting-to-visualization
Jun 9, 2026
Merged

refactor: rename plotting subpackage to visualization#21
edkerk merged 1 commit into
developfrom
refactor/rename-plotting-to-visualization

Conversation

@edkerk

@edkerk edkerk commented Jun 8, 2026

Copy link
Copy Markdown
Member

What

Renames the (stub) plotting subpackage to visualization, matching RAVEN's folder layout where pathway/ + plotting/ were unified into visualization/ (SysBioChalmers/RAVEN#614). This is the deferred follow-up noted in #20.

  • src/raven_python/plotting/src/raven_python/visualization/
  • pip extra [plotting][visualization] (matplotlib)
  • CI (.github/workflows/ci.yml) and ReadTheDocs (.readthedocs.yaml) install lines updated so the renamed extra still resolves
  • docs updated: README.md, CHANGELOG.md, docs/installation.md, docs/README.md, docs/reference/api/index.md, docs/reference/todo.md

Notes

  • The subpackage is an unimplemented stub (empty __init__.py), so nothing imports it — no behaviour change, no broken imports.
  • Generic prose uses of the word "plotting" (seaborn / heatmap descriptions) are intentionally left unchanged.
  • After this, RAVEN and raven-python share the visualization/ name. The one remaining structural gap is reconstruction/metacyc/ (feature work — porting MetaCyc — not a rename).

Align the (stub) plotting subpackage with RAVEN's folder layout, where pathway +
plotting were unified into visualization/ (SysBioChalmers/RAVEN#614).

- src/raven_python/plotting/ -> src/raven_python/visualization/
- pyproject optional-dependency extra [plotting] -> [visualization] (matplotlib);
  CI (.github/workflows/ci.yml) and ReadTheDocs (.readthedocs.yaml) install lines
  updated to match so the renamed extra still resolves.
- docs updated: README, CHANGELOG, installation, docs/README, api/index, todo.

The subpackage is an unimplemented stub (empty __init__), so nothing imports it and
there is no behaviour change. Generic uses of the word "plotting" (seaborn / heatmap
prose) are left as-is.
@edkerk edkerk merged commit 693bc08 into develop Jun 9, 2026
5 checks passed
@edkerk edkerk deleted the refactor/rename-plotting-to-visualization branch June 9, 2026 05:33
edkerk added a commit that referenced this pull request Jun 13, 2026
* feat(data): shared download manifest for artefacts + binaries (#16)

* feat(data): shared download manifest for artefacts and binaries

Introduce a single, language-agnostic manifest (data/manifest.schema.json) that lists every
downloadable data artefact and external-binary bundle with a SHA256, consumed by both
raven-python and (via the same JSON) MATLAB RAVEN. The manifest is a superset of the two
runtime registries:

* manifest["data"]     -> raven_python.data._DATA_REGISTRY
* manifest["binaries"] -> raven_python.binaries._REGISTRY

Added:
* data/manifest.schema.json (JSON Schema) + data/manifest.example.json (worked example) +
  data/manifest.json (empty, the live source of truth until assets are published).
* raven_python.manifest — load_manifest / to_*_registry / load_into_registries.
* Lazy autoload: data.ensure_* and binaries.ensure_binary populate themselves from
  $RAVEN_PYTHON_MANIFEST on first use when their registry is still empty (guarded; no effect
  when a registry is passed explicitly or the env var is unset).
* scripts/make_registry_snippet.py: a `manifest` subcommand that computes url+sha256+bytes
  and writes/updates manifest.json.
* tests/test_manifest.py (round-trip, converters, lazy autoload via file:// URLs, repo
  manifests valid).
* docs/maintenance/data_manifest.md — format, Python + MATLAB consumers, GitHub-Releases vs
  Zenodo hosting (incl. a release→Zenodo GitHub Action), and per-asset recommendations.

* docs(data): host assets on existing-repo releases; KEGG redistribution permitted

Reflect the chosen distribution model: GitHub release assets live outside the git tree, so a
separate data repository is optional — attach assets to dedicated tags (e.g. kegg-kegg116,
diamond-2.1.9) on an existing RAVEN repo and reuse the same URLs across raven-python and
MATLAB RAVEN. Use Zenodo only for DOIs or files >2 GB. KEGG artefacts are redistributed with
permission, so the prior 'confirm rights' caveat is removed. Example/schema URLs repointed
from a hypothetical raven-data repo to raven-python.

* Add yeast-GEM-derived shared modules (diff, annotation, conditions, biomass, curation) (#17)

* Add diff_models, annotation, and conditions modules for yeast-GEM port

Lands the upstream-shareable pieces that yeast-GEM has been implementing
locally during its Python port (see yeast-GEM/code/python/PORTING_PLAN.md
and UPSTREAM_CANDIDATES.md). These are organism-agnostic; yeast-GEM
will consume them via a Python dependency on raven-python.

New modules
-----------
raven_python.comparison.diff
    diff_models(a, b, ...) -> DiffReport — strict two-model semantic-
    equality diff. Complements the existing compare_models (N-model
    presence-matrix overview). Used as a CI gate to verify that two
    toolchains (e.g. MATLAB RAVEN vs raven_python, pre/post refactor
    of one toolchain) produce equivalent models. Includes a
    python -m raven_python.comparison.diff CLI.

raven_python.annotation.sbo
    add_sbo_terms — SBO term assignment with "fill" semantic. Default
    parameter set reproduces yeast-GEM's behaviour; biomass metabolite
    names, biomass/NGAM reaction names, and pseudoreaction substrings
    are overridable. Transport detection is pluggable (default: same-
    met-name in two compartments). Includes an `only_last_reaction_
    for_pseudo` legacy bug-compat flag for yeast-GEM's lock-step
    migration; off by default for any new caller.

raven_python.annotation.delta_g
    load_delta_g_csv / save_delta_g_csv — generic side-car CSV
    persistence for scalar notes (ΔG by default, but the note key,
    column names, and id/value mapping are all configurable).

raven_python.conditions.apply
    apply_condition(model, yaml_or_dict) — generic "apply this YAML
    condition" loader. Schema: prelude (reset_exchanges),
    cofactor_pseudoreaction (remove_mets + charge_balance_met),
    biomass_stoichiometry_delta, per-rxn bounds, expected_uptake_count.
    Project-specific extensions (e.g. yeast-GEM's amino_acid_ratio)
    are handled by the caller before/after this function — kept
    upstream-narrow on purpose. Also exposes set_reaction_bounds
    helper that bypasses cobra's lb<=ub validator for the (legitimate)
    cases where a condition lands on an infeasible bound state.

Tests
-----
46 new tests across the three modules; full pre-existing raven-python
suite still passes (519 passed; 1 unrelated pre-existing openpyxl
ImportError in tests/test_io_git.py; 2 skipped). ruff clean.

Not in this commit
------------------
The biomass / GAM / chemostat / fit_gam modules are still tracked as
upstream candidates in yeast-GEM/code/python/UPSTREAM_CANDIDATES.md
and remain local in yeast-GEM until phase 4 of the port (which would
ideally land them directly here).

* Add raven_python.biomass — sum / scale / rescale / set_gam

Generic biomass-equation manipulation, extracted from yeast-GEM's
sumBioMass / scaleBioMass / rescalePseudoReaction / changeGAM as
yeast-GEM moves to depend on raven-python (yeast-GEM phase 4 of the
porting plan).

Module layout
-------------
raven_python.biomass.config
    BiomassConfig — biomass_rxn id + proton_met id + ordered tuple
    of BiomassComponent (per-component pseudoreaction name + mass-
    computation strategy).

raven_python.biomass.scale
    sum_biomass(model, config) → {component: g/gDW, ..., "total": X}
    rescale_pseudoreaction(model, config, name, factor) — multiply
        the pseudoreaction's substrate coefs by factor and rebalance
        H+ to keep ionic charge at zero.
    scale_biomass(model, config, name, new_value, balance_out=None) —
        rescale a component to a target fraction, optionally balancing
        a second component so the total stays at 1 g/gDW.

raven_python.biomass.gam
    set_gam(model, value, *, biomass_rxn, cofactor_met_names,
            ngam_rxn=None, ngam_value=None) — scales every metabolite
    in the biomass pseudoreaction whose `name` is in the supplied
    cofactor set, preserving its sign; optionally fixes the NGAM rxn
    bounds.

Mass strategies (per BiomassComponent.mass_strategy):
    "mw"               plain MW from chemical formula (carbohydrate /
                       ion / cofactor)
    "mw_minus_2h"      MW − 2.016 g/mol per substrate (protein —
                       charged tRNAs release two protons)
    "mw_minus_water"   MW − 18.015 g/mol per substrate (RNA / DNA —
                       polymerisation releases one water)
    "grams"            stoichiometry already in g/gDW (lipid backbone)

Tests: 19 new tests over a synthetic toy model that exercises every
mass strategy, the H+ charge rebalance, scale_biomass with and
without balance_out, set_gam on cofactor mets (and the NGAM bound
path).

* Add raven_python.manipulation.find_duplicate_reactions (detection variant)

Detection-only counterpart to remove_duplicate_reactions. Returns
duplicate groups instead of mutating the model. Ignores bounds /
GPR / objective — only stoichiometry is compared, mirroring the
typical curation use case ("find reactions that could be merged").

A new ``ignore_direction=True`` default (yeast-GEM convention)
treats A→B and B→A as duplicates. Set False to require identical
orientation.

Used by yeast-GEM's modelTests port (Tier 3 / phase 5) to flag
duplicate reactions during curation review.

* Add raven_python.curation — batch_curate / batch_curate_from_tsv

Generic batch curation engine extracted from yeast-GEM's MATLAB
curateMetsRxnsGenes (yeast-GEM phase 6). Adds or updates
metabolites, reactions and genes from pandas DataFrames; a
batch_curate_from_tsv convenience wrapper reads the equivalent TSVs.

Schema (matches yeast-GEM's data/modelCuration/template/ layout):

  mets_df          metNames, comps, formula, charge, inchi, metNotes
                   + any number of MIRIAM-namespace columns
  genes_df         genes, geneShortNames + MIRIAM columns
  rxns_df          rxnNames, grRules, lb, ub, rev, subSystems,
                   eccodes, rxnNotes, rxnReferences,
                   rxnConfidenceScores + MIRIAM columns
  rxns_coeffs_df   rxnNames, metNames, comps, coefficient
                   (one row per (reaction, metabolite) pair)

Match keys:
  metabolites — (name, compartment) tuple
  genes       — gene id
  reactions   — stoichiometric signature

Existing entities get their annotations overwritten (warning emitted);
new entities are added with fresh ids generated from the supplied
``met_id_prefix`` / ``rxn_id_prefix`` (defaults M_ / R_ per the BiGG
convention; yeast-GEM passes s_ / r_). Width of the existing
zero-padded suffix is preserved so s_0001 → s_0002, not s_2.

"Everything after the core columns is MIRIAM" — the header of any
extra column becomes the annotation namespace key. Matches MATLAB
behaviour exactly so yeast-GEM's existing TSVs work unchanged, and
projects with different MIRIAM column sets need no code change.

CurationResult dataclass records what was added vs updated so
callers can verify in tests / CI.

Tests: 13 new (add/update mets, add/update genes, add/update rxns
by stoichiometry, miriam auto-detect, id-width preservation,
combined mets+rxns in one call, missing-metabolite error,
batch_curate_from_tsv round trip).

* io.yaml: byte-compatible round-trip with cobrapy + RAVEN MATLAB

Three things this fixes:

  1. write_yaml_model dropped the !!omap tags entirely. _to_plain
     was flattening cobra's OrderedDict to plain dict, which causes
     ruamel to emit ordinary block mappings. RAVEN MATLAB's reader
     is a line-based parser keyed on !!omap and therefore could not
     load any file we wrote. _to_plain now returns OrderedDict so
     ruamel re-emits the !!omap tag.

  2. eccodes was lost on round-trip — it wasn't in _RXN_FIELDS, so
     read_yaml_model didn't capture it into .notes and
     write_yaml_model couldn't lift it back. Added.

  3. RAVEN MATLAB writes reaction notes as 'rxnNotes'; cobrapy and
     this writer use 'notes'. Added a read-time alias so existing
     yeast-GEM YAML files (which still say 'rxnNotes') load
     cleanly. Writes go out as 'notes' (cobrapy-canonical).

Top-level layout now matches RAVEN MATLAB: metaData first, then
metabolites / reactions / genes / compartments, then optional
gecko_light + ec-rxns + ec-enzymes. id/name/version live inside
metaData (RAVEN convention) — cobrapy reading these files still
works, but cobra_model.id ends up None because cobrapy doesn't
know about metaData. raven_python.read_yaml_model lifts both
metaData.id/name/version onto model.id / model.name /
model.notes['version'] so the rest of the codebase doesn't care
which layout the file used.

Empty-name genes are no longer emitted as  — that's a
cobrapy quirk that drifts yeast-GEM YAML files away from RAVEN
MATLAB's output.

Verified end-to-end:

  *  cobra.io.load_yaml_model reads every file the new writer
     produces (yeast-GEM and a synthetic fixture).
  *  RAVEN MATLAB readYAMLmodel reads every file the new writer
     produces.
  *  Round-tripping yeast-GEM through raven_python preserves
     2748/2748 metabolites, 4102/4102 reactions, 1143/1143 genes,
     2411 eccodes, 3984 reaction deltaG, 2696 metabolite deltaG,
     1788 SMILES, 1443 rxn-notes — no semantic drift.

Tests
-----
  *  tests/test_io_yaml_parity.py is new: covers every RAVEN
     extension, the rxnNotes legacy alias, the SMILES YAML-special
     character case, metaData-first layout, and cobra readability.
  *  tests/test_io_yaml.py::test_output_is_cobra_readable adjusts
     for the metaData layout (cobra recovers mets/rxns/annotation
     but not model.id, by design).

* conditions: switch from PyYAML to ruamel.yaml

PyYAML is not a project dependency; raven-python uses ruamel.yaml
(already pulled in via cobra) everywhere else. The conditions
module and its tests still imported PyYAML, which broke pytest
collection on clean CI runners with 'No module named yaml'.

Both apply.py and the test now use a YAML(typ='safe') instance
from ruamel.yaml — same plain-dict semantics as PyYAML's
safe_load / safe_dump, no new dependency.

* io.yaml: document the format + accept legacy geckoLight-in-metaData

Adds docs/reference/yaml_format.md as the canonical schema reference
for the cross-toolchain YAML format (cobrapy / raven-python / RAVEN
MATLAB). Covers the full document shape, per-entry field order,
RAVEN extensions, the GECKO ec-* sections, the metaData provenance
block, number / string / quoting rules, and the cross-reader
interoperability matrix. Linked from docs/reference/index.md and
the I/O guide.

Reader fix: pre-shim RAVEN MATLAB writes emitted GECKO models
with geckoLight: "true" inside the metaData block (not as a
top-level gecko_light). The reader now lifts that legacy key out
of metaData so model.ec.gecko_light is populated whichever
placement the file used. Round-trip writes always use the new
top-level form.

Regression tests:

  test_pre_shim_format_loads — synthetic fixture covering every
  legacy quirk we know about (--- doc marker, plain metaData,
  geckoLight inside metaData, top-level metabolite smiles,
  rxnNotes reaction key, integer bounds, double-quoted strings).
  Each quirk has its own assertion + comment.

  test_pre_shim_yeast_gem_loads_if_available — sanity-loads the
  real yeast-GEM.yml (2748 mets, 4102 rxns, 1143 genes) and
  asserts the documented preserved-counts table from the format
  reference. Skipped on CI runners where the working copy isn't
  mounted.

* Cobra-aligned hardening pass from full code review (#18)

* Add diff_models, annotation, and conditions modules for yeast-GEM port

Lands the upstream-shareable pieces that yeast-GEM has been implementing
locally during its Python port (see yeast-GEM/code/python/PORTING_PLAN.md
and UPSTREAM_CANDIDATES.md). These are organism-agnostic; yeast-GEM
will consume them via a Python dependency on raven-python.

New modules
-----------
raven_python.comparison.diff
    diff_models(a, b, ...) -> DiffReport — strict two-model semantic-
    equality diff. Complements the existing compare_models (N-model
    presence-matrix overview). Used as a CI gate to verify that two
    toolchains (e.g. MATLAB RAVEN vs raven_python, pre/post refactor
    of one toolchain) produce equivalent models. Includes a
    python -m raven_python.comparison.diff CLI.

raven_python.annotation.sbo
    add_sbo_terms — SBO term assignment with "fill" semantic. Default
    parameter set reproduces yeast-GEM's behaviour; biomass metabolite
    names, biomass/NGAM reaction names, and pseudoreaction substrings
    are overridable. Transport detection is pluggable (default: same-
    met-name in two compartments). Includes an `only_last_reaction_
    for_pseudo` legacy bug-compat flag for yeast-GEM's lock-step
    migration; off by default for any new caller.

raven_python.annotation.delta_g
    load_delta_g_csv / save_delta_g_csv — generic side-car CSV
    persistence for scalar notes (ΔG by default, but the note key,
    column names, and id/value mapping are all configurable).

raven_python.conditions.apply
    apply_condition(model, yaml_or_dict) — generic "apply this YAML
    condition" loader. Schema: prelude (reset_exchanges),
    cofactor_pseudoreaction (remove_mets + charge_balance_met),
    biomass_stoichiometry_delta, per-rxn bounds, expected_uptake_count.
    Project-specific extensions (e.g. yeast-GEM's amino_acid_ratio)
    are handled by the caller before/after this function — kept
    upstream-narrow on purpose. Also exposes set_reaction_bounds
    helper that bypasses cobra's lb<=ub validator for the (legitimate)
    cases where a condition lands on an infeasible bound state.

Tests
-----
46 new tests across the three modules; full pre-existing raven-python
suite still passes (519 passed; 1 unrelated pre-existing openpyxl
ImportError in tests/test_io_git.py; 2 skipped). ruff clean.

Not in this commit
------------------
The biomass / GAM / chemostat / fit_gam modules are still tracked as
upstream candidates in yeast-GEM/code/python/UPSTREAM_CANDIDATES.md
and remain local in yeast-GEM until phase 4 of the port (which would
ideally land them directly here).

* Add raven_python.biomass — sum / scale / rescale / set_gam

Generic biomass-equation manipulation, extracted from yeast-GEM's
sumBioMass / scaleBioMass / rescalePseudoReaction / changeGAM as
yeast-GEM moves to depend on raven-python (yeast-GEM phase 4 of the
porting plan).

Module layout
-------------
raven_python.biomass.config
    BiomassConfig — biomass_rxn id + proton_met id + ordered tuple
    of BiomassComponent (per-component pseudoreaction name + mass-
    computation strategy).

raven_python.biomass.scale
    sum_biomass(model, config) → {component: g/gDW, ..., "total": X}
    rescale_pseudoreaction(model, config, name, factor) — multiply
        the pseudoreaction's substrate coefs by factor and rebalance
        H+ to keep ionic charge at zero.
    scale_biomass(model, config, name, new_value, balance_out=None) —
        rescale a component to a target fraction, optionally balancing
        a second component so the total stays at 1 g/gDW.

raven_python.biomass.gam
    set_gam(model, value, *, biomass_rxn, cofactor_met_names,
            ngam_rxn=None, ngam_value=None) — scales every metabolite
    in the biomass pseudoreaction whose `name` is in the supplied
    cofactor set, preserving its sign; optionally fixes the NGAM rxn
    bounds.

Mass strategies (per BiomassComponent.mass_strategy):
    "mw"               plain MW from chemical formula (carbohydrate /
                       ion / cofactor)
    "mw_minus_2h"      MW − 2.016 g/mol per substrate (protein —
                       charged tRNAs release two protons)
    "mw_minus_water"   MW − 18.015 g/mol per substrate (RNA / DNA —
                       polymerisation releases one water)
    "grams"            stoichiometry already in g/gDW (lipid backbone)

Tests: 19 new tests over a synthetic toy model that exercises every
mass strategy, the H+ charge rebalance, scale_biomass with and
without balance_out, set_gam on cofactor mets (and the NGAM bound
path).

* Add raven_python.manipulation.find_duplicate_reactions (detection variant)

Detection-only counterpart to remove_duplicate_reactions. Returns
duplicate groups instead of mutating the model. Ignores bounds /
GPR / objective — only stoichiometry is compared, mirroring the
typical curation use case ("find reactions that could be merged").

A new ``ignore_direction=True`` default (yeast-GEM convention)
treats A→B and B→A as duplicates. Set False to require identical
orientation.

Used by yeast-GEM's modelTests port (Tier 3 / phase 5) to flag
duplicate reactions during curation review.

* Add raven_python.curation — batch_curate / batch_curate_from_tsv

Generic batch curation engine extracted from yeast-GEM's MATLAB
curateMetsRxnsGenes (yeast-GEM phase 6). Adds or updates
metabolites, reactions and genes from pandas DataFrames; a
batch_curate_from_tsv convenience wrapper reads the equivalent TSVs.

Schema (matches yeast-GEM's data/modelCuration/template/ layout):

  mets_df          metNames, comps, formula, charge, inchi, metNotes
                   + any number of MIRIAM-namespace columns
  genes_df         genes, geneShortNames + MIRIAM columns
  rxns_df          rxnNames, grRules, lb, ub, rev, subSystems,
                   eccodes, rxnNotes, rxnReferences,
                   rxnConfidenceScores + MIRIAM columns
  rxns_coeffs_df   rxnNames, metNames, comps, coefficient
                   (one row per (reaction, metabolite) pair)

Match keys:
  metabolites — (name, compartment) tuple
  genes       — gene id
  reactions   — stoichiometric signature

Existing entities get their annotations overwritten (warning emitted);
new entities are added with fresh ids generated from the supplied
``met_id_prefix`` / ``rxn_id_prefix`` (defaults M_ / R_ per the BiGG
convention; yeast-GEM passes s_ / r_). Width of the existing
zero-padded suffix is preserved so s_0001 → s_0002, not s_2.

"Everything after the core columns is MIRIAM" — the header of any
extra column becomes the annotation namespace key. Matches MATLAB
behaviour exactly so yeast-GEM's existing TSVs work unchanged, and
projects with different MIRIAM column sets need no code change.

CurationResult dataclass records what was added vs updated so
callers can verify in tests / CI.

Tests: 13 new (add/update mets, add/update genes, add/update rxns
by stoichiometry, miriam auto-detect, id-width preservation,
combined mets+rxns in one call, missing-metabolite error,
batch_curate_from_tsv round trip).

* io.yaml: byte-compatible round-trip with cobrapy + RAVEN MATLAB

Three things this fixes:

  1. write_yaml_model dropped the !!omap tags entirely. _to_plain
     was flattening cobra's OrderedDict to plain dict, which causes
     ruamel to emit ordinary block mappings. RAVEN MATLAB's reader
     is a line-based parser keyed on !!omap and therefore could not
     load any file we wrote. _to_plain now returns OrderedDict so
     ruamel re-emits the !!omap tag.

  2. eccodes was lost on round-trip — it wasn't in _RXN_FIELDS, so
     read_yaml_model didn't capture it into .notes and
     write_yaml_model couldn't lift it back. Added.

  3. RAVEN MATLAB writes reaction notes as 'rxnNotes'; cobrapy and
     this writer use 'notes'. Added a read-time alias so existing
     yeast-GEM YAML files (which still say 'rxnNotes') load
     cleanly. Writes go out as 'notes' (cobrapy-canonical).

Top-level layout now matches RAVEN MATLAB: metaData first, then
metabolites / reactions / genes / compartments, then optional
gecko_light + ec-rxns + ec-enzymes. id/name/version live inside
metaData (RAVEN convention) — cobrapy reading these files still
works, but cobra_model.id ends up None because cobrapy doesn't
know about metaData. raven_python.read_yaml_model lifts both
metaData.id/name/version onto model.id / model.name /
model.notes['version'] so the rest of the codebase doesn't care
which layout the file used.

Empty-name genes are no longer emitted as  — that's a
cobrapy quirk that drifts yeast-GEM YAML files away from RAVEN
MATLAB's output.

Verified end-to-end:

  *  cobra.io.load_yaml_model reads every file the new writer
     produces (yeast-GEM and a synthetic fixture).
  *  RAVEN MATLAB readYAMLmodel reads every file the new writer
     produces.
  *  Round-tripping yeast-GEM through raven_python preserves
     2748/2748 metabolites, 4102/4102 reactions, 1143/1143 genes,
     2411 eccodes, 3984 reaction deltaG, 2696 metabolite deltaG,
     1788 SMILES, 1443 rxn-notes — no semantic drift.

Tests
-----
  *  tests/test_io_yaml_parity.py is new: covers every RAVEN
     extension, the rxnNotes legacy alias, the SMILES YAML-special
     character case, metaData-first layout, and cobra readability.
  *  tests/test_io_yaml.py::test_output_is_cobra_readable adjusts
     for the metaData layout (cobra recovers mets/rxns/annotation
     but not model.id, by design).

* conditions: switch from PyYAML to ruamel.yaml

PyYAML is not a project dependency; raven-python uses ruamel.yaml
(already pulled in via cobra) everywhere else. The conditions
module and its tests still imported PyYAML, which broke pytest
collection on clean CI runners with 'No module named yaml'.

Both apply.py and the test now use a YAML(typ='safe') instance
from ruamel.yaml — same plain-dict semantics as PyYAML's
safe_load / safe_dump, no new dependency.

* io.yaml: document the format + accept legacy geckoLight-in-metaData

Adds docs/reference/yaml_format.md as the canonical schema reference
for the cross-toolchain YAML format (cobrapy / raven-python / RAVEN
MATLAB). Covers the full document shape, per-entry field order,
RAVEN extensions, the GECKO ec-* sections, the metaData provenance
block, number / string / quoting rules, and the cross-reader
interoperability matrix. Linked from docs/reference/index.md and
the I/O guide.

Reader fix: pre-shim RAVEN MATLAB writes emitted GECKO models
with geckoLight: "true" inside the metaData block (not as a
top-level gecko_light). The reader now lifts that legacy key out
of metaData so model.ec.gecko_light is populated whichever
placement the file used. Round-trip writes always use the new
top-level form.

Regression tests:

  test_pre_shim_format_loads — synthetic fixture covering every
  legacy quirk we know about (--- doc marker, plain metaData,
  geckoLight inside metaData, top-level metabolite smiles,
  rxnNotes reaction key, integer bounds, double-quoted strings).
  Each quirk has its own assertion + comment.

  test_pre_shim_yeast_gem_loads_if_available — sanity-loads the
  real yeast-GEM.yml (2748 mets, 4102 rxns, 1143 genes) and
  asserts the documented preserved-counts table from the format
  reference. Skipped on CI runners where the working copy isn't
  mounted.

* Cobra-aligned hardening pass from full code review

No behaviour change on well-formed inputs. Highlights:

- Packaging: derive __version__ from package metadata (was a stale
  hard-coded "0.0.1" that the docs site reported); pin ruff==0.15.15 in
  the dev extra and CI; fix two lint errors unpinned ruff started flagging.
- Errors: solver/feasibility failures in run_init, run_ftinit, fill_tasks
  and random_sampling now raise cobra.exceptions.OptimizationError instead
  of bare RuntimeError (consistent with the rest of the package).
- Consistency: single utils.parse.subsystem_to_str coerces reaction
  subsystem to cobra's canonical str across io.excel / comparison.compare /
  curation.batch / manipulation.add (fixes a crash on non-string items and
  the silent drop of multi-subsystem reactions); shared GPR score
  aggregators in utils.gpr used by init.score and init.genes; KEGG-download
  progress uses a module logger instead of print.
- Robustness: zip path-traversal guard in binaries.py; penalty>0 check in
  connect_blocked_reactions; NaN-sample guard in random_sampling; all-zero
  ec coupling warning; optional verify= SHA256 re-check on data cache hits;
  non-finite z-score guard in reporter. Regression tests added for each.

* io.yaml: reaction EC codes as cobra annotation ec-code (#19)

* Add diff_models, annotation, and conditions modules for yeast-GEM port

Lands the upstream-shareable pieces that yeast-GEM has been implementing
locally during its Python port (see yeast-GEM/code/python/PORTING_PLAN.md
and UPSTREAM_CANDIDATES.md). These are organism-agnostic; yeast-GEM
will consume them via a Python dependency on raven-python.

New modules
-----------
raven_python.comparison.diff
    diff_models(a, b, ...) -> DiffReport — strict two-model semantic-
    equality diff. Complements the existing compare_models (N-model
    presence-matrix overview). Used as a CI gate to verify that two
    toolchains (e.g. MATLAB RAVEN vs raven_python, pre/post refactor
    of one toolchain) produce equivalent models. Includes a
    python -m raven_python.comparison.diff CLI.

raven_python.annotation.sbo
    add_sbo_terms — SBO term assignment with "fill" semantic. Default
    parameter set reproduces yeast-GEM's behaviour; biomass metabolite
    names, biomass/NGAM reaction names, and pseudoreaction substrings
    are overridable. Transport detection is pluggable (default: same-
    met-name in two compartments). Includes an `only_last_reaction_
    for_pseudo` legacy bug-compat flag for yeast-GEM's lock-step
    migration; off by default for any new caller.

raven_python.annotation.delta_g
    load_delta_g_csv / save_delta_g_csv — generic side-car CSV
    persistence for scalar notes (ΔG by default, but the note key,
    column names, and id/value mapping are all configurable).

raven_python.conditions.apply
    apply_condition(model, yaml_or_dict) — generic "apply this YAML
    condition" loader. Schema: prelude (reset_exchanges),
    cofactor_pseudoreaction (remove_mets + charge_balance_met),
    biomass_stoichiometry_delta, per-rxn bounds, expected_uptake_count.
    Project-specific extensions (e.g. yeast-GEM's amino_acid_ratio)
    are handled by the caller before/after this function — kept
    upstream-narrow on purpose. Also exposes set_reaction_bounds
    helper that bypasses cobra's lb<=ub validator for the (legitimate)
    cases where a condition lands on an infeasible bound state.

Tests
-----
46 new tests across the three modules; full pre-existing raven-python
suite still passes (519 passed; 1 unrelated pre-existing openpyxl
ImportError in tests/test_io_git.py; 2 skipped). ruff clean.

Not in this commit
------------------
The biomass / GAM / chemostat / fit_gam modules are still tracked as
upstream candidates in yeast-GEM/code/python/UPSTREAM_CANDIDATES.md
and remain local in yeast-GEM until phase 4 of the port (which would
ideally land them directly here).

* Add raven_python.biomass — sum / scale / rescale / set_gam

Generic biomass-equation manipulation, extracted from yeast-GEM's
sumBioMass / scaleBioMass / rescalePseudoReaction / changeGAM as
yeast-GEM moves to depend on raven-python (yeast-GEM phase 4 of the
porting plan).

Module layout
-------------
raven_python.biomass.config
    BiomassConfig — biomass_rxn id + proton_met id + ordered tuple
    of BiomassComponent (per-component pseudoreaction name + mass-
    computation strategy).

raven_python.biomass.scale
    sum_biomass(model, config) → {component: g/gDW, ..., "total": X}
    rescale_pseudoreaction(model, config, name, factor) — multiply
        the pseudoreaction's substrate coefs by factor and rebalance
        H+ to keep ionic charge at zero.
    scale_biomass(model, config, name, new_value, balance_out=None) —
        rescale a component to a target fraction, optionally balancing
        a second component so the total stays at 1 g/gDW.

raven_python.biomass.gam
    set_gam(model, value, *, biomass_rxn, cofactor_met_names,
            ngam_rxn=None, ngam_value=None) — scales every metabolite
    in the biomass pseudoreaction whose `name` is in the supplied
    cofactor set, preserving its sign; optionally fixes the NGAM rxn
    bounds.

Mass strategies (per BiomassComponent.mass_strategy):
    "mw"               plain MW from chemical formula (carbohydrate /
                       ion / cofactor)
    "mw_minus_2h"      MW − 2.016 g/mol per substrate (protein —
                       charged tRNAs release two protons)
    "mw_minus_water"   MW − 18.015 g/mol per substrate (RNA / DNA —
                       polymerisation releases one water)
    "grams"            stoichiometry already in g/gDW (lipid backbone)

Tests: 19 new tests over a synthetic toy model that exercises every
mass strategy, the H+ charge rebalance, scale_biomass with and
without balance_out, set_gam on cofactor mets (and the NGAM bound
path).

* Add raven_python.manipulation.find_duplicate_reactions (detection variant)

Detection-only counterpart to remove_duplicate_reactions. Returns
duplicate groups instead of mutating the model. Ignores bounds /
GPR / objective — only stoichiometry is compared, mirroring the
typical curation use case ("find reactions that could be merged").

A new ``ignore_direction=True`` default (yeast-GEM convention)
treats A→B and B→A as duplicates. Set False to require identical
orientation.

Used by yeast-GEM's modelTests port (Tier 3 / phase 5) to flag
duplicate reactions during curation review.

* Add raven_python.curation — batch_curate / batch_curate_from_tsv

Generic batch curation engine extracted from yeast-GEM's MATLAB
curateMetsRxnsGenes (yeast-GEM phase 6). Adds or updates
metabolites, reactions and genes from pandas DataFrames; a
batch_curate_from_tsv convenience wrapper reads the equivalent TSVs.

Schema (matches yeast-GEM's data/modelCuration/template/ layout):

  mets_df          metNames, comps, formula, charge, inchi, metNotes
                   + any number of MIRIAM-namespace columns
  genes_df         genes, geneShortNames + MIRIAM columns
  rxns_df          rxnNames, grRules, lb, ub, rev, subSystems,
                   eccodes, rxnNotes, rxnReferences,
                   rxnConfidenceScores + MIRIAM columns
  rxns_coeffs_df   rxnNames, metNames, comps, coefficient
                   (one row per (reaction, metabolite) pair)

Match keys:
  metabolites — (name, compartment) tuple
  genes       — gene id
  reactions   — stoichiometric signature

Existing entities get their annotations overwritten (warning emitted);
new entities are added with fresh ids generated from the supplied
``met_id_prefix`` / ``rxn_id_prefix`` (defaults M_ / R_ per the BiGG
convention; yeast-GEM passes s_ / r_). Width of the existing
zero-padded suffix is preserved so s_0001 → s_0002, not s_2.

"Everything after the core columns is MIRIAM" — the header of any
extra column becomes the annotation namespace key. Matches MATLAB
behaviour exactly so yeast-GEM's existing TSVs work unchanged, and
projects with different MIRIAM column sets need no code change.

CurationResult dataclass records what was added vs updated so
callers can verify in tests / CI.

Tests: 13 new (add/update mets, add/update genes, add/update rxns
by stoichiometry, miriam auto-detect, id-width preservation,
combined mets+rxns in one call, missing-metabolite error,
batch_curate_from_tsv round trip).

* io.yaml: byte-compatible round-trip with cobrapy + RAVEN MATLAB

Three things this fixes:

  1. write_yaml_model dropped the !!omap tags entirely. _to_plain
     was flattening cobra's OrderedDict to plain dict, which causes
     ruamel to emit ordinary block mappings. RAVEN MATLAB's reader
     is a line-based parser keyed on !!omap and therefore could not
     load any file we wrote. _to_plain now returns OrderedDict so
     ruamel re-emits the !!omap tag.

  2. eccodes was lost on round-trip — it wasn't in _RXN_FIELDS, so
     read_yaml_model didn't capture it into .notes and
     write_yaml_model couldn't lift it back. Added.

  3. RAVEN MATLAB writes reaction notes as 'rxnNotes'; cobrapy and
     this writer use 'notes'. Added a read-time alias so existing
     yeast-GEM YAML files (which still say 'rxnNotes') load
     cleanly. Writes go out as 'notes' (cobrapy-canonical).

Top-level layout now matches RAVEN MATLAB: metaData first, then
metabolites / reactions / genes / compartments, then optional
gecko_light + ec-rxns + ec-enzymes. id/name/version live inside
metaData (RAVEN convention) — cobrapy reading these files still
works, but cobra_model.id ends up None because cobrapy doesn't
know about metaData. raven_python.read_yaml_model lifts both
metaData.id/name/version onto model.id / model.name /
model.notes['version'] so the rest of the codebase doesn't care
which layout the file used.

Empty-name genes are no longer emitted as  — that's a
cobrapy quirk that drifts yeast-GEM YAML files away from RAVEN
MATLAB's output.

Verified end-to-end:

  *  cobra.io.load_yaml_model reads every file the new writer
     produces (yeast-GEM and a synthetic fixture).
  *  RAVEN MATLAB readYAMLmodel reads every file the new writer
     produces.
  *  Round-tripping yeast-GEM through raven_python preserves
     2748/2748 metabolites, 4102/4102 reactions, 1143/1143 genes,
     2411 eccodes, 3984 reaction deltaG, 2696 metabolite deltaG,
     1788 SMILES, 1443 rxn-notes — no semantic drift.

Tests
-----
  *  tests/test_io_yaml_parity.py is new: covers every RAVEN
     extension, the rxnNotes legacy alias, the SMILES YAML-special
     character case, metaData-first layout, and cobra readability.
  *  tests/test_io_yaml.py::test_output_is_cobra_readable adjusts
     for the metaData layout (cobra recovers mets/rxns/annotation
     but not model.id, by design).

* conditions: switch from PyYAML to ruamel.yaml

PyYAML is not a project dependency; raven-python uses ruamel.yaml
(already pulled in via cobra) everywhere else. The conditions
module and its tests still imported PyYAML, which broke pytest
collection on clean CI runners with 'No module named yaml'.

Both apply.py and the test now use a YAML(typ='safe') instance
from ruamel.yaml — same plain-dict semantics as PyYAML's
safe_load / safe_dump, no new dependency.

* io.yaml: document the format + accept legacy geckoLight-in-metaData

Adds docs/reference/yaml_format.md as the canonical schema reference
for the cross-toolchain YAML format (cobrapy / raven-python / RAVEN
MATLAB). Covers the full document shape, per-entry field order,
RAVEN extensions, the GECKO ec-* sections, the metaData provenance
block, number / string / quoting rules, and the cross-reader
interoperability matrix. Linked from docs/reference/index.md and
the I/O guide.

Reader fix: pre-shim RAVEN MATLAB writes emitted GECKO models
with geckoLight: "true" inside the metaData block (not as a
top-level gecko_light). The reader now lifts that legacy key out
of metaData so model.ec.gecko_light is populated whichever
placement the file used. Round-trip writes always use the new
top-level form.

Regression tests:

  test_pre_shim_format_loads — synthetic fixture covering every
  legacy quirk we know about (--- doc marker, plain metaData,
  geckoLight inside metaData, top-level metabolite smiles,
  rxnNotes reaction key, integer bounds, double-quoted strings).
  Each quirk has its own assertion + comment.

  test_pre_shim_yeast_gem_loads_if_available — sanity-loads the
  real yeast-GEM.yml (2748 mets, 4102 rxns, 1143 genes) and
  asserts the documented preserved-counts table from the format
  reference. Skipped on CI runners where the working copy isn't
  mounted.

* io.yaml: represent reaction EC codes as cobra annotation['ec-code']

EC numbers are a standard MIRIAM cross-reference, so the cobra-native
representation is annotation['ec-code'] (a list) -- exactly where cobrapy
and geckopy read them. raven-python was instead routing RAVEN's legacy
top-level `eccodes` key into model.notes['eccodes'], so reaction EC codes
written by RAVEN-MATLAB never reached the annotation['ec-code'] location
geckopy reads from.

- Drop `eccodes` from _RXN_FIELDS (it is not a RAVEN-only notes field).
- Add _lift_eccodes_to_annotation: a legacy top-level `eccodes` (a
  ;-joined string or a list) is lifted into annotation['ec-code'] on read,
  mirroring the existing _lift_smiles_to_annotation; a native
  annotation['ec-code'] wins.
- On write, EC codes serialise via cobra's annotation block; no top-level
  `eccodes` is emitted.
- Update test_io_yaml_parity expectations to the cobra-aligned location
  (verified against the real yeast-GEM.yml: 2411 reactions).

* docs: update RAVEN cross-references for the post-reorg folder layout (#20)

RAVEN moved its functions out of the core/ catch-all into purpose-based top-level
folders (SysBioChalmers/RAVEN#614). Repoint every RAVEN file path in the
cross-reference docs (IMPROVEMENTS.md, docs/reference/matlab_raven_backports.md):

- FSEOF / randomSampling / reporterMetabolites -> analysis/
- parseTaskList / checkTasks                   -> tasks/
- fillGaps                                      -> gapfilling/
- addRxns / changeRxns / standardizeGrRules     -> manipulation/
- getIndexes / checkModelStruct / getElementalBalance -> queries/
- getModelFromHomology                          -> reconstruction/homology/
- getKEGGModelForOrganism                       -> reconstruction/kegg/
- runINIT / ftINIT                              -> INIT/

Also corrects references that were stale even before the reorg (getKEGGModelForOrganism
was in external/kegg/) and points the proposed GPR-lint back-port findPotentialErrors at
manipulation/, alongside standardizeGrRules.

Doc-only: raven-python's module layout already matches RAVEN's new structure (it was the
template the reorg mirrored), so no code changes are needed.

* refactor: rename plotting subpackage to visualization (#21)

Align the (stub) plotting subpackage with RAVEN's folder layout, where pathway +
plotting were unified into visualization/ (SysBioChalmers/RAVEN#614).

- src/raven_python/plotting/ -> src/raven_python/visualization/
- pyproject optional-dependency extra [plotting] -> [visualization] (matplotlib);
  CI (.github/workflows/ci.yml) and ReadTheDocs (.readthedocs.yaml) install lines
  updated to match so the renamed extra still resolves.
- docs updated: README, CHANGELOG, installation, docs/README, api/index, todo.

The subpackage is an unimplemented stub (empty __init__), so nothing imports it and
there is no behaviour change. Generic uses of the word "plotting" (seaborn / heatmap
prose) are left as-is.

* Ship type information and enforce it; make gpr_to_dnf public (#22)

Three related "make the package's contracts real" changes:

- Add the PEP 561 py.typed marker so the package's extensive type hints are
  visible to downstream type checkers (geckopy included). The hatchling wheel
  ships raven_python/py.typed.
- Add mypy to the dev extra, a lenient [tool.mypy] config (ignore_missing_imports
  for the un-stubbed cobra/optlang/scipy/ruamel), and a mypy CI job. Fix the 36
  type errors this surfaced -- all type-only (Path vs str annotations, None-guards
  that match existing behaviour, optlang Variable typing, isinstance/cast
  narrowing). No runtime behaviour changes; the full test suite stays green.
- Promote manipulation.expand._gpr_to_dnf to a public gpr_to_dnf (re-exported
  from raven_python.manipulation). geckopy's call sites switch to it in lockstep
  (separate PR), so no deprecated alias is kept.

* Harden curation, EC-data and archive-handling modules (#23)

Tier-2 audit of the post-review modules surfaced four targeted fixes:

- curation/batch.py: new reactions coerce a list-valued subSystems via
  subsystem_to_str (";"-joined) instead of str(list), matching the update path.
- io/ec_data.py: _eccodes_to_yaml strips stray separators in the single-EC
  case so a trailing ";" never leaks into the written YAML.
- binaries.py: _safe_extract_zip rejects symlink members, defence-in-depth
  alongside the existing path-traversal guard.
- binaries.py / data.py: archive and dataset downloads pass a socket timeout
  to urlopen so a stalled server cannot hang the process.

Adds regression tests for each fix.

* Surgical performance pass on hot paths (#24)

Targeted, behaviour-preserving optimisations from the review:

- manipulation/add.py + change.py: resolve equation tokens through a shared
  (name, compartment) -> metabolite index (_build_met_index) instead of
  re-scanning model.metabolites per token. Bulk reaction add/change by name
  drops from O(R*k*M) to O(R*k); the index is updated as new mets are created
  so cross-token and cross-reaction dedup is preserved.
- reconstruction/homology/homology.py: replace DataFrame.apply(axis=1) in the
  ortholog filter with a comprehension over the columns (membership is already
  O(1); avoids per-row Series construction).
- analysis/sampling.py: build the random objective with optlang add() instead
  of sum(), which re-canonicalises the expression on every term (O(n^2)).

Adds a cross-reaction metabolite-dedup regression test for the add path.

* Robustness and polish fixes (#25)

Tier 4 of the review: small, targeted hardening, no behaviour change on valid input.

- gapfilling/fill.py: clamp the connectivity gap-fill big-M to the largest finite
  bound magnitude, so a template reaction with an infinite bound no longer puts an
  infinite coefficient into the MILP (which broke the solver).
- reconstruction/kegg/download.py: a malformed or unreadable .netrc now raises a
  ValueError explaining how to fix it, instead of a raw NetrcParseError/OSError.
- io/excel.py: always write the metabolite formula to the METS COMPOSITION column;
  it was dropped whenever an InChI was present.
- visualization: the empty stub package raises a clear NotImplementedError (with a
  roadmap pointer) on attribute access, via a PEP 562 module __getattr__.

A regression test per fix.

* Add code-built-model YAML round-trip test (covers the objective) (#26)

The existing YAML round-trip and parity tests originate their model from a parsed
doc; none builds a model directly from cobra objects, and none asserts the objective
coefficient survives (the parity fixture pins it to 0). Add one round-trip test that
builds a model in code with a non-zero objective and asserts metabolites, reactions,
bounds, stoichiometry, GPR, subsystem, formula, annotation and the objective all
survive write -> read.

* Share the linear-chain INIT model fixture via tests/conftest.py (#27)

test_init.py, test_init_build.py and test_init_solvers.py each built the same
linear-chain INIT model (EX_A -> A -> B -> C -> D) independently, differing only in
the model id and whether gene rules were attached. Move that construction into a new
tests/conftest.py as linear_chain_model / linear_chain_model_with_genes fixtures; the
three files now reuse it (test bodies unchanged). The bespoke _toy_ftinit_model stays
local. No behaviour change.

* Publish kegg116 KEGG artefacts (v0.1.0) (#28)

* Publish kegg116 KEGG artefacts as gzip, version-prefixed assets (v0.1.0)

First downloadable KEGG artefact set, wired into the runtime resolvers:

- All artefacts are gzip and version-prefixed (kegg116_<name>.gz) so MATLAB and
  Windows read them with the built-in gunzip, no external tool. organism_gene_ko
  moves from xz to gzip for the same reason.
- HMM libraries ship as one gzip concatenated flatfile per domain;
  ensure_kegg_hmm_library decompresses and hmmpresses on first use, ~10x smaller
  than the pressed index and portable across HMMER versions.
- Add a version-prefix-tolerant artefact resolver (_resolve_artefact) used by the
  organism/sequence entry points; parse_kegg_dump and build_kegg_artefacts.py gain
  an opt-in --version.
- Populate data/manifest.json and _DATA_REGISTRY with the kegg116 release assets
  (real SHA256 + bytes); refresh the maintainer docs and manifest example.
- Bump version to 0.1.0 and update CHANGELOG.

* Add KEGG taxonomy artefact and phyl_dist (RAVEN getPhylDist port)

Publish kegg116_taxonomy.gz and regenerate RAVEN's keggPhylDist from it, so GECKO's
organism-distance kcat selection needs no MATLAB .mat file:

- reconstruction.kegg.phyl_dist + PhylDist faithfully reproduce RAVEN getPhylDist's
  (asymmetric, occasionally negative) distance metric; parse_taxonomy_records exposes
  ids/names/lineages and reads .gz transparently.
- data.ensure_kegg_taxonomy fetches the artefact; build_kegg_artefacts.py emits it.
- Register kegg116_taxonomy.gz in data/manifest.json and _DATA_REGISTRY (8 files).
- Tests for phyl_dist (hand-checked against RAVEN) and the taxonomy fetch; update
  migration/IMPROVEMENTS/maintainer docs and CHANGELOG.

* Publish kegg116 KEGG artefacts as gzip, version-prefixed assets (v0.1.0) (#29)

First downloadable KEGG artefact set, wired into the runtime resolvers:

- All artefacts are gzip and version-prefixed (kegg116_<name>.gz) so MATLAB and
  Windows read them with the built-in gunzip, no external tool. organism_gene_ko
  moves from xz to gzip for the same reason.
- HMM libraries ship as one gzip concatenated flatfile per domain;
  ensure_kegg_hmm_library decompresses and hmmpresses on first use, ~10x smaller
  than the pressed index and portable across HMMER versions.
- Add a version-prefix-tolerant artefact resolver (_resolve_artefact) used by the
  organism/sequence entry points; parse_kegg_dump and build_kegg_artefacts.py gain
  an opt-in --version.
- Populate data/manifest.json and _DATA_REGISTRY with the kegg116 release assets
  (real SHA256 + bytes); refresh the maintainer docs and manifest example.
- Bump version to 0.1.0 and update CHANGELOG.

Add KEGG taxonomy artefact and phyl_dist (RAVEN getPhylDist port)

Publish kegg116_taxonomy.gz and regenerate RAVEN's keggPhylDist from it, so GECKO's
organism-distance kcat selection needs no MATLAB .mat file:

- reconstruction.kegg.phyl_dist + PhylDist faithfully reproduce RAVEN getPhylDist's
  (asymmetric, occasionally negative) distance metric; parse_taxonomy_records exposes
  ids/names/lineages and reads .gz transparently.
- data.ensure_kegg_taxonomy fetches the artefact; build_kegg_artefacts.py emits it.
- Register kegg116_taxonomy.gz in data/manifest.json and _DATA_REGISTRY (8 files).
- Tests for phyl_dist (hand-checked against RAVEN) and the taxonomy fetch; update
  migration/IMPROVEMENTS/maintainer docs and CHANGELOG.

Bundle core KEGG artefacts into kegg116_core.tar.gz

Combine the five core model files (reference model + KO/reaction/organism-gene/
rxn-flag tables) into one kegg116_core.tar.gz; HMM libraries and taxonomy stay
separate. The release drops from 8 assets to 4.

- ensure_kegg_data now fetches the single bundle, SHA-verifies it, and extracts the
  version-prefixed members into the cache once (safe extraction, matching download.py).
- build_kegg_artefacts.py groups the core files into the bundle after the HMM step.
- Regenerate data/manifest.json and _DATA_REGISTRY (4 entries); update manifest.example,
  tests (bundle fixture), and docs.

* Remove the visualization stub and [visualization] extra (#30)

Mirror MATLAB RAVEN removing its pathway-map / omics-overlay plotting functions
(drawMap, colorPathway, drawPathway, markPathwayWith*, setOmicDataToRxns, ...) as
obsolete/low-value (SysBioChalmers/RAVEN #618). raven-python only had a
not-implemented `visualization` stub reserving that domain; drop it and its
scaffolding. cobrapy + Escher cover pathway/omics visualization externally.

- Delete src/raven_python/visualization/ and tests/test_visualization.py.
- Drop the [visualization] (matplotlib) extra; remove it from CI, ReadTheDocs, and
  the installation / README / api-index / todo docs.
- CHANGELOG: record the removal.

The other functions RAVEN removed (MetaCyc, xml_toolbox, Excel-import wrappers,
solveQP) were never ported to raven-python, so no further changes are needed.

* Auto-resolve the taxonomy artefact in domain-mode from_artefacts (#31)

get_kegg_model_for_organism_from_artefacts("prokaryotes"/"eukaryotes") builds a
whole-domain model, which needs the KEGG taxonomy file. Taxonomy is a separate
artefact (not part of the core set ensure_kegg_data fetches), so the call raised
"Domain mode needs the KEGG taxonomy file; pass taxonomy=." unless the caller
supplied a path by hand.

It now auto-resolves taxonomy for domain mode: from the artefact directory if
present, else via ensure_kegg_taxonomy(version). An explicit taxonomy= still wins;
species mode is unchanged. Adds a regression test.

* Use hmmsearch (not hmmscan) for the de-novo KEGG query (#32)

get_kegg_model_from_sequences now runs one hmmsearch over the concatenated KO
library instead of an hmmscan against a pressed database:

- run_hmmsearch / parse_hmmsearch_tblout replace run_hmmscan / parse_hmmscan_tblout.
  hmmsearch is HMMER's faster, better-parallelising direction (profiles as the query)
  and needs no hmmpress. -Z is fixed to the profile count so per-hit E-values (and
  thus assign_kos output) are identical to the previous hmmscan path — verified on
  real HMMs (same hits, same E-values, same assignments).
- ensure_kegg_hmm_library just gunzips the library (no hmmpress, no .h3* sidecars).
- build_hmm_library concatenates the per-KO HMMs without pressing; the published
  .hmm.gz artefact is unchanged.
- Docs / IMPROVEMENTS (K7) / CHANGELOG updated.

* Replace the on-disk KEGG test fixture with a synthetic in-code dump (#33)

tests/data/kegg_dump contained real KEGG records (e.g. reaction R00010 and
KO K01194 with their EC/RHEA/ChEBI cross-references) which the project is not
licensed to redistribute.

Remove the directory and instead generate an equivalent, fully fictional
KEGG-format dump at test time via a new session-scoped  fixture in
tests/conftest.py. The synthetic dump mimics the flat-file format so it still
exercises the parser (reaction flags, overview-map skipping, InChI/formula
handling, mapformula irreversibility, KO/gene grouping, taxonomy lineages) but
all identifiers, names, sequences and cross-references are invented.

The four dependent test modules (parse, query, hmm, organism) consume the
fixture and assert against the fictional ids. No real KEGG content is committed
and coverage is unchanged.

* Rename project and import package: raven-python -> raven-toolbox (#34)

* Rename project and import package: raven-python -> raven-toolbox

Rename the distribution (raven-python -> raven-toolbox) and the import
package (raven_python -> raven_toolbox) across all source, tests,
scripts, docs, and packaging metadata. Project URLs now point to
SysBioChalmers/raven-toolbox.

* Complete the rename: remaining raven-python/raven_python -> raven-toolbox/raven_toolbox

The package/distribution rename left occurrences behind after the rebase:

- import statements () in the reconstruction.kegg modules
  and data.py, which would have failed at import time;
- monkeypatch string targets and the cache-path assertions in the tests;
- the wheel/package and mypy  paths in pyproject.toml (still pointing at
  the now-removed src/raven_python), plus the distribution name and project URLs;
- docs, data manifests and GitHub URLs.

Replace them so the import package is consistently raven_toolbox and all
distribution/repo references point to raven-toolbox. Also drop the empty
src/raven_python directory left behind by the rebase.

* Wrap homology imports to satisfy ruff isort after the rename

raven_python -> raven_toolbox widened the homology hits import past the
100-char line length, so ruff isort (I001) wanted it split across lines.
Format it as a multiline import block; ruff check . is clean again.

* CI: bump actions to Node 24 versions (checkout v5, setup-python v6) (#35)

actions/checkout@v4 and actions/setup-python@v5 run on the deprecated Node.js 20
runtime. Bump to actions/checkout@v5 and actions/setup-python@v6, both of which
run on Node.js 24, to clear the GitHub Actions deprecation warning.

* Prepare 0.2.0 release

Bump version 0.1.0 -> 0.2.0 and complete the CHANGELOG 0.2.0 section
(raven-toolbox rename, hmmsearch de-novo KEGG query, domain-mode taxonomy
auto-resolve, synthetic KEGG test fixture, visualization stub removal,
Node 24 CI).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant