refactor: rename plotting subpackage to visualization#21
Merged
Conversation
Align the (stub) plotting subpackage with RAVEN's folder layout, where pathway + plotting were unified into visualization/ (SysBioChalmers/RAVEN#614). - src/raven_python/plotting/ -> src/raven_python/visualization/ - pyproject optional-dependency extra [plotting] -> [visualization] (matplotlib); CI (.github/workflows/ci.yml) and ReadTheDocs (.readthedocs.yaml) install lines updated to match so the renamed extra still resolves. - docs updated: README, CHANGELOG, installation, docs/README, api/index, todo. The subpackage is an unimplemented stub (empty __init__), so nothing imports it and there is no behaviour change. Generic uses of the word "plotting" (seaborn / heatmap prose) are left as-is.
edkerk
added a commit
that referenced
this pull request
Jun 13, 2026
* feat(data): shared download manifest for artefacts + binaries (#16)
* feat(data): shared download manifest for artefacts and binaries
Introduce a single, language-agnostic manifest (data/manifest.schema.json) that lists every
downloadable data artefact and external-binary bundle with a SHA256, consumed by both
raven-python and (via the same JSON) MATLAB RAVEN. The manifest is a superset of the two
runtime registries:
* manifest["data"] -> raven_python.data._DATA_REGISTRY
* manifest["binaries"] -> raven_python.binaries._REGISTRY
Added:
* data/manifest.schema.json (JSON Schema) + data/manifest.example.json (worked example) +
data/manifest.json (empty, the live source of truth until assets are published).
* raven_python.manifest — load_manifest / to_*_registry / load_into_registries.
* Lazy autoload: data.ensure_* and binaries.ensure_binary populate themselves from
$RAVEN_PYTHON_MANIFEST on first use when their registry is still empty (guarded; no effect
when a registry is passed explicitly or the env var is unset).
* scripts/make_registry_snippet.py: a `manifest` subcommand that computes url+sha256+bytes
and writes/updates manifest.json.
* tests/test_manifest.py (round-trip, converters, lazy autoload via file:// URLs, repo
manifests valid).
* docs/maintenance/data_manifest.md — format, Python + MATLAB consumers, GitHub-Releases vs
Zenodo hosting (incl. a release→Zenodo GitHub Action), and per-asset recommendations.
* docs(data): host assets on existing-repo releases; KEGG redistribution permitted
Reflect the chosen distribution model: GitHub release assets live outside the git tree, so a
separate data repository is optional — attach assets to dedicated tags (e.g. kegg-kegg116,
diamond-2.1.9) on an existing RAVEN repo and reuse the same URLs across raven-python and
MATLAB RAVEN. Use Zenodo only for DOIs or files >2 GB. KEGG artefacts are redistributed with
permission, so the prior 'confirm rights' caveat is removed. Example/schema URLs repointed
from a hypothetical raven-data repo to raven-python.
* Add yeast-GEM-derived shared modules (diff, annotation, conditions, biomass, curation) (#17)
* Add diff_models, annotation, and conditions modules for yeast-GEM port
Lands the upstream-shareable pieces that yeast-GEM has been implementing
locally during its Python port (see yeast-GEM/code/python/PORTING_PLAN.md
and UPSTREAM_CANDIDATES.md). These are organism-agnostic; yeast-GEM
will consume them via a Python dependency on raven-python.
New modules
-----------
raven_python.comparison.diff
diff_models(a, b, ...) -> DiffReport — strict two-model semantic-
equality diff. Complements the existing compare_models (N-model
presence-matrix overview). Used as a CI gate to verify that two
toolchains (e.g. MATLAB RAVEN vs raven_python, pre/post refactor
of one toolchain) produce equivalent models. Includes a
python -m raven_python.comparison.diff CLI.
raven_python.annotation.sbo
add_sbo_terms — SBO term assignment with "fill" semantic. Default
parameter set reproduces yeast-GEM's behaviour; biomass metabolite
names, biomass/NGAM reaction names, and pseudoreaction substrings
are overridable. Transport detection is pluggable (default: same-
met-name in two compartments). Includes an `only_last_reaction_
for_pseudo` legacy bug-compat flag for yeast-GEM's lock-step
migration; off by default for any new caller.
raven_python.annotation.delta_g
load_delta_g_csv / save_delta_g_csv — generic side-car CSV
persistence for scalar notes (ΔG by default, but the note key,
column names, and id/value mapping are all configurable).
raven_python.conditions.apply
apply_condition(model, yaml_or_dict) — generic "apply this YAML
condition" loader. Schema: prelude (reset_exchanges),
cofactor_pseudoreaction (remove_mets + charge_balance_met),
biomass_stoichiometry_delta, per-rxn bounds, expected_uptake_count.
Project-specific extensions (e.g. yeast-GEM's amino_acid_ratio)
are handled by the caller before/after this function — kept
upstream-narrow on purpose. Also exposes set_reaction_bounds
helper that bypasses cobra's lb<=ub validator for the (legitimate)
cases where a condition lands on an infeasible bound state.
Tests
-----
46 new tests across the three modules; full pre-existing raven-python
suite still passes (519 passed; 1 unrelated pre-existing openpyxl
ImportError in tests/test_io_git.py; 2 skipped). ruff clean.
Not in this commit
------------------
The biomass / GAM / chemostat / fit_gam modules are still tracked as
upstream candidates in yeast-GEM/code/python/UPSTREAM_CANDIDATES.md
and remain local in yeast-GEM until phase 4 of the port (which would
ideally land them directly here).
* Add raven_python.biomass — sum / scale / rescale / set_gam
Generic biomass-equation manipulation, extracted from yeast-GEM's
sumBioMass / scaleBioMass / rescalePseudoReaction / changeGAM as
yeast-GEM moves to depend on raven-python (yeast-GEM phase 4 of the
porting plan).
Module layout
-------------
raven_python.biomass.config
BiomassConfig — biomass_rxn id + proton_met id + ordered tuple
of BiomassComponent (per-component pseudoreaction name + mass-
computation strategy).
raven_python.biomass.scale
sum_biomass(model, config) → {component: g/gDW, ..., "total": X}
rescale_pseudoreaction(model, config, name, factor) — multiply
the pseudoreaction's substrate coefs by factor and rebalance
H+ to keep ionic charge at zero.
scale_biomass(model, config, name, new_value, balance_out=None) —
rescale a component to a target fraction, optionally balancing
a second component so the total stays at 1 g/gDW.
raven_python.biomass.gam
set_gam(model, value, *, biomass_rxn, cofactor_met_names,
ngam_rxn=None, ngam_value=None) — scales every metabolite
in the biomass pseudoreaction whose `name` is in the supplied
cofactor set, preserving its sign; optionally fixes the NGAM rxn
bounds.
Mass strategies (per BiomassComponent.mass_strategy):
"mw" plain MW from chemical formula (carbohydrate /
ion / cofactor)
"mw_minus_2h" MW − 2.016 g/mol per substrate (protein —
charged tRNAs release two protons)
"mw_minus_water" MW − 18.015 g/mol per substrate (RNA / DNA —
polymerisation releases one water)
"grams" stoichiometry already in g/gDW (lipid backbone)
Tests: 19 new tests over a synthetic toy model that exercises every
mass strategy, the H+ charge rebalance, scale_biomass with and
without balance_out, set_gam on cofactor mets (and the NGAM bound
path).
* Add raven_python.manipulation.find_duplicate_reactions (detection variant)
Detection-only counterpart to remove_duplicate_reactions. Returns
duplicate groups instead of mutating the model. Ignores bounds /
GPR / objective — only stoichiometry is compared, mirroring the
typical curation use case ("find reactions that could be merged").
A new ``ignore_direction=True`` default (yeast-GEM convention)
treats A→B and B→A as duplicates. Set False to require identical
orientation.
Used by yeast-GEM's modelTests port (Tier 3 / phase 5) to flag
duplicate reactions during curation review.
* Add raven_python.curation — batch_curate / batch_curate_from_tsv
Generic batch curation engine extracted from yeast-GEM's MATLAB
curateMetsRxnsGenes (yeast-GEM phase 6). Adds or updates
metabolites, reactions and genes from pandas DataFrames; a
batch_curate_from_tsv convenience wrapper reads the equivalent TSVs.
Schema (matches yeast-GEM's data/modelCuration/template/ layout):
mets_df metNames, comps, formula, charge, inchi, metNotes
+ any number of MIRIAM-namespace columns
genes_df genes, geneShortNames + MIRIAM columns
rxns_df rxnNames, grRules, lb, ub, rev, subSystems,
eccodes, rxnNotes, rxnReferences,
rxnConfidenceScores + MIRIAM columns
rxns_coeffs_df rxnNames, metNames, comps, coefficient
(one row per (reaction, metabolite) pair)
Match keys:
metabolites — (name, compartment) tuple
genes — gene id
reactions — stoichiometric signature
Existing entities get their annotations overwritten (warning emitted);
new entities are added with fresh ids generated from the supplied
``met_id_prefix`` / ``rxn_id_prefix`` (defaults M_ / R_ per the BiGG
convention; yeast-GEM passes s_ / r_). Width of the existing
zero-padded suffix is preserved so s_0001 → s_0002, not s_2.
"Everything after the core columns is MIRIAM" — the header of any
extra column becomes the annotation namespace key. Matches MATLAB
behaviour exactly so yeast-GEM's existing TSVs work unchanged, and
projects with different MIRIAM column sets need no code change.
CurationResult dataclass records what was added vs updated so
callers can verify in tests / CI.
Tests: 13 new (add/update mets, add/update genes, add/update rxns
by stoichiometry, miriam auto-detect, id-width preservation,
combined mets+rxns in one call, missing-metabolite error,
batch_curate_from_tsv round trip).
* io.yaml: byte-compatible round-trip with cobrapy + RAVEN MATLAB
Three things this fixes:
1. write_yaml_model dropped the !!omap tags entirely. _to_plain
was flattening cobra's OrderedDict to plain dict, which causes
ruamel to emit ordinary block mappings. RAVEN MATLAB's reader
is a line-based parser keyed on !!omap and therefore could not
load any file we wrote. _to_plain now returns OrderedDict so
ruamel re-emits the !!omap tag.
2. eccodes was lost on round-trip — it wasn't in _RXN_FIELDS, so
read_yaml_model didn't capture it into .notes and
write_yaml_model couldn't lift it back. Added.
3. RAVEN MATLAB writes reaction notes as 'rxnNotes'; cobrapy and
this writer use 'notes'. Added a read-time alias so existing
yeast-GEM YAML files (which still say 'rxnNotes') load
cleanly. Writes go out as 'notes' (cobrapy-canonical).
Top-level layout now matches RAVEN MATLAB: metaData first, then
metabolites / reactions / genes / compartments, then optional
gecko_light + ec-rxns + ec-enzymes. id/name/version live inside
metaData (RAVEN convention) — cobrapy reading these files still
works, but cobra_model.id ends up None because cobrapy doesn't
know about metaData. raven_python.read_yaml_model lifts both
metaData.id/name/version onto model.id / model.name /
model.notes['version'] so the rest of the codebase doesn't care
which layout the file used.
Empty-name genes are no longer emitted as — that's a
cobrapy quirk that drifts yeast-GEM YAML files away from RAVEN
MATLAB's output.
Verified end-to-end:
* cobra.io.load_yaml_model reads every file the new writer
produces (yeast-GEM and a synthetic fixture).
* RAVEN MATLAB readYAMLmodel reads every file the new writer
produces.
* Round-tripping yeast-GEM through raven_python preserves
2748/2748 metabolites, 4102/4102 reactions, 1143/1143 genes,
2411 eccodes, 3984 reaction deltaG, 2696 metabolite deltaG,
1788 SMILES, 1443 rxn-notes — no semantic drift.
Tests
-----
* tests/test_io_yaml_parity.py is new: covers every RAVEN
extension, the rxnNotes legacy alias, the SMILES YAML-special
character case, metaData-first layout, and cobra readability.
* tests/test_io_yaml.py::test_output_is_cobra_readable adjusts
for the metaData layout (cobra recovers mets/rxns/annotation
but not model.id, by design).
* conditions: switch from PyYAML to ruamel.yaml
PyYAML is not a project dependency; raven-python uses ruamel.yaml
(already pulled in via cobra) everywhere else. The conditions
module and its tests still imported PyYAML, which broke pytest
collection on clean CI runners with 'No module named yaml'.
Both apply.py and the test now use a YAML(typ='safe') instance
from ruamel.yaml — same plain-dict semantics as PyYAML's
safe_load / safe_dump, no new dependency.
* io.yaml: document the format + accept legacy geckoLight-in-metaData
Adds docs/reference/yaml_format.md as the canonical schema reference
for the cross-toolchain YAML format (cobrapy / raven-python / RAVEN
MATLAB). Covers the full document shape, per-entry field order,
RAVEN extensions, the GECKO ec-* sections, the metaData provenance
block, number / string / quoting rules, and the cross-reader
interoperability matrix. Linked from docs/reference/index.md and
the I/O guide.
Reader fix: pre-shim RAVEN MATLAB writes emitted GECKO models
with geckoLight: "true" inside the metaData block (not as a
top-level gecko_light). The reader now lifts that legacy key out
of metaData so model.ec.gecko_light is populated whichever
placement the file used. Round-trip writes always use the new
top-level form.
Regression tests:
test_pre_shim_format_loads — synthetic fixture covering every
legacy quirk we know about (--- doc marker, plain metaData,
geckoLight inside metaData, top-level metabolite smiles,
rxnNotes reaction key, integer bounds, double-quoted strings).
Each quirk has its own assertion + comment.
test_pre_shim_yeast_gem_loads_if_available — sanity-loads the
real yeast-GEM.yml (2748 mets, 4102 rxns, 1143 genes) and
asserts the documented preserved-counts table from the format
reference. Skipped on CI runners where the working copy isn't
mounted.
* Cobra-aligned hardening pass from full code review (#18)
* Add diff_models, annotation, and conditions modules for yeast-GEM port
Lands the upstream-shareable pieces that yeast-GEM has been implementing
locally during its Python port (see yeast-GEM/code/python/PORTING_PLAN.md
and UPSTREAM_CANDIDATES.md). These are organism-agnostic; yeast-GEM
will consume them via a Python dependency on raven-python.
New modules
-----------
raven_python.comparison.diff
diff_models(a, b, ...) -> DiffReport — strict two-model semantic-
equality diff. Complements the existing compare_models (N-model
presence-matrix overview). Used as a CI gate to verify that two
toolchains (e.g. MATLAB RAVEN vs raven_python, pre/post refactor
of one toolchain) produce equivalent models. Includes a
python -m raven_python.comparison.diff CLI.
raven_python.annotation.sbo
add_sbo_terms — SBO term assignment with "fill" semantic. Default
parameter set reproduces yeast-GEM's behaviour; biomass metabolite
names, biomass/NGAM reaction names, and pseudoreaction substrings
are overridable. Transport detection is pluggable (default: same-
met-name in two compartments). Includes an `only_last_reaction_
for_pseudo` legacy bug-compat flag for yeast-GEM's lock-step
migration; off by default for any new caller.
raven_python.annotation.delta_g
load_delta_g_csv / save_delta_g_csv — generic side-car CSV
persistence for scalar notes (ΔG by default, but the note key,
column names, and id/value mapping are all configurable).
raven_python.conditions.apply
apply_condition(model, yaml_or_dict) — generic "apply this YAML
condition" loader. Schema: prelude (reset_exchanges),
cofactor_pseudoreaction (remove_mets + charge_balance_met),
biomass_stoichiometry_delta, per-rxn bounds, expected_uptake_count.
Project-specific extensions (e.g. yeast-GEM's amino_acid_ratio)
are handled by the caller before/after this function — kept
upstream-narrow on purpose. Also exposes set_reaction_bounds
helper that bypasses cobra's lb<=ub validator for the (legitimate)
cases where a condition lands on an infeasible bound state.
Tests
-----
46 new tests across the three modules; full pre-existing raven-python
suite still passes (519 passed; 1 unrelated pre-existing openpyxl
ImportError in tests/test_io_git.py; 2 skipped). ruff clean.
Not in this commit
------------------
The biomass / GAM / chemostat / fit_gam modules are still tracked as
upstream candidates in yeast-GEM/code/python/UPSTREAM_CANDIDATES.md
and remain local in yeast-GEM until phase 4 of the port (which would
ideally land them directly here).
* Add raven_python.biomass — sum / scale / rescale / set_gam
Generic biomass-equation manipulation, extracted from yeast-GEM's
sumBioMass / scaleBioMass / rescalePseudoReaction / changeGAM as
yeast-GEM moves to depend on raven-python (yeast-GEM phase 4 of the
porting plan).
Module layout
-------------
raven_python.biomass.config
BiomassConfig — biomass_rxn id + proton_met id + ordered tuple
of BiomassComponent (per-component pseudoreaction name + mass-
computation strategy).
raven_python.biomass.scale
sum_biomass(model, config) → {component: g/gDW, ..., "total": X}
rescale_pseudoreaction(model, config, name, factor) — multiply
the pseudoreaction's substrate coefs by factor and rebalance
H+ to keep ionic charge at zero.
scale_biomass(model, config, name, new_value, balance_out=None) —
rescale a component to a target fraction, optionally balancing
a second component so the total stays at 1 g/gDW.
raven_python.biomass.gam
set_gam(model, value, *, biomass_rxn, cofactor_met_names,
ngam_rxn=None, ngam_value=None) — scales every metabolite
in the biomass pseudoreaction whose `name` is in the supplied
cofactor set, preserving its sign; optionally fixes the NGAM rxn
bounds.
Mass strategies (per BiomassComponent.mass_strategy):
"mw" plain MW from chemical formula (carbohydrate /
ion / cofactor)
"mw_minus_2h" MW − 2.016 g/mol per substrate (protein —
charged tRNAs release two protons)
"mw_minus_water" MW − 18.015 g/mol per substrate (RNA / DNA —
polymerisation releases one water)
"grams" stoichiometry already in g/gDW (lipid backbone)
Tests: 19 new tests over a synthetic toy model that exercises every
mass strategy, the H+ charge rebalance, scale_biomass with and
without balance_out, set_gam on cofactor mets (and the NGAM bound
path).
* Add raven_python.manipulation.find_duplicate_reactions (detection variant)
Detection-only counterpart to remove_duplicate_reactions. Returns
duplicate groups instead of mutating the model. Ignores bounds /
GPR / objective — only stoichiometry is compared, mirroring the
typical curation use case ("find reactions that could be merged").
A new ``ignore_direction=True`` default (yeast-GEM convention)
treats A→B and B→A as duplicates. Set False to require identical
orientation.
Used by yeast-GEM's modelTests port (Tier 3 / phase 5) to flag
duplicate reactions during curation review.
* Add raven_python.curation — batch_curate / batch_curate_from_tsv
Generic batch curation engine extracted from yeast-GEM's MATLAB
curateMetsRxnsGenes (yeast-GEM phase 6). Adds or updates
metabolites, reactions and genes from pandas DataFrames; a
batch_curate_from_tsv convenience wrapper reads the equivalent TSVs.
Schema (matches yeast-GEM's data/modelCuration/template/ layout):
mets_df metNames, comps, formula, charge, inchi, metNotes
+ any number of MIRIAM-namespace columns
genes_df genes, geneShortNames + MIRIAM columns
rxns_df rxnNames, grRules, lb, ub, rev, subSystems,
eccodes, rxnNotes, rxnReferences,
rxnConfidenceScores + MIRIAM columns
rxns_coeffs_df rxnNames, metNames, comps, coefficient
(one row per (reaction, metabolite) pair)
Match keys:
metabolites — (name, compartment) tuple
genes — gene id
reactions — stoichiometric signature
Existing entities get their annotations overwritten (warning emitted);
new entities are added with fresh ids generated from the supplied
``met_id_prefix`` / ``rxn_id_prefix`` (defaults M_ / R_ per the BiGG
convention; yeast-GEM passes s_ / r_). Width of the existing
zero-padded suffix is preserved so s_0001 → s_0002, not s_2.
"Everything after the core columns is MIRIAM" — the header of any
extra column becomes the annotation namespace key. Matches MATLAB
behaviour exactly so yeast-GEM's existing TSVs work unchanged, and
projects with different MIRIAM column sets need no code change.
CurationResult dataclass records what was added vs updated so
callers can verify in tests / CI.
Tests: 13 new (add/update mets, add/update genes, add/update rxns
by stoichiometry, miriam auto-detect, id-width preservation,
combined mets+rxns in one call, missing-metabolite error,
batch_curate_from_tsv round trip).
* io.yaml: byte-compatible round-trip with cobrapy + RAVEN MATLAB
Three things this fixes:
1. write_yaml_model dropped the !!omap tags entirely. _to_plain
was flattening cobra's OrderedDict to plain dict, which causes
ruamel to emit ordinary block mappings. RAVEN MATLAB's reader
is a line-based parser keyed on !!omap and therefore could not
load any file we wrote. _to_plain now returns OrderedDict so
ruamel re-emits the !!omap tag.
2. eccodes was lost on round-trip — it wasn't in _RXN_FIELDS, so
read_yaml_model didn't capture it into .notes and
write_yaml_model couldn't lift it back. Added.
3. RAVEN MATLAB writes reaction notes as 'rxnNotes'; cobrapy and
this writer use 'notes'. Added a read-time alias so existing
yeast-GEM YAML files (which still say 'rxnNotes') load
cleanly. Writes go out as 'notes' (cobrapy-canonical).
Top-level layout now matches RAVEN MATLAB: metaData first, then
metabolites / reactions / genes / compartments, then optional
gecko_light + ec-rxns + ec-enzymes. id/name/version live inside
metaData (RAVEN convention) — cobrapy reading these files still
works, but cobra_model.id ends up None because cobrapy doesn't
know about metaData. raven_python.read_yaml_model lifts both
metaData.id/name/version onto model.id / model.name /
model.notes['version'] so the rest of the codebase doesn't care
which layout the file used.
Empty-name genes are no longer emitted as — that's a
cobrapy quirk that drifts yeast-GEM YAML files away from RAVEN
MATLAB's output.
Verified end-to-end:
* cobra.io.load_yaml_model reads every file the new writer
produces (yeast-GEM and a synthetic fixture).
* RAVEN MATLAB readYAMLmodel reads every file the new writer
produces.
* Round-tripping yeast-GEM through raven_python preserves
2748/2748 metabolites, 4102/4102 reactions, 1143/1143 genes,
2411 eccodes, 3984 reaction deltaG, 2696 metabolite deltaG,
1788 SMILES, 1443 rxn-notes — no semantic drift.
Tests
-----
* tests/test_io_yaml_parity.py is new: covers every RAVEN
extension, the rxnNotes legacy alias, the SMILES YAML-special
character case, metaData-first layout, and cobra readability.
* tests/test_io_yaml.py::test_output_is_cobra_readable adjusts
for the metaData layout (cobra recovers mets/rxns/annotation
but not model.id, by design).
* conditions: switch from PyYAML to ruamel.yaml
PyYAML is not a project dependency; raven-python uses ruamel.yaml
(already pulled in via cobra) everywhere else. The conditions
module and its tests still imported PyYAML, which broke pytest
collection on clean CI runners with 'No module named yaml'.
Both apply.py and the test now use a YAML(typ='safe') instance
from ruamel.yaml — same plain-dict semantics as PyYAML's
safe_load / safe_dump, no new dependency.
* io.yaml: document the format + accept legacy geckoLight-in-metaData
Adds docs/reference/yaml_format.md as the canonical schema reference
for the cross-toolchain YAML format (cobrapy / raven-python / RAVEN
MATLAB). Covers the full document shape, per-entry field order,
RAVEN extensions, the GECKO ec-* sections, the metaData provenance
block, number / string / quoting rules, and the cross-reader
interoperability matrix. Linked from docs/reference/index.md and
the I/O guide.
Reader fix: pre-shim RAVEN MATLAB writes emitted GECKO models
with geckoLight: "true" inside the metaData block (not as a
top-level gecko_light). The reader now lifts that legacy key out
of metaData so model.ec.gecko_light is populated whichever
placement the file used. Round-trip writes always use the new
top-level form.
Regression tests:
test_pre_shim_format_loads — synthetic fixture covering every
legacy quirk we know about (--- doc marker, plain metaData,
geckoLight inside metaData, top-level metabolite smiles,
rxnNotes reaction key, integer bounds, double-quoted strings).
Each quirk has its own assertion + comment.
test_pre_shim_yeast_gem_loads_if_available — sanity-loads the
real yeast-GEM.yml (2748 mets, 4102 rxns, 1143 genes) and
asserts the documented preserved-counts table from the format
reference. Skipped on CI runners where the working copy isn't
mounted.
* Cobra-aligned hardening pass from full code review
No behaviour change on well-formed inputs. Highlights:
- Packaging: derive __version__ from package metadata (was a stale
hard-coded "0.0.1" that the docs site reported); pin ruff==0.15.15 in
the dev extra and CI; fix two lint errors unpinned ruff started flagging.
- Errors: solver/feasibility failures in run_init, run_ftinit, fill_tasks
and random_sampling now raise cobra.exceptions.OptimizationError instead
of bare RuntimeError (consistent with the rest of the package).
- Consistency: single utils.parse.subsystem_to_str coerces reaction
subsystem to cobra's canonical str across io.excel / comparison.compare /
curation.batch / manipulation.add (fixes a crash on non-string items and
the silent drop of multi-subsystem reactions); shared GPR score
aggregators in utils.gpr used by init.score and init.genes; KEGG-download
progress uses a module logger instead of print.
- Robustness: zip path-traversal guard in binaries.py; penalty>0 check in
connect_blocked_reactions; NaN-sample guard in random_sampling; all-zero
ec coupling warning; optional verify= SHA256 re-check on data cache hits;
non-finite z-score guard in reporter. Regression tests added for each.
* io.yaml: reaction EC codes as cobra annotation ec-code (#19)
* Add diff_models, annotation, and conditions modules for yeast-GEM port
Lands the upstream-shareable pieces that yeast-GEM has been implementing
locally during its Python port (see yeast-GEM/code/python/PORTING_PLAN.md
and UPSTREAM_CANDIDATES.md). These are organism-agnostic; yeast-GEM
will consume them via a Python dependency on raven-python.
New modules
-----------
raven_python.comparison.diff
diff_models(a, b, ...) -> DiffReport — strict two-model semantic-
equality diff. Complements the existing compare_models (N-model
presence-matrix overview). Used as a CI gate to verify that two
toolchains (e.g. MATLAB RAVEN vs raven_python, pre/post refactor
of one toolchain) produce equivalent models. Includes a
python -m raven_python.comparison.diff CLI.
raven_python.annotation.sbo
add_sbo_terms — SBO term assignment with "fill" semantic. Default
parameter set reproduces yeast-GEM's behaviour; biomass metabolite
names, biomass/NGAM reaction names, and pseudoreaction substrings
are overridable. Transport detection is pluggable (default: same-
met-name in two compartments). Includes an `only_last_reaction_
for_pseudo` legacy bug-compat flag for yeast-GEM's lock-step
migration; off by default for any new caller.
raven_python.annotation.delta_g
load_delta_g_csv / save_delta_g_csv — generic side-car CSV
persistence for scalar notes (ΔG by default, but the note key,
column names, and id/value mapping are all configurable).
raven_python.conditions.apply
apply_condition(model, yaml_or_dict) — generic "apply this YAML
condition" loader. Schema: prelude (reset_exchanges),
cofactor_pseudoreaction (remove_mets + charge_balance_met),
biomass_stoichiometry_delta, per-rxn bounds, expected_uptake_count.
Project-specific extensions (e.g. yeast-GEM's amino_acid_ratio)
are handled by the caller before/after this function — kept
upstream-narrow on purpose. Also exposes set_reaction_bounds
helper that bypasses cobra's lb<=ub validator for the (legitimate)
cases where a condition lands on an infeasible bound state.
Tests
-----
46 new tests across the three modules; full pre-existing raven-python
suite still passes (519 passed; 1 unrelated pre-existing openpyxl
ImportError in tests/test_io_git.py; 2 skipped). ruff clean.
Not in this commit
------------------
The biomass / GAM / chemostat / fit_gam modules are still tracked as
upstream candidates in yeast-GEM/code/python/UPSTREAM_CANDIDATES.md
and remain local in yeast-GEM until phase 4 of the port (which would
ideally land them directly here).
* Add raven_python.biomass — sum / scale / rescale / set_gam
Generic biomass-equation manipulation, extracted from yeast-GEM's
sumBioMass / scaleBioMass / rescalePseudoReaction / changeGAM as
yeast-GEM moves to depend on raven-python (yeast-GEM phase 4 of the
porting plan).
Module layout
-------------
raven_python.biomass.config
BiomassConfig — biomass_rxn id + proton_met id + ordered tuple
of BiomassComponent (per-component pseudoreaction name + mass-
computation strategy).
raven_python.biomass.scale
sum_biomass(model, config) → {component: g/gDW, ..., "total": X}
rescale_pseudoreaction(model, config, name, factor) — multiply
the pseudoreaction's substrate coefs by factor and rebalance
H+ to keep ionic charge at zero.
scale_biomass(model, config, name, new_value, balance_out=None) —
rescale a component to a target fraction, optionally balancing
a second component so the total stays at 1 g/gDW.
raven_python.biomass.gam
set_gam(model, value, *, biomass_rxn, cofactor_met_names,
ngam_rxn=None, ngam_value=None) — scales every metabolite
in the biomass pseudoreaction whose `name` is in the supplied
cofactor set, preserving its sign; optionally fixes the NGAM rxn
bounds.
Mass strategies (per BiomassComponent.mass_strategy):
"mw" plain MW from chemical formula (carbohydrate /
ion / cofactor)
"mw_minus_2h" MW − 2.016 g/mol per substrate (protein —
charged tRNAs release two protons)
"mw_minus_water" MW − 18.015 g/mol per substrate (RNA / DNA —
polymerisation releases one water)
"grams" stoichiometry already in g/gDW (lipid backbone)
Tests: 19 new tests over a synthetic toy model that exercises every
mass strategy, the H+ charge rebalance, scale_biomass with and
without balance_out, set_gam on cofactor mets (and the NGAM bound
path).
* Add raven_python.manipulation.find_duplicate_reactions (detection variant)
Detection-only counterpart to remove_duplicate_reactions. Returns
duplicate groups instead of mutating the model. Ignores bounds /
GPR / objective — only stoichiometry is compared, mirroring the
typical curation use case ("find reactions that could be merged").
A new ``ignore_direction=True`` default (yeast-GEM convention)
treats A→B and B→A as duplicates. Set False to require identical
orientation.
Used by yeast-GEM's modelTests port (Tier 3 / phase 5) to flag
duplicate reactions during curation review.
* Add raven_python.curation — batch_curate / batch_curate_from_tsv
Generic batch curation engine extracted from yeast-GEM's MATLAB
curateMetsRxnsGenes (yeast-GEM phase 6). Adds or updates
metabolites, reactions and genes from pandas DataFrames; a
batch_curate_from_tsv convenience wrapper reads the equivalent TSVs.
Schema (matches yeast-GEM's data/modelCuration/template/ layout):
mets_df metNames, comps, formula, charge, inchi, metNotes
+ any number of MIRIAM-namespace columns
genes_df genes, geneShortNames + MIRIAM columns
rxns_df rxnNames, grRules, lb, ub, rev, subSystems,
eccodes, rxnNotes, rxnReferences,
rxnConfidenceScores + MIRIAM columns
rxns_coeffs_df rxnNames, metNames, comps, coefficient
(one row per (reaction, metabolite) pair)
Match keys:
metabolites — (name, compartment) tuple
genes — gene id
reactions — stoichiometric signature
Existing entities get their annotations overwritten (warning emitted);
new entities are added with fresh ids generated from the supplied
``met_id_prefix`` / ``rxn_id_prefix`` (defaults M_ / R_ per the BiGG
convention; yeast-GEM passes s_ / r_). Width of the existing
zero-padded suffix is preserved so s_0001 → s_0002, not s_2.
"Everything after the core columns is MIRIAM" — the header of any
extra column becomes the annotation namespace key. Matches MATLAB
behaviour exactly so yeast-GEM's existing TSVs work unchanged, and
projects with different MIRIAM column sets need no code change.
CurationResult dataclass records what was added vs updated so
callers can verify in tests / CI.
Tests: 13 new (add/update mets, add/update genes, add/update rxns
by stoichiometry, miriam auto-detect, id-width preservation,
combined mets+rxns in one call, missing-metabolite error,
batch_curate_from_tsv round trip).
* io.yaml: byte-compatible round-trip with cobrapy + RAVEN MATLAB
Three things this fixes:
1. write_yaml_model dropped the !!omap tags entirely. _to_plain
was flattening cobra's OrderedDict to plain dict, which causes
ruamel to emit ordinary block mappings. RAVEN MATLAB's reader
is a line-based parser keyed on !!omap and therefore could not
load any file we wrote. _to_plain now returns OrderedDict so
ruamel re-emits the !!omap tag.
2. eccodes was lost on round-trip — it wasn't in _RXN_FIELDS, so
read_yaml_model didn't capture it into .notes and
write_yaml_model couldn't lift it back. Added.
3. RAVEN MATLAB writes reaction notes as 'rxnNotes'; cobrapy and
this writer use 'notes'. Added a read-time alias so existing
yeast-GEM YAML files (which still say 'rxnNotes') load
cleanly. Writes go out as 'notes' (cobrapy-canonical).
Top-level layout now matches RAVEN MATLAB: metaData first, then
metabolites / reactions / genes / compartments, then optional
gecko_light + ec-rxns + ec-enzymes. id/name/version live inside
metaData (RAVEN convention) — cobrapy reading these files still
works, but cobra_model.id ends up None because cobrapy doesn't
know about metaData. raven_python.read_yaml_model lifts both
metaData.id/name/version onto model.id / model.name /
model.notes['version'] so the rest of the codebase doesn't care
which layout the file used.
Empty-name genes are no longer emitted as — that's a
cobrapy quirk that drifts yeast-GEM YAML files away from RAVEN
MATLAB's output.
Verified end-to-end:
* cobra.io.load_yaml_model reads every file the new writer
produces (yeast-GEM and a synthetic fixture).
* RAVEN MATLAB readYAMLmodel reads every file the new writer
produces.
* Round-tripping yeast-GEM through raven_python preserves
2748/2748 metabolites, 4102/4102 reactions, 1143/1143 genes,
2411 eccodes, 3984 reaction deltaG, 2696 metabolite deltaG,
1788 SMILES, 1443 rxn-notes — no semantic drift.
Tests
-----
* tests/test_io_yaml_parity.py is new: covers every RAVEN
extension, the rxnNotes legacy alias, the SMILES YAML-special
character case, metaData-first layout, and cobra readability.
* tests/test_io_yaml.py::test_output_is_cobra_readable adjusts
for the metaData layout (cobra recovers mets/rxns/annotation
but not model.id, by design).
* conditions: switch from PyYAML to ruamel.yaml
PyYAML is not a project dependency; raven-python uses ruamel.yaml
(already pulled in via cobra) everywhere else. The conditions
module and its tests still imported PyYAML, which broke pytest
collection on clean CI runners with 'No module named yaml'.
Both apply.py and the test now use a YAML(typ='safe') instance
from ruamel.yaml — same plain-dict semantics as PyYAML's
safe_load / safe_dump, no new dependency.
* io.yaml: document the format + accept legacy geckoLight-in-metaData
Adds docs/reference/yaml_format.md as the canonical schema reference
for the cross-toolchain YAML format (cobrapy / raven-python / RAVEN
MATLAB). Covers the full document shape, per-entry field order,
RAVEN extensions, the GECKO ec-* sections, the metaData provenance
block, number / string / quoting rules, and the cross-reader
interoperability matrix. Linked from docs/reference/index.md and
the I/O guide.
Reader fix: pre-shim RAVEN MATLAB writes emitted GECKO models
with geckoLight: "true" inside the metaData block (not as a
top-level gecko_light). The reader now lifts that legacy key out
of metaData so model.ec.gecko_light is populated whichever
placement the file used. Round-trip writes always use the new
top-level form.
Regression tests:
test_pre_shim_format_loads — synthetic fixture covering every
legacy quirk we know about (--- doc marker, plain metaData,
geckoLight inside metaData, top-level metabolite smiles,
rxnNotes reaction key, integer bounds, double-quoted strings).
Each quirk has its own assertion + comment.
test_pre_shim_yeast_gem_loads_if_available — sanity-loads the
real yeast-GEM.yml (2748 mets, 4102 rxns, 1143 genes) and
asserts the documented preserved-counts table from the format
reference. Skipped on CI runners where the working copy isn't
mounted.
* io.yaml: represent reaction EC codes as cobra annotation['ec-code']
EC numbers are a standard MIRIAM cross-reference, so the cobra-native
representation is annotation['ec-code'] (a list) -- exactly where cobrapy
and geckopy read them. raven-python was instead routing RAVEN's legacy
top-level `eccodes` key into model.notes['eccodes'], so reaction EC codes
written by RAVEN-MATLAB never reached the annotation['ec-code'] location
geckopy reads from.
- Drop `eccodes` from _RXN_FIELDS (it is not a RAVEN-only notes field).
- Add _lift_eccodes_to_annotation: a legacy top-level `eccodes` (a
;-joined string or a list) is lifted into annotation['ec-code'] on read,
mirroring the existing _lift_smiles_to_annotation; a native
annotation['ec-code'] wins.
- On write, EC codes serialise via cobra's annotation block; no top-level
`eccodes` is emitted.
- Update test_io_yaml_parity expectations to the cobra-aligned location
(verified against the real yeast-GEM.yml: 2411 reactions).
* docs: update RAVEN cross-references for the post-reorg folder layout (#20)
RAVEN moved its functions out of the core/ catch-all into purpose-based top-level
folders (SysBioChalmers/RAVEN#614). Repoint every RAVEN file path in the
cross-reference docs (IMPROVEMENTS.md, docs/reference/matlab_raven_backports.md):
- FSEOF / randomSampling / reporterMetabolites -> analysis/
- parseTaskList / checkTasks -> tasks/
- fillGaps -> gapfilling/
- addRxns / changeRxns / standardizeGrRules -> manipulation/
- getIndexes / checkModelStruct / getElementalBalance -> queries/
- getModelFromHomology -> reconstruction/homology/
- getKEGGModelForOrganism -> reconstruction/kegg/
- runINIT / ftINIT -> INIT/
Also corrects references that were stale even before the reorg (getKEGGModelForOrganism
was in external/kegg/) and points the proposed GPR-lint back-port findPotentialErrors at
manipulation/, alongside standardizeGrRules.
Doc-only: raven-python's module layout already matches RAVEN's new structure (it was the
template the reorg mirrored), so no code changes are needed.
* refactor: rename plotting subpackage to visualization (#21)
Align the (stub) plotting subpackage with RAVEN's folder layout, where pathway +
plotting were unified into visualization/ (SysBioChalmers/RAVEN#614).
- src/raven_python/plotting/ -> src/raven_python/visualization/
- pyproject optional-dependency extra [plotting] -> [visualization] (matplotlib);
CI (.github/workflows/ci.yml) and ReadTheDocs (.readthedocs.yaml) install lines
updated to match so the renamed extra still resolves.
- docs updated: README, CHANGELOG, installation, docs/README, api/index, todo.
The subpackage is an unimplemented stub (empty __init__), so nothing imports it and
there is no behaviour change. Generic uses of the word "plotting" (seaborn / heatmap
prose) are left as-is.
* Ship type information and enforce it; make gpr_to_dnf public (#22)
Three related "make the package's contracts real" changes:
- Add the PEP 561 py.typed marker so the package's extensive type hints are
visible to downstream type checkers (geckopy included). The hatchling wheel
ships raven_python/py.typed.
- Add mypy to the dev extra, a lenient [tool.mypy] config (ignore_missing_imports
for the un-stubbed cobra/optlang/scipy/ruamel), and a mypy CI job. Fix the 36
type errors this surfaced -- all type-only (Path vs str annotations, None-guards
that match existing behaviour, optlang Variable typing, isinstance/cast
narrowing). No runtime behaviour changes; the full test suite stays green.
- Promote manipulation.expand._gpr_to_dnf to a public gpr_to_dnf (re-exported
from raven_python.manipulation). geckopy's call sites switch to it in lockstep
(separate PR), so no deprecated alias is kept.
* Harden curation, EC-data and archive-handling modules (#23)
Tier-2 audit of the post-review modules surfaced four targeted fixes:
- curation/batch.py: new reactions coerce a list-valued subSystems via
subsystem_to_str (";"-joined) instead of str(list), matching the update path.
- io/ec_data.py: _eccodes_to_yaml strips stray separators in the single-EC
case so a trailing ";" never leaks into the written YAML.
- binaries.py: _safe_extract_zip rejects symlink members, defence-in-depth
alongside the existing path-traversal guard.
- binaries.py / data.py: archive and dataset downloads pass a socket timeout
to urlopen so a stalled server cannot hang the process.
Adds regression tests for each fix.
* Surgical performance pass on hot paths (#24)
Targeted, behaviour-preserving optimisations from the review:
- manipulation/add.py + change.py: resolve equation tokens through a shared
(name, compartment) -> metabolite index (_build_met_index) instead of
re-scanning model.metabolites per token. Bulk reaction add/change by name
drops from O(R*k*M) to O(R*k); the index is updated as new mets are created
so cross-token and cross-reaction dedup is preserved.
- reconstruction/homology/homology.py: replace DataFrame.apply(axis=1) in the
ortholog filter with a comprehension over the columns (membership is already
O(1); avoids per-row Series construction).
- analysis/sampling.py: build the random objective with optlang add() instead
of sum(), which re-canonicalises the expression on every term (O(n^2)).
Adds a cross-reaction metabolite-dedup regression test for the add path.
* Robustness and polish fixes (#25)
Tier 4 of the review: small, targeted hardening, no behaviour change on valid input.
- gapfilling/fill.py: clamp the connectivity gap-fill big-M to the largest finite
bound magnitude, so a template reaction with an infinite bound no longer puts an
infinite coefficient into the MILP (which broke the solver).
- reconstruction/kegg/download.py: a malformed or unreadable .netrc now raises a
ValueError explaining how to fix it, instead of a raw NetrcParseError/OSError.
- io/excel.py: always write the metabolite formula to the METS COMPOSITION column;
it was dropped whenever an InChI was present.
- visualization: the empty stub package raises a clear NotImplementedError (with a
roadmap pointer) on attribute access, via a PEP 562 module __getattr__.
A regression test per fix.
* Add code-built-model YAML round-trip test (covers the objective) (#26)
The existing YAML round-trip and parity tests originate their model from a parsed
doc; none builds a model directly from cobra objects, and none asserts the objective
coefficient survives (the parity fixture pins it to 0). Add one round-trip test that
builds a model in code with a non-zero objective and asserts metabolites, reactions,
bounds, stoichiometry, GPR, subsystem, formula, annotation and the objective all
survive write -> read.
* Share the linear-chain INIT model fixture via tests/conftest.py (#27)
test_init.py, test_init_build.py and test_init_solvers.py each built the same
linear-chain INIT model (EX_A -> A -> B -> C -> D) independently, differing only in
the model id and whether gene rules were attached. Move that construction into a new
tests/conftest.py as linear_chain_model / linear_chain_model_with_genes fixtures; the
three files now reuse it (test bodies unchanged). The bespoke _toy_ftinit_model stays
local. No behaviour change.
* Publish kegg116 KEGG artefacts (v0.1.0) (#28)
* Publish kegg116 KEGG artefacts as gzip, version-prefixed assets (v0.1.0)
First downloadable KEGG artefact set, wired into the runtime resolvers:
- All artefacts are gzip and version-prefixed (kegg116_<name>.gz) so MATLAB and
Windows read them with the built-in gunzip, no external tool. organism_gene_ko
moves from xz to gzip for the same reason.
- HMM libraries ship as one gzip concatenated flatfile per domain;
ensure_kegg_hmm_library decompresses and hmmpresses on first use, ~10x smaller
than the pressed index and portable across HMMER versions.
- Add a version-prefix-tolerant artefact resolver (_resolve_artefact) used by the
organism/sequence entry points; parse_kegg_dump and build_kegg_artefacts.py gain
an opt-in --version.
- Populate data/manifest.json and _DATA_REGISTRY with the kegg116 release assets
(real SHA256 + bytes); refresh the maintainer docs and manifest example.
- Bump version to 0.1.0 and update CHANGELOG.
* Add KEGG taxonomy artefact and phyl_dist (RAVEN getPhylDist port)
Publish kegg116_taxonomy.gz and regenerate RAVEN's keggPhylDist from it, so GECKO's
organism-distance kcat selection needs no MATLAB .mat file:
- reconstruction.kegg.phyl_dist + PhylDist faithfully reproduce RAVEN getPhylDist's
(asymmetric, occasionally negative) distance metric; parse_taxonomy_records exposes
ids/names/lineages and reads .gz transparently.
- data.ensure_kegg_taxonomy fetches the artefact; build_kegg_artefacts.py emits it.
- Register kegg116_taxonomy.gz in data/manifest.json and _DATA_REGISTRY (8 files).
- Tests for phyl_dist (hand-checked against RAVEN) and the taxonomy fetch; update
migration/IMPROVEMENTS/maintainer docs and CHANGELOG.
* Publish kegg116 KEGG artefacts as gzip, version-prefixed assets (v0.1.0) (#29)
First downloadable KEGG artefact set, wired into the runtime resolvers:
- All artefacts are gzip and version-prefixed (kegg116_<name>.gz) so MATLAB and
Windows read them with the built-in gunzip, no external tool. organism_gene_ko
moves from xz to gzip for the same reason.
- HMM libraries ship as one gzip concatenated flatfile per domain;
ensure_kegg_hmm_library decompresses and hmmpresses on first use, ~10x smaller
than the pressed index and portable across HMMER versions.
- Add a version-prefix-tolerant artefact resolver (_resolve_artefact) used by the
organism/sequence entry points; parse_kegg_dump and build_kegg_artefacts.py gain
an opt-in --version.
- Populate data/manifest.json and _DATA_REGISTRY with the kegg116 release assets
(real SHA256 + bytes); refresh the maintainer docs and manifest example.
- Bump version to 0.1.0 and update CHANGELOG.
Add KEGG taxonomy artefact and phyl_dist (RAVEN getPhylDist port)
Publish kegg116_taxonomy.gz and regenerate RAVEN's keggPhylDist from it, so GECKO's
organism-distance kcat selection needs no MATLAB .mat file:
- reconstruction.kegg.phyl_dist + PhylDist faithfully reproduce RAVEN getPhylDist's
(asymmetric, occasionally negative) distance metric; parse_taxonomy_records exposes
ids/names/lineages and reads .gz transparently.
- data.ensure_kegg_taxonomy fetches the artefact; build_kegg_artefacts.py emits it.
- Register kegg116_taxonomy.gz in data/manifest.json and _DATA_REGISTRY (8 files).
- Tests for phyl_dist (hand-checked against RAVEN) and the taxonomy fetch; update
migration/IMPROVEMENTS/maintainer docs and CHANGELOG.
Bundle core KEGG artefacts into kegg116_core.tar.gz
Combine the five core model files (reference model + KO/reaction/organism-gene/
rxn-flag tables) into one kegg116_core.tar.gz; HMM libraries and taxonomy stay
separate. The release drops from 8 assets to 4.
- ensure_kegg_data now fetches the single bundle, SHA-verifies it, and extracts the
version-prefixed members into the cache once (safe extraction, matching download.py).
- build_kegg_artefacts.py groups the core files into the bundle after the HMM step.
- Regenerate data/manifest.json and _DATA_REGISTRY (4 entries); update manifest.example,
tests (bundle fixture), and docs.
* Remove the visualization stub and [visualization] extra (#30)
Mirror MATLAB RAVEN removing its pathway-map / omics-overlay plotting functions
(drawMap, colorPathway, drawPathway, markPathwayWith*, setOmicDataToRxns, ...) as
obsolete/low-value (SysBioChalmers/RAVEN #618). raven-python only had a
not-implemented `visualization` stub reserving that domain; drop it and its
scaffolding. cobrapy + Escher cover pathway/omics visualization externally.
- Delete src/raven_python/visualization/ and tests/test_visualization.py.
- Drop the [visualization] (matplotlib) extra; remove it from CI, ReadTheDocs, and
the installation / README / api-index / todo docs.
- CHANGELOG: record the removal.
The other functions RAVEN removed (MetaCyc, xml_toolbox, Excel-import wrappers,
solveQP) were never ported to raven-python, so no further changes are needed.
* Auto-resolve the taxonomy artefact in domain-mode from_artefacts (#31)
get_kegg_model_for_organism_from_artefacts("prokaryotes"/"eukaryotes") builds a
whole-domain model, which needs the KEGG taxonomy file. Taxonomy is a separate
artefact (not part of the core set ensure_kegg_data fetches), so the call raised
"Domain mode needs the KEGG taxonomy file; pass taxonomy=." unless the caller
supplied a path by hand.
It now auto-resolves taxonomy for domain mode: from the artefact directory if
present, else via ensure_kegg_taxonomy(version). An explicit taxonomy= still wins;
species mode is unchanged. Adds a regression test.
* Use hmmsearch (not hmmscan) for the de-novo KEGG query (#32)
get_kegg_model_from_sequences now runs one hmmsearch over the concatenated KO
library instead of an hmmscan against a pressed database:
- run_hmmsearch / parse_hmmsearch_tblout replace run_hmmscan / parse_hmmscan_tblout.
hmmsearch is HMMER's faster, better-parallelising direction (profiles as the query)
and needs no hmmpress. -Z is fixed to the profile count so per-hit E-values (and
thus assign_kos output) are identical to the previous hmmscan path — verified on
real HMMs (same hits, same E-values, same assignments).
- ensure_kegg_hmm_library just gunzips the library (no hmmpress, no .h3* sidecars).
- build_hmm_library concatenates the per-KO HMMs without pressing; the published
.hmm.gz artefact is unchanged.
- Docs / IMPROVEMENTS (K7) / CHANGELOG updated.
* Replace the on-disk KEGG test fixture with a synthetic in-code dump (#33)
tests/data/kegg_dump contained real KEGG records (e.g. reaction R00010 and
KO K01194 with their EC/RHEA/ChEBI cross-references) which the project is not
licensed to redistribute.
Remove the directory and instead generate an equivalent, fully fictional
KEGG-format dump at test time via a new session-scoped fixture in
tests/conftest.py. The synthetic dump mimics the flat-file format so it still
exercises the parser (reaction flags, overview-map skipping, InChI/formula
handling, mapformula irreversibility, KO/gene grouping, taxonomy lineages) but
all identifiers, names, sequences and cross-references are invented.
The four dependent test modules (parse, query, hmm, organism) consume the
fixture and assert against the fictional ids. No real KEGG content is committed
and coverage is unchanged.
* Rename project and import package: raven-python -> raven-toolbox (#34)
* Rename project and import package: raven-python -> raven-toolbox
Rename the distribution (raven-python -> raven-toolbox) and the import
package (raven_python -> raven_toolbox) across all source, tests,
scripts, docs, and packaging metadata. Project URLs now point to
SysBioChalmers/raven-toolbox.
* Complete the rename: remaining raven-python/raven_python -> raven-toolbox/raven_toolbox
The package/distribution rename left occurrences behind after the rebase:
- import statements () in the reconstruction.kegg modules
and data.py, which would have failed at import time;
- monkeypatch string targets and the cache-path assertions in the tests;
- the wheel/package and mypy paths in pyproject.toml (still pointing at
the now-removed src/raven_python), plus the distribution name and project URLs;
- docs, data manifests and GitHub URLs.
Replace them so the import package is consistently raven_toolbox and all
distribution/repo references point to raven-toolbox. Also drop the empty
src/raven_python directory left behind by the rebase.
* Wrap homology imports to satisfy ruff isort after the rename
raven_python -> raven_toolbox widened the homology hits import past the
100-char line length, so ruff isort (I001) wanted it split across lines.
Format it as a multiline import block; ruff check . is clean again.
* CI: bump actions to Node 24 versions (checkout v5, setup-python v6) (#35)
actions/checkout@v4 and actions/setup-python@v5 run on the deprecated Node.js 20
runtime. Bump to actions/checkout@v5 and actions/setup-python@v6, both of which
run on Node.js 24, to clear the GitHub Actions deprecation warning.
* Prepare 0.2.0 release
Bump version 0.1.0 -> 0.2.0 and complete the CHANGELOG 0.2.0 section
(raven-toolbox rename, hmmsearch de-novo KEGG query, domain-mode taxonomy
auto-resolve, synthetic KEGG test fixture, visualization stub removal,
Node 24 CI).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Renames the (stub)
plottingsubpackage tovisualization, matching RAVEN's folder layout wherepathway/+plotting/were unified intovisualization/(SysBioChalmers/RAVEN#614). This is the deferred follow-up noted in #20.src/raven_python/plotting/→src/raven_python/visualization/[plotting]→[visualization](matplotlib).github/workflows/ci.yml) and ReadTheDocs (.readthedocs.yaml) install lines updated so the renamed extra still resolvesREADME.md,CHANGELOG.md,docs/installation.md,docs/README.md,docs/reference/api/index.md,docs/reference/todo.mdNotes
__init__.py), so nothing imports it — no behaviour change, no broken imports.visualization/name. The one remaining structural gap isreconstruction/metacyc/(feature work — porting MetaCyc — not a rename).