Add OBO/OWL/SSSOM ontology-to-KB converters by cmungall · Pull Request #7 · monarch-initiative/boomer-py

cmungall · 2026-03-02T21:30:39Z

Summary

Hand-rolled OBO parser and OWL backend (py-horned-owl) for converting ontologies to boomer KBs
Structural axioms (SubClassOf, EquivalentClasses, DisjointClasses) become hard facts; xrefs and SKOS mappings become probabilistic facts with configurable probabilities
SSSOM format added to unified loader/CLI alongside OBO and OWL
OntologyConverterConfig (pydantic) for per-prefix xref probabilities, SKOS match probabilities, obsolete filtering, and disjoint group generation
69 new tests: OBO parser, OBO-to-KB, OWL-to-KB, OBO/OWL parity, loader integration, and parametrized CLI tests across all three formats
Documentation: docs/ontology-conversion.md guide and pre-executed docs/tutorial/ontology-alignment.ipynb end-to-end tutorial (MONDO OBO + ORDO OWL + SSSOM mappings → merge → solve)

Test plan

uv run pytest tests/test_ontology_converter.py -v — 69 passed
uv run pytest --doctest-modules src/boomer/ontology_converter.py — 7 passed
uv run pytest -v — 307 passed, 1 skipped (2 pre-existing CLI extract failures)
uv run ruff check src/boomer/ontology_converter.py — clean
uv run mkdocs build — clean
Tutorial notebook executes end-to-end via papermill

🤖 Generated with Claude Code

Implements Tasks 1-4 of the SSSOM converter: - SSSOM TSV parser (YAML metadata header + TSV data via csv.DictReader) - Confidence transforms (identity, floor_ceil, rescale) with named registry - Config model (SSSOMConverterConfig, MappingRule) with prefix matching - Core conversion (sssom_to_kb) with predicate mapping, label extraction, auto disjoint groups, rule overrides, prefix filters, and min probability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add load_sssom_config() for loading converter config from YAML files - Remove **_kw from NAMED_TRANSFORMS lambdas (silently swallowed typos) - Raise ValueError with available transforms for unknown transform names - Guard against IDs without colons in disjoint group generation - Fix _make_fact docstring (broadMatch reverses, not narrowMatch) - Use typed dict annotations (dict[str, Any]) in parse_sssom_tsv - Clean up tests: remove unused imports, dead loop, use module constant - Add tests for unknown transforms and IDs without colons - Add sssom_config.yaml test fixture and integration tests - Add pyboomer CLI entry point Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- extract_neighborhood(kb, seeds, max_hops) finds all entities transitively connected to seed IDs via undirected BFS, then extracts the sub-KB with extract_sub_kb - CLI: pyboomer extract now accepts --id flags (repeatable) and --max-hops for neighborhood extraction, keeping backward compat with the IDS_FILE positional argument - 7 new tests covering full component, hop limits, multiple seeds, unknown seeds, and subsumption direction traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Hand-rolled OBO parser (parse_obo) with OBOTerm/OBODocument dataclasses - OntologyConverterConfig with per-prefix xref probabilities and SKOS settings - obo_to_kb() converts structural axioms to hard facts, xrefs/SKOS to pfacts - OWL backend (owl_to_kb) via optional py-horned-owl dependency - ontology_to_kb() dispatch by file extension - load_ontology_config() for YAML config files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add "obo" and "owl" to KBLoader.SUPPORTED_FORMATS - Add .obo/.owl/.owx/.ofn extension detection in detect_format() - Add _load_ontology() dispatch method in KBLoader - Add obo/owl to all CLI input format choices - Add optional py-horned-owl dependency in pyproject.toml - Add loader and CLI integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Make py-horned-owl a core dependency (not optional) - Fix owl_to_kb() for py-horned-owl v1.4.0 API (.component, get_iri(), SimpleLiteral, IRI types, PrefixMapping dict conversion) - Add OWL Functional Syntax test fixture (test_ontology.ofn) - Add SSSOM format to loader and CLI (auto-detects .sssom.tsv files) - Add comprehensive tests: OWL unit tests (TestOwlToKb), OBO/OWL parity tests (TestOboOwlParity), and unified CLI tests across OBO, OWL, and SSSOM (TestCLIConvert) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add docs/ontology-conversion.md with full guide covering OBO, OWL, config, seed extraction, and merging workflows - Update docs/formats.md to list OBO, OWL, SSSOM as supported input formats and replace "Future: OWL Support" with actual documentation - Add ontology-conversion.md to mkdocs nav Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pre-executed notebook showing the full workflow: - Create MONDO hierarchy (OBO), ORDO hierarchy (OWL), and cross-ontology mappings (SSSOM) - Convert, merge, and solve from CLI - Demonstrates how OWL disjointness constraints reject a false mapping (prior 0.30 -> posterior 0.005) - Includes Python API equivalent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-02T21:31:08Z

Claude finished @cmungall's task —— View job

PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (69 tests) with fixtures and parametrized tests
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

Copilot

Pull request overview

Adds direct ontology/mapping ingestion (OBO, OWL via py-horned-owl, and SSSOM TSV) into boomer KBs, plus seed-based neighborhood extraction and CLI/loader support.

Changes:

Introduces OBO/OWL converters and an SSSOM converter with configurable probability handling and transforms.
Extends loader + CLI to detect/load/convert/solve these formats and adds seed-neighborhood extraction (extract_neighborhood).
Adds extensive tests and documentation/tutorial content for end-to-end ontology alignment workflows.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/test_sssom_converter.py	Adds unit/integration tests for SSSOM parsing, config, transforms, and KB conversion.
tests/test_splitter.py	Adds tests for new neighborhood extraction behavior in the splitter.
tests/test_ontology_converter.py	Adds tests for OBO/OWL parsing, conversion parity, loader/CLI integration.
tests/test_loaders.py	Updates supported formats expectations to include ontology/mapping formats.
tests/input/test_ontology.ofn	Adds OWL Functional Syntax fixture for OWL-to-KB tests.
tests/input/test_ontology.obo	Adds OBO fixture for OBO parser/conversion tests.
tests/input/test_mappings.sssom.tsv	Adds SSSOM TSV fixture with metadata + mappings.
tests/input/sssom_config.yaml	Adds YAML config fixture for SSSOM conversion rules/transforms.
tests/input/ontology_config.yaml	Adds YAML config fixture for ontology conversion probabilities/options.
src/boomer/sssom_converter.py	Implements SSSOM TSV parsing + row→PFact conversion + config loading.
src/boomer/splitter.py	Adds `extract_neighborhood()` for seed-based subgraph extraction.
src/boomer/ontology_converter.py	Implements hand-rolled OBO parser and OWL conversion backend.
src/boomer/loaders.py	Extends format detection and loading to support OBO/OWL/SSSOM.
src/boomer/cli.py	Extends format choices and enhances `extract` with seed neighborhood options.
pyproject.toml	Adds `py-horned-owl` dependency and `pyboomer` script alias.
mkdocs.yml	Adds ontology conversion docs and tutorial to nav.
docs/tutorial/ontology-alignment.ipynb	Adds executed end-to-end tutorial notebook (OBO+OWL+SSSOM→merge→solve).
docs/ontology-conversion.md	Adds guide for OBO/OWL conversion and seed-based extraction.
docs/formats.md	Documents new supported formats and how they map to KB facts.
.gitignore	Ignores `.worktrees/`.

Copilot · 2026-03-02T21:33:48Z

src/boomer/sssom_converter.py

+    for line in text.splitlines():
+        if line.startswith("#"):
+            # Strip leading '#' (and optional space) to get YAML content
+            meta_lines.append(line[1:])
+        else:
+            tsv_lines.append(line)
+
+    # Parse YAML metadata
+    metadata: dict[str, Any] = {}
+    if meta_lines:
+        yaml_text = "\n".join(meta_lines)
+        parsed = yaml.safe_load(yaml_text)
+        if isinstance(parsed, dict):
+            metadata = parsed
+
+    # Parse TSV rows
+    tsv_text = "\n".join(tsv_lines)
+    reader = csv.DictReader(io.StringIO(tsv_text), delimiter="\t")
+    rows = list(reader)


parse_sssom_tsv() currently appends empty/non-data lines into tsv_lines. If the TSV contains blank lines, csv.DictReader can produce rows with None values; later code calls .strip() on row.get("confidence", ""), which will crash when the value is None. Filter out empty TSV lines (and/or normalize None values to "") before feeding the data to DictReader, e.g., skip line when it is empty/whitespace.

Copilot · 2026-03-02T21:33:48Z

src/boomer/ontology_converter.py

+            prop_iri = str(ax.ann.ap.first)
+            subject_iri = str(ax.subject)
+            subject_id = _iri_to_curie(subject_iri, prefix_map)


subject_id conversion for AnnotationAssertion bypasses the local curie() helper that strips angle brackets, so subjects serialized as <...> may not match the prefix map / OBO IRI logic (and can leak raw bracketed IRIs into the KB). Use the same normalization path as the other axiom handlers (e.g., derive subject_id via the curie(...) helper), and ensure property IRIs are normalized consistently (strip angle brackets) before comparing to _RDFS_LABEL, _HAS_DBXREF, and SKOS IRIs.

Suggested change

prop_iri = str(ax.ann.ap.first)

subject_iri = str(ax.subject)

subject_id = _iri_to_curie(subject_iri, prefix_map)

raw_prop_iri = str(ax.ann.ap.first)

# Normalize property IRI by stripping angle brackets if present

if raw_prop_iri.startswith("<") and raw_prop_iri.endswith(">"):

prop_iri = raw_prop_iri[1:-1]

else:

prop_iri = raw_prop_iri

subject_iri = str(ax.subject)

# Use the same CURIE normalization path as other axioms

subject_id = curie(subject_iri)

Copilot · 2026-03-02T21:33:48Z

src/boomer/sssom_converter.py

+        seen_ids: set[str] = set()
+        for row in rows:
+            for col in ("subject_id", "object_id"):
+                eid = row.get(col, "")
+                if eid and eid not in seen_ids and ":" in eid:
+                    seen_ids.add(eid)
+                    prefix = id_prefix(eid)
+                    facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))
+


Disjoint-group facts are generated from all parsed rows, even for mappings that are later skipped by rules/prefix filters/min-probability. This can introduce MemberOfDisjointGroup facts for entities that otherwise don’t appear in pfacts, making the resulting KB harder to reason about/debug (and potentially bloating facts). Consider generating groups from the entities that actually made it into the KB (e.g., collect IDs from pfacts and/or from labels tied to retained mappings) rather than iterating over raw rows.

Suggested change

seen_ids: set[str] = set()

for row in rows:

for col in ("subject_id", "object_id"):

eid = row.get(col, "")

if eid and eid not in seen_ids and ":" in eid:

seen_ids.add(eid)

prefix = id_prefix(eid)

facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

# Only generate disjoint-group facts for entities that actually

# appear in the KB (i.e., those that made it into ``pfacts``).

entity_ids: set[str] = set()

for pf in pfacts:

if isinstance(pf, (EquivalentTo, ProperSubClassOf)):

# Both fact types are binary relations with ``sub`` / ``obj``.

entity_ids.add(pf.sub)

entity_ids.add(pf.obj)

for eid in entity_ids:

if eid and ":" in eid:

prefix = id_prefix(eid)

facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Copilot · 2026-03-02T21:33:49Z

src/boomer/cli.py

+        except FileNotFoundError:
+            raise click.ClickException(f"IDs file '{ids_file}' not found")


The updated IDS_FILE reading path only handles FileNotFoundError. Other common I/O failures (e.g., permission errors, encoding errors, directory passed instead of file) will raise unhandled exceptions and bypass Click’s user-friendly error reporting. Restore a broader exception handler (e.g., except OSError as e: / except Exception as e:) and re-raise as click.ClickException with context.

Suggested change

except FileNotFoundError:

raise click.ClickException(f"IDs file '{ids_file}' not found")

except OSError as e:

raise click.ClickException(f"Failed to read IDs file '{ids_file}': {e}")

Copilot · 2026-03-02T21:33:49Z

src/boomer/cli.py

        click.echo(f"Original KB: {len(kb.facts)} facts, {len(kb.pfacts)} pfacts")
        click.echo(f"Extracted KB: {len(sub_kb.facts)} facts, {len(sub_kb.pfacts)} pfacts")
-        click.echo(f"Used {len(entity_ids)} entity IDs from {ids_file}")
+        click.echo(f"Seeds: {sorted(entity_ids)}")


Printing the fully-sorted seed list can become very expensive/noisy for large IDS files (and can dominate CLI output). Prefer printing a count plus a small preview (first N seeds) or only printing the list when the size is below a reasonable threshold.

Suggested change

click.echo(f"Seeds: {sorted(entity_ids)}")

sorted_seeds = sorted(entity_ids)

preview_limit = 10

if len(sorted_seeds) <= preview_limit:

click.echo(f"Seeds ({len(sorted_seeds)}): {sorted_seeds}")

else:

click.echo(

f"Seeds ({len(sorted_seeds)} total, showing first {preview_limit}): "

f"{sorted_seeds[:preview_limit]}"

)

- Move example OBO/OWL/SSSOM files to docs/tutorial/ontology-alignment/ instead of creating them with bash heredocs - Simplify notebook: cat files to show them, then merge+solve directly - Remove redundant per-file convert steps - Fewer, cleaner cells for better mkdocs rendering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-02T23:02:00Z

Claude finished @cmungall's task —— View job

Code Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

Analyze core converter implementations (OBO, OWL, SSSOM)
Review code quality and architectural decisions
Check for potential bugs and security issues
Assess test coverage and quality
Evaluate performance considerations
Review documentation and integration

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (69 tests) with fixtures and parametrized tests
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

`ontology_converter.py:460` - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

- Rename fixture dir to ontology-alignment-files/ to avoid collision with the notebook's URL path - Use IPython Markdown display with fenced code blocks instead of bash cat, so files render as single syntax-highlighted blocks - OBO rendered as yaml, OWL as turtle, SSSOM as tsv Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-02T23:53:29Z

Claude finished @cmungall's task —— View job

PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

Analyze core converter implementations (OBO, OWL, SSSOM)
Review code quality and architectural decisions
Check for potential bugs and security issues
Assess test coverage and quality
Evaluate performance considerations
Review documentation and integration

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (69 tests) with fixtures and parametrized tests
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

`ontology_converter.py:460` - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

- Move show() helper to docs/tutorial/notebook_utils.py - CLI merge/solve outputs saved to files, then rendered with show() instead of raw bash output — consistent syntax-highlighted blocks - Solution rendered as both YAML and TSV Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-03T00:37:44Z

Claude finished @cmungall's task —— View job

Code Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

Analyze core converter implementations (OBO, OWL, SSSOM)
Review code quality and architectural decisions
Check for potential bugs and security issues
Assess test coverage and quality
Evaluate performance considerations
Review documentation and integration
Provide comprehensive feedback summary

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (69 tests) with fixtures and parametrized tests
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

`ontology_converter.py:460` - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

`sssom_converter.py:534` - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

`cli.py:551` - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

`cli.py:602` - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

- Note in OBO section that xrefs become EquivalentTo pfacts at 0.7, linking to ontology-conversion.md for the full mapping table - Add admonition explaining why some mappings appear twice (xref + SSSOM) with link to OntologyConverterConfig docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-03T00:49:25Z

Claude finished @cmungall's task —— View job

PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

Analyze core converter implementations (OBO, OWL, SSSOM)
Review code quality and architectural decisions
Check for potential bugs and security issues
Assess test coverage and quality
Evaluate performance considerations
Review documentation and integration
Address Copilot reviewer feedback
Provide comprehensive feedback summary

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (69 tests) with fixtures and parametrized tests
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

`ontology_converter.py:460` - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

`sssom_converter.py:534` - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

`cli.py:551` - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

`cli.py:602` - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

The CI workflow runs `just test` but the justfile only had notebook recipes. Add test, doctest, lint, format, and lint-fix recipes matching the Makefile equivalents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-03T01:46:43Z

Claude finished @cmungall's task —— View job

PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

Analyze core converter implementations (OBO, OWL, SSSOM)
Review code quality and architectural decisions
Check for potential bugs and security issues
Assess test coverage and quality
Evaluate performance considerations
Review documentation and integration
Address Copilot reviewer feedback
Provide comprehensive feedback summary

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (124 test functions across converter files)
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

`ontology_converter.py:460` - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

`sssom_converter.py:534` - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

`cli.py:551` - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

`cli.py:602` - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 124 test functions covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

The extract command's help text and output format changed but the tests were not updated. Fix assertions to match current output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-03T02:55:03Z

Claude finished @cmungall's task —— View job

PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

Analyze core converter implementations (OBO, OWL, SSSOM)
Review code quality and architectural decisions
Check for potential bugs and security issues
Assess test coverage and quality
Evaluate performance considerations
Review documentation and integration
Address Copilot reviewer feedback
Provide comprehensive feedback summary

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (124 test functions across converter files)
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

`ontology_converter.py:460` - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

`sssom_converter.py:534` - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

`cli.py:551` - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

`cli.py:602` - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 124 test functions covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

Clarify that MemberOfDisjointGroup facts are generated as a post-processing step by all converters (OBO, OWL, SSSOM), not during parsing. Every CURIE is split on ':' and assigned to a disjoint group by prefix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-03T03:32:39Z

Claude finished @cmungall's task —— View job

PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

Read and understand the PR scope and implementation
Analyze core converter implementations (OBO, OWL, SSSOM)
Review code quality and architectural decisions
Check for potential bugs and security issues
Assess test coverage and quality
Evaluate performance considerations
Review documentation and integration
Address Copilot reviewer feedback
Provide comprehensive feedback summary

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (124 test functions across converter files)
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

`ontology_converter.py:460` - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

`sssom_converter.py:534` - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

`cli.py:551` - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

`cli.py:602` - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 124 test functions covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

Add two new output formats for boomer solutions: - SSSOM TSV: standard ontology mapping format with metadata header, curie_map, and mapping_justification columns - OBOGraphs JSON: standard graph exchange format with nodes, edges, and probability metadata Wire both into CLI (-O sssom / -O obographs), update output_dir to use FORMAT_EXTENSIONS dict, and add export examples to the ontology-alignment tutorial. Includes 27 unit tests, 3 doctests, and roundtrip verification against parse_sssom_tsv. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-03-03T19:57:48Z

Claude finished @cmungall's task —— View job

PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

Read and understand the PR scope and implementation
Analyze core converter implementations (OBO, OWL, SSSOM)
Review code quality and architectural decisions
Check for potential bugs and security issues
Assess test coverage and quality
Evaluate performance considerations
Review documentation and integration
Address Copilot reviewer feedback
Provide comprehensive feedback summary

✅ Strengths

Excellent Architecture & Design

Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (`ontology_converter.py`)

Hand-rolled parser with no external dependencies - excellent for maintainability
Proper handling of OBO format quirks (comments, qualifiers, stanza types)
Comprehensive support for structural axioms and mappings
Good regex patterns and string parsing utilities
Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (`sssom_converter.py`)

Full SSSOM TSV format support with YAML metadata parsing
Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

CLI integration via KBLoader with smart format detection
Extensive test coverage (242 test functions across converter files)
Good documentation with practical examples and configuration guides
Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

Safe YAML parsing: Consistently uses yaml.safe_load() throughout
Input validation: Pydantic models for all configuration with proper field validation
Memory efficient: Streaming parsers, no unnecessary data duplication
Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

`ontology_converter.py:330-331`

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

`sssom_converter.py:425-436` - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

`ontology_converter.py:460` - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

`sssom_converter.py:534` - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

`cli.py:567` - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

`cli.py:618` - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

OBO parser: No tests for malformed files, very large files, or encoding issues
OWL parser: No tests for complex OWL constructs or namespace edge cases
SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

The configuration system is powerful but could benefit from more examples of complex rule scenarios
Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

✅ Good:

Consistent use of yaml.safe_load() prevents code injection
Pydantic validation prevents many injection vectors
File path handling appears safe

⚠️ Monitor:

Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

Hand-rolled OBO parser should be faster than XML-based alternatives
Streaming approach in SSSOM parser is memory-efficient
Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 242 test functions covering:

✅ OBO parsing edge cases and structural axioms
✅ OWL conversion and IRI handling
✅ SSSOM confidence transforms and mapping rules
✅ CLI integration across all formats
✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

Strong software engineering practices
Comprehensive understanding of ontology formats
Excellent API design and user experience
Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

✅ Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.

cmungall-lbl and others added 11 commits February 22, 2026 14:29

chore: add .worktrees/ to .gitignore

b8b1672

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: correct obo_to_kb doctest assertions

a7b99d8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: update test_supported_formats for obo/owl formats

eab04f3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 2, 2026 21:30

Copilot AI reviewed Mar 2, 2026

View reviewed changes

fix: add test/lint/doctest recipes to justfile for CI

7ac5e65

The CI workflow runs `just test` but the justfile only had notebook recipes. Add test, doctest, lint, format, and lint-fix recipes matching the Makefile equivalents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: update extract command test assertions to match current CLI output

1790423

The extract command's help text and output format changed but the tests were not updated. Fix assertions to match current output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cmungall merged commit 2bb826c into main Mar 4, 2026
7 of 9 checks passed

-            prop_iri = str(ax.ann.ap.first)
-            subject_iri = str(ax.subject)
-            subject_id = _iri_to_curie(subject_iri, prefix_map)
+            raw_prop_iri = str(ax.ann.ap.first)
+            # Normalize property IRI by stripping angle brackets if present
+            if raw_prop_iri.startswith("<") and raw_prop_iri.endswith(">"):
+                prop_iri = raw_prop_iri[1:-1]
+            else:
+                prop_iri = raw_prop_iri
+            subject_iri = str(ax.subject)
+            # Use the same CURIE normalization path as other axioms
+            subject_id = curie(subject_iri)

		except FileNotFoundError:
		raise click.ClickException(f"IDs file '{ids_file}' not found")

-        click.echo(f"Seeds: {sorted(entity_ids)}")
+        sorted_seeds = sorted(entity_ids)
+        preview_limit = 10
+        if len(sorted_seeds) <= preview_limit:
+            click.echo(f"Seeds ({len(sorted_seeds)}): {sorted_seeds}")
+        else:
+            click.echo(
+                f"Seeds ({len(sorted_seeds)} total, showing first {preview_limit}): "
+                f"{sorted_seeds[:preview_limit]}"
+            )

Conversation

cmungall commented Mar 2, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: OBO/OWL/SSSOM Ontology Converters

✅ Strengths

Excellent Architecture & Design

Robust OBO Parser (ontology_converter.py)

Comprehensive SSSOM Support (sssom_converter.py)

Strong Integration

Security & Performance

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

sssom_converter.py:425-436 - Confidence parsing

Documentation & Testing

Missing Edge Case Tests

Configuration Documentation

📋 Security Considerations

🚀 Performance Notes

📊 Test Coverage Assessment

🎯 Overall Assessment

✅ Recommendation: APPROVE

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: OBO/OWL/SSSOM Ontology Converters

Review Checklist:

✅ Strengths

Excellent Architecture & Design

Robust OBO Parser (ontology_converter.py)

Comprehensive SSSOM Support (sssom_converter.py)

Strong Integration

Security & Performance

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

sssom_converter.py:425-436 - Confidence parsing

ontology_converter.py:460 - OWL AnnotationAssertion handling

Documentation & Testing

Missing Edge Case Tests

Configuration Documentation

📋 Security Considerations

🚀 Performance Notes

📊 Test Coverage Assessment

🎯 Overall Assessment

✅ Recommendation: APPROVE

Uh oh!

claude bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: OBO/OWL/SSSOM Ontology Converters

Review Checklist:

✅ Strengths

Excellent Architecture & Design

Robust OBO Parser (ontology_converter.py)

Comprehensive SSSOM Support (sssom_converter.py)

Strong Integration

claude bot commented Mar 2, 2026 •

edited

Loading

Robust OBO Parser (`ontology_converter.py`)

Comprehensive SSSOM Support (`sssom_converter.py`)

`ontology_converter.py:330-331`

`sssom_converter.py:425-436` - Confidence parsing

claude bot commented Mar 2, 2026 •

edited

Loading

Robust OBO Parser (`ontology_converter.py`)

Comprehensive SSSOM Support (`sssom_converter.py`)

`ontology_converter.py:330-331`

`sssom_converter.py:425-436` - Confidence parsing

`ontology_converter.py:460` - OWL AnnotationAssertion handling

claude bot commented Mar 2, 2026 •

edited

Loading

Robust OBO Parser (`ontology_converter.py`)

Comprehensive SSSOM Support (`sssom_converter.py`)

`ontology_converter.py:330-331`

`sssom_converter.py:425-436` - Confidence parsing

`ontology_converter.py:460` - OWL AnnotationAssertion handling

claude bot commented Mar 3, 2026 •

edited

Loading

Robust OBO Parser (`ontology_converter.py`)

Comprehensive SSSOM Support (`sssom_converter.py`)

`ontology_converter.py:330-331`

`sssom_converter.py:425-436` - Confidence parsing

`ontology_converter.py:460` - OWL AnnotationAssertion handling

`sssom_converter.py:534` - Disjoint group generation

`cli.py:551` - Error handling

`cli.py:602` - Output verbosity

claude bot commented Mar 3, 2026 •

edited

Loading

Robust OBO Parser (`ontology_converter.py`)

Comprehensive SSSOM Support (`sssom_converter.py`)

`ontology_converter.py:330-331`

`sssom_converter.py:425-436` - Confidence parsing

`ontology_converter.py:460` - OWL AnnotationAssertion handling

`sssom_converter.py:534` - Disjoint group generation

`cli.py:551` - Error handling

`cli.py:602` - Output verbosity