Skip to content

Add OBO/OWL/SSSOM ontology-to-KB converters#7

Merged
cmungall merged 19 commits intomainfrom
feature/ontology-converter
Mar 4, 2026
Merged

Add OBO/OWL/SSSOM ontology-to-KB converters#7
cmungall merged 19 commits intomainfrom
feature/ontology-converter

Conversation

@cmungall
Copy link
Member

@cmungall cmungall commented Mar 2, 2026

Summary

  • Hand-rolled OBO parser and OWL backend (py-horned-owl) for converting ontologies to boomer KBs
  • Structural axioms (SubClassOf, EquivalentClasses, DisjointClasses) become hard facts; xrefs and SKOS mappings become probabilistic facts with configurable probabilities
  • SSSOM format added to unified loader/CLI alongside OBO and OWL
  • OntologyConverterConfig (pydantic) for per-prefix xref probabilities, SKOS match probabilities, obsolete filtering, and disjoint group generation
  • 69 new tests: OBO parser, OBO-to-KB, OWL-to-KB, OBO/OWL parity, loader integration, and parametrized CLI tests across all three formats
  • Documentation: docs/ontology-conversion.md guide and pre-executed docs/tutorial/ontology-alignment.ipynb end-to-end tutorial (MONDO OBO + ORDO OWL + SSSOM mappings → merge → solve)

Test plan

  • uv run pytest tests/test_ontology_converter.py -v — 69 passed
  • uv run pytest --doctest-modules src/boomer/ontology_converter.py — 7 passed
  • uv run pytest -v — 307 passed, 1 skipped (2 pre-existing CLI extract failures)
  • uv run ruff check src/boomer/ontology_converter.py — clean
  • uv run mkdocs build — clean
  • Tutorial notebook executes end-to-end via papermill

🤖 Generated with Claude Code

cmungall-lbl and others added 11 commits February 22, 2026 14:29
Implements Tasks 1-4 of the SSSOM converter:
- SSSOM TSV parser (YAML metadata header + TSV data via csv.DictReader)
- Confidence transforms (identity, floor_ceil, rescale) with named registry
- Config model (SSSOMConverterConfig, MappingRule) with prefix matching
- Core conversion (sssom_to_kb) with predicate mapping, label extraction,
  auto disjoint groups, rule overrides, prefix filters, and min probability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add load_sssom_config() for loading converter config from YAML files
- Remove **_kw from NAMED_TRANSFORMS lambdas (silently swallowed typos)
- Raise ValueError with available transforms for unknown transform names
- Guard against IDs without colons in disjoint group generation
- Fix _make_fact docstring (broadMatch reverses, not narrowMatch)
- Use typed dict annotations (dict[str, Any]) in parse_sssom_tsv
- Clean up tests: remove unused imports, dead loop, use module constant
- Add tests for unknown transforms and IDs without colons
- Add sssom_config.yaml test fixture and integration tests
- Add pyboomer CLI entry point

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- extract_neighborhood(kb, seeds, max_hops) finds all entities
  transitively connected to seed IDs via undirected BFS, then
  extracts the sub-KB with extract_sub_kb
- CLI: pyboomer extract now accepts --id flags (repeatable) and
  --max-hops for neighborhood extraction, keeping backward compat
  with the IDS_FILE positional argument
- 7 new tests covering full component, hop limits, multiple seeds,
  unknown seeds, and subsumption direction traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Hand-rolled OBO parser (parse_obo) with OBOTerm/OBODocument dataclasses
- OntologyConverterConfig with per-prefix xref probabilities and SKOS settings
- obo_to_kb() converts structural axioms to hard facts, xrefs/SKOS to pfacts
- OWL backend (owl_to_kb) via optional py-horned-owl dependency
- ontology_to_kb() dispatch by file extension
- load_ontology_config() for YAML config files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "obo" and "owl" to KBLoader.SUPPORTED_FORMATS
- Add .obo/.owl/.owx/.ofn extension detection in detect_format()
- Add _load_ontology() dispatch method in KBLoader
- Add obo/owl to all CLI input format choices
- Add optional py-horned-owl dependency in pyproject.toml
- Add loader and CLI integration tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Make py-horned-owl a core dependency (not optional)
- Fix owl_to_kb() for py-horned-owl v1.4.0 API (.component, get_iri(),
  SimpleLiteral, IRI types, PrefixMapping dict conversion)
- Add OWL Functional Syntax test fixture (test_ontology.ofn)
- Add SSSOM format to loader and CLI (auto-detects .sssom.tsv files)
- Add comprehensive tests: OWL unit tests (TestOwlToKb), OBO/OWL parity
  tests (TestOboOwlParity), and unified CLI tests across OBO, OWL, and
  SSSOM (TestCLIConvert)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add docs/ontology-conversion.md with full guide covering OBO, OWL,
  config, seed extraction, and merging workflows
- Update docs/formats.md to list OBO, OWL, SSSOM as supported input
  formats and replace "Future: OWL Support" with actual documentation
- Add ontology-conversion.md to mkdocs nav

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-executed notebook showing the full workflow:
- Create MONDO hierarchy (OBO), ORDO hierarchy (OWL), and
  cross-ontology mappings (SSSOM)
- Convert, merge, and solve from CLI
- Demonstrates how OWL disjointness constraints reject a false
  mapping (prior 0.30 -> posterior 0.005)
- Includes Python API equivalent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 2, 2026 21:30
@claude
Copy link

claude bot commented Mar 2, 2026

Claude finished @cmungall's task —— View job


PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (69 tests) with fixtures and parametrized tests
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds direct ontology/mapping ingestion (OBO, OWL via py-horned-owl, and SSSOM TSV) into boomer KBs, plus seed-based neighborhood extraction and CLI/loader support.

Changes:

  • Introduces OBO/OWL converters and an SSSOM converter with configurable probability handling and transforms.
  • Extends loader + CLI to detect/load/convert/solve these formats and adds seed-neighborhood extraction (extract_neighborhood).
  • Adds extensive tests and documentation/tutorial content for end-to-end ontology alignment workflows.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_sssom_converter.py Adds unit/integration tests for SSSOM parsing, config, transforms, and KB conversion.
tests/test_splitter.py Adds tests for new neighborhood extraction behavior in the splitter.
tests/test_ontology_converter.py Adds tests for OBO/OWL parsing, conversion parity, loader/CLI integration.
tests/test_loaders.py Updates supported formats expectations to include ontology/mapping formats.
tests/input/test_ontology.ofn Adds OWL Functional Syntax fixture for OWL-to-KB tests.
tests/input/test_ontology.obo Adds OBO fixture for OBO parser/conversion tests.
tests/input/test_mappings.sssom.tsv Adds SSSOM TSV fixture with metadata + mappings.
tests/input/sssom_config.yaml Adds YAML config fixture for SSSOM conversion rules/transforms.
tests/input/ontology_config.yaml Adds YAML config fixture for ontology conversion probabilities/options.
src/boomer/sssom_converter.py Implements SSSOM TSV parsing + row→PFact conversion + config loading.
src/boomer/splitter.py Adds extract_neighborhood() for seed-based subgraph extraction.
src/boomer/ontology_converter.py Implements hand-rolled OBO parser and OWL conversion backend.
src/boomer/loaders.py Extends format detection and loading to support OBO/OWL/SSSOM.
src/boomer/cli.py Extends format choices and enhances extract with seed neighborhood options.
pyproject.toml Adds py-horned-owl dependency and pyboomer script alias.
mkdocs.yml Adds ontology conversion docs and tutorial to nav.
docs/tutorial/ontology-alignment.ipynb Adds executed end-to-end tutorial notebook (OBO+OWL+SSSOM→merge→solve).
docs/ontology-conversion.md Adds guide for OBO/OWL conversion and seed-based extraction.
docs/formats.md Documents new supported formats and how they map to KB facts.
.gitignore Ignores .worktrees/.

Comment on lines +274 to +292
for line in text.splitlines():
if line.startswith("#"):
# Strip leading '#' (and optional space) to get YAML content
meta_lines.append(line[1:])
else:
tsv_lines.append(line)

# Parse YAML metadata
metadata: dict[str, Any] = {}
if meta_lines:
yaml_text = "\n".join(meta_lines)
parsed = yaml.safe_load(yaml_text)
if isinstance(parsed, dict):
metadata = parsed

# Parse TSV rows
tsv_text = "\n".join(tsv_lines)
reader = csv.DictReader(io.StringIO(tsv_text), delimiter="\t")
rows = list(reader)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_sssom_tsv() currently appends empty/non-data lines into tsv_lines. If the TSV contains blank lines, csv.DictReader can produce rows with None values; later code calls .strip() on row.get("confidence", ""), which will crash when the value is None. Filter out empty TSV lines (and/or normalize None values to "") before feeding the data to DictReader, e.g., skip line when it is empty/whitespace.

Copilot uses AI. Check for mistakes.
Comment on lines +458 to +460
prop_iri = str(ax.ann.ap.first)
subject_iri = str(ax.subject)
subject_id = _iri_to_curie(subject_iri, prefix_map)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subject_id conversion for AnnotationAssertion bypasses the local curie() helper that strips angle brackets, so subjects serialized as <...> may not match the prefix map / OBO IRI logic (and can leak raw bracketed IRIs into the KB). Use the same normalization path as the other axiom handlers (e.g., derive subject_id via the curie(...) helper), and ensure property IRIs are normalized consistently (strip angle brackets) before comparing to _RDFS_LABEL, _HAS_DBXREF, and SKOS IRIs.

Suggested change
prop_iri = str(ax.ann.ap.first)
subject_iri = str(ax.subject)
subject_id = _iri_to_curie(subject_iri, prefix_map)
raw_prop_iri = str(ax.ann.ap.first)
# Normalize property IRI by stripping angle brackets if present
if raw_prop_iri.startswith("<") and raw_prop_iri.endswith(">"):
prop_iri = raw_prop_iri[1:-1]
else:
prop_iri = raw_prop_iri
subject_iri = str(ax.subject)
# Use the same CURIE normalization path as other axioms
subject_id = curie(subject_iri)

Copilot uses AI. Check for mistakes.
Comment on lines +526 to +534
seen_ids: set[str] = set()
for row in rows:
for col in ("subject_id", "object_id"):
eid = row.get(col, "")
if eid and eid not in seen_ids and ":" in eid:
seen_ids.add(eid)
prefix = id_prefix(eid)
facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disjoint-group facts are generated from all parsed rows, even for mappings that are later skipped by rules/prefix filters/min-probability. This can introduce MemberOfDisjointGroup facts for entities that otherwise don’t appear in pfacts, making the resulting KB harder to reason about/debug (and potentially bloating facts). Consider generating groups from the entities that actually made it into the KB (e.g., collect IDs from pfacts and/or from labels tied to retained mappings) rather than iterating over raw rows.

Suggested change
seen_ids: set[str] = set()
for row in rows:
for col in ("subject_id", "object_id"):
eid = row.get(col, "")
if eid and eid not in seen_ids and ":" in eid:
seen_ids.add(eid)
prefix = id_prefix(eid)
facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))
# Only generate disjoint-group facts for entities that actually
# appear in the KB (i.e., those that made it into ``pfacts``).
entity_ids: set[str] = set()
for pf in pfacts:
if isinstance(pf, (EquivalentTo, ProperSubClassOf)):
# Both fact types are binary relations with ``sub`` / ``obj``.
entity_ids.add(pf.sub)
entity_ids.add(pf.obj)
for eid in entity_ids:
if eid and ":" in eid:
prefix = id_prefix(eid)
facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Copilot uses AI. Check for mistakes.
Comment on lines +550 to +551
except FileNotFoundError:
raise click.ClickException(f"IDs file '{ids_file}' not found")
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated IDS_FILE reading path only handles FileNotFoundError. Other common I/O failures (e.g., permission errors, encoding errors, directory passed instead of file) will raise unhandled exceptions and bypass Click’s user-friendly error reporting. Restore a broader exception handler (e.g., except OSError as e: / except Exception as e:) and re-raise as click.ClickException with context.

Suggested change
except FileNotFoundError:
raise click.ClickException(f"IDs file '{ids_file}' not found")
except OSError as e:
raise click.ClickException(f"Failed to read IDs file '{ids_file}': {e}")

Copilot uses AI. Check for mistakes.
click.echo(f"Original KB: {len(kb.facts)} facts, {len(kb.pfacts)} pfacts")
click.echo(f"Extracted KB: {len(sub_kb.facts)} facts, {len(sub_kb.pfacts)} pfacts")
click.echo(f"Used {len(entity_ids)} entity IDs from {ids_file}")
click.echo(f"Seeds: {sorted(entity_ids)}")
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing the fully-sorted seed list can become very expensive/noisy for large IDS files (and can dominate CLI output). Prefer printing a count plus a small preview (first N seeds) or only printing the list when the size is below a reasonable threshold.

Suggested change
click.echo(f"Seeds: {sorted(entity_ids)}")
sorted_seeds = sorted(entity_ids)
preview_limit = 10
if len(sorted_seeds) <= preview_limit:
click.echo(f"Seeds ({len(sorted_seeds)}): {sorted_seeds}")
else:
click.echo(
f"Seeds ({len(sorted_seeds)} total, showing first {preview_limit}): "
f"{sorted_seeds[:preview_limit]}"
)

Copilot uses AI. Check for mistakes.
- Move example OBO/OWL/SSSOM files to docs/tutorial/ontology-alignment/
  instead of creating them with bash heredocs
- Simplify notebook: cat files to show them, then merge+solve directly
- Remove redundant per-file convert steps
- Fewer, cleaner cells for better mkdocs rendering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 2, 2026

Claude finished @cmungall's task —— View job


Code Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

  • Analyze core converter implementations (OBO, OWL, SSSOM)
  • Review code quality and architectural decisions
  • Check for potential bugs and security issues
  • Assess test coverage and quality
  • Evaluate performance considerations
  • Review documentation and integration

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (69 tests) with fixtures and parametrized tests
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

ontology_converter.py:460 - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


- Rename fixture dir to ontology-alignment-files/ to avoid collision
  with the notebook's URL path
- Use IPython Markdown display with fenced code blocks instead of
  bash cat, so files render as single syntax-highlighted blocks
- OBO rendered as yaml, OWL as turtle, SSSOM as tsv

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 2, 2026

Claude finished @cmungall's task —— View job


PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

  • Analyze core converter implementations (OBO, OWL, SSSOM)
  • Review code quality and architectural decisions
  • Check for potential bugs and security issues
  • Assess test coverage and quality
  • Evaluate performance considerations
  • Review documentation and integration

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (69 tests) with fixtures and parametrized tests
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

ontology_converter.py:460 - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


- Move show() helper to docs/tutorial/notebook_utils.py
- CLI merge/solve outputs saved to files, then rendered with show()
  instead of raw bash output — consistent syntax-highlighted blocks
- Solution rendered as both YAML and TSV

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @cmungall's task —— View job


Code Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

  • Analyze core converter implementations (OBO, OWL, SSSOM)
  • Review code quality and architectural decisions
  • Check for potential bugs and security issues
  • Assess test coverage and quality
  • Evaluate performance considerations
  • Review documentation and integration
  • Provide comprehensive feedback summary

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (69 tests) with fixtures and parametrized tests
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

ontology_converter.py:460 - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

sssom_converter.py:534 - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

cli.py:551 - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

cli.py:602 - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


- Note in OBO section that xrefs become EquivalentTo pfacts at 0.7,
  linking to ontology-conversion.md for the full mapping table
- Add admonition explaining why some mappings appear twice (xref + SSSOM)
  with link to OntologyConverterConfig docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @cmungall's task —— View job


PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

  • Analyze core converter implementations (OBO, OWL, SSSOM)
  • Review code quality and architectural decisions
  • Check for potential bugs and security issues
  • Assess test coverage and quality
  • Evaluate performance considerations
  • Review documentation and integration
  • Address Copilot reviewer feedback
  • Provide comprehensive feedback summary

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (69 tests) with fixtures and parametrized tests
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

ontology_converter.py:460 - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

sssom_converter.py:534 - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

cli.py:551 - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

cli.py:602 - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 69 tests covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


The CI workflow runs `just test` but the justfile only had notebook
recipes. Add test, doctest, lint, format, and lint-fix recipes
matching the Makefile equivalents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @cmungall's task —— View job


PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

  • Analyze core converter implementations (OBO, OWL, SSSOM)
  • Review code quality and architectural decisions
  • Check for potential bugs and security issues
  • Assess test coverage and quality
  • Evaluate performance considerations
  • Review documentation and integration
  • Address Copilot reviewer feedback
  • Provide comprehensive feedback summary

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (124 test functions across converter files)
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

ontology_converter.py:460 - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

sssom_converter.py:534 - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

cli.py:551 - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

cli.py:602 - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 124 test functions covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


The extract command's help text and output format changed but the tests
were not updated. Fix assertions to match current output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @cmungall's task —— View job


PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

  • Analyze core converter implementations (OBO, OWL, SSSOM)
  • Review code quality and architectural decisions
  • Check for potential bugs and security issues
  • Assess test coverage and quality
  • Evaluate performance considerations
  • Review documentation and integration
  • Address Copilot reviewer feedback
  • Provide comprehensive feedback summary

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (124 test functions across converter files)
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

ontology_converter.py:460 - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

sssom_converter.py:534 - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

cli.py:551 - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

cli.py:602 - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 124 test functions covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


Clarify that MemberOfDisjointGroup facts are generated as a
post-processing step by all converters (OBO, OWL, SSSOM), not during
parsing. Every CURIE is split on ':' and assigned to a disjoint group
by prefix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @cmungall's task —— View job


PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

  • Read and understand the PR scope and implementation
  • Analyze core converter implementations (OBO, OWL, SSSOM)
  • Review code quality and architectural decisions
  • Check for potential bugs and security issues
  • Assess test coverage and quality
  • Evaluate performance considerations
  • Review documentation and integration
  • Address Copilot reviewer feedback
  • Provide comprehensive feedback summary

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (124 test functions across converter files)
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

ontology_converter.py:460 - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

sssom_converter.py:534 - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

cli.py:551 - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

cli.py:602 - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 124 test functions covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


Add two new output formats for boomer solutions:
- SSSOM TSV: standard ontology mapping format with metadata header,
  curie_map, and mapping_justification columns
- OBOGraphs JSON: standard graph exchange format with nodes, edges,
  and probability metadata

Wire both into CLI (-O sssom / -O obographs), update output_dir to
use FORMAT_EXTENSIONS dict, and add export examples to the
ontology-alignment tutorial. Includes 27 unit tests, 3 doctests,
and roundtrip verification against parse_sssom_tsv.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @cmungall's task —— View job


PR Review: OBO/OWL/SSSOM Ontology Converters

This is a substantial and well-crafted PR that adds comprehensive ontology conversion capabilities to boomer-py. The implementation is impressive in scope and quality.

Review Checklist:

  • Read and understand the PR scope and implementation
  • Analyze core converter implementations (OBO, OWL, SSSOM)
  • Review code quality and architectural decisions
  • Check for potential bugs and security issues
  • Assess test coverage and quality
  • Evaluate performance considerations
  • Review documentation and integration
  • Address Copilot reviewer feedback
  • Provide comprehensive feedback summary

Strengths

Excellent Architecture & Design

  • Clean separation: OBO parser, OWL backend, SSSOM converter are well-separated with clear responsibilities
  • Unified configuration: OntologyConverterConfig and SSSOMConverterConfig provide comprehensive, type-safe configuration
  • Smart dispatch: ontology_to_kb() dispatcher by file extension is elegant and user-friendly
  • Consistent patterns: All converters follow the same path → KB pattern with configurable probabilistic mappings

Robust OBO Parser (ontology_converter.py)

  • Hand-rolled parser with no external dependencies - excellent for maintainability
  • Proper handling of OBO format quirks (comments, qualifiers, stanza types)
  • Comprehensive support for structural axioms and mappings
  • Good regex patterns and string parsing utilities
  • Solid IRI-to-CURIE conversion logic for OWL

Comprehensive SSSOM Support (sssom_converter.py)

  • Full SSSOM TSV format support with YAML metadata parsing
  • Excellent confidence transform system: identity, floor_ceil, rescale transforms with plugin architecture
  • Flexible rule system: MappingRule with prefix matching, probability overrides, and filtering
  • Proper handling of SKOS semantic differences (broadMatch reverses subject/object for subclass relations)

Strong Integration

  • CLI integration via KBLoader with smart format detection
  • Extensive test coverage (242 test functions across converter files)
  • Good documentation with practical examples and configuration guides
  • Pre-executed tutorial notebook demonstrating real-world usage

Security & Performance

  • Safe YAML parsing: Consistently uses yaml.safe_load() throughout
  • Input validation: Pydantic models for all configuration with proper field validation
  • Memory efficient: Streaming parsers, no unnecessary data duplication
  • Reasonable dependencies: py-horned-owl is well-maintained and appropriate for OWL parsing

🔍 Areas for Improvement

Minor Code Quality Issues

ontology_converter.py:330-331

prefix = id_prefix(eid)
hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

Issue: No validation that id_prefix(eid) returns a non-empty prefix. If eid has no colon, this could create invalid disjoint groups.

Suggestion: Add prefix validation:

if ":" in eid:
    prefix = id_prefix(eid)
    if prefix:  # Ensure non-empty prefix
        hard_facts.append(MemberOfDisjointGroup(sub=eid, group=prefix))

sssom_converter.py:425-436 - Confidence parsing

raw_confidence = row.get("confidence", "").strip()
if raw_confidence:
    confidence_val = float(raw_confidence)

Issue: No exception handling for invalid float conversion. Malformed confidence values will crash the converter.

Suggestion: Add error handling:

try:
    confidence_val = float(raw_confidence)
except ValueError:
    continue  # Skip rows with invalid confidence values

ontology_converter.py:460 - OWL AnnotationAssertion handling

The subject_id conversion bypasses the local curie() helper that strips angle brackets, potentially leaking raw bracketed IRIs into the KB.

sssom_converter.py:534 - Disjoint group generation

As noted by Copilot reviewer, disjoint group facts are generated from all parsed rows, even those later filtered out by rules/prefix filters, which could bloat the KB with orphaned groups.

cli.py:567 - Error handling

Only handles FileNotFoundError, but other I/O errors (permissions, encoding) will bypass Click's user-friendly error reporting.

cli.py:618 - Output verbosity

Printing the fully-sorted seed list can become expensive/noisy for large IDS files.

Documentation & Testing

Missing Edge Case Tests

  • OBO parser: No tests for malformed files, very large files, or encoding issues
  • OWL parser: No tests for complex OWL constructs or namespace edge cases
  • SSSOM: No tests for malformed TSV or YAML metadata parsing errors

Configuration Documentation

  • The configuration system is powerful but could benefit from more examples of complex rule scenarios
  • Consider adding validation warnings when confidence transforms produce values outside [0,1]

📋 Security Considerations

Good:

  • Consistent use of yaml.safe_load() prevents code injection
  • Pydantic validation prevents many injection vectors
  • File path handling appears safe

⚠️ Monitor:

  • Large ontology files could potentially cause memory issues, but this is inherent to the problem domain
  • py-horned-owl dependency should be kept updated for security patches

🚀 Performance Notes

  • Hand-rolled OBO parser should be faster than XML-based alternatives
  • Streaming approach in SSSOM parser is memory-efficient
  • Configuration caching could optimize repeated conversions (minor optimization)

📊 Test Coverage Assessment

Excellent coverage with 242 test functions covering:

  • ✅ OBO parsing edge cases and structural axioms
  • ✅ OWL conversion and IRI handling
  • ✅ SSSOM confidence transforms and mapping rules
  • ✅ CLI integration across all formats
  • ✅ Configuration loading and validation

🎯 Overall Assessment

This is high-quality, production-ready code that significantly enhances boomer-py's capabilities. The implementation demonstrates:

  • Strong software engineering practices
  • Comprehensive understanding of ontology formats
  • Excellent API design and user experience
  • Thorough testing and documentation

The minor issues noted above are non-blocking and can be addressed in follow-up PRs if desired.

Recommendation: APPROVE

This PR successfully adds robust ontology conversion capabilities with excellent code quality, comprehensive testing, and strong integration. The few minor improvement opportunities don't detract from the overall excellent implementation.


@cmungall cmungall merged commit 2bb826c into main Mar 4, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants