Skip to content

Develop branch merge into Main #30

@VarenyaJ

Description

@VarenyaJ

Completed PRs/Issue

PR Title: PR #20 – Develop branch merge into Main (Branch: develop → main)
PR Type: Integration / Release
Status: Completed

Background

The develop branch had accumulated multiple major features, including expanded
Phenopacket support (genotypes, phenotypes, diseases, measurements, biosamples),
CLI commands (parse-excel, audit-excel, download), and refactored mapping
logic. This PR merged those updates into main to create the first feature-complete,
GA4GH-aligned release of the pipeline.


Scope

Outline

Synchronize develop into main to release a working end-to-end pipeline for
Phenopacket generation from Excel workbooks.

Included/Required

  • Merged feature branch feature/expand-phenopackets (Feature/expand phenopackets #19).
  • Integrated VariationDescriptor support from exp/attempt_pyphetools (Draft Variant field for the Phenopacket #18).
  • CLI commands:
    • p6 parse-excel: parse genotype/phenotype and emit JSON phenopackets.
    • p6 audit-excel: audit workbooks for headers, sheet classification, variant columns.
    • p6 download: fetch HPO JSON releases.
  • Expanded data model:
    • DiseaseRecord, MeasurementRecord, BiosampleRecord.
  • Refactored DefaultMapper into modular row-level helpers.
  • Full unit and integration test suite.

Optional

  • Graceful tolerance for chromosome inputs (chr16 vs 16).
  • Strict vs non-strict HGVS consistency checks.
  • Canonicalization of HGVS expressions.
  • README expanded with new CLI references and examples.

Not included

  • Full VariationDescriptor gene_context integration (still TODO).
  • Visualization/dashboard layer.
  • Privacy/PHI handling or linkage to genomic VCFs.

Technical Plan / Implementation Details

  • Integration of all work from PRs Draft Variant field for the Phenopacket #18 and Feature/expand phenopackets #19 into main.
  • Refactored CLI (__main__.py) to support audit, parse, and download workflows.
  • Mapper API stabilized: apply_mapping returns list[Phenopacket].
  • Row-level parsing: genotype, phenotype, disease, measurement, biosample.
  • Expanded loader (RENAME_MAP) with new standardized fields.
  • Updated README with installation, CLI reference, and quickstart.
  • Tests added:
    • tests/test_full_features.py
    • tests/test_cli_audit_excel.py
    • tests/test_download_mock.py
    • Unit tests for row-level mappers and HGVS consistency.

Validation & Testing

  • All tests passed in CI (unit + integration).
  • test_full_features_parse_creates_all_blocks confirmed expanded JSON output.
  • test_audit_excel_* validated CLI audit outputs (table + JSON).
  • Mocked network calls for download ensured offline reproducibility.
  • HGVS consistency checked in both strict and forgiving modes.

Milestones


Outcome

  • main now contains a stable, feature-complete pipeline for generating
    GA4GH Phenopackets from Excel workbooks.
  • CLI supports auditing, parsing, and ontology downloads.
  • All five record types (genotypes, phenotypes, diseases, measurements, biosamples)
    are serialized into compliant JSON outputs.
  • Project is ready for downstream adoption and further incremental releases.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions