Completed PRs/Issue
PR Title: PR #20 – Develop branch merge into Main (Branch: develop → main)
PR Type: Integration / Release
Status: Completed
Background
The develop branch had accumulated multiple major features, including expanded
Phenopacket support (genotypes, phenotypes, diseases, measurements, biosamples),
CLI commands (parse-excel, audit-excel, download), and refactored mapping
logic. This PR merged those updates into main to create the first feature-complete,
GA4GH-aligned release of the pipeline.
Scope
Outline
Synchronize develop into main to release a working end-to-end pipeline for
Phenopacket generation from Excel workbooks.
Included/Required
Merged feature branch feature/expand-phenopackets (Feature/expand phenopackets #19 ).
Integrated VariationDescriptor support from exp/attempt_pyphetools (Draft Variant field for the Phenopacket #18 ).
CLI commands:
p6 parse-excel: parse genotype/phenotype and emit JSON phenopackets.
p6 audit-excel: audit workbooks for headers, sheet classification, variant columns.
p6 download: fetch HPO JSON releases.
Expanded data model:
DiseaseRecord, MeasurementRecord, BiosampleRecord.
Refactored DefaultMapper into modular row-level helpers.
Full unit and integration test suite.
Optional
Graceful tolerance for chromosome inputs (chr16 vs 16).
Strict vs non-strict HGVS consistency checks.
Canonicalization of HGVS expressions.
README expanded with new CLI references and examples.
Not included
Full VariationDescriptor gene_context integration (still TODO).
Visualization/dashboard layer.
Privacy/PHI handling or linkage to genomic VCFs.
Technical Plan / Implementation Details
Integration of all work from PRs Draft Variant field for the Phenopacket #18 and Feature/expand phenopackets #19 into main.
Refactored CLI (__main__.py) to support audit, parse, and download workflows.
Mapper API stabilized: apply_mapping returns list[Phenopacket].
Row-level parsing: genotype, phenotype, disease, measurement, biosample.
Expanded loader (RENAME_MAP) with new standardized fields.
Updated README with installation, CLI reference, and quickstart.
Tests added:
tests/test_full_features.py
tests/test_cli_audit_excel.py
tests/test_download_mock.py
Unit tests for row-level mappers and HGVS consistency.
Validation & Testing
All tests passed in CI (unit + integration).
test_full_features_parse_creates_all_blocks confirmed expanded JSON output.
test_audit_excel_* validated CLI audit outputs (table + JSON).
Mocked network calls for download ensured offline reproducibility.
HGVS consistency checked in both strict and forgiving modes.
Milestones
Outcome
main now contains a stable, feature-complete pipeline for generating
GA4GH Phenopackets from Excel workbooks.
CLI supports auditing, parsing, and ontology downloads.
All five record types (genotypes, phenotypes, diseases, measurements, biosamples)
are serialized into compliant JSON outputs.
Project is ready for downstream adoption and further incremental releases.
Completed PRs/Issue
PR Title: PR #20 – Develop branch merge into Main (Branch: develop → main)
PR Type: Integration / Release
Status: Completed
Background
The
developbranch had accumulated multiple major features, including expandedPhenopacket support (genotypes, phenotypes, diseases, measurements, biosamples),
CLI commands (
parse-excel,audit-excel,download), and refactored mappinglogic. This PR merged those updates into
mainto create the first feature-complete,GA4GH-aligned release of the pipeline.
Scope
Outline
Synchronize
developintomainto release a working end-to-end pipeline forPhenopacket generation from Excel workbooks.
Included/Required
feature/expand-phenopackets(Feature/expand phenopackets #19).exp/attempt_pyphetools(Draft Variant field for the Phenopacket #18).p6 parse-excel: parse genotype/phenotype and emit JSON phenopackets.p6 audit-excel: audit workbooks for headers, sheet classification, variant columns.p6 download: fetch HPO JSON releases.DiseaseRecord,MeasurementRecord,BiosampleRecord.DefaultMapperinto modular row-level helpers.Optional
chr16vs16).Not included
Technical Plan / Implementation Details
__main__.py) to support audit, parse, and download workflows.apply_mappingreturnslist[Phenopacket].RENAME_MAP) with new standardized fields.tests/test_full_features.pytests/test_cli_audit_excel.pytests/test_download_mock.pyValidation & Testing
test_full_features_parse_creates_all_blocksconfirmed expanded JSON output.test_audit_excel_*validated CLI audit outputs (table + JSON).downloadensured offline reproducibility.Milestones
Outcome
mainnow contains a stable, feature-complete pipeline for generatingGA4GH Phenopackets from Excel workbooks.
are serialized into compliant JSON outputs.