Completed PRs/Issue
PR Title: PR #19 – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop)
PR Type: Feature
Status: Completed
Background
The pipeline initially produced Phenopackets with only genotype and phenotype
blocks. To fully comply with GA4GH Phenopacket v2, we needed to expand support
for diseases, measurements, and biosamples. This PR also aligned the CLI and
DefaultMapper to serialize richer data structures into each output file.
Scope
Outline
Extend the mapper and CLI to capture diseases, measurements, and biosamples in
Phenopacket JSON outputs.
Included/Required
- Added
DiseaseRecord, MeasurementRecord, and BiosampleRecord dataclasses.
- Extended loader (
RENAME_MAP) to recognize new columns.
- Updated
DefaultMapper.apply_mapping to detect and map disease/measurement/
biosample tables.
- CLI
parse-excel now serializes these blocks into phenopacket JSON.
- Group records by patient across all five record types.
- Integration test
test_full_features_parse_creates_all_blocks.
- Updated README with new CLI commands and audit-excel reference.
- Refactored
DefaultMapper into modular row-level helpers.
- Added audit improvements and verbose reporting.
Optional
- Graceful CLI tolerance for chromosome input (
chr16 vs 16).
- Canonical HGVS emitted without redundant
chr prefix.
- Tests for alias-based sheet selection (
variants, hpo, labs).
Not included
- VariationDescriptor
gene_context and HGVS expression integration (left as TODO).
- No visualization/dashboard layer.
Technical Plan / Implementation Details
- New files:
src/P6/disease.py, src/P6/measurement.py, src/P6/biosample.py.
- Loader extended with mappings for disease, measurement, biosample fields.
DefaultMapper.apply_mapping now returns list[Phenopacket] instead of tuples.
- Row-level parsing split into
_map_genotype_table, _map_phenotype_table,
_map_diseases_table, _map_measurements_table, _map_biosamples_table.
_group_records_by_patient aggregates all record types before serialization.
- CLI integration:
p6 parse-excel → writes phenopackets with all supported blocks.
p6 audit-excel → improved audit with header normalization, sheet
classification, and variant checks.
- Tests added:
tests/test_full_features.py
tests/test_mapper_* (row parsing, required column checks, HGVS consistency).
- Utility helpers + audit/preprocess validation.
Validation & Testing
- Integration tests confirmed that diseases, measurements, and biosamples appear
in output phenopacket JSON.
- Unit tests for row-level mapping of genotype, phenotype, disease, measurement,
and biosample tables.
- CLI tested with both table and JSON audit outputs.
- Network calls mocked in
test_download_mock.py for HPO fetch.
- Mapper tested on strict vs non-strict HGVS consistency.
Milestones
Outcome
- Phenopackets now support diseases, measurements, and biosamples alongside
genotypes and phenotypes.
- CLI users can parse richer Excel inputs and produce GA4GH-compliant JSON.
- Expanded tests and refactoring increased maintainability of the mapping layer.
Completed PRs/Issue
PR Title: PR #19 – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop)
PR Type: Feature
Status: Completed
Background
The pipeline initially produced Phenopackets with only genotype and phenotype
blocks. To fully comply with GA4GH Phenopacket v2, we needed to expand support
for diseases, measurements, and biosamples. This PR also aligned the CLI and
DefaultMapper to serialize richer data structures into each output file.
Scope
Outline
Extend the mapper and CLI to capture diseases, measurements, and biosamples in
Phenopacket JSON outputs.
Included/Required
DiseaseRecord,MeasurementRecord, andBiosampleRecorddataclasses.RENAME_MAP) to recognize new columns.DefaultMapper.apply_mappingto detect and map disease/measurement/biosample tables.
parse-excelnow serializes these blocks into phenopacket JSON.test_full_features_parse_creates_all_blocks.DefaultMapperinto modular row-level helpers.Optional
chr16vs16).chrprefix.variants,hpo,labs).Not included
gene_contextand HGVS expression integration (left as TODO).Technical Plan / Implementation Details
src/P6/disease.py,src/P6/measurement.py,src/P6/biosample.py.DefaultMapper.apply_mappingnow returnslist[Phenopacket]instead of tuples._map_genotype_table,_map_phenotype_table,_map_diseases_table,_map_measurements_table,_map_biosamples_table._group_records_by_patientaggregates all record types before serialization.p6 parse-excel→ writes phenopackets with all supported blocks.p6 audit-excel→ improved audit with header normalization, sheetclassification, and variant checks.
tests/test_full_features.pytests/test_mapper_*(row parsing, required column checks, HGVS consistency).Validation & Testing
in output phenopacket JSON.
and biosample tables.
test_download_mock.pyfor HPO fetch.Milestones
Outcome
genotypes and phenotypes.