Completed PRs/Issue
PR Title: PR #19 – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop)
PR Type: Feature
Status: Completed
Background
The pipeline originally generated Phenopackets with only genotype and phenotype
blocks. To meet GA4GH v2 standards and enrich the interpretations field, this PR
expanded support for diseases, measurements, and biosamples, while also
refactoring genotype handling to use GA4GH VariationDescriptor enriched with
gene, zygosity, inheritance, and transcript-level context.
Scope
Outline
Expand Phenopacket generation to include additional record types and integrate
robust VariantValidator lookups for genotypes.
Included/Required
- Added mapping of HGNC gene symbols into
VariationDescriptor.gene_context.
- Added zygosity → GA4GH GENO
allelicState mapping.
- Added inheritance parsing (captured in
Genotype dataclass).
- Updated
DefaultMapper._add_genotype_interpretations to use
Genotype.to_variation_descriptor().
- Integrated new module
vv_lookup.py for VariantValidator gene/transcript lookups.
- CLI emits VariationDescriptor with expressions, allelicState, geneContext.
- Deprecated
hpo-toolkit (version conflicts) and added pyphetools.
- README refinements (venv/conda setup).
- Requirements updated (
pyphetools, pinned hpo-toolkit 0.5.5).
Optional
- Deduplication of HGVS expressions.
- Ruff linting and readability refactors.
- Improved exception handling (explicit guards instead of blanket try/except).
Not included
- No visualization or dashboard.
- No changes to PHI handling or genomic linkage (left for future work).
Technical Plan / Implementation Details
src/P6/genotype.py:
- Implemented
Genotype.to_variation_descriptor().
- Added validation for patient ID, email, chromosome encodings, zygosity, inheritance.
- Local descriptor fallback when VV unavailable.
- Deduplication of HGVS expressions.
src/P6/mapper.py:
- Delegated VariationDescriptor creation to
Genotype.
- Preserved phenotype, disease, measurement, biosample mapping behavior.
src/P6/vv_lookup.py (new):
- Queries VariantValidator REST API.
- Normalizes responses (HGNC ID, Ensembl gene ID, transcripts).
- Adds retry/backoff and
VVLookupError.
__main__.py:
- CLI integrates new VariationDescriptor pipeline.
- Explicit file checks for HPO JSON (Ruff BLE001 compliance).
- README & requirements updated for environment setup and new dependencies.
Validation & Testing
- Verified VariationDescriptor enrichment on sample Excel workbooks.
- Deduplication tests ensured no duplicate HGVS expressions.
- Ruff linting enforced for exception handling compliance.
- CI checks passed (2/2).
- Integration tests confirmed expanded phenopackets include all new fields.
Milestones
Outcome
- Phenopackets now include diseases, measurements, and biosamples in addition
to genotypes and phenotypes.
- Genotypes enriched with VariationDescriptor (HGVS expressions, allelicState,
geneContext).
- CLI supports expanded output with robust ontology and variant context.
- Codebase hardened with better exception handling, VV enrichment, and modular design.
Completed PRs/Issue
PR Title: PR #19 – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop)
PR Type: Feature
Status: Completed
Background
The pipeline originally generated Phenopackets with only genotype and phenotype
blocks. To meet GA4GH v2 standards and enrich the
interpretationsfield, this PRexpanded support for diseases, measurements, and biosamples, while also
refactoring genotype handling to use GA4GH
VariationDescriptorenriched withgene, zygosity, inheritance, and transcript-level context.
Scope
Outline
Expand Phenopacket generation to include additional record types and integrate
robust VariantValidator lookups for genotypes.
Included/Required
VariationDescriptor.gene_context.allelicStatemapping.Genotypedataclass).DefaultMapper._add_genotype_interpretationsto useGenotype.to_variation_descriptor().vv_lookup.pyfor VariantValidator gene/transcript lookups.hpo-toolkit(version conflicts) and addedpyphetools.pyphetools, pinnedhpo-toolkit0.5.5).Optional
Not included
Technical Plan / Implementation Details
src/P6/genotype.py:Genotype.to_variation_descriptor().src/P6/mapper.py:Genotype.src/P6/vv_lookup.py(new):VVLookupError.__main__.py:Validation & Testing
Milestones
vv_lookup.pymodule for gene/transcript xrefs.Outcome
to genotypes and phenotypes.
geneContext).