Skip to content

Feature/expand phenopackets #31

@VarenyaJ

Description

@VarenyaJ

Completed PRs/Issue

PR Title: PR #19 – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop)
PR Type: Feature
Status: Completed

Background

The pipeline originally generated Phenopackets with only genotype and phenotype
blocks. To meet GA4GH v2 standards and enrich the interpretations field, this PR
expanded support for diseases, measurements, and biosamples, while also
refactoring genotype handling to use GA4GH VariationDescriptor enriched with
gene, zygosity, inheritance, and transcript-level context.


Scope

Outline

Expand Phenopacket generation to include additional record types and integrate
robust VariantValidator lookups for genotypes.

Included/Required

  • Added mapping of HGNC gene symbols into VariationDescriptor.gene_context.
  • Added zygosity → GA4GH GENO allelicState mapping.
  • Added inheritance parsing (captured in Genotype dataclass).
  • Updated DefaultMapper._add_genotype_interpretations to use
    Genotype.to_variation_descriptor().
  • Integrated new module vv_lookup.py for VariantValidator gene/transcript lookups.
  • CLI emits VariationDescriptor with expressions, allelicState, geneContext.
  • Deprecated hpo-toolkit (version conflicts) and added pyphetools.
  • README refinements (venv/conda setup).
  • Requirements updated (pyphetools, pinned hpo-toolkit 0.5.5).

Optional

  • Deduplication of HGVS expressions.
  • Ruff linting and readability refactors.
  • Improved exception handling (explicit guards instead of blanket try/except).

Not included

  • No visualization or dashboard.
  • No changes to PHI handling or genomic linkage (left for future work).

Technical Plan / Implementation Details

  • src/P6/genotype.py:
    • Implemented Genotype.to_variation_descriptor().
    • Added validation for patient ID, email, chromosome encodings, zygosity, inheritance.
    • Local descriptor fallback when VV unavailable.
    • Deduplication of HGVS expressions.
  • src/P6/mapper.py:
    • Delegated VariationDescriptor creation to Genotype.
    • Preserved phenotype, disease, measurement, biosample mapping behavior.
  • src/P6/vv_lookup.py (new):
    • Queries VariantValidator REST API.
    • Normalizes responses (HGNC ID, Ensembl gene ID, transcripts).
    • Adds retry/backoff and VVLookupError.
  • __main__.py:
    • CLI integrates new VariationDescriptor pipeline.
    • Explicit file checks for HPO JSON (Ruff BLE001 compliance).
  • README & requirements updated for environment setup and new dependencies.

Validation & Testing

  • Verified VariationDescriptor enrichment on sample Excel workbooks.
  • Deduplication tests ensured no duplicate HGVS expressions.
  • Ruff linting enforced for exception handling compliance.
  • CI checks passed (2/2).
  • Integration tests confirmed expanded phenopackets include all new fields.

Milestones

  • Add VariationDescriptor enrichment (gene, zygosity, inheritance).
  • Implement vv_lookup.py module for gene/transcript xrefs.
  • Update DefaultMapper to delegate VariationDescriptor building.
  • Refactor Genotype with explicit validation and guards.
  • Deduplicate HGVS expressions.
  • Update README and requirements.

Outcome

  • Phenopackets now include diseases, measurements, and biosamples in addition
    to genotypes and phenotypes.
  • Genotypes enriched with VariationDescriptor (HGVS expressions, allelicState,
    geneContext).
  • CLI supports expanded output with robust ontology and variant context.
  • Codebase hardened with better exception handling, VV enrichment, and modular design.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions