Feature/expand phenopackets

## Completed PRs/Issue

PR Title: PR #19  – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop)
PR Type: Feature
Status: Completed
================================================================================

Background
----------
The pipeline initially produced Phenopackets with only genotype and phenotype
blocks. To fully comply with GA4GH Phenopacket v2, we needed to expand support
for **diseases, measurements, and biosamples**. This PR also aligned the CLI and
DefaultMapper to serialize richer data structures into each output file.

--------------------------------------------------------------------------------
Scope
--------------------------------------------------------------------------------
### Outline
Extend the mapper and CLI to capture diseases, measurements, and biosamples in
Phenopacket JSON outputs.

### Included/Required
- Added `DiseaseRecord`, `MeasurementRecord`, and `BiosampleRecord` dataclasses.
- Extended loader (`RENAME_MAP`) to recognize new columns.
- Updated `DefaultMapper.apply_mapping` to detect and map disease/measurement/
  biosample tables.
- CLI `parse-excel` now serializes these blocks into phenopacket JSON.
- Group records by patient across all five record types.
- Integration test `test_full_features_parse_creates_all_blocks`.
- Updated README with new CLI commands and audit-excel reference.
- Refactored `DefaultMapper` into modular row-level helpers.
- Added audit improvements and verbose reporting.

### Optional
- Graceful CLI tolerance for chromosome input (`chr16` vs `16`).
- Canonical HGVS emitted without redundant `chr` prefix.
- Tests for alias-based sheet selection (`variants`, `hpo`, `labs`).

### Not included
- VariationDescriptor `gene_context` and HGVS expression integration (left as TODO).
- No visualization/dashboard layer.

--------------------------------------------------------------------------------
Technical Plan / Implementation Details
--------------------------------------------------------------------------------
- New files: `src/P6/disease.py`, `src/P6/measurement.py`, `src/P6/biosample.py`.
- Loader extended with mappings for disease, measurement, biosample fields.
- `DefaultMapper.apply_mapping` now returns `list[Phenopacket]` instead of tuples.
- Row-level parsing split into `_map_genotype_table`, `_map_phenotype_table`,
  `_map_diseases_table`, `_map_measurements_table`, `_map_biosamples_table`.
- `_group_records_by_patient` aggregates all record types before serialization.
- CLI integration:
  * `p6 parse-excel` → writes phenopackets with all supported blocks.
  * `p6 audit-excel` → improved audit with header normalization, sheet
    classification, and variant checks.
- Tests added:
  * `tests/test_full_features.py`
  * `tests/test_mapper_*` (row parsing, required column checks, HGVS consistency).
  * Utility helpers + audit/preprocess validation.

--------------------------------------------------------------------------------
Validation & Testing
--------------------------------------------------------------------------------
- Integration tests confirmed that diseases, measurements, and biosamples appear
  in output phenopacket JSON.
- Unit tests for row-level mapping of genotype, phenotype, disease, measurement,
  and biosample tables.
- CLI tested with both table and JSON audit outputs.
- Network calls mocked in `test_download_mock.py` for HPO fetch.
- Mapper tested on strict vs non-strict HGVS consistency.

--------------------------------------------------------------------------------
Milestones
--------------------------------------------------------------------------------
- [x] Add disease/measurement/biosample dataclasses.
- [x] Extend loader and RENAME_MAP for new fields.
- [x] Update DefaultMapper to map new sheet types.
- [x] Update CLI parse-excel to emit expanded phenopackets.
- [x] Add full integration tests.
- [x] Refactor DefaultMapper into modular components.

--------------------------------------------------------------------------------
Outcome
--------------------------------------------------------------------------------
- Phenopackets now support **diseases, measurements, and biosamples** alongside
  genotypes and phenotypes.
- CLI users can parse richer Excel inputs and produce GA4GH-compliant JSON.
- Expanded tests and refactoring increased maintainability of the mapping layer.  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/expand phenopackets #24

Completed PRs/Issue

PR Title: PR #19 – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop)
PR Type: Feature
Status: Completed

Background

Scope

Outline

Included/Required

Optional

Not included

Technical Plan / Implementation Details

Validation & Testing

Milestones

Outcome

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature/expand phenopackets #24

Description

Completed PRs/Issue

PR Title: PR #19 – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop) PR Type: Feature Status: Completed

Background

Scope

Outline

Included/Required

Optional

Not included

Technical Plan / Implementation Details

Validation & Testing

Milestones

Outcome

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

PR Title: PR #19 – Feature/expand phenopackets (Branch: feature/expand-phenopackets → develop)
PR Type: Feature
Status: Completed