P6

Peter's Parse and Processing of Prenatal Particulars via Pandas

A simple, extensible CLI for downloading the Human Phenotype Ontology, parsing genotype/phenotype Excel workbooks, and producing GA4GH Phenopackets as specified here.

Features
Prerequisites
Installation
Quickstart
CLI Reference
Development & Testing
Contributing
License
Contact

Features

Download: fetch the latest or a specific hp.json release from GitHub
Parse: autodetect genotype vs phenotype sheets in any Excel workbook
Normalize: clean up column names, HPO IDs, timestamps, and data types
Generate: emit individual Phenopacket files, one per record (will change the file extension later)

Installation

Clone the repo:

git clone https://github.com/VarenyaJ/P6.git
cd P6

(Recommended) Create a virtual environment (venv or Conda):

=== Simple Venv setup ===

python3 -m venv .venv
source .venv/bin/activate

=== or with Conda ===

conda env create -f requirements/environment.yml -y
conda activate P6

Install via pip:

python3 -m pip install -r requirements/requirements.txt .

Verify the installation:

p6 --help

You should see something like:

Usage: p6 [OPTIONS] COMMAND [ARGS]...

  P6: Peter's Parse and Processing of Prenatal Particulars via Pandas.

Options:
  --help  Show this message and exit.

Commands:
  download    Download a specific or the latest HPO JSON release into...
  parse-excel Read each sheet, check column order, then: - Identify as a...

Quickstart

Download HPO JSON

Fetch the latest release into tests/data/ (the default directory):

p6 download

After running, you’ll have tests/data/hp.json.

Parse Excel to Phenopackets

With your HPO JSON in place at tests/data/hp.json, run:

p6 parse-excel -e tests/data/Sydney_Python_transformation.xlsx

Resulting phenopacket files will be under:

phenopacket_from_excel/$(date "+%Y-%m-%d_%H-%M-%S")/phenopackets/

Audit Excel Workbooks

Quickly check each sheet in an Excel file for header normalization, sheet classification, and presence of required variant columns.

p6 audit-excel -e tests/data/Sydney_Python_transformation.xlsx

By default you get a table; use -r for a JSON output to the console.

p6 audit-excel -e tests/data/Sydney_Python_transformation.xlsx -r

CLI Reference

p6 download

Usage:

p6 download [OPTIONS]

Options:

    -d, --data-path PATH        where to save HPO JSON (default: tests/data)
    -v, --hpo-version TEXT      exact HPO release tag (e.g. 2025-03-03 or v2025-03-03)
    --help                      Show this help message and exit.

Examples:

Fetch a specific release tag (e.g. v2025-03-03 or 2025-03-03) into tests/data/ (the default directory):

p6 download -v 2025-03-03
p6 download --hpo-version 2025-03-03

Fetch a specific release tag (e.g. v2025-03-03 or 2025-03-03) into a custom directory:

p6 download -d src/P6 -v 2025-03-03
p6 download --data-path src/P6 --hpo-version 2025-03-03

p6 parse-excel

Read an Excel workbook, classify sheets, normalize fields, and emit Phenopacket protobuffers.

Usage: p6 parse-excel [OPTIONS] EXCEL_FILE

Options:

    -e, --excel-path FILE       path to the Excel workbook  [required]
    -hpo, --custom-hpo FILE     path to a custom HPO JSON file (defaults to `tests/data/hp.json`)
    --help                      Show this message and exit.

Example:

Explicitly point at a custom HPO file:

p6 parse-excel -e tests/data/Sydney_Python_transformation.xlsx -hpo src/P6/hp.json

p6 audit-excel

Run a lightweight audit on each sheet in an Excel workbook, reporting header counts, sheet classification, and missing variant‐column checks.

Usage: p6 audit-excel [OPTIONS] EXCEL_FILE

Options:

    -e, --excel-path FILE   path to the Excel workbook  [required]
    -r, --report-json       output audit report as JSON instead of table
    --help                  Show this message and exit.

Development & Testing

Install dev requirements:

python3 -m pip install -r requirements/requirements.txt -r requirements/requirements_test.txt .

This will install P6 along with the dependencies needed for the development.

Run the full test suite:

pytest -q

Lint & type-check (via ruff and built-in assertions):

ruff check .
ruff format .

Contributing

Fork the repo & create a feature branch
Make your changes & add tests
Ensure all tests pass & lint is clean
Submit a pull request against main
Please follow the AGPL-3.0 code of conduct.

License

This project is licensed under the AGPL-3.0. See LICENSE for details.

Contact

Varenya Jain varenyajj@gmail.com GitHub: @VarenyaJ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P6

Table of Contents

Features

Installation

=== Simple Venv setup ===

=== or with Conda ===

Quickstart

Download HPO JSON

Parse Excel to Phenopackets

Audit Excel Workbooks

CLI Reference

p6 download

p6 parse-excel

p6 audit-excel

Development & Testing

Contributing

License

Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

P6

Table of Contents

Features

Installation

=== Simple Venv setup ===

=== or with Conda ===

Quickstart

Download HPO JSON

Parse Excel to Phenopackets

Audit Excel Workbooks

CLI Reference

p6 download

p6 parse-excel

p6 audit-excel

Development & Testing

Contributing

License

Contact