Skip to content

Commit de056c2

Browse files
authored
Merge pull request #16 from VarenyaJ/develop
Pull latest features
2 parents 409bce2 + a084a9a commit de056c2

24 files changed

Lines changed: 4447 additions & 151 deletions

README.md

Lines changed: 37 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# P6
22
**Peter's Parse and Processing of Prenatal Particulars via Pandas**
33

4-
A simple, extensible CLI for downloading the Human Phenotype Ontology, parsing genotype/phenotype Excel workbooks, and producing [GA4GH Phenopackets](https://phenopacket-schema.readthedocs.io/en/latest/schema.html#version-2-0) as specified [here](https://phenopacket-schema.readthedocs.io/_/downloads/en/stable/pdf/).
4+
A simple, extensible CLI for downloading the Human Phenotype Ontology, parsing genotype/phenotype Excel workbooks, and producing [GA4GH Phenopackets](https://phenopacket-schema.readthedocs.io/en/latest/schema.html#version-2-0) as specified [here](https://phenopacket-schema.readthedocs.io/_/downloads/en/stable/pdf/). This project enables downloading the latest or specified Human Phenotype Ontology (HPO) JSON release, auto-classifying Excel sheets as genotype or phenotype data, normalizing column names and HPO IDs, and writing one Phenopacket per record. Additional commands provide quick auditing of workbooks for header normalization, sheet classification, and required variant columns. Built for easy integration and reproducibility, P6 supports rapid phenotypic data preparation for research and clinical workflows, and runs locally with simple installation via pip. The end usage of this project is to convert an existing digital record of phenotypic data into phenopackets, such that they may be linked to their corresponding VCFs and used to integrate with a larger federated repository system.
55

66
## Table of Contents
77

@@ -10,10 +10,12 @@ A simple, extensible CLI for downloading the Human Phenotype Ontology, parsing g
1010
3. [Installation](#installation)
1111
4. [Quickstart](#quickstart)
1212
- [Download HPO JSON](#download-hpo-json)
13-
- [Parse Excel to Phenopackets](#parse-excel-to-phenopackets)
13+
- [Parse Excel to Phenopackets](#parse-excel-to-phenopackets)
14+
- [Audit Excel Workbooks](#audit-excel-workbooks)
1415
5. [CLI Reference](#cli-reference)
1516
- [`p6 download`](#p6-download)
1617
- [`p6 parse-excel`](#p6-parse-excel)
18+
- [`p6 audit-excel`](#p6-audit-excel)
1719
6. [Development & Testing](#development--testing)
1820
7. [Contributing](#contributing)
1921
8. [License](#license)
@@ -94,18 +96,32 @@ Resulting phenopacket files will be under:
9496
phenopacket_from_excel/$(date "+%Y-%m-%d_%H-%M-%S")/phenopackets/
9597
```
9698
99+
### Audit Excel Workbooks
100+
101+
Quickly check each sheet in an Excel file for header normalization, sheet classification, and presence of required variant columns.
102+
```bash
103+
p6 audit-excel -e tests/data/Sydney_Python_transformation.xlsx
104+
```
105+
106+
By default you get a table; use `-r` for a JSON output to the console.
107+
```bash
108+
p6 audit-excel -e tests/data/Sydney_Python_transformation.xlsx -r
109+
```
110+
97111
## CLI Reference
98112
99113
### p6 download
100114
101115
Usage:
102116
```markdown
103117
p6 download [OPTIONS]
118+
```
104119
105120
Options:
106-
-d, --data-path PATH where to save HPO JSON (default: tests/data)
107-
-v, --hpo-version TEXT exact HPO release tag (e.g. 2025-03-03 or v2025-03-03)
108-
--help Show this help message and exit.
121+
```markdown
122+
-d, --data-path PATH where to save HPO JSON (default: tests/data)
123+
-v, --hpo-version TEXT exact HPO release tag (e.g. 2025-03-03 or v2025-03-03)
124+
--help Show this help message and exit.
109125
```
110126
111127
Examples:
@@ -130,9 +146,9 @@ Usage: `p6 parse-excel [OPTIONS] EXCEL_FILE`
130146
131147
Options:
132148
```markdown
133-
-e, --excel-path FILE path to the Excel workbook [required]
134-
-hpo, --custom-hpo FILE path to a custom HPO JSON file (defaults to `tests/data/hp.json`)
135-
--help Show this message and exit.
149+
-e, --excel-path FILE path to the Excel workbook [required]
150+
-hpo, --custom-hpo FILE path to a custom HPO JSON file (defaults to `tests/data/hp.json`)
151+
--help Show this message and exit.
136152
```
137153
138154
Example:
@@ -142,6 +158,19 @@ Explicitly point at a custom HPO file:
142158
p6 parse-excel -e tests/data/Sydney_Python_transformation.xlsx -hpo src/P6/hp.json
143159
```
144160
161+
### p6 audit-excel
162+
163+
Run a lightweight audit on each sheet in an Excel workbook, reporting header counts, sheet classification, and missing variant‐column checks.
164+
165+
Usage: `p6 audit-excel [OPTIONS] EXCEL_FILE`
166+
167+
Options:
168+
```markdown
169+
-e, --excel-path FILE path to the Excel workbook [required]
170+
-r, --report-json output audit report as JSON instead of table
171+
--help Show this message and exit.
172+
```
173+
145174
## Development & Testing
146175
147176
Install dev requirements:

0 commit comments

Comments
 (0)