A comprehensive evaluation of continuous pathogenicity scores versus established binary classifiers using experimental (DMS) and clinical (ClinVar/Humsavar) datasets.
├── README.md
├── environment.yml
├── data/
│ └── ClinHum_ALL_preds_cleaned.csv
├── notebooks/
│ ├── quantitative_preds_remaining.ipynb
│ ├── preanalysis.ipynb
│ ├── block1_part1_analysis.ipynb
│ ├── block1_part2_analysis.ipynb
│ ├── block2_analysis.ipynb
│ └── block3_analysis.ipynb
├── results/
│ ├── figures/
│ └── tables/
└── SupplementaryMaterials.pdfThis project benchmarks five continuous predictors (CADD, MetaRNN, Envision, QAFI, EVE) against four binary classifiers (AlphaMissense, BayesDel, REVEL, VEST4). Analyses include:
- Correlation with DMS functional scores
- ROC-based clinical discrimination on ∼47,000 ClinVar/Humsavar variants
- Platt-scaling calibration and threshold optimization
- Mapping to ACMG/AMP evidence categories (Pejaver et al. (2022) workflow)
-
ClinVar and Humsavar cleaning
Preprocessing notebooks and raw data are maintained in separate repositories:Each contains detailed Zenodo links for the source datasets.
-
Quantitative predictor merge
Precomputed scores for tools not available via VEP are processed indata/quantitative_preds_remaining/. -
Merged clinical dataset
data/ClinHum_ALL_preds_cleaned.csvcontains all filtered variants with raw and calibrated scores. -
DMS data
Residue-level functional scores were extracted from Beltran et al. (2025). See their publication for raw data access; processed inputs are not stored here.
conda env create -f environment.yml && conda activate continuous-pathogenicityExecute the notebooks in order:
block1_part1_analysis.ipynbblock1_part2_analysis.ipynbblock2_analysis.ipynbblock3_analysis.ipynb
Each notebook reads from data/ClinHum_ALL_preds_cleaned.csv and outputs figures into results/figures/ and tables into results/tables/.
Final figures and tables for the manuscript are stored under results/figures/ and results/tables/. Supplementary materials (additional tables, KDE plots, Brier scores, etc.) are in the same folders, labeled with “S” prefixes.