- Outline
- Repo Contents
- System Requirements
- Setup
- Installation Instructions
- Running PRScope
- PRScope tested with
- Pipeline Description
- Demo
PRScope automatically generates all the polygenic scores (PGS) associated with selected ontology IDs (e.g., Experimental Factor Ontology, EFO) for a given genotype dataset.
- Ontology IDs defining the PGS of interest
- Genotype data to calculate the PGS
- A dataset (multi-PGS matrix) containing:
- Subject IDs from the genotype data
- Values of all selected PGS
The setup is optimized for a minimal-effort "vanilla use case" but supports advanced configurations.
- config: config files for parameter specification.
- input: genotype-, reference files and PRS trait specification.
- main: contains pipeline and scripts for PRS calculation, biotype identification and tools.
- output: intermediate and final results are saved here.
PRScope requires only a standard computer with enough RAM (4GB) to support the in-memory operations, but for best performance, we suggest a computer with higher specifications:
RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core
The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 cores each 3.3 GHz) and internet of speed 100 Mbps.
PRScope requires the following:
Only conda must be installed manually. All other dependencies are managed via the Conda environment.
PRScope has been tested on the Ubuntu 22.04.5 and requires a Linux system.
git clone https://github.com/transbioZI/PRScope
tar -xf reference.tar.gz
Place them into this folder input/reference/
:
-
The folder should now include:
ldpred2_ref/
eur_hg38.phase3.bed
eur_hg38.phase3.bim
eur_hg38.phase3.fam
eur_hg38.phase3.frq
conda create -c conda-forge -c bioconda -n snakemake snakemake python=3.12.1
- Environment name:
snakemake
- Wait for installation to complete (~15 minutes)
conda activate snakemake
cd PRScope
./run.sh
This command initiates the PRScope pipeline.
- Wait for installation to complete conda environment
- May take up to an hour
conda version : 23.1.0
snakemake version : 8.4.8
python : 3.12.1
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
LC_CTYPE=C.UTF-8, LC_NUMERIC=C, LC_TIME=C.UTF-8, LC_COLLATE=C.UTF-8, LC_MONETARY=C.UTF-8, LC_MESSAGES=C.UTF-8, LC_PAPER=C.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=C.UTF-8, LC_IDENTIFICATION=C
time zone: Europe/Berlin
tzcode source: system (glibc)
attached base packages:
parallel, stats, graphics, grDevices, utils, datasets, methods, base
other attached packages:
reshape2_1.4.4, cluster_2.1.7, xgboost_1.7.8.1, bigutilsr_0.3.4, reshape_0.8.9, ggpubr_0.6.0,doParallel_1.0.17, iterators_1.0.14, foreach_1.5.2, glmnet_4.1-8, Matrix_1.7-1, lubridate_1.9.4, forcats_1.0.0, purrr_1.0.2, readr_2.1.5, tidyr_1.3.1, tibble_3.2.1, tidyverse_2.0.0, data.table_1.16.4, dplyr_1.1.4, gwasrapidd_0.99.17, caret_7.0-1, lattice_0.22-6, ranger_0.17.0, stringr_1.5.1, ggplot2_3.5.1, fmsb_0.7.6, optparse_1.7.5, tidyselect_1.2.1, timeDate_4041.110, bigassertr_0.1.6, pROC_1.18.5, digest_0.6.37, rpart_4.1.23,timechange_0.3.0, lifecycle_1.0.4, survival_3.7-0, magrittr_2.0.3, compiler_4.4.1, rlang_1.1.4, tools_4.4.1, utf8_1.2.4, ggsignif_0.6.4, plyr_1.8.9, abind_1.4-8, withr_3.0.2, nnet_7.3-19, grid_4.4.1, stats4_4.4.1, fansi_1.0.6, colorspace_2.1-1, future_1.34.0, globals_0.16.3, scales_1.3.0, MASS_7.3-61, cli_3.6.3, generics_0.1.3, RSpectra_0.16-2, rstudioapi_0.17.1, future.apply_1.11.3, tzdb_0.4.0, getopt_1.20.4, splines_4.4.1, vctrs_0.6.5, hardhat_1.4.0, jsonlite_1.8.9, carData_3.0-5, car_3.1-3, hms_1.1.3, rstatix_0.7.2, Formula_1.2-5, listenv_0.9.1, gower_1.0.1, recipes_1.1.0, glue_1.8.0,parallelly_1.40.1, codetools_0.2-20, stringi_1.8.4, gtable_0.3.6, shape_1.4.6.1, munsell_0.5.1, pillar_1.9.0, ipred_0.9-15, lava_1.8.0, R6_2.5.1, backports_1.5.0, broom_1.0.7, class_7.3-22, Rcpp_1.0.13-1, nlme_3.1-166, prodlim_2024.06.25, ModelMetrics_1.2.2.2, pkgconfig_2.0.3
The following pipelines can be found in the main/snakefiles/
directory:
find_sumstats.snakefile
– Selection of summary statistics for specified EFO IDsqc_sumstats.snakefile
– Quality control of the selected summary statisticsqc_genotype_with_liftover.snakefile
– Quality control of genotype data with liftoverqc_genotype.snakefile
– Quality control of genotype dataprs_calculation_prsice.snakefile
– For PRS calculation using PRSiceprs_calculation_ldpred.snakefile
– For PRS calculation using LDpredldsc_heritability_calculation.snakefile
– Heritability calculation
cd PRScope
config/
– For advanced parameter customizationinput/
– The only folder requiring user modificationsmain/
– Contains the main pipelineoutput/
– Will contain output after pipeline execution
-
Navigate to
input/
-
Edit
efo_ids.txt
:- Default content:
EFO_0003898
- Replace with your own EFO IDs as needed
- Default content:
-
In
input/genotype/
, you’ll find:EUR.bed
EUR.bim
EUR.fam
(Replace with your genotype data if desired)
-
Running PRScope
cd PRScope
./run.sh
- Expected Outcome
output/gwas_list/gwas_search.txt
- GWAS list meeting the criteria for use in PGS calculation.output/qced_gwas/GCST*
- GWAS ingwas_search.txt
, downloaded and preprocessed, ready for PGS calculation.output/qced_genotype/corrected[hg19,hg38]
- EUR.FINAL.* - preprocessed version of the simulated genotype data ininput/genotype/
.output/calculated_pgs_prsice/
all calculated PGS of GWAS in the listgwas_search.txt
as data tablepgs_datatable_prsice_100.tsv
. The suffix_100
means, the min number of SNPs used to calculate a PGS.
[1] HannahVMeyer. Meyer-Lab-cshl/plinkQC: plinkQC 0.3.2. (Zenodo, 2020). 976 doi:10.5281/ZENODO.3934294.
[2] Choi, S. W. & O’Reilly, P. F. PRSice-2: Polygenic Risk Score software for biobank-scale 906 data. GigaScience 8, (2019).
[3] Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 913 36, 5424–5431 (2021).
[4] https://github.com/sritchie73/liftOverPlink
For usage of the PRScope and associated manuscript, please cite according to the enclosed citation.bib.