PRS Mix and Single Scoring

A polygenic risk score (PRS) summarizes the chance of developing a certain health condition based on estimated effects of many genetic variants on an individual’s phenotype. In a standard run of the PRS pipeline, a single variant weights file is used to calculate a polygenic risk score. However, to more accurately determine an individual’s score, a mix score can be calculated using multiple variant weights files and a score weights file.

The PRS pipeline in this repo can calculate mix or single scores for multiple health conditions for many individuals. It will calculate the PRS for each provided variant weights file per health condition of interest. If given multiple variant weights files, it will compute the mix score for each health condition by finding the sum of weighted polygenic risk scores associated with the condition. Then, if desired, the PRS mix or single score can then be adjusted using a previously computed ancestry adjustment model and PCA.

Inputs for the PRS pipeline have strict stipulations in order for the pipeline to run successfully. To facilitate successful runs of the pipeline, we have created a WDL for input preparation and a WDL for training ancestry adjustment models that are also within this repo.

Folder Structure

There are three main WDLs in the scripts folder that can be used to run PRS mix and single scoring. Each are described in more detail in their corresponding section of the README. The WDLs are:

PrsInputPrep.wdl
PrsModel.wdl
PrsScoring.wdl

The res folder within this repo contains flowcharts that outline the steps taken in each of the three main PRS WDLs. The flowcharts can also be found in the corresponding WDL's section below.

Example JSON input files for each of the three main PRS WDLs can be found in the examples folder of this repo.

PrsInputPrep WDL

To run the PRS pipeline, input files must be prepared and cleaned to follow format requirements and other pipeline stipulations. The PRS Input Preparation workflow will scrub input files to only include the intersection of variants found in the variant weights files, PCA variants file, reference VCF, and query VCFs. In addition, it will ensure that naming conventions in the input files match those expected by the pipeline.

Below is a flowchart that outlines the workflow step-by-step:

Input files for the PRS Input Preparation WDL include:

Weights files -- Variant weights files for the PRS pipeline must be tab-delimited and contain three columns in the following order: variant ID, effect allele, and effect weight. The variant ID should be comprised of the chromosome, position, allele, and effect allele (ex: 1:927392:A:G). The PGS Catalog is a good resource for variant weights files, however, the files must be amended to the described format. For the pipeline to calculate a mix score, all weights files for calculating the mix score must be provided and the PGS ID associated with each weights file must be in the corresponding file name.
PCA variants file -- The PCA variants file is a file with one column containing varaint IDs in chromosome, position, ref, alt notation.
Query VCF files -- All VCFs that will be scored for a health condition of interest and therefore with the same condition's ancestry adjustment model. The VCFs can be single-sample or contain multiple samples/individuals.

PRS Prep Input Parameters

Type	Name	Required	Description	Default Value
Array[File]	weights_files	Yes	Array of 3-column TSV files describing variants and their weights; provide multiple weights files if mix scoring is desired
File	pca_variants	Yes	Text file listing the variants for principal component analysis (PCA), one variant per line
String	workspace	Yes	Name of the Terra workspace where the workflow will be run
String	source	Yes	URL to location of reference VCF shards
String	target	Yes	URL to use as temporary work space and to save the generated reference VCF
Int	nbatches	No	Number of parallel jobs for subsetting the reference shards	500
Boolean	resuming	No	Whether this run is the resumption of an earlier run	false
Boolean	norename	No	If `true`, do not run `HelperTasks.RenameChromosomes*` tasks	false
Array[File]	query_vcfs	Yes	Array of gz-compressed VCF files of the samples to be scored
String	prs_docker_image	No	Docker image equipped with PRS scripts	"us-central1-docker.pkg.dev/mgb-lmm-gcp-infrast-1651079146/mgbpmbiofx/prs:20250515"

PRS Prep Output Parameters

Type	Name	When	Description
File	regions	Always	Regions file used to generate reference VCF
File	kept_pca_variants	Always	File listing the retained and renamed PCA variants
Array[File]	renamed_weights_files	Always	Weights files with renamed variant IDs
Array[File]	renamed_query_vcfs	Always	Query VCFs with renamed variant IDs
File	reference_vcf	Always	Generated refernce VCF
File	reference_tbi	Always	Index for generated reference VCF

PrsModel WDL

An ancestry adjustment model will be used to adjust a mix or single PRS. Multiple models -- one for each health condition of interest -- can be supplied as input to the PRS workflow as a model manifest, and a PRS will be calculated for each model. The model manifests for the PRS pipeline are JSON files that contain the health condition name, paths to variant weights files, a path to a PC loadings file, paths to training variants files, a path to a PCA variants file, a path to the query VCF used to create the model, a path to a meansd file, a path to parameters for the model, and a path to the score weights file. Models and their associated model manifests can be generated using the Prs Model WDL.

Below is a flowchart that outlines the workflow step-by-step:

Inputs for the PRS Model WDL include:

Variant weights files -- The cleaned and prepped variant weights file from the PRS Input Prep WDL should be used as input for the PRS Model WDL. These weights files will be in the same format as the originals, but will have corrected naming conventions.
PCA variants file -- The output PCA variants file from the PRS Input Prep WDL should be used as input for the PRS Model WDL. This file will be in the same format as the original, but it will be filtered to the intersection of variants in the variant weights files, query files, and original PCA variants file.
Reference VCF -- We recommend using a reference such as 1KG for the reference VCF. Note that the reference VCF should not be the same as any query VCF. The reference VCF used should be the output reference VCF from the PRS Input Preparation WDL.
Query file -- The query file for making an ancestry adjustment model should be one of the output query VCFs from the PRS Input Prep WDL.
Score weights -- A score weights file must be tab-delimited and contain two columns in the following order: PGS ID and weight. The PGS IDs found in input variant weights filenames will be used to associate a weight to the variant weights file when calculating mix scores. To calculate a single PRS and not a mix score, a file with a single PGS ID and weight of 1 can be used as input.

PRS Model Input Parameters

Type	Name	Required	Description	Default Value
File	condition_name	Yes	Code for condition/disease
Array[File]	var_weights	Yes	Array of different PGS variant weight files, with 3 columns: variant ID, effect allele, and score; include multiple files for mix scoring and only one file for single scoring
File	pca_variants	Yes	Variants used in PCA projection
File	reference_vcf	Yes	Reference VCF of population for creating/training model
File	query_file	Yes	VCF or TSV to score
File	score_weights	No	Score weights for each PGS ID; required for mix models
Boolean	norename	No	If `true`, do not rename chromosomes to have chr prefix	false

PRS Model Output Parameters

Type	Name	When	Description
File	model_manifest	Always	JSON of model file paths for either single score model or mix model

PrsScoring WDL

Once the inputs have been prepared and any desired ancestry adjustment models have been created, the PRS Scoring WDL can be run. The PRS Scoring workflow will generate a PRS per variant weights file for each of the input model manifests. It will then calculate a mix or single score for each health condition associated, and adjust the resulting score using the ancestry adjustment model and PCA (if desired).

Below is a flowchart that outlines the workflow step-by-step:

PRS Scoring Input Parameters

Type	Name	Required	Description	Default Value
File	input_vcf	Yes	VCF to score
Array[File]	model_manifests	Yes	Adjustment model manifest file from MakeAdjustmentModelWorkflow WDL
Boolean	norename	No	If `true`, do not rename chromosomes to have chr prefix	false
Boolean	perform_adjustment	No	If `true`, use ancestry adjustment model to adjust scores	true
File	renaming_lookup	No	Mapping file for renaming chromosomes	"gs://lmm-reference-data/prsmix/reference/rename_chromosomes.tsv"
String	ubuntu_docker_image	No	Ubuntu Docker image	"ubuntu:latest"

PRS Scoring Output Parameters

Type	Name	When	Description
Array[File]	prs_raw_scores	Always	PRS raw scores
File	prs_mix_raw_score	If a score weights file exists in the model manifest	PRS mix raw scores
File	prs_adjusted_score	If perform_adjustment is `true`	PRS scores or mix scores adjusted with population models

References

Original PRS Github: https://github.com/broadinstitute/palantir-workflows/tree/main/PRS

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
res		res
scripts		scripts
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PRS Mix and Single Scoring

Folder Structure

PrsInputPrep WDL

PRS Prep Input Parameters

PRS Prep Output Parameters

PrsModel WDL

PRS Model Input Parameters

PRS Model Output Parameters

PrsScoring WDL

PRS Scoring Input Parameters

PRS Scoring Output Parameters

References

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

anvilproject/prs-mix

Folders and files

Latest commit

History

Repository files navigation

PRS Mix and Single Scoring

Folder Structure

PrsInputPrep WDL

PRS Prep Input Parameters

PRS Prep Output Parameters

PrsModel WDL

PRS Model Input Parameters

PRS Model Output Parameters

PrsScoring WDL

PRS Scoring Input Parameters

PRS Scoring Output Parameters

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages