harmonization_pipeline

The pipeline is designed to harmonize summary statistics based on GWASLab.

Requirements

Singularity

Getting started

git clone --recurse-submodules https://github.com/ht-diva/pqtl_pipeline.git
cd pqtl_pipeline
in config/config.yaml:
- adapt run options:
  - default option, harmonization: True, summarize: True
  - with pre- (pre_filtering_and_harmonization: True) or post- (harmonization_and_post_filtering: True) filtering option
  - with destination option (delivery: True)
- adapt the input path to the summary statistics sumstats_path
- adapt the suffix of the summary statistics filename sumstats_suffix (check filenames in sumstats_path)
- adapt the input path to the ID table used to filter your data snpid2filter (used only with pre- or post-filtering)
- adapt the ID column name of the summary statistics input_snpid_col (check files in sumstats_path; used only with pre-filtering)
- adapt the ID column name of the ID table used to filter your data filter_snpid_col (check file at snpid2filter; used only with pre- or post-filtering)
- adapt the input_format of harmonize_sumstats and snp_mapping based on your input data (listed in sumstats_path; see below for a list of possible input formats)
- adapt the output paths (the output is written to the path defined by the workspace_path; if delivery: True the output is copied to dest_path)
in the rule-based configuration files in config, adapt the filename transformation with filename_mask to extract the seqid with "." separator. Examples:
- for seq.3007.7.gwas.regenie.gz, the filename_mask is [True, True, True, False, False, False]
- for finngen_R12_AB1_ACTINOMYCOSIS.gz, the filename_mask is [True, False]
adapt the submit.sbatch
sbatch submit.sbatch

Note on job names

The job name can now be displayed as rule name in the "COMMENT" field of squeue. Use the command:

squeue --me --format="%.18i %.9P %.8j %.25k %.8u %.2t %.10M %.6D %.20R"

with output (example):

JOBID	PARTITION	NAME	COMMENT	USER	ST	TIME	NODES	NODELIST
199xxxx	cpuq	72a9f3ce-8929-...	harmonize_sumstats	username	R	mm:ss	1	cnodexx
199xxxx	cpuq	harmonization_pipeline	(null)	username	R	mm:ss	1	cnodexx

Input formats

Possible input formats for summary statistics (see formatbook.json for more options to add):

finngen
vcf
decode
gwaslab
regenie
fastgwa
ldsc
fuma
pickle
metal_het

Configuration files

This pipeline requires 6 configuration files in the folder config: the main configuration file config/config.yaml, and 5 rule-based configuration files where to specify the parameters of each step of the rule.

Examples of configuration files for BELIEVE, CHRIS, Decode, FinnGen, and INTERVAL input data are given in the folder examples.

Rules description

pre_filtering and harmonize_sumstats (pre_filtering_and_harmonization: True):
Purpose: Filters input data (column name provided input_snpid_col) by an ID (SNPID or rsID) list (provided in snpid2filter with column name filter_snpid_col) and performs GWASLab harmonization on filtered data.
Output: {seqid}.gwaslab.tsv.gz: Pre-filtered, standardized and aligned GWAS summary statistics.
harmonize_sumstats (harmonization: True):
Purpose: Performs GWASLab harmonization on input data without filtering.
Output: {seqid}.gwaslab.tsv.gz: Standardized and aligned GWAS summary statistics.
harmonize_sumstats and post_filtering: (harmonization_and_post_filtering: True):
Purpose: Performs GWASLab harmonization on input data and filters harmonized data by a SNPID list (provided in snpid2filter with column name filter_snpid_col).
Output: {seqid}.gwaslab.tsv.gz: Standardized, aligned and post-filtered GWAS summary statistics.
bgzip_tabix (included in all harmonization options):
Purpose: Creates a region-based index (CHROM and POS columns) of GWAS harmonized data for fast queries.
Output: {seqid}.gwaslab.tsv.gz.tbi: Index of GWAS harmonized data.
summarize_sumstats, create_if_table, create_min_pvalue_table and create_snp_mapping_table (summarize: True):
Purpose: Creates summary reports and plots of harmonized data.
Outputs:
{seqid}.png: Includes a Manhattan plot of -log10(p-values) by chromosome/position, and a QQ plot of observed -log10(p-values) vs. expected, with thresholds for genome-wide significance.
min_pvalue_table.tsv: Table with top association hits (SNPs with the smallest p-value in the GWAS summary statistics).
inflation_factors_table.tsv: Table with genomic inflation factors (lambda GC, Median and Maximum chi-squared statistics).
table.snp_mapping.tsv.gz: Mapping file that links input SNPID (and rsID when available) to harmonized SNPID.
sync_outputs_folder, sync_plots and sync_tables (delivery: True):
Purpose: Copies GWAS indexes, and summary reports and plots to destination folder dest_path.
Outputs: Copies of {seqid}.gwaslab.tsv.gz.tbi, {seqid}.{seqid}.png, min_pvalue_table.tsv, inflation_factors_table.tsv, and table.snp_mapping.tsv.gz.

GWASLab Harmonization

GWASLab Harmonization includes the following steps:

Check SNP identifiers (SNPID/rsID).
Fix chromosome notation (CHR), basepair positions (POS) and alleles (EA and NEA).
Sanity check on statistics.
Infer genome reference build version.
Align alleles to the reference genome to ensure that alleles match the reference strand and direction (in case, flip the alleles to match the reference).
Flip allele-specific statistics for mismatches: BETA = - BETA; Z = - Z; EAF = 1 - EAF.
Build SNPID column (CHR:POS:NEA:EA) (Optional with fixid: True and overwrite: True to specify in rule-based condiguration files, basic_check step).
Re-name and re-order columns based on GWASLab format.

DAGs

Check the dags for:

the default option
with the pre-filtering option, or
with the post-filtering option, or
with the delivery option

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.github/workflows		.github/workflows
config		config
examples		examples
slurm		slurm
workflow		workflow
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
dag_default.svg		dag_default.svg
dag_delivery.svg		dag_delivery.svg
dag_postfiltering.svg		dag_postfiltering.svg
dag_prefiltering.svg		dag_prefiltering.svg
environment.yml		environment.yml
environment_dev.yml		environment_dev.yml
submit.sbatch		submit.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

harmonization_pipeline

Requirements

Getting started

Note on job names

Input formats

Configuration files

Rules description

GWASLab Harmonization

DAGs

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

ht-diva/harmonization_pipeline

Folders and files

Latest commit

History

Repository files navigation

harmonization_pipeline

Requirements

Getting started

Note on job names

Input formats

Configuration files

Rules description

GWASLab Harmonization

DAGs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Packages