Skip to content

shivabioinformatics/scalable-prs-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Scalable Cloud Architecture for Polygenic Risk Scoring (PRS)

A complete, end-to-end Nextflow DSL2 pipeline that takes a patient VCF, performs quality control, normalizes to a common genome build, imputes missing genotypes, computes Polygenic Risk Scores using GWAS summary statistics, stores results in a database, and generates publication-quality visualizations.


Problem Motivation

Why Polygenic Risk Scores matter:

Most common diseases β€” coronary artery disease, type 2 diabetes, breast cancer, Alzheimer's β€” aren't caused by a single gene. They're driven by the combined effects of thousands of common genetic variants, each contributing a tiny amount of risk. A Polygenic Risk Score (PRS) aggregates all of these small effects into a single number that estimates how much genetic risk a person carries.

The clinical impact is real:

  • A person in the top 8% of PRS for coronary artery disease has the same risk as someone with a monogenic mutation in LDLR (familial hypercholesterolemia) β€” roughly 3x the population average
  • Unlike family history, PRS is quantitative and actionable from birth
  • The UK Biobank, Genomics England, and the NIH All of Us program are all integrating PRS into their research platforms

Why a pipeline is needed:

  • Genotype data is massive: millions of variants Γ— thousands of patients
  • Different datasets use different genome builds (hg19 vs hg38) β€” coordinates must be harmonized
  • Missing genotypes must be imputed using reference panels
  • The scoring itself requires matching, aligning, and weighting thousands of variants
  • Results need to be stored in a queryable database and visualized for clinical interpretation

Without a reproducible pipeline, every step is a manual, error-prone process. This project automates the entire workflow.


Pipeline Architecture

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚           PRS PIPELINE (Nextflow DSL2)             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β–Ό                               β–Ό                               β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Input    β”‚                  β”‚   GWAS     β”‚                  β”‚ Reference β”‚
    β”‚  VCF      β”‚                  β”‚   Summary  β”‚                  β”‚   Panel   β”‚
    β”‚ (Patient) β”‚                  β”‚   Stats    β”‚                  β”‚  (1000G)  β”‚
    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
          β”‚                               β”‚                               β”‚
          β–Ό                               β”‚                               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚                               β”‚
    β”‚  Step 1   β”‚                         β”‚                               β”‚
    β”‚    QC     │──► call rate, MAF,      β”‚                               β”‚
    β”‚ BCFtools  β”‚    depth filtering      β”‚                               β”‚
    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                         β”‚                               β”‚
          β”‚                               β”‚                               β”‚
          β–Ό                               β”‚                               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚                               β”‚
    β”‚  Step 2   β”‚                         β”‚                               β”‚
    β”‚ Normalize │──► liftover hg19β†’hg38   β”‚                               β”‚
    β”‚ CrossMap  β”‚    + allele alignment   β”‚                               β”‚
    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                         β”‚                               β”‚
          β”‚                               β”‚                               β”‚
          β–Ό                               β”‚                               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚                               β”‚
    β”‚  Step 3   β”‚β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚  Impute   │──► fill missing calls
    β”‚  Beagle   β”‚    using LD patterns
    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
          β”‚                               β”‚
          β–Ό                               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚
    β”‚  Step 4   β”‚β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚   Score   │──► PRS = Ξ£(dosage Γ— beta)
    β”‚  Python   β”‚    z-score normalization
    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
          β”‚
          β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β–Ό                      β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Step 5   β”‚         β”‚  Step 6   β”‚
    β”‚ Database  β”‚         β”‚ Visualize β”‚
    β”‚  SQLite   β”‚         β”‚  Python   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                      β”‚
          β–Ό                      β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  3 Tables β”‚         β”‚  5 Plots  β”‚
    β”‚ β€’ samples β”‚         β”‚ β€’ QC      β”‚
    β”‚ β€’ variantsβ”‚         β”‚ β€’ Distrib β”‚
    β”‚ β€’ qc_meta β”‚         β”‚ β€’ Risk    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚ β€’ Manhtn  β”‚
                          β”‚ β€’ Ranked  β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

AWS Cloud Architecture

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                         AWS Cloud                                  β”‚
    β”‚                                                                     β”‚
    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
    β”‚   β”‚  S3      │────►│  Lambda  │────►│       AWS Batch           β”‚  β”‚
    β”‚   β”‚  Input   β”‚     β”‚ Trigger  β”‚     β”‚                           β”‚  β”‚
    β”‚   β”‚  Bucket  β”‚     β”‚          β”‚     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”‚
    β”‚   β”‚          β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  β”‚   Nextflow Pipeline  β”‚ β”‚  β”‚
    β”‚   β”‚ patient  β”‚                      β”‚  β”‚                      β”‚ β”‚  β”‚
    β”‚   β”‚  .vcf    β”‚                      β”‚  β”‚  QC β†’ Norm β†’ Impute β”‚ β”‚  β”‚
    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚  β”‚     β†’ Score β†’ DB    β”‚ β”‚  β”‚
    β”‚                                     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β”‚
    β”‚                                     β”‚                           β”‚  β”‚
    β”‚                                     β”‚  EC2 On-Demand  EC2 Spot  β”‚  β”‚
    β”‚                                     β”‚  (scoring)     (impute)   β”‚  β”‚
    β”‚                                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
    β”‚                                                β”‚                   β”‚
    β”‚        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€                   β”‚
    β”‚        β–Ό                   β–Ό                    β–Ό                   β”‚
    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
    β”‚   β”‚  S3      β”‚     β”‚    RDS     β”‚      β”‚ CloudWatch β”‚             β”‚
    β”‚   β”‚  Output  β”‚     β”‚ PostgreSQL β”‚      β”‚ Dashboard  β”‚             β”‚
    β”‚   β”‚  Bucket  β”‚     β”‚ (or SQLite β”‚      β”‚ + Alarms   β”‚             β”‚
    β”‚   β”‚          β”‚     β”‚  /DynamoDB)β”‚      β”‚            β”‚             β”‚
    β”‚   β”‚ results/ β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
    β”‚   β”‚ figures/ β”‚                                                     β”‚
    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
    β”‚        β”‚                β”‚ AWS Budget β”‚                            β”‚
    β”‚        β”‚ Lifecycle      β”‚ $100/month β”‚                            β”‚
    β”‚        β”‚ Rules          β”‚ + alerts   β”‚                            β”‚
    β”‚        β–Ό                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β”‚
    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                    β”‚
    β”‚   β”‚ S3       β”‚                                                    β”‚
    β”‚   β”‚ Glacier  β”‚  ◄── auto-archive after 90 days                   β”‚
    β”‚   β”‚ Archive  β”‚      HIPAA: retain β‰₯ 6 years                      β”‚
    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                                    β”‚
    β”‚                                                                     β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design Decisions:

Component Choice Why
Compute AWS Batch Auto-scales EC2 instances, shuts down when done β€” no idle costs
Spot Instances Imputation step only 60-90% cheaper; Nextflow retries on interruption
Storage S3 + Lifecycle Rules Standard β†’ IA (30d) β†’ Glacier (90d) β†’ Deep Archive (365d)
Database RDS PostgreSQL / DynamoDB Variant weights in DynamoDB for O(1) lookups; scores in RDS for SQL queries
Trigger S3 β†’ Lambda Event-driven: drop a VCF, pipeline runs automatically
Monitoring CloudWatch + Budget Dashboards for memory/CPU, $100/month budget with email alerts

Variant Selection for the PRS Model

Not all GWAS variants should go into a PRS. The selection strategy matters:

P-value Thresholding (C+T Method):

  • Include only variants below a significance threshold (e.g., p < 5Γ—10⁻⁸ for genome-wide significant, or p < 0.05 for more liberal)
  • Simple but effective. This pipeline uses p-value filtering by default.

LD Clumping:

  • Nearby variants are often correlated (linkage disequilibrium)
  • Including both would double-count the same genetic signal
  • PLINK --clump selects the most significant variant in each LD block
  • In production, I'd add a PLINK clumping step before scoring

Advanced Methods (not implemented, but worth knowing):

  • LDpred2: Bayesian method that adjusts betas for LD structure
  • PRS-CS: Uses continuous shrinkage priors on effect sizes
  • These give better prediction than simple C+T but are computationally heavier

Normalization β€” What It Means in PRS Context

Normalization happens at two levels:

1. Coordinate Normalization (Liftover)

Different datasets are built on different reference genomes (hg19 vs hg38). If my patient data is on hg19 and the GWAS summary stats are on hg38, the positions won't match. CrossMap converts coordinates using a chain file that maps old positions to new positions.

2. Batch / Population Normalization

A raw PRS of 0.47 means nothing by itself. It only has meaning relative to a population:

  • A European with PRS = 0.47 might be at the 60th percentile in a European reference
  • An African American with PRS = 0.47 might be at the 90th percentile in an African reference

Z-score normalization within ancestry groups fixes this:

z_score = (raw_prs - ancestry_mean) / ancestry_std

This pipeline computes z-scores across all input samples. In production with known ancestry labels, you'd normalize within each group separately.


S3 Lifecycle Rules & Data Archiving

This addresses the feedback about discussing archiving:

Time After Pipeline Run S3 Storage Class Cost (per TB/month) Access Time
0–30 days STANDARD $23 Instant
30–90 days STANDARD_IA $12.50 Instant
90–365 days GLACIER $4 3-5 hours
365+ days DEEP_ARCHIVE $1 12 hours

Why this matters:

  • Genomic data is large β€” a single whole-genome VCF is ~100GB
  • HIPAA requires retaining patient data for β‰₯6 years
  • Active results need fast access, but year-old runs can be archived
  • Lifecycle rules automate this β€” no manual intervention needed

The lifecycle policy is defined in nextflow.config and conf/aws.config.


Project Structure

prs_pipeline/
β”œβ”€β”€ main.nf                      # Pipeline entry point (Nextflow DSL2)
β”œβ”€β”€ nextflow.config              # Parameters, profiles, S3 lifecycle config
β”œβ”€β”€ README.md                    # This file
β”‚
β”œβ”€β”€ modules/                     # One Nextflow process per file
β”‚   β”œβ”€β”€ qc.nf                   # BCFtools-based QC filtering
β”‚   β”œβ”€β”€ normalize.nf            # CrossMap liftover + batch normalization
β”‚   β”œβ”€β”€ impute.nf               # Beagle genotype imputation
β”‚   β”œβ”€β”€ score.nf                # PRS weighted sum calculation
β”‚   β”œβ”€β”€ database.nf             # SQLite database storage
β”‚   └── visualize.nf            # Matplotlib visualization
β”‚
β”œβ”€β”€ subworkflows/
β”‚   └── prs_workflow.nf         # Chains all modules: QC β†’ Norm β†’ Impute β†’ Score β†’ DB β†’ Viz
β”‚
β”œβ”€β”€ bin/                         # Python scripts (auto-added to PATH in processes)
β”‚   β”œβ”€β”€ generate_test_data.py   # Creates simulated VCF + GWAS stats for demo
β”‚   β”œβ”€β”€ qc_filter.py            # QC: call rate, MAF, depth filtering
β”‚   β”œβ”€β”€ calculate_prs.py        # Core PRS scoring engine
β”‚   β”œβ”€β”€ store_results.py        # SQLite database operations
β”‚   └── visualize_prs.py        # 5 publication-quality plots
β”‚
β”œβ”€β”€ conf/
β”‚   β”œβ”€β”€ resources.config        # Per-process CPU/memory allocation
β”‚   └── aws.config              # AWS Batch, S3, CloudWatch configuration
β”‚
β”œβ”€β”€ data/                        # Test data (generated by generate_test_data.py)
β”‚   β”œβ”€β”€ sample_input.vcf        # 500 variants Γ— 20 patients (~5% missing)
β”‚   β”œβ”€β”€ gwas_summary_stats.tsv  # Effect sizes + p-values for 500 variants
β”‚   β”œβ”€β”€ reference_panel.vcf     # 50-sample reference for imputation
β”‚   └── chain_file.chain        # Placeholder liftover chain
β”‚
└── results/                     # Pipeline outputs (generated by running the pipeline)
    β”œβ”€β”€ qc/                     # Filtered VCF + QC stats
    β”œβ”€β”€ normalized/             # Build-normalized VCF
    β”œβ”€β”€ imputed/                # Imputed VCF
    β”œβ”€β”€ scores/                 # PRS scores per patient
    β”œβ”€β”€ database/               # SQLite database
    └── figures/                # 5 visualization PNGs

How to Run

Prerequisites

  • Python 3.8+ with matplotlib and numpy
  • Nextflow 22.10+ (for running the full pipeline)

Quick Start (Python scripts only β€” no Nextflow needed)

# 1. generate test data
cd prs_pipeline
python3 bin/generate_test_data.py

# 2. run QC
python3 bin/qc_filter.py \
  --vcf data/sample_input.vcf \
  --out-vcf results/qc/filtered.vcf \
  --out-stats results/qc/qc_stats.tsv

# 3. calculate PRS (skipping normalize/impute for simplicity)
python3 bin/calculate_prs.py \
  --vcf results/qc/filtered.vcf \
  --gwas data/gwas_summary_stats.tsv \
  --output results/scores/prs_scores.tsv

# 4. store in database
python3 bin/store_results.py \
  --scores results/scores/prs_scores.tsv \
  --gwas data/gwas_summary_stats.tsv \
  --qc-stats results/qc/qc_stats.tsv \
  --db results/database/prs_results.db

# 5. generate visualizations
python3 bin/visualize_prs.py \
  --scores results/scores/prs_scores.tsv \
  --qc-stats results/qc/qc_stats.tsv \
  --gwas data/gwas_summary_stats.tsv \
  --outdir results/figures

Full Pipeline (with Nextflow)

# local execution
nextflow run main.nf -profile local

# docker execution (more reproducible)
nextflow run main.nf -profile docker

# on AWS
nextflow run main.nf -profile aws_batch \
  --input_vcf s3://my-bucket/patient.vcf \
  --outdir s3://my-bucket/results

Outputs

Output Location Description
Filtered VCF results/qc/filtered.vcf Variants passing QC thresholds
QC Report results/qc/qc_stats.tsv Per-variant and per-sample QC metrics
Normalized VCF results/normalized/normalized.vcf Genome build–harmonized variants
Imputed VCF results/imputed/imputed.vcf Missing genotypes filled in
PRS Scores results/scores/prs_scores.tsv Per-patient raw score, z-score, percentile, risk category
Database results/database/prs_results.db SQLite with samples, variants, qc_metrics tables
QC Summary Plot results/figures/01_qc_summary.png Call rates and depth per sample
PRS Distribution results/figures/02_prs_distribution.png Histogram with risk thresholds
Risk Stratification results/figures/03_risk_stratification.png Patients per risk category
Variant Effects results/figures/04_variant_effects.png Manhattan-style effect size plot
Score Comparison results/figures/05_score_comparison.png Ranked lollipop chart

Database Schema

-- patient PRS scores (what clinicians query)
SELECT sample_id, prs_zscore, percentile, risk_category
FROM samples
WHERE risk_category = 'HIGH_RISK';

-- most significant variants in the model
SELECT rsid, chromosome, beta, p_value
FROM variants
ORDER BY p_value ASC
LIMIT 10;

-- samples with questionable QC
SELECT s.sample_id, s.prs_zscore, q.call_rate
FROM samples s
JOIN qc_metrics q ON s.sample_id = q.sample_id
WHERE q.call_rate < 0.95;

Cost Management

Strategy Tool Details
Budget alerts AWS Budgets $100/month cap with email notifications
Spot instances AWS Batch 60-90% savings on imputation compute
Auto-shutdown AWS Batch EC2 instances terminate when pipeline finishes
Data archiving S3 Lifecycle Auto-transition to Glacier after 90 days
Monitoring CloudWatch CPU/memory dashboards + duration alarms

Estimated cost per run (1M variants Γ— 100 samples): ~$0.36 with spot instances.


Tools & References

Tool Purpose Reference
Nextflow Workflow orchestration nextflow.io
BCFtools VCF quality control samtools.github.io/bcftools
CrossMap Genome build liftover crossmap.sourceforge.net
Beagle Genotype imputation faculty.washington.edu/browning/beagle
PGS Catalog Published PRS models pgscatalog.org
pgsc_calc Reference PRS pipeline github.com/PGScatalog/pgsc_calc
matplotlib Visualization matplotlib.org
SQLite Local database sqlite.org

License

This project was built for BINFX 410 β€” Final Project.

About

A scalable, end-to-end Nextflow DSL2 bioinformatics pipeline for computing Polygenic Risk Scores (PRS) from VCF data, including automated QC, imputation, scoring, and clinical visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors