ML-guided qPCR primer design with off-target minimization.
# Create environment with external tools
conda env create -f environment.yml
conda activate qprimer-designer
# Install the package
pip install .The GHCR image is multi-arch (amd64+arm64) and GPU-enabled — use it for the CLI, training, and Terra/batch workflows:
docker pull ghcr.io/broadinstitute/qprimer_designer:latest
docker run --rm ghcr.io/broadinstitute/qprimer_designer qprimer --help(A separate slim, CPU-only image powers the hosted web app on Cloud Run — see below.)
A Streamlit-based GUI is available as an alternative to the CLI. See the GUI Getting Started Guide for detailed setup instructions.
pip install -e ".[gui]"
streamlit run gui/app.pyA hosted version runs on Google Cloud Run (project sabeti-adapt). It is public
(no login) and shared — anyone with the URL can use it, and results are not
retained across redeploys/restarts, so download anything you want to keep. See
terraform/README.md for deployment details.
| Environment | URL |
|---|---|
| Production | https://qprimer-designer.sabeti.broadinstitute.org |
| Staging (base) | https://qprimer-designer-staging-soitfyremq-uc.a.run.app |
| Per-branch preview | https://<branch>---qprimer-designer-staging-soitfyremq-uc.a.run.app |
Every push to a branch deploys a preview revision to the staging service (with --no-traffic), accessible at the per-branch URL above. Pushing a v* tag deploys production.
The GUI uses a multi-page workflow with sidebar navigation:
- Home — Entry point with getting started guide
- Design / Evaluate / Monitor — Step-by-step workflows: select targets, configure parameters, run, and view results
- Past Results — Browse and download outputs from previous runs
adapt --help
adapt design --help
adapt evaluate --help
adapt fetch --help
adapt monitor --helpThe pipeline expects input sequences in the following structure:
.
└── target_seqs/
└── original/
├── target1.fa
├── target2.fa
└── offtarget.fa
FASTA files in ./target_seqs/original should be multi-sequence, unaligned FASTAs with .fa extension.
Copy the template and fill in your parameters:
cp workflows/params.txt.template params.txtKey sections in params.txt:
- Primer generation — Tm, GC%, length, tiling parameters
- Probe generation — Probe length, Tm, homopolymer, 5' G avoidance
- Probe mode — Mismatch tolerance, amplicon buffer, probes per pair
- Amplicon — Min/max amplicon and off-target lengths
- Email — Gmail sender, app password, and recipients (for
adapt monitor)
# Singleplex
adapt design --params params.txt
# With probe design
adapt design --params params.txt --probe
# Multiplex
adapt design --params params.txt --multiplex --probe
# Dry run (preview without executing)
adapt design --params params.txt --dry-runOutput will be in runs/{run_id}/ (timestamped by default, or use --runid to specify):
{target}_final.csv— Primer candidates with scores and coverage- For probe mode:
{target}_probe.faand mapping results
Evaluate existing primers against target sequences using the ML model.
# Option 1: Evaluate direct sequences
adapt evaluate --for ATCGATCGATCG --rev GCTAGCTAGCTA
# Option 2: With probe
adapt evaluate --for ATCGATCGATCG --rev GCTAGCTAGCTA --pro AACCGGTTAACCGG
# Option 3: Evaluate from FASTA file (multiple primer sets)
adapt evaluate --pset my_primers.faPrimers must follow the *_for, *_rev, and optionally *_pro naming pattern:
>primer1_for
ATCGATCGATCGATCGATCG
>primer1_rev
GCTAGCTAGCTAGCTAGCTA
>primer1_pro
AACCGGTTAACCGGTTAACC
>primer2_for
TTTTAAAACCCCGGGG
>primer2_rev
GGGGCCCCAAAATTTT
Each primer set must have both a forward (*_for) and reverse (*_rev) entry. Probe (*_pro) is optional.
Results will be in evaluate/{run_id}/{pset_name}/ containing Excel reports with:
- Summary sheet: Primer/probe sequences, dimerization table (2x2 or 3x3 with probe), sensitivity (coverage as
covered / total), and specificity metrics - Detail sheet: Per-target alignments with classifier/regressor scores, decision/reason columns, probe match status, and mismatch counts. Unmapped sequences are included with
classifier=0, regressor=unmapped.
Each primer set gets its own Excel file (e.g., primer1.xlsx, primer2.xlsx).
Download virus sequences from NCBI using a Google Sheets configuration spreadsheet.
# Fetch all queries defined in the spreadsheet
adapt fetch --params params.txt
# Fetch specific query IDs
adapt fetch --params params.txt --query-ids 7 12
# Override spreadsheet URL
adapt fetch --url "https://docs.google.com/spreadsheets/d/..." --query-ids 7The spreadsheet must have columns including query_id, Pathogen, and search parameters. Configure SPREADSHEET_URL and optionally QUERY_IDS in params.txt, or pass them via CLI.
Automatically fetch new sequences, evaluate primers, and send email alerts with results. Designed for periodic surveillance of primer set performance against emerging variants.
# Run monitor once
adapt monitor --params params.txt
# With explicit run ID (groups results from the same spreadsheet)
adapt monitor --params params.txt --runid my_panel
# Dry run
adapt monitor --params params.txt --dry-run
# Schedule monthly cron job
adapt monitor --params params.txt --schedule
# Remove cron job
adapt monitor --unschedule- Fetch — Downloads latest sequences from NCBI via the configured Google Sheets spreadsheet
- Diff — Compares accession IDs against the most recent previous fetch to identify new sequences
- Evaluate — Runs the ML evaluation pipeline on new sequences using primer/probe sets from the spreadsheet (
Forward,Reverse,Probecolumns) - Email — Sends an alert with:
- Number of new sequences detected (with length, geographic region, release date)
- Sensitivity/specificity summary tables
- Excel report attachments for each pathogen
In params.txt:
EMAIL_SENDER = your.alert@gmail.com
EMAIL_PASSWORD = xxxx xxxx xxxx xxxx
EMAIL_RECIPIENTS = recipient1@example.com,recipient2@example.com
The sender must be a Gmail account with App Passwords enabled (requires 2-Step Verification). The password is a 16-character app password, not the account password.
Results are organized by run ID and date. The --runid flag (default: derived from spreadsheet ID) groups results from the same monitoring configuration. Each run date gets a flat directory with all inputs and outputs:
monitor/
└── {runid}/
└── YYYYMMDD/
├── mastersheet.csv # Spreadsheet snapshot
├── {target}_pset.fa # Primer set FASTA
├── {target}_new.fa # New sequences (evaluate input)
├── {target}_accessions.txt # Accession list (for next diff)
├── {target}_metadata.csv # Sequence metadata
└── {target}_{primer}.xlsx # Evaluation reports
The full fetched FASTA is removed after extracting the new-only subset and saving the accession list, to conserve disk space. Previous accession lists are used to diff against future fetches.
The pipeline also uses internal qprimer subcommands via Snakemake. See docs/qprimer_cli.md for details.
Model inference auto-detects CUDA and falls back to CPU when no GPU is present. The GHCR image (CLI / training / Terra) is CUDA-enabled; the hosted web app runs the CPU-only GAR image (Cloud Run has no GPU).
If GPU is available, add the resource flag:
adapt design --params params.txt # auto-detects GPUOr with raw Snakemake:
snakemake -s Snakefile.example --cores all --resources gpu=1CPU performance is acceptable for most use cases.
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -vSee CLAUDE.md for development guidelines.
Pre-trained models are bundled with the package in src/qprimer_designer/data/. Training scripts are available in the training/ directory for reference (raw dataset available upon request).
MIT License - see LICENSE for details.