- Running on a HPC
- You have access to appropriate GPU's if running DeepVariant/DeepTrio
Download the variant databases if you wish to run the variant annotation component of the pipeline.
Note: variant annotation is only available for hg38
Get a local copy of the VEP cache
curl -O https://ftp.ensembl.org/pub/release-112/variation/indexed_vep_cache/homo_sapiens_merged_vep_112_GRCh38.tar.gzCheck download was successful by checking md5sum
md5sum homo_sapiens_merged_vep_112_GRCh38.tar.gzExpected md5sum
51f6fc181a41f7386c85b40219694427 homo_sapiens_merged_vep_112_GRCh38.tar.gzUn-tar
tar -xzf homo_sapiens_merged_vep_112_GRCh38.tar.gzGet a local copy of the REVEL database
curl -o revel-v1.3_all_chromosomes.zip https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip?download=1Check download was successful by checking md5sum
md5sum revel-v1.3_all_chromosomes.zipExpected md5sum
3ea2bc33e6b5455fc7e9899da863b5fe revel-v1.3_all_chromosomes.zipUnzip
unzip revel-v1.3_all_chromosomes.zipFormat
cat revel_with_transcript_ids | tr "," "\t" > tabbed_revel.tsv
sed '1s/.*/#&/' tabbed_revel.tsv > new_tabbed_revel.tsv
bgzip new_tabbed_revel.tsv
zgrep -h -v ^#chr new_tabbed_revel.tsv.gz | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat - > new_tabbed_revel_grch38.tsv
bgzip new_tabbed_revel_grch38.tsv
tabix -f -s 1 -b 3 -e 3 new_tabbed_revel_grch38.tsv.gzGet a local copy of the gnomAD database. Eg:
for i in {1..22} X Y; do curl -O https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/vcf/joint/gnomad.joint.v4.1.sites.chr${i}.vcf.bgz; doneCheck download was successful by checking md5sums
for i in {1..22} X Y; do md5sum gnomad.joint.v4.1.sites.chr${i}.vcf.bgz; doneExpected md5sums
11c62331b0a654fce6a9cd43838de648 gnomad.joint.v4.1.sites.chr1.vcf.bgz
563a8fe6f148621169b0215ac9f19602 gnomad.joint.v4.1.sites.chr2.vcf.bgz
c44f1661bafc15685f1eee593b4886ea gnomad.joint.v4.1.sites.chr3.vcf.bgz
e6d438b84539c6adad5bc67d6febb33b gnomad.joint.v4.1.sites.chr4.vcf.bgz
d226db0e0055b5e87e72f4b63158664a gnomad.joint.v4.1.sites.chr5.vcf.bgz
2820f13a2439ebbdf55066a0c320cdb5 gnomad.joint.v4.1.sites.chr6.vcf.bgz
d1d50e4fa082246a5787eee76c036189 gnomad.joint.v4.1.sites.chr7.vcf.bgz
195f2825e94c9b5e43b34bb2b1ab5c7b gnomad.joint.v4.1.sites.chr8.vcf.bgz
1c739fb01fd9de816e3fddc958668627 gnomad.joint.v4.1.sites.chr9.vcf.bgz
e2f174f150b5d709d5d7349ac241c438 gnomad.joint.v4.1.sites.chr10.vcf.bgz
b8651f2e5a0aafa23d7fc3406b35bc69 gnomad.joint.v4.1.sites.chr11.vcf.bgz
62795bafd326eae566ef49781a26bc91 gnomad.joint.v4.1.sites.chr12.vcf.bgz
92244327bee6d45973f6077a8134ccd9 gnomad.joint.v4.1.sites.chr13.vcf.bgz
f7a8344b03a4162cb71cdb628dd1e15b gnomad.joint.v4.1.sites.chr14.vcf.bgz
40c8ab829f973688d2ef891ce5acabb7 gnomad.joint.v4.1.sites.chr15.vcf.bgz
58a5f920fc191b2069126c41278cc077 gnomad.joint.v4.1.sites.chr16.vcf.bgz
aa48657f45d8db7711c69fbf71a25cdc gnomad.joint.v4.1.sites.chr17.vcf.bgz
80dd729bc61be464c964d3a3bfb0f41a gnomad.joint.v4.1.sites.chr18.vcf.bgz
1853ca4993ceb25bd6f3a4554173f7cf gnomad.joint.v4.1.sites.chr19.vcf.bgz
09263d3c29b822760c61607c6398f5c4 gnomad.joint.v4.1.sites.chr20.vcf.bgz
09263d3c29b822760c61607c6398f5c4 gnomad.joint.v4.1.sites.chr21.vcf.bgz
df15a5ea8ae2e3090eae112f548c74ef gnomad.joint.v4.1.sites.chr22.vcf.bgz
a5288ced0c2fe893fcfae4d2022b9cd9 gnomad.joint.v4.1.sites.chrX.vcf.bgz
7b882f00919d582139acbc116a7a559f gnomad.joint.v4.1.sites.chrY.vcf.bgzMerge into a single file
zcat gnomad.joint.v4.1.sites.chr1.vcf.bgz | head -n1000 | grep '#' > gnomad.joint.v4.1.sites.chrall.vcf
for i in {1..22} X Y; do zgrep -v '#' gnomad.joint.v4.1.sites.chr${i}.vcf.gz >> gnomad.joint.v4.1.sites.chrall.vcf; done
bgzip gnomad.joint.v4.1.sites.chrall.vcf
tabix gnomad.joint.v4.1.sites.chrall.vcf.gzGet a local copy of the ClinVar database
curl -O https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/weekly/clinvar_20240825.vcf.gz
curl -O https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/weekly/clinvar_20240825.vcf.gz.tbiCheck download was successful by checking md5sums
md5sum clinvar_20240825.vcf.gz
md5sum clinvar_20240825.vcf.gz.tbiExpected md5sums
e05111f8e6418ce2898d78f68d39a019 clinvar_20240825.vcf.gz
74fd2cbee0c03af7809a4e9d2960c157 clinvar_20240825.vcf.gz.tbiGet a local copy of the CADD databases
curl -O https://krishna.gs.washington.edu/download/CADD/v1.7/GRCh38/whole_genome_SNVs.tsv.gz
curl -O https://krishna.gs.washington.edu/download/CADD/v1.7/GRCh38/whole_genome_SNVs.tsv.gz.tbi
curl -O https://krishna.gs.washington.edu/download/CADD/v1.7/GRCh38/gnomad.genomes.r4.0.indel.tsv.gz
curl -O https://krishna.gs.washington.edu/download/CADD/v1.7/GRCh38/gnomad.genomes.r4.0.indel.tsv.gz.tbi
curl -O https://kircherlab.bihealth.org/download/CADD-SV/v1.1/1000G_phase3_SVs.tsv.gz
curl -O https://kircherlab.bihealth.org/download/CADD-SV/v1.1/1000G_phase3_SVs.tsv.gz.tbiCheck download was successful by checking md5sums
md5sum whole_genome_SNVs.tsv.gz
md5sum whole_genome_SNVs.tsv.gz.tbi
md5sum gnomad.genomes.r4.0.indel.tsv.gz
md5sum gnomad.genomes.r4.0.indel.tsv.gz.tbi
md5sum 1000G_phase3_SVs.tsv.gz
md5sum 1000G_phase3_SVs.tsv.gz.tbiExpected md5sums
88577a55f1cd519d44e0f415ba248eb9 whole_genome_SNVs.tsv.gz
347df8fac17ea374c4598f4f44c7ce8b whole_genome_SNVs.tsv.gz.tbi
4b9c685c96d396af4d001c2f7dd9d8f9 gnomad.genomes.r4.0.indel.tsv.gz
85f3d2daa9202c5915c0ce0f1c749a66 gnomad.genomes.r4.0.indel.tsv.gz.tbi
1149f1df5490db9daaf520f957f59f23 1000G_phase3_SVs.tsv.gz
6ab7ffaead8e9f4246953bbd3b35dbd6 1000G_phase3_SVs.tsv.gz.tbiGet a local copy of the spliceAI database
Manually download from Illumina basespace (https://basespace.illumina.com/s/otSPW8hnhaZR). See the VEP spliceAI plugin documentation for more detail).
Get a local copy of the AlphaMissense database
curl -O https://storage.googleapis.com/dm_alphamissense/AlphaMissense_hg38.tsv.gzCheck download was successful by checking md5sum
md5sum AlphaMissense_hg38.tsv.gzExpected md5sums
9fd167735f16a1b87da6eb3e4c25fcb5 AlphaMissense_hg38.tsv.gzIndex
tabix -s 1 -b 2 -e 2 -f -S 1 AlphaMissense_hg38.tsv.gzSpecify the paths to your local copies of the variant databases. Eg:
params.vep_db = '/path/to/vep/grch38/'
params.revel_db = '/path/to/new_tabbed_revel_grch38.tsv.gz'
params.gnomad_db = '/path/to/gnomad.joint.v4.1.sites.chrall.vcf.gz'
params.clinvar_db = '/path/to/clinvar_20240825.vcf.gz'
params.cadd_snv_db = '/path/to/whole_genome_SNVs.tsv.gz'
params.cadd_indel_db = '/path/to/gnomad.genomes.r4.0.indel.tsv.gz'
params.cadd_sv_db = '/path/to/1000G_phase3_SVs.tsv.gz'
params.spliceai_snv_db = '/path/to/spliceai_scores.raw.snv.hg38.vcf.gz'
params.spliceai_indel_db = '/path/to/spliceai_scores.raw.indel.hg38.vcf.gz'
params.alphamissense_db = '/path/to/AlphaMissense_hg38.tsv.gz'Modify the rest of the nextflow_pipeface_container.config for your specific HPC/job sheduler.
Note: the 'deepvariant_call_variants' and 'deeptrio_call_variants' processes require access to appropriate GPU's
You'll need access to nextflow and singularity. Tested on:
- nextflow version 24.04.5
- singularity version 3.11.3
For example:
nextflow run pipeface.nf -params-file ./config/parameters_pipeface.json -config ./config/nextflow_pipeface_container.configPlease keep in mind that some datasets will require modifications to the default resources (particularly memory, disk usage, walltime). For example WGS data with greater than typical (~30x) sequencing depth.