Skip to content

Commit 12d8e9b

Browse files
leahkempkisarur
andauthored
Release v0.2.1 (#21)
Co-authored-by: Kisaru Liyanage <kisarur@gmail.com>
1 parent de5ecb0 commit 12d8e9b

5 files changed

Lines changed: 127 additions & 62 deletions

File tree

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,6 @@ snp_indel_phasing_s4-.->snp_indel_annotation_s4
164164
- Access to xy86 project on [National Computational Infrastructure (NCI)](https://nci.org.au/) (if running variant annotation)
165165
- Access to pipeline dependencies:
166166
- [Nextflow 24.04.1 and it's Java 17.0.2 dependency](https://nf-co.re/docs/usage/installation)
167-
- [DeepVariant GPU 1.6.1 docker container](https://hub.docker.com/layers/google/deepvariant/1.6.1-gpu/images/sha256-7929c55106d3739daa18d52802913c43af4ca2879db29656056f59005d1d46cb?context=explore) pulled via singularity (if running DeepVariant)
168167
- [mosdepth 0.3.9 binary](https://github.com/brentp/mosdepth/releases/tag/v0.3.9) (if running depth calculation)
169168
- [pb-CpG-tools 2.3.2 binary](https://github.com/PacificBiosciences/pb-CpG-tools/releases/tag/v2.3.2) (if processing pacbio data)
170169

@@ -176,5 +175,5 @@ See a walkthrough for how to [run pipeface on NCI](./docs/run_on_nci.md).
176175

177176
## Credit
178177

179-
This is a highly collaborative project, with many contributions from the [Genomic Technologies Lab](https://www.garvan.org.au/research/labs-groups/genomic-technologies-lab). Notably, Dr Andre Reis and Dr Ira Deveson are closely involved in the development of this pipeline. The installation and hosting of software used in this pipeline has and continues to be supported by the [Australian BioCommons Tools and Workflows project (if89)](https://australianbiocommons.github.io/ables/if89/).
178+
This is a highly collaborative project, with many contributions from the [Genomic Technologies Lab](https://www.garvan.org.au/research/labs-groups/genomic-technologies-lab). Notably, Dr Andre Reis and Dr Ira Deveson are closely involved in the development of this pipeline. Optimisations involving DeepVariant have been contributed by Dr Kisaru Liyanage and Dr Matthew Downton from the [National Computational Infrastructure](https://nci.org.au), with support from Australian BioCommons as part of the Workflow Commons project. The installation and hosting of software used in this pipeline has and continues to be supported by the [Australian BioCommons Tools and Workflows project (if89)](https://australianbiocommons.github.io/ables/if89/).
180179

config/nextflow_pipeface.config

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -63,14 +63,40 @@ process {
6363
module = 'clair3/v1.0.9'
6464
}
6565

66-
withName: deepvariant {
66+
withName: deepvariant_dry_run {
67+
queue = 'normal'
68+
cpus = '1'
69+
time = '20m'
70+
memory = '4GB'
71+
module = 'deepvariant-gpu/1.6.1'
72+
}
73+
74+
withName: deepvariant_make_examples {
75+
queue = 'normalsr'
76+
cpus = '104'
77+
memory = '500.GB'
78+
time = '2.h'
79+
disk = '10GB'
80+
module = 'parallel:deepvariant-gpu/1.6.1'
81+
}
82+
83+
withName: deepvariant_call_variants {
6784
queue = 'gpuvolta'
68-
cpus = '24'
69-
gpus = '2'
70-
time = '8h'
71-
memory = '192GB'
72-
disk = '80GB'
73-
module = 'singularity:bcftools/1.12:htslib/1.16'
85+
cpus = '12'
86+
gpus = '1'
87+
memory = '96.GB'
88+
time = '2.h'
89+
disk = '10GB'
90+
module = 'deepvariant-gpu/1.6.1'
91+
}
92+
93+
withName: deepvariant_post_processing {
94+
queue = 'normalbw'
95+
cpus = '14'
96+
memory = '128.GB'
97+
time = '1.h'
98+
disk = '10GB'
99+
module = 'deepvariant-gpu/1.6.1:bcftools/1.12:htslib/1.16'
74100
}
75101

76102
withName: vep_snv {

config/parameters_pipeface.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
"annotate": "",
1212
"calculate_depth": "",
1313
"outdir": "",
14-
"deepvariant_container": "",
1514
"mosdepth_binary": "",
1615
"pbcpgtools_binary": ""
1716

docs/run_on_nci.md

Lines changed: 2 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
- [Clair3 models (if running clair3)](#clair3-models-if-running-clair3)
1010
- [ONT](#ont)
1111
- [Pacbio HiFi revio](#pacbio-hifi-revio)
12-
- [DeepVariant container (if running DeepVariant)](#deepvariant-container-if-running-deepvariant)
1312
- [mosdepth binary (if running depth calculation)](#mosdepth-binary-if-running-depth-calculation)
1413
- [pb-CpG-tools binary (if processing pacbio data)](#pb-cpg-tools-binary-if-processing-pacbio-data)
1514
- [3. Modify in\_data.csv](#3-modify-in_datacsv)
@@ -119,17 +118,6 @@ Untar
119118
tar -xvf hifi_revio.tar.gz
120119
```
121120

122-
### DeepVariant container (if running DeepVariant)
123-
124-
> **_Note:_** Running DeepVariant on ONT data assumes r10 data
125-
126-
Get a local copy of the DeepVariant GPU container v1.6.1 (singularity image file)
127-
128-
```bash
129-
module load singularity
130-
singularity pull deepvariant_1.6.1-gpu.sif docker://google/deepvariant:deeptrio-1.6.1-gpu
131-
```
132-
133121
### mosdepth binary (if running depth calculation)
134122

135123
Get a local copy of the mosdepth v0.3.9 binary
@@ -186,7 +174,8 @@ Modify access to project specific directories. Eg:
186174
```
187175

188176
> **_Note:_** Don't remove access to if89 gdata (`gdata/if89`). This is required to access environmental modules used in the pipeline
189-
> **_Note:_** Similarly, don't remove access to xy86 gdata (`gdata/xy86`) if running variant annotation. This is required to access variant annotation databases used in the pipeline
177+
178+
> **_Note:_** Don't remove access to xy86 gdata (`gdata/xy86`) if running variant annotation. This is required to access variant annotation databases used in the pipeline
190179
191180
## 5. Modify parameters_pipeface.json
192181

@@ -293,18 +282,6 @@ Specify the directory in which to write the pipeline outputs (please provide a f
293282
"outdir": "/g/data/ox63/results"
294283
```
295284

296-
Specify the path to the DeepVariant GPU container v1.6.1 (singularity image file) (if running DeepVariant). Eg:
297-
298-
```json
299-
"deepvariant_container": "./deepvariant_1.6.1-gpu.sif"
300-
```
301-
302-
*OR*
303-
304-
```json
305-
"deepvariant_container": "NONE"
306-
```
307-
308285
Specify the path to the mosdepth binary (if running depth calculation). Eg:
309286

310287
```json

pipeface.nf

Lines changed: 91 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -327,16 +327,15 @@ process clair3 {
327327

328328
}
329329

330-
process deepvariant {
330+
process deepvariant_dry_run {
331331

332332
input:
333333
tuple val(sample_id), val(extension), path(bam), val(data_type), val(regions_of_interest), val(clair3_model)
334334
val ref
335335
val ref_index
336-
val deepvariant_container
337336

338337
output:
339-
tuple val(sample_id), val(extension), val(data_type), val(regions_of_interest), val(clair3_model), path('sorted.bam'), path('sorted.bam.bai'), path('snp_indel.vcf.gz'), path('snp_indel.vcf.gz.tbi')
338+
tuple val(sample_id), path('sorted.bam'), path('sorted.bam.bai'), val(extension), val(data_type), val(regions_of_interest), val(clair3_model), env(make_examples_args), env(call_variants_args)
340339

341340
script:
342341
// conditionally define model type
@@ -346,27 +345,105 @@ process deepvariant {
346345
else if ( data_type == 'pacbio' ) {
347346
model = 'PACBIO'
348347
}
349-
// define an optional string to pass regions of interest bed file
350-
def regions_of_interest_optional = file(regions_of_interest).name != 'NONE' ? "--regions $regions_of_interest" : ''
351348
"""
352349
# stage bam and bam index
353350
# do this here instead of input tuple so I can handle processing an aligned bam as an input file without requiring a bam index for ubam input
354351
bam_loc=\$(realpath ${bam})
355352
ln -sf \${bam_loc} sorted.bam
356353
ln -sf \${bam_loc}.bai .
357354
ln -sf \${bam_loc}.bai sorted.bam.bai
358-
# run deepvariant
359-
singularity run $deepvariant_container run_deepvariant \
355+
356+
# do a dry-run of deepvariant
357+
run_deepvariant \
360358
--reads=$bam \
361359
--ref=$ref \
362360
--sample_name=$sample_id \
363361
--output_vcf=snp_indel.raw.vcf.gz \
364362
--model_type=$model \
365-
$regions_of_interest_optional \
366-
--num_shards=${task.cpus} \
367-
--postprocess_cpus=${task.cpus}
363+
--dry_run=true > commands.txt
364+
365+
# extract arguments for make_examples and call_variants stages
366+
make_examples_args=\$(grep "/opt/deepvariant/bin/make_examples" commands.txt | awk '{split(\$0, arr, "--add_hp_channel"); print "--add_hp_channel" arr[2]}' | sed 's/--sample_name "[^"]*"//g')
367+
call_variants_args=\$(grep "/opt/deepvariant/bin/call_variants" commands.txt | awk '{split(\$0, arr, "--checkpoint"); print "--checkpoint" arr[2]}')
368+
"""
369+
370+
stub:
371+
"""
372+
make_examples_args=""
373+
call_variants_args=""
374+
touch sorted.bam
375+
touch sorted.bam.bai
376+
"""
377+
378+
}
379+
380+
process deepvariant_make_examples {
381+
382+
input:
383+
tuple val(sample_id), path(bam), path(bam_index), val(extension), val(data_type), val(regions_of_interest), val(clair3_model), val(make_examples_args), val(call_variants_args)
384+
val ref
385+
val ref_index
386+
387+
output:
388+
tuple val(sample_id), path(bam), path(bam_index), val(extension), val(data_type), val(regions_of_interest), val(clair3_model), val(call_variants_args), path('*.gz{,.example_info.json}')
389+
390+
script:
391+
// define an optional string to pass regions of interest bed file
392+
def regions_of_interest_optional = file(regions_of_interest).name != 'NONE' ? "--regions $regions_of_interest" : ''
393+
"""
394+
seq 0 ${task.cpus - 1} | parallel -q --halt 2 --line-buffer make_examples \\
395+
--mode calling --ref "${ref}" --reads "${bam}" --sample_name "${sample_id}" ${regions_of_interest_optional} --examples "make_examples.tfrecord@${task.cpus}.gz" ${make_examples_args}
396+
"""
397+
398+
stub:
399+
"""
400+
touch make_examples.tfrecord-00000-of-00104.gz
401+
touch make_examples.tfrecord-00000-of-00104.gz.example_info.json
402+
"""
403+
404+
}
405+
406+
process deepvariant_call_variants {
407+
408+
input:
409+
tuple val(sample_id), path(bam), path(bam_index), val(extension), val(data_type), val(regions_of_interest), val(clair3_model), val(call_variants_args), path(make_examples_out)
410+
411+
output:
412+
tuple val(sample_id), path(bam), path(bam_index), val(extension), val(data_type), val(regions_of_interest), val(clair3_model), val(call_variants_args), path('*.gz')
413+
414+
script:
415+
def matcher = make_examples_out[0].baseName =~ /^(.+)-\d{5}-of-(\d{5})$/
416+
def num_shards = matcher[0][2] as int
417+
"""
418+
call_variants --outfile "call_variants_output.tfrecord.gz" --examples "make_examples.tfrecord@${num_shards}.gz" ${call_variants_args}
419+
"""
420+
421+
stub:
422+
"""
423+
touch call_variants_output-00000-of-00016.tfrecord.gz
424+
"""
425+
426+
}
427+
428+
process deepvariant_post_processing {
429+
430+
input:
431+
tuple val(sample_id), path(bam), path(bam_index), val(extension), val(data_type), val(regions_of_interest), val(clair3_model), val(call_variants_args), path(call_variants_out)
432+
val ref
433+
val ref_index
434+
435+
output:
436+
tuple val(sample_id), val(extension), val(data_type), val(regions_of_interest), val(clair3_model), path(bam), path(bam_index), path('snp_indel.vcf.gz'), path('snp_indel.vcf.gz.tbi')
437+
438+
script:
439+
"""
440+
# postprocess_variants and vcf_stats_report stages in deepvariant
441+
postprocess_variants --ref "${ref}" --infile "call_variants_output.tfrecord.gz" --outfile "snp_indel.raw.vcf.gz" --cpus "${task.cpus}" --sample_name "${sample_id}"
442+
vcf_stats_report --input_vcf "snp_indel.raw.vcf.gz" --outfile_base "snp_indel.raw"
443+
368444
# filter out refcall variants
369445
bcftools view -f 'PASS' snp_indel.raw.vcf.gz -o snp_indel.vcf.gz
446+
370447
# index vcf
371448
tabix snp_indel.vcf.gz
372449
"""
@@ -705,7 +782,6 @@ workflow {
705782
calculate_depth = "$params.calculate_depth"
706783
outdir = "$params.outdir"
707784
outdir2 = "$params.outdir2"
708-
deepvariant_container = "$params.deepvariant_container"
709785
mosdepth_binary = "$params.mosdepth_binary"
710786
pbcpgtools_binary = "$params.pbcpgtools_binary"
711787
vep_db = "$params.vep_db"
@@ -767,9 +843,6 @@ workflow {
767843
if ( in_data_format != 'snv_vcf' && snp_indel_caller != 'clair3' && snp_indel_caller != 'deepvariant' ) {
768844
exit 1, "Error: SNP/indel calling software should be either 'clair3' or 'deepvariant', '${snp_indel_caller}' selected."
769845
}
770-
if ( in_data_format != 'snv_vcf' && snp_indel_caller == 'deepvariant' && deepvariant_container == 'NONE' ) {
771-
exit 1, "Error: When DeepVariant is selected as the SNP/indel calling software, provide a path to an appropriate DeepVariant container in the parameter file or pass to --deepvariant_container on the command line rather than setting it to 'NONE'."
772-
}
773846
if ( !sv_caller ) {
774847
exit 1, "Error: No SV calling software selected. Either include in parameter file or pass to --sv_caller on the command line. Should be 'sniffles', 'cutesv', or 'both'."
775848
}
@@ -830,15 +903,6 @@ workflow {
830903
if ( !outdir ) {
831904
exit 1, "Error: No output directory provided. Either include in parameter file or pass to --outdir on the command line."
832905
}
833-
if ( !deepvariant_container ) {
834-
exit 1, "Error: No DeepVariant container provided. Either include in parameter file or pass to --deepvariant_container on the command line. Set to 'NONE' if not running DeepVariant."
835-
}
836-
if ( in_data_format == 'snv_vcf' && deepvariant_container != 'NONE' && snp_indel_caller == 'deepvariant' ) {
837-
exit 1, "Error: When the input data format is 'snv_vcf', please set the DeepVariant container (deepvariant_container) to 'NONE'."
838-
}
839-
if ( in_data_format != 'snv_vcf' && deepvariant_container != 'NONE' && snp_indel_caller != 'deepvariant' ) {
840-
exit 1, "Error: Pass 'NONE' to 'deepvariant_container' when DeepVariant is NOT selected as the SNP/indel calling software, '${deepvariant_container}' and '${snp_indel_caller}' respectively provided'."
841-
}
842906
if ( !mosdepth_binary ) {
843907
exit 1, "Error: No mosdepth binary provided. Either include in parameter file or pass to --mosdepth_binary on the command line. Set to 'NONE' if not running depth calculation."
844908
}
@@ -866,9 +930,6 @@ workflow {
866930
if ( !file(tandem_repeat).exists() ) {
867931
exit 1, "Error: Tandem repeat bed file path does not exist, '${tandem_repeat}' provided."
868932
}
869-
if ( !file(deepvariant_container).exists() ) {
870-
exit 1, "Error: DeepVariant container file path does not exist, '${deepvariant_container}' provided."
871-
}
872933
if ( !file(mosdepth_binary).exists() ) {
873934
exit 1, "Error: mosdepth binary file path does not exist, '${mosdepth_binary}' provided."
874935
}
@@ -979,7 +1040,10 @@ workflow {
9791040
snp_indel_vcf_bam = clair3(bam, ref, ref_index)
9801041
}
9811042
else if ( snp_indel_caller == 'deepvariant' ) {
982-
snp_indel_vcf_bam = deepvariant(bam, ref, ref_index, deepvariant_container)
1043+
deepvariant_dry_run(bam, ref, ref_index)
1044+
deepvariant_make_examples(deepvariant_dry_run.out, ref, ref_index)
1045+
deepvariant_call_variants(deepvariant_make_examples.out)
1046+
snp_indel_vcf_bam = deepvariant_post_processing(deepvariant_call_variants.out, ref, ref_index)
9831047
}
9841048
// phasing
9851049
(snp_indel_phased_vcf_bam, snp_indel_phased_vcf, phased_read_list) = whatshap_phase(snp_indel_vcf_bam, ref, ref_index, outdir, outdir2, ref_name, snp_indel_caller)

0 commit comments

Comments
 (0)