Skip to content

Commit de5ecb0

Browse files
authored
Release v0.2.0 (#19)
1 parent 84e09ba commit de5ecb0

6 files changed

Lines changed: 446 additions & 334 deletions

File tree

README.md

Lines changed: 32 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
# Pipeface
23

34
## Overview
@@ -25,11 +26,13 @@ snp_indel_calling{{"SNP/indel variant calling"}}
2526
snp_indel_phasing{{"SNP/indel phasing"}}
2627
snp_indel_annotation{{"SNP/indel annotation (optional - hg38 only)"}}
2728
haplotagging{{"Haplotagging bams"}}
29+
generate_meth_probs{{"Generate site methylation probabilities (pacbio data only)"}}
2830
sv_calling{{"Structural variant calling"}}
2931
3032
input_data-.->merging-.->alignment-.->snp_indel_calling-.->snp_indel_phasing-.->haplotagging-.->sv_calling
3133
alignment-.->depth
3234
alignment-.->haplotagging
35+
haplotagging-.->generate_meth_probs
3336
snp_indel_phasing-.->snp_indel_annotation
3437
3538
```
@@ -51,24 +54,24 @@ merging_m1{{"Description: merge runs <br><br> Main tools: GNU coreutils <br><br>
5154
merging_m2{{"Description: merge runs <br><br> Main tools: Samtools <br><br> Commands: samtools merge"}}
5255
5356
alignment_s1{{"Description: alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
54-
alignment_s2{{"Description: alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
55-
alignment_s3{{"Description: bam to fastq conversion, alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
57+
alignment_s2{{"Description: bam to fastq conversion, alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
58+
alignment_s3{{"Description: alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
5659
alignment_s4{{"Description: bam to fastq conversion, alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
5760
58-
depth_s1{{"Description: calculate alignment depth <br><br> Main tools: Samtools <br><br> Commands: samtools depth"}}
59-
depth_s2{{"Description: calculate alignment depth <br><br> Main tools: Samtools <br><br> Commands: samtools depth"}}
60-
depth_s3{{"Description: calculate alignment depth <br><br> Main tools: Samtools <br><br> Commands: samtools depth"}}
61-
depth_s4{{"Description: calculate alignment depth <br><br> Main tools: Samtools <br><br> Commands: samtools depth"}}
61+
depth_s1{{"Description: calculate alignment depth <br><br> Main tools: mosdepth <br><br> Commands: mosdepth depth"}}
62+
depth_s2{{"Description: calculate alignment depth <br><br> Main tools: mosdepth <br><br> Commands: mosdepth depth"}}
63+
depth_s3{{"Description: calculate alignment depth <br><br> Main tools: mosdepth <br><br> Commands: mosdepth depth"}}
64+
depth_s4{{"Description: calculate alignment depth <br><br> Main tools: mosdepth <br><br> Commands: mosdepth depth"}}
6265
6366
snp_indel_calling_s1{{"Description: SNP/indel variant calling <br><br> Main tools: Clair3 or DeepVariant <br><br> Commands: run_clair3.sh or run_deepvariant"}}
6467
snp_indel_calling_s2{{"Description: SNP/indel variant calling <br><br> Main tools: Clair3 or DeepVariant <br><br> Commands: run_clair3.sh or run_deepvariant"}}
6568
snp_indel_calling_s3{{"Description: SNP/indel variant calling <br><br> Main tools: Clair3 or DeepVariant <br><br> Commands: run_clair3.sh or run_deepvariant"}}
6669
snp_indel_calling_s4{{"Description: SNP/indel variant calling <br><br> Main tools: Clair3 or DeepVariant <br><br> Commands: run_clair3.sh or run_deepvariant"}}
6770
68-
snp_indel_phasing_s1{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase"}}
69-
snp_indel_phasing_s2{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase"}}
70-
snp_indel_phasing_s3{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase"}}
71-
snp_indel_phasing_s4{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase"}}
71+
snp_indel_phasing_s1{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase and whatshap stats"}}
72+
snp_indel_phasing_s2{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase and whatshap stats"}}
73+
snp_indel_phasing_s3{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase and whatshap stats"}}
74+
snp_indel_phasing_s4{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase and whatshap stats"}}
7275
7376
snp_indel_annotation_s1{{"Description: SNP/indel annotation (optional - hg38 only)" <br><br> Main tools: ensembl-vep <br><br> Commands: vep}}
7477
snp_indel_annotation_s2{{"Description: SNP/indel annotation (optional - hg38 only)" <br><br> Main tools: ensembl-vep <br><br> Commands: vep}}
@@ -80,18 +83,19 @@ haplotagging_s2{{"Description: haplotagging bams <br><br> Main tools: WhatsHap <
8083
haplotagging_s3{{"Description: haplotagging bams <br><br> Main tools: WhatsHap <br><br> Commands: whatshap haplotag"}}
8184
haplotagging_s4{{"Description: haplotagging bams <br><br> Main tools: WhatsHap <br><br> Commands: whatshap haplotag"}}
8285
86+
generate_meth_probs_s2{{"Description: Generate site methylation probabilities <br><br> Main tools: pb-CpG-tools <br><br> Commands: aligned_bam_to_cpg_scores"}}
87+
8388
sv_calling_s1{{"Description: structural variant calling <br><br> Main tools: Sniffles2 and/or cuteSV <br><br> Commands: sniffles and/or cuteSV"}}
8489
sv_calling_s2{{"Description: structural variant calling <br><br> Main tools: Sniffles2 and/or cuteSV <br><br> Commands: sniffles and/or cuteSV"}}
8590
sv_calling_s3{{"Description: structural variant calling <br><br> Main tools: Sniffles2 and/or cuteSV <br><br> Commands: sniffles and/or cuteSV"}}
8691
sv_calling_s4{{"Description: structural variant calling <br><br> Main tools: Sniffles2 and/or cuteSV <br><br> Commands: sniffles and/or cuteSV"}}
8792
8893
ont_data_f1-.->merging_m1-.->alignment_s1-.->snp_indel_calling_s1-.->snp_indel_phasing_s1-.->haplotagging_s1-.->sv_calling_s1
8994
ont_data_f2-.->merging_m1
90-
ont_data_f5-.->alignment_s2-.->snp_indel_calling_s2-.->snp_indel_phasing_s2-.->haplotagging_s2-.->sv_calling_s2
91-
ont_data_f6-.->alignment_s3-.->snp_indel_calling_s3-.->snp_indel_phasing_s3-.->haplotagging_s3-.->sv_calling_s3
92-
93-
pacbio_data_f3-.->merging_m2-.->alignment_s4-.->snp_indel_calling_s4-.->snp_indel_phasing_s4-.->haplotagging_s4-.->sv_calling_s4
95+
pacbio_data_f3-.->merging_m2-.->alignment_s2-.->snp_indel_calling_s2-.->snp_indel_phasing_s2-.->haplotagging_s2-.->sv_calling_s2
9496
pacbio_data_f4-.->merging_m2
97+
ont_data_f5-.->alignment_s3-.->snp_indel_calling_s3-.->snp_indel_phasing_s3-.->haplotagging_s3-.->sv_calling_s3
98+
ont_data_f6-.->alignment_s4-.->snp_indel_calling_s4-.->snp_indel_phasing_s4-.->haplotagging_s4-.->sv_calling_s4
9599
96100
alignment_s1-.->depth_s1
97101
alignment_s2-.->depth_s2
@@ -103,6 +107,8 @@ alignment_s2-.->haplotagging_s2
103107
alignment_s3-.->haplotagging_s3
104108
alignment_s4-.->haplotagging_s4
105109
110+
haplotagging_s2-.->generate_meth_probs_s2
111+
106112
snp_indel_phasing_s1-.->snp_indel_annotation_s1
107113
snp_indel_phasing_s2-.->snp_indel_annotation_s2
108114
snp_indel_phasing_s3-.->snp_indel_annotation_s3
@@ -112,8 +118,8 @@ snp_indel_phasing_s4-.->snp_indel_annotation_s4
112118

113119
## Main analyses
114120

115-
- ONT and pacbio HiFi data
116-
- WGS and targeted
121+
- ONT and/or pacbio HiFi data
122+
- WGS and/or targeted
117123
- hg38 or hs1 reference genome
118124

119125
## Main tools
@@ -123,6 +129,8 @@ snp_indel_phasing_s4-.->snp_indel_annotation_s4
123129
- [WhatsHap](https://github.com/whatshap/whatshap)
124130
- [Sniffles2](https://github.com/fritzsedlazeck/Sniffles) and/or [cuteSV](https://github.com/tjiangHIT/cuteSV)
125131
- [Samtools](https://github.com/samtools/samtools)
132+
- [mosdepth](https://github.com/brentp/mosdepth)
133+
- [pb-CpG-tools](https://github.com/PacificBiosciences/pb-CpG-tools)
126134
- [ensembl-vep](https://github.com/Ensembl/ensembl-vep)
127135

128136
## Main input files
@@ -132,7 +140,6 @@ snp_indel_phasing_s4-.->snp_indel_annotation_s4
132140
- ONT/pacbio HiFi FASTQ (gzipped or uncompressed) or unaligned BAM
133141
- Indexed reference genome
134142
- Clair3 models (if running Clair3)
135-
- [DeepVariant GPU 1.6.1 docker container](https://hub.docker.com/layers/google/deepvariant/1.6.1-gpu/images/sha256-7929c55106d3739daa18d52802913c43af4ca2879db29656056f59005d1d46cb?context=explore) pulled via singularity (if running DeepVariant)
136143

137144
### Optional
138145

@@ -142,19 +149,24 @@ snp_indel_phasing_s4-.->snp_indel_annotation_s4
142149
## Main output files
143150

144151
- Aligned, sorted and haplotagged bam
152+
- Depth per chromosome (and per region in the case of targeted sequencing) (optional)
145153
- Clair3 or DeepVariant phased SNP/indel VCF file
146154
- Clair3 or DeepVariant phased and annotated SNP/indel VCF file (optional - hg38 only)
155+
- Bed and bigwig site methylation probabilities for complete read set and separate haplotypes (pacbio only)
147156
- Phased Sniffles2 and/or un-phased cuteSV SV VCF file
148157

158+
> **_Note:_** Running DeepVariant on ONT data assumes r10 data
159+
149160
## Assumptions
150161

151162
- Running pipeline on Australia's [National Computational Infrastructure (NCI)](https://nci.org.au/)
152163
- Access to if89 project on [National Computational Infrastructure (NCI)](https://nci.org.au/)
153164
- Access to xy86 project on [National Computational Infrastructure (NCI)](https://nci.org.au/) (if running variant annotation)
154165
- Access to pipeline dependencies:
155-
- [Nextflow and it's java dependency](https://nf-co.re/docs/usage/installation). Validated to run on:
156-
- Nextflow 24.04.1
157-
- Java 17.0.2
166+
- [Nextflow 24.04.1 and it's Java 17.0.2 dependency](https://nf-co.re/docs/usage/installation)
167+
- [DeepVariant GPU 1.6.1 docker container](https://hub.docker.com/layers/google/deepvariant/1.6.1-gpu/images/sha256-7929c55106d3739daa18d52802913c43af4ca2879db29656056f59005d1d46cb?context=explore) pulled via singularity (if running DeepVariant)
168+
- [mosdepth 0.3.9 binary](https://github.com/brentp/mosdepth/releases/tag/v0.3.9) (if running depth calculation)
169+
- [pb-CpG-tools 2.3.2 binary](https://github.com/PacificBiosciences/pb-CpG-tools/releases/tag/v2.3.2) (if processing pacbio data)
158170

159171
*[See the list of software and their versions used by this version of pipeface](./docs/software_versions.txt) as well as the [list of variant databases and their versions](./docs/database_versions.txt) if variant annotation is carried out (assuming the default [nextflow_pipeface.config](./config/nextflow_pipeface.config) file is used).*
160172

config/nextflow_pipeface.config

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -48,12 +48,11 @@ process {
4848
module = 'minimap2/2.28:samtools/1.19'
4949
}
5050

51-
withName: depth {
51+
withName: mosdepth {
5252
queue = 'normal'
5353
cpus = '8'
54-
time = '4h'
54+
time = '2h'
5555
memory = '32GB'
56-
module = 'samtools/1.19'
5756
}
5857

5958
withName: clair3 {
@@ -71,13 +70,13 @@ process {
7170
time = '8h'
7271
memory = '192GB'
7372
disk = '80GB'
74-
module = 'singularity'
73+
module = 'singularity:bcftools/1.12:htslib/1.16'
7574
}
7675

7776
withName: vep_snv {
7877
queue = 'normal'
7978
cpus = '32'
80-
time = '10h'
79+
time = '24h'
8180
memory = '128GB'
8281
module = 'singularity:htslib/1.16:ensemblorg/ensembl-vep/release_112.0'
8382
}
@@ -90,6 +89,13 @@ process {
9089
module = 'whatshap/2.3:htslib/1.16:samtools/1.19'
9190
}
9291

92+
withName: pbcpgtools {
93+
queue = 'normal'
94+
cpus = '48'
95+
time = '2h'
96+
memory = '192GB'
97+
}
98+
9399
withName: sniffles {
94100
queue = 'normal'
95101
cpus = '4'
@@ -106,11 +112,5 @@ process {
106112
module = 'cuteSV/1.0.13:htslib/1.16'
107113
}
108114

109-
withName: 'publish_settings|publish_bam_header|publish_depth|publish_whatshap_phase|publish_whatshap_haplotag|publish_sniffles|publish_cutesv' {
110-
queue = 'normal'
111-
cpus = '1'
112-
time = '20m'
113-
memory = '4GB'
114-
}
115-
116115
}
116+

config/parameters_pipeface.json

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,18 @@
22
{
33

44
"in_data": "",
5+
"in_data_format": "",
56
"ref": "",
67
"ref_index": "",
78
"tandem_repeat": "",
89
"snp_indel_caller": "",
910
"sv_caller": "",
1011
"annotate": "",
12+
"calculate_depth": "",
1113
"outdir": "",
12-
"deepvariant_container": ""
14+
"deepvariant_container": "",
15+
"mosdepth_binary": "",
16+
"pbcpgtools_binary": ""
1317

1418
}
19+

docs/run_on_nci.md

Lines changed: 80 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
- [ONT](#ont)
1111
- [Pacbio HiFi revio](#pacbio-hifi-revio)
1212
- [DeepVariant container (if running DeepVariant)](#deepvariant-container-if-running-deepvariant)
13+
- [mosdepth binary (if running depth calculation)](#mosdepth-binary-if-running-depth-calculation)
14+
- [pb-CpG-tools binary (if processing pacbio data)](#pb-cpg-tools-binary-if-processing-pacbio-data)
1315
- [3. Modify in\_data.csv](#3-modify-in_datacsv)
1416
- [4. Modify nextflow\_pipeface.config](#4-modify-nextflow_pipefaceconfig)
1517
- [5. Modify parameters\_pipeface.json](#5-modify-parameters_pipefacejson)
@@ -27,10 +29,10 @@ cd pipeface
2729

2830
## 2. Get pipeline inputs
2931

30-
*Please keep in mind that, while hs1 is new, smancy and exciting, hg38 is still the latest GRCh assembly and is better annotated by most projects.*
31-
3232
### Reference genome
3333

34+
> **_Note:_** SNP/indel variant annotation is only available for hg38
35+
3436
#### hg38
3537

3638
Get a copy of the hg38 reference genome
@@ -119,13 +121,33 @@ tar -xvf hifi_revio.tar.gz
119121

120122
### DeepVariant container (if running DeepVariant)
121123

124+
> **_Note:_** Running DeepVariant on ONT data assumes r10 data
125+
122126
Get a local copy of the DeepVariant GPU container v1.6.1 (singularity image file)
123127

124128
```bash
125129
module load singularity
126130
singularity pull deepvariant_1.6.1-gpu.sif docker://google/deepvariant:deeptrio-1.6.1-gpu
127131
```
128132

133+
### mosdepth binary (if running depth calculation)
134+
135+
Get a local copy of the mosdepth v0.3.9 binary
136+
137+
```bash
138+
wget https://github.com/brentp/mosdepth/releases/download/v0.3.9/mosdepth -O mosdepth_0.3.9
139+
chmod +x mosdepth_0.3.9
140+
```
141+
142+
### pb-CpG-tools binary (if processing pacbio data)
143+
144+
Get a local copy of the pb-CpG-tools v2.3.2 binary
145+
146+
```bash
147+
wget https://github.com/PacificBiosciences/pb-CpG-tools/releases/download/v2.3.2/pb-CpG-tools-v2.3.2-x86_64-unknown-linux-gnu.tar.gz
148+
tar -xzf pb-CpG-tools-v2.3.2-x86_64-unknown-linux-gnu.tar.gz
149+
```
150+
129151
## 3. Modify in_data.csv
130152

131153
Specify the sample ID, file path to the data, data type, file path to regions of interest bed file (optional) and file path to clair3 model (if running Clair3) for each data to be analysed. Eg:
@@ -171,7 +193,13 @@ Modify access to project specific directories. Eg:
171193
Specify the path to `in_data.csv`. Eg:
172194

173195
```json
174-
"input": "./config/in_data.csv",
196+
"in_data": "./config/in_data.csv",
197+
```
198+
199+
Specify the input data format ('ubam_fastq'). Eg:
200+
201+
```json
202+
"in_data_format": "ubam_fastq",
175203
```
176204

177205
Specify the path to the reference genome and it's index. Eg:
@@ -188,7 +216,7 @@ Specify the path to the reference genome and it's index. Eg:
188216
"ref_index": "./hs1.fa.fai",
189217
```
190218

191-
Optionally specify the path to the tandem repeat bed file. Eg:
219+
Optionally specify the path to the tandem repeat bed file. Set to 'NONE' if not required. Eg:
192220

193221
```json
194222
"tandem_repeat": "./*.trf.bed",
@@ -202,6 +230,7 @@ Optionally specify the path to the tandem repeat bed file. Eg:
202230

203231
Specify the SNP/indel caller to use ('clair3' or 'deepvariant'). Eg:
204232

233+
205234
```json
206235
"snp_indel_caller": "clair3",
207236
```
@@ -212,6 +241,8 @@ Specify the SNP/indel caller to use ('clair3' or 'deepvariant'). Eg:
212241
"snp_indel_caller": "deepvariant",
213242
```
214243

244+
> **_Note:_** Running DeepVariant on ONT data assumes r10 data
245+
215246
Specify the SV caller to use ('sniffles', 'cutesv' or 'both'). Eg:
216247

217248
```json
@@ -242,18 +273,62 @@ Specify whether variant annotation should be carried out ('yes' or 'no'). Eg:
242273
"annotate": "yes",
243274
```
244275

276+
> **_Note:_** SNP/indel variant annotation is only available for hg38
277+
278+
Specify whether alignment depth should be calculated ('yes' or 'no'). Eg:
279+
280+
```json
281+
"calculate_depth": "no",
282+
```
283+
284+
*OR*
285+
286+
```json
287+
"calculate_depth": "yes",
288+
```
289+
245290
Specify the directory in which to write the pipeline outputs (please provide a full path). Eg:
246291

247292
```json
248293
"outdir": "/g/data/ox63/results"
249294
```
250295

251-
Specify the path to DeepVariant GPU container v1.6.1 (singularity image file). Eg:
296+
Specify the path to the DeepVariant GPU container v1.6.1 (singularity image file) (if running DeepVariant). Eg:
252297

253298
```json
254299
"deepvariant_container": "./deepvariant_1.6.1-gpu.sif"
255300
```
256301

302+
*OR*
303+
304+
```json
305+
"deepvariant_container": "NONE"
306+
```
307+
308+
Specify the path to the mosdepth binary (if running depth calculation). Eg:
309+
310+
```json
311+
"mosdepth_binary": "./mosdepth_0.3.9"
312+
```
313+
314+
*OR*
315+
316+
```json
317+
"mosdepth_binary": "NONE"
318+
```
319+
320+
Specify the path to the pb-CpG-tools binary (if processing pacbio data). Eg:
321+
322+
```json
323+
"pbcpgtools_binary": "./pb-CpG-tools-v2.3.2-x86_64-unknown-linux-gnu/"
324+
```
325+
326+
*OR*
327+
328+
```json
329+
"pbcpgtools_binary": "NONE"
330+
```
331+
257332
## 6. Get pipeline dependencies
258333

259334
You may use the centrally installed nextflow environmental module available on NCI to access the nextflow and java dependencies

docs/software_versions.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
Samtools: 1.19
22
Minimap2: 2.28-r1209
3+
mosdepth: 0.3.9
34
Clair3: 1.0.9
45
DeepVariant 1.6.1
56
WhatsHap: 2.3
7+
pb-CpG-tools: 2.3.2
68
Sniffles2: 2.3.3
79
cuteSV: 1.0.13
810
GNU coreutils: 8.30

0 commit comments

Comments
 (0)