Skip to content

Commit 84da5c0

Browse files
authored
Merge pull request #819 from nf-core/release300
Sync dev with fixes in release 300 branch
2 parents 37e1f7c + 83dffc7 commit 84da5c0

35 files changed

Lines changed: 1015 additions & 333 deletions

File tree

CHANGELOG.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2828
- Add peddy --sites hg38 argument when running with GRCh38 [#799](https://github.com/nf-core/raredisease/pull/799)
2929
- Saltshaker for downstream processing of mitochondrial SV calls from MitoSAlt [#775](https://github.com/nf-core/raredisease/pull/775)
3030
- Env variable NXF_SINGULARITY_NEW_PID_NAMESPACE = false to accommodate hisat2 running with latest Nextflow and Singularity [#775](https://github.com/nf-core/raredisease/pull/775)
31+
- Parameter `exclude_alt` to filter alignments to alt/unplaced contigs after alignment using samtools view, retaining only primary chromosomes (GRCh37: 1-22,X,Y,MT / GRCh38: chr1-chr22,chrX,chrY,chrM). Note that enabling this will restrict variant calling to these chromosomes [#803](https://github.com/nf-core/raredisease/pull/803)]
32+
- Parameters `save_all_mapped_as_cram` and `save_noalt_mapped_as_cram` to replace `save_mapped_as_cram`, allowing independent control over publishing unfiltered and alt-filtered alignment files as CRAM [#807](https://github.com/nf-core/raredisease/pull/807)
3133

3234
### `Changed`
3335

36+
- Sort parameters of `CALL_STRUCTURAL_VARIANTS` and `CALL_SV_MANTA` alphabetically [[#](https://github.com/nf-core/raredisease/pull/)]
3437
- Use distinct output filenames for bcfools (in call_mobile_elements subworkflow) and svdb (in call_sv_tiddit subworkflow) [#716](https://github.com/nf-core/raredisease/pull/716)
3538
- Use nf-core's most severe consequence & pli scripts instead of local ones [#732](https://github.com/nf-core/raredisease/pull/732)
3639
- Use nf-core's VCF_FILTER_BCFTOOLS_ENSEMBLVEP subworkflow to generate clinical set instead of a local subworkflow [#727](https://github.com/nf-core/raredisease/pull/727)
@@ -56,6 +59,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
5659
- Run MitoSAlt.pl from bin rather than within container [#775](https://github.com/nf-core/raredisease/pull/775)
5760
- Include mitochonrdial SV calls in combined SV vcf, change call_sv output directory structure to remove mitochondria/ and genome/ [#775](https://github.com/nf-core/raredisease/pull/775)
5861
- Remove Qualimap and Haplogrep3 as they were made redundant by Picard and VerifyBamID2 [#801](https://github.com/nf-core/raredisease/pull/801)
62+
- Remove env variable NXF_SINGULARITY_NEW_PID_NAMESPACE from the config since this has to be set outside the subworkflow [#804](https://github.com/nf-core/raredisease/pull/804)
63+
- Run UPD_SITES, UPD_REGIONS, and CHROMOGRAPH for UPD only when analysis type is WGS [#806](https://github.com/nf-core/raredisease/pull/806)
64+
- Change saltshaker classification output from txt to html [#808](https://github.com/nf-core/raredisease/pull/808)
5965

6066
### `Fixed`
6167

@@ -67,12 +73,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
6773

6874
### Parameters
6975

70-
| Old parameter | New parameter |
71-
| --------------- | ----------------------- |
72-
| | sambamba_regions |
73-
| bwa_as_fallback | |
74-
| | multiqc_samples |
75-
| | homoplasmy_af_threshold |
76+
| Old parameter | New parameter |
77+
| ------------------- | ------------------------- |
78+
| | sambamba_regions |
79+
| bwa_as_fallback | |
80+
| | multiqc_samples |
81+
| | homoplasmy_af_threshold |
82+
| | exclude_alt |
83+
| save_mapped_as_cram | |
84+
| | save_all_mapped_as_cram |
85+
| | save_noalt_mapped_as_cram |
7686

7787
### Tool updates
7888

conf/modules/align.config

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,23 @@ process{
2121
].join(' ').trim() }
2222
}
2323

24-
withName: '.*ALIGN:SAMTOOLS_VIEW' {
24+
withName: '.*ALIGN:CONVERTTOCRAM_ALTFILTERED' {
2525
ext.args = { '--output-fmt cram' }
26-
ext.prefix = { "${meta.id}_sort_md" }
26+
ext.prefix = { "${meta.id}_sorted_md_primary_contigs" }
27+
}
28+
29+
withName: '.*ALIGN:CONVERTTOCRAM_UNFILTERED' {
30+
ext.args = { '--output-fmt cram' }
31+
ext.prefix = { "${meta.id}_sorted_md" }
32+
}
33+
34+
withName: '.*ALIGN:SAMTOOLS_VIEW_EXCLUDE_ALT' {
35+
ext.args = { '--fetch-pairs' }
36+
ext.args2 = {
37+
params.genome == 'GRCh38'
38+
? ((1..22).collect { n -> "chr${n}" } + ['chrX', 'chrY', 'chrM']).join(' ')
39+
: ((1..22).collect { n -> "${n}" } + ['X', 'Y', 'MT' ]).join(' ')
40+
}
41+
ext.prefix = { "${meta.id}_sorted_md_primary_contigs" }
2742
}
2843
}

conf/modules/call_snv_deepvariant.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ process {
1919

2020
withName: '.*CALL_SNV_DEEPVARIANT:DEEPVARIANT' {
2121
ext.args = { [
22+
"--vcf_stats_report=true",
2223
"--model_type=${params.analysis_type.toUpperCase()}",
2324
meta.sex == 1 ? params.genome == 'GRCh37' ? '--haploid_contigs="X,Y"' : '--haploid_contigs="chrX,chrY"' : ''
2425
].join(' ') }

conf/modules/generate_cytosure_files.config

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ process {
4242
ext.args = { [
4343
meta.sex == 1 ? '--sex male' : '--sex female',
4444
'--size 5000',
45-
'--maxbnd 5000'
45+
'--maxbnd 5000',
46+
params.genome.equals("GRCh38") ? "--genome 38" : ""
4647
].join(' ') }
4748
ext.prefix = { "${meta.custid}" ? "${meta.custid}" : "${meta.id}" }
4849
}

docs/output.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
6060
- [Mitochondrial analysis](#mitochondrial-analysis)
6161
- [Alignment and variant calling](#alignment-and-variant-calling)
6262
- [MT deletion script](#mt-deletion-script)
63-
- [eKLIPse](#eklipse)
63+
- [MitoSAlt](#mitosalt)
64+
- [saltshaker](#saltshaker)
6465
- [Annotation](#annotation)
6566
- [vcfanno](#vcfanno-1)
6667
- [CADD](#cadd-1)
@@ -100,26 +101,28 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
100101

101102
##### Picard's MarkDuplicates
102103

103-
[Picard MarkDuplicates](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates) is used for marking PCR duplicates that can occur during library amplification. This is essential as the presence of such duplicates results in false inflated coverages, which in turn can lead to overly-confident genotyping calls during variant calling. Only reads aligned by Bwa-mem2, bwameme and bwa are processed by this tool. By default, alignment files are published in bam format. If you would like to store cram files instead, set `--save_mapped_as_cram` to true.
104+
[Picard MarkDuplicates](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates) is used for marking PCR duplicates that can occur during library amplification. This is essential as the presence of such duplicates results in false inflated coverages, which in turn can lead to overly-confident genotyping calls during variant calling. Only reads aligned by Bwa-mem2, bwameme and bwa are processed by this tool. By default, alignment files are published in bam format. To publish cram files instead, use `--save_all_mapped_as_cram` for the full (unfiltered) alignment, or `--save_noalt_mapped_as_cram` for the alt-filtered alignment (requires `--exclude_alt`).
104105

105106
<details markdown="1">
106107
<summary>Output files from Alignment</summary>
107108

108109
- `{outputdir}/alignment/`
109-
- `*.bam|*.cram`: Alignment file in bam/cram format.
110+
- `*_sorted_md.bam|*_sorted_md.cram`: Full (unfiltered) alignment file. Published as bam by default, or as cram when `--save_all_mapped_as_cram` is set.
111+
- `*_sorted_md_primary_contigs.cram`: Alt-filtered alignment file in cram format. Published when `--save_noalt_mapped_as_cram` is set (requires `--exclude_alt`). Contains only primary chromosomes (GRCh37: 1-22,X,Y,MT / GRCh38: chr1-chr22,chrX,chrY,chrM).
110112
- `*.bai|*.crai`: Index of the corresponding bam/cram file.
111113
- `*.txt`: Text file containing the dedup metrics.
112114
</details>
113115

114116
##### Sentieon Dedup
115117

116-
[Sentieon Dedup](https://support.sentieon.com/manual/DNAseq_usage/dnaseq/#remove-or-mark-duplicates) is the algorithm used by Sentieon's driver to remove duplicate reads. Only reads aligned by Sentieon's implementation of bwa are processed by this algorithm. By default, alignment files are published in bam format. If you would like to store cram files instead, set `--save_mapped_as_cram` to true.
118+
[Sentieon Dedup](https://support.sentieon.com/manual/DNAseq_usage/dnaseq/#remove-or-mark-duplicates) is the algorithm used by Sentieon's driver to remove duplicate reads. Only reads aligned by Sentieon's implementation of bwa are processed by this algorithm. By default, alignment files are published in bam format. To publish cram files instead, use `--save_all_mapped_as_cram` for the full (unfiltered) alignment, or `--save_noalt_mapped_as_cram` for the alt-filtered alignment (requires `--exclude_alt`).
117119

118120
<details markdown="1">
119121
<summary>Output files from Alignment</summary>
120122

121123
- `{outputdir}/alignment/`
122-
- `*.bam|*.cram`: Alignment file in bam/cram format.
124+
- `*_sorted_md.bam|*_sorted_md.cram`: Full (unfiltered) alignment file. Published as bam by default, or as cram when `--save_all_mapped_as_cram` is set.
125+
- `*_sorted_md_primary_contigs.cram`: Alt-filtered alignment file in cram format. Published when `--save_noalt_mapped_as_cram` is set (requires `--exclude_alt`). Contains only primary chromosomes (GRCh37: 1-22,X,Y,MT / GRCh38: chr1-chr22,chrX,chrY,chrM).
123126
- `*.bai|*.crai`: Index of the corresponding bam/cram file.
124127
- `*.metrics`: Text file containing the dedup metrics.
125128
</details>
@@ -467,7 +470,7 @@ The pipeline for mitochondrial variant discovery, using Mutect2, uses a high sen
467470
[Saltshaker](https://github.com/aksenia/saltshaker) allows for downstream clustering and classification of mtDNA strucutral variants. Called variants are combined with structural variants called in the nuclear genome.
468471

469472
- `call_sv`
470-
- `<sample_id>.saltshaker_classify.txt`: report containing case-level classification of mitochondrial deletions.
473+
- `<sample_id>.saltshaker_classify.html`: report containing case-level classification of mitochondrial deletions.
471474
- `<sample_id>.saltshaker.png`: circos plot.
472475

473476
#### Annotation

docs/usage.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,7 @@ The mandatory and optional parameters for each category are tabulated below.
210210
| | min_trimmed_length<sup>6</sup> |
211211
| | extract_alignments |
212212
| | restrict_to_contigs<sup>7</sup> |
213+
| | exclude_alt<sup>8</sup> |
213214

214215
<sup>1</sup>Default value is bwamem2. Other alternatives are bwa, bwameme and sentieon (requires valid Sentieon license ).<br />
215216
<sup>2</sup>Analysis set reference genome in fasta format, first 25 contigs need to be chromosome 1-22, X, Y and the mitochondria.<br />
@@ -218,6 +219,7 @@ The mandatory and optional parameters for each category are tabulated below.
218219
<sup>5</sup>Used only by Sentieon.<br />
219220
<sup>6</sup>Default value is 40. Used only by fastp.<br />
220221
<sup>7</sup>Used to limit your analysis to specific contigs. Can be used to remove alignments to unplaced contigs to minimize potential errors. This parameter should be used in conjunction with the `extract_alignments` parameter.<br />
222+
<sup>8</sup>When set to true, alignments to alt/unplaced contigs are removed after alignment using samtools view, retaining only primary chromosomes (GRCh37: 1-22,X,Y,MT / GRCh38: chr1-chr22,chrX,chrY,chrM). Note that this will affect all downstream variant calling, as variants will only be called on these primary chromosomes.<br />
221223

222224
##### 2. QC stats from the alignment files
223225

main.nf

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ workflow NFCORE_RAREDISEASE {
5151
val_cadd_resources
5252
val_call_interval
5353
val_concatenate_snv_calls
54+
val_exclude_alt
5455
val_extract_alignments
5556
val_fai
5657
val_fasta
@@ -109,7 +110,8 @@ workflow NFCORE_RAREDISEASE {
109110
val_sambamba_regions
110111
val_sample_id_map
111112
val_samtools_sort_threads
112-
val_save_mapped_as_cram
113+
val_save_all_mapped_as_cram
114+
val_save_noalt_mapped_as_cram
113115
val_save_reference
114116
val_score_config_mt
115117
val_score_config_snv
@@ -250,7 +252,7 @@ workflow NFCORE_RAREDISEASE {
250252
// Using channelFromSamplesheet helper. Returns either an empty channel or validated channel.
251253
ch_me_references = channelFromSamplesheet(val_mobile_element_references, "${projectDir}/assets/mobile_element_references_schema.json", false)
252254
ch_me_svdb_resources = channelFromSamplesheet(val_mobile_element_svdb_annotations, "${projectDir}/assets/svdb_query_vcf_schema.json", false)
253-
ch_sample_id_map = channelFromSamplesheet(val_sample_id_map, "${projectDir}/assets/sample_id_map.json")
255+
ch_sample_id_map = channelFromSamplesheet(val_sample_id_map, "${projectDir}/assets/sample_id_map.json", false)
254256
ch_svdb_bedpedbs = channelFromSamplesheet(val_svdb_query_bedpedbs, "${projectDir}/assets/svdb_query_bedpe_schema.json", false)
255257
ch_svdb_dbs = channelFromSamplesheet(val_svdb_query_dbs, "${projectDir}/assets/svdb_query_vcf_schema.json", false)
256258

@@ -330,6 +332,13 @@ workflow NFCORE_RAREDISEASE {
330332
skip_sv_calling = parseSkipList(val_skip_subworkflows, 'sv_calling')
331333
skip_generate_clinical_set = parseSkipList(val_skip_subworkflows, 'generate_clinical_set')
332334

335+
//
336+
// Validate parameter combinations
337+
//
338+
if (val_save_noalt_mapped_as_cram && !val_exclude_alt) {
339+
error("save_noalt_mapped_as_cram requires exclude_alt to be set to true")
340+
}
341+
333342
//
334343
// SV caller priority
335344
//
@@ -469,6 +478,7 @@ workflow NFCORE_RAREDISEASE {
469478
val_analysis_type,
470479
val_cadd_resources,
471480
val_concatenate_snv_calls,
481+
val_exclude_alt,
472482
val_extract_alignments,
473483
val_genome,
474484
val_heavy_strand_origin_end,
@@ -502,7 +512,8 @@ workflow NFCORE_RAREDISEASE {
502512
val_run_rtgvcfeval,
503513
val_sample_id_map,
504514
val_samtools_sort_threads,
505-
val_save_mapped_as_cram,
515+
val_save_all_mapped_as_cram,
516+
val_save_noalt_mapped_as_cram,
506517
val_svdb_query_bedpedbs,
507518
val_svdb_query_dbs,
508519
val_target_bed,
@@ -554,6 +565,7 @@ workflow {
554565
params.cadd_resources,
555566
params.call_interval,
556567
params.concatenate_snv_calls,
568+
params.exclude_alt,
557569
params.extract_alignments,
558570
params.fai,
559571
params.fasta,
@@ -612,7 +624,8 @@ workflow {
612624
params.sambamba_regions,
613625
params.sample_id_map,
614626
params.samtools_sort_threads,
615-
params.save_mapped_as_cram,
627+
params.save_all_mapped_as_cram,
628+
params.save_noalt_mapped_as_cram,
616629
params.save_reference,
617630
params.score_config_mt,
618631
params.score_config_snv,
@@ -663,7 +676,7 @@ workflow {
663676

664677
output {
665678
subworkflow_results {
666-
path { destination, value -> destination }
679+
path { destination, _value -> destination }
667680
}
668681
}
669682

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
process SALTSHAKER_TO_HTML {
2+
tag "$meta.id"
3+
label "process_low"
4+
5+
input:
6+
tuple val(meta), path(classify)
7+
8+
output:
9+
tuple val(meta), path("*.html"), emit: classify_html
10+
11+
script:
12+
"""
13+
python3 << 'EOF'
14+
import re
15+
def saltshaker_txt_to_html(txt_file):
16+
with open(txt_file) as f:
17+
content = f.read()
18+
html_content = re.sub(r'\\n', '<br>', content)
19+
return html_content
20+
21+
html = saltshaker_txt_to_html("${classify}")
22+
with open("${classify.baseName}.html", 'w') as f:
23+
f.write('<html><body>')
24+
f.write(f'<pre style="padding: 15px; border-radius: 5px; overflow-x: auto;">{html}</pre>')
25+
f.write('</body></html>')
26+
EOF
27+
"""
28+
}

nextflow.config

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,15 @@ params {
2929
analysis_type = 'wgs'
3030
bait_padding = 100
3131
concatenate_snv_calls = false
32+
exclude_alt = false
3233
extract_alignments = false
3334
mt_subsample_approach = 'reads'
3435
mt_subsample_reads = 18000
3536
restrict_to_contigs = null
3637
run_mt_for_wes = false
3738
run_rtgvcfeval = false
38-
save_mapped_as_cram = false
39+
save_all_mapped_as_cram = false
40+
save_noalt_mapped_as_cram = false
3941
scatter_count = 20
4042
skip_tools = null
4143
skip_subworkflows = null
@@ -58,6 +60,7 @@ params {
5860
gnomad_af = null
5961
gnomad_af_idx = null
6062
hisat2 = null
63+
hisat2_build_memory = null
6164
intervals_wgs = null
6265
intervals_y = null
6366
known_dbsnp = null
@@ -121,7 +124,7 @@ params {
121124
vep_cache_version = 112
122125

123126
// sentieon Defaults
124-
ml_model = ''
127+
ml_model = null
125128

126129
// Dnascope SNV calling
127130
sentieon_dnascope_pcr_indel_model = 'CONSERVATIVE'
@@ -344,7 +347,6 @@ env {
344347
R_PROFILE_USER = "/.Rprofile"
345348
R_ENVIRON_USER = "/.Renviron"
346349
JULIA_DEPOT_PATH = "/usr/local/share/julia"
347-
NXF_SINGULARITY_NEW_PID_NAMESPACE = false
348350
}
349351

350352
// Set bash options

nextflow_schema.json

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -536,6 +536,11 @@
536536
"description": "Specifies whether to generate a concatenated VCF file containing both nuclear & mitochondrial snv calls",
537537
"fa_icon": "fas fa-toggle-on"
538538
},
539+
"exclude_alt": {
540+
"type": "boolean",
541+
"description": "After aligning the reads to a reference, remove alignments to alt contigs using samtools view, retaining only primary chromosomes (GRCh37: 1-22,X,Y,MT / GRCh38: chr1-chr22,chrX,chrY,chrM).",
542+
"fa_icon": "fas fa-toggle-on"
543+
},
539544
"extract_alignments": {
540545
"type": "boolean",
541546
"description": "After aligning the reads to a reference, extract alignments from specific regions/contigs and restrict the analysis to those regions/contigs.",
@@ -576,9 +581,14 @@
576581
"description": "Specifies whether to run rtgtools' vcfeval",
577582
"fa_icon": "fas fa-toggle-on"
578583
},
579-
"save_mapped_as_cram": {
584+
"save_all_mapped_as_cram": {
585+
"type": "boolean",
586+
"description": "Specifies whether to generate and publish all (unfiltered) alignment files as cram instead of bam",
587+
"fa_icon": "fas fa-toggle-on"
588+
},
589+
"save_noalt_mapped_as_cram": {
580590
"type": "boolean",
581-
"description": "Specifies whether to generate and publish alignment files as cram instead of bam",
591+
"description": "Specifies whether to generate and publish alt-filtered alignment files as cram instead of bam. Requires exclude_alt to be set to true.",
582592
"fa_icon": "fas fa-toggle-on"
583593
},
584594
"scatter_count": {
@@ -756,6 +766,11 @@
756766
"hisat2": {
757767
"type": "string"
758768
},
769+
"hisat2_build_memory": {
770+
"type": "string",
771+
"description": "Minimum memory required to build HISAT2 index with splice sites and exons. If available memory is below this threshold, a simpler index is built without splice sites.",
772+
"fa_icon": "fas fa-memory"
773+
},
759774
"mitosalt_breakspan": {
760775
"type": "integer",
761776
"default": 15,

0 commit comments

Comments
 (0)