Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,19 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- Added GATK contamination check for WES/WGS samples as complement to VerifyBamID2
- New parameters: `run_contamination`, `contamination_sites`, `contamination_sites_tbi`
- CONTAMINATION_CHECK subworkflow using GATK4 GetPileupSummaries and CalculateContamination
- PARSE_CONTAMINATION module for MultiQC integration
- Contamination results displayed in MultiQC with color-coded thresholds
Comment on lines +8 to +14
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add your log entries to 2.7.0dev since its the one in development. And don't forget to link the PR to your entries ;)

Also, we have a separate table for parameters and new tools under the ##Fixed section of 2.7.0dev, so you can add that information there.


### Changed

- Updated MultiQC configuration to include GATK contamination metrics
## 2.7.0dev - Semiautomatix [xxxx-xx-xx]

### `Added`
Expand Down
50 changes: 46 additions & 4 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,16 @@ report_comment: >
target="_blank">nf-core/raredisease</a> analysis pipeline. For information about
how to interpret these results, please see the <a href="https://nf-co.re/raredisease/dev/docs/output"
target="_blank">documentation</a>.

report_section_order:
"nf-core-raredisease-methods-description":
order: -1000
software_versions:
order: -1001
"nf-core-raredisease-summary":
order: -1002

gatk_contamination:
order: 1050
export_plots: true

run_modules:
- fastqc
- fastp
Expand All @@ -28,7 +27,6 @@ run_modules:
- peddy
- verifybamid
- custom_content

module_order:
- fastqc:
name: "FastQC"
Expand All @@ -51,8 +49,52 @@ module_order:
- verifybamid:
name: "VerifyBamID2"

# Custom content configuration for GATK contamination
custom_data:
gatk_contamination:
id: "gatk_contamination"
section_name: "GATK Contamination"
description: "Sample contamination estimates from GATK CalculateContamination based on common variant allele frequencies"
plot_type: "generalstats"
pconfig:
contamination_pct:
title: "Contamination"
description: "Estimated sample contamination percentage"
max: 10
min: 0
scale: "RdYlGn-rev"
suffix: "%"
format: "{:,.2f}"
shared_key: "contamination"

# Make contamination visible in general stats by default
table_columns_visible:
gatk_contamination:
contamination_pct: true

# Color coding thresholds for contamination
table_cond_formatting_rules:
contamination_pct:
pass:
- s_eq: "pass"
- lt: 2.0
warn:
- s_eq: "warn"
- lt: 5.0
- gte: 2.0
fail:
- s_eq: "fail"
- gte: 5.0

# Add to General Statistics table configuration
table_columns_placement:
gatk_contamination:
contamination_pct: 900

extra_fn_clean_exts:
- "_sorted_md"
- "_contamination"
- "_pileups"
- type: regex
pattern: "_LNUMBER[0-9]{1,}"

Expand Down
4 changes: 2 additions & 2 deletions conf/modules/align_bwa_bwamem2_bwameme.config
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ process {
ext.args = "--TMP_DIR ."
ext.prefix = { "${meta.id}_sorted_md" }
publishDir = [
enabled: !params.save_mapped_as_cram,
enabled: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why do you want to change this?

path: { "${params.outdir}/alignment" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
Expand All @@ -67,7 +67,7 @@ process {

withName: '.*ALIGN:ALIGN_BWA_BWAMEM2_BWAMEME:SAMTOOLS_INDEX_MARKDUP' {
publishDir = [
enabled: !params.save_mapped_as_cram,
enabled: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

path: { "${params.outdir}/alignment" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
Expand Down
46 changes: 46 additions & 0 deletions conf/modules/contamination.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for GATK contamination checking modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process {

//
// GATK GetPileupSummaries
//
withName: '.*:CONTAMINATION_CHECK:GATK4_GETPILEUPSUMMARIES' {
ext.args = ''
ext.prefix = { "${meta.id}_pileups" }
publishDir = [
path: { "${params.outdir}/qc/contamination/pileups" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

//
// GATK CalculateContamination
//
withName: '.*:CONTAMINATION_CHECK:GATK4_CALCULATECONTAMINATION' {
ext.args = ''
ext.prefix = { "${meta.id}_contamination" }
publishDir = [
path: { "${params.outdir}/qc/contamination" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

//
// Parse contamination results for MultiQC
//
withName: '.*:RAREDISEASE:PARSE_CONTAMINATION' {
ext.prefix = { "${meta.id}_contamination" }
publishDir = [
path: { "${params.outdir}/multiqc" },
mode: params.publish_dir_mode,
pattern: '*_mqc.tsv'
]
}
}
8 changes: 8 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,14 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"gatk4/calculatecontamination": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"installed_by": ["modules"]
},
"gatk4/getpileupsummaries": {
"branch": "master",
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
"gawk": {
"branch": "master",
"git_sha": "5ee4d69ed992c3ce81cfbbdd0bef932fcb81c75a",
Expand Down
7 changes: 7 additions & 0 deletions modules/local/parse_contamination/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: parse_contamination
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- python=3.11
61 changes: 61 additions & 0 deletions modules/local/parse_contamination/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
process PARSE_CONTAMINATION {
tag "$meta.id"
label 'process_single'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/python:3.11' :
'biocontainers/python:3.11' }"

input:
tuple val(meta), path(contamination_table)

output:
tuple val(meta), path("*_contamination_mqc.tsv"), emit: mqc_table
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a module binary? We have had issues in the past with some systems interpreting indents differently.


import csv

# Read GATK contamination table
with open("${contamination_table}", 'r') as f:
lines = f.readlines()
# Skip header, get contamination value
data_line = lines[1].strip().split('\\t')
sample = data_line[0]
contamination = float(data_line[1])
contamination_pct = contamination * 100

# Write MultiQC custom content file
with open("${prefix}_contamination_mqc.tsv", 'w') as out:
# Header with MultiQC configuration
out.write("# id: 'gatk_contamination'\\n")
out.write("# section_name: 'GATK Contamination'\\n")
out.write("# description: 'Sample contamination estimates from GATK CalculateContamination'\\n")
out.write("# plot_type: 'generalstats'\\n")
out.write("# pconfig:\\n")
out.write("# contamination_pct:\\n")
out.write("# title: 'Contamination'\\n")
out.write("# description: 'Estimated sample contamination percentage'\\n")
out.write("# max: 10\\n")
out.write("# min: 0\\n")
out.write("# scale: 'RdYlGn-rev'\\n")
out.write("# suffix: '%'\\n")
out.write("# format: '{:,.2f}'\\n")
# Data
out.write("Sample\\tcontamination_pct\\n")
out.write(f"${meta.id}\\t{contamination_pct:.4f}\\n")

# Create versions file
with open("versions.yml", 'w') as v:
v.write('"${task.process}":\\n')
v.write(' python: "3.11"\\n')
"""
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you add a stub section?

29 changes: 29 additions & 0 deletions modules/local/parse_contamination/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: parse_contamination
description: Parse GATK CalculateContamination output for MultiQC
keywords:
- contamination
- MultiQC
- parsing
tools:
- python:
description: Python programming language
homepage: https://www.python.org/
input:
- meta:
type: map
description: Groovy Map containing sample information
- contamination_table:
type: file
description: GATK CalculateContamination output table
pattern: "*.contamination.table"
output:
- mqc_table:
type: file
description: MultiQC custom content table
pattern: "*_contamination_mqc.tsv"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "56053vujinovic"
10 changes: 10 additions & 0 deletions modules/nf-core/gatk4/calculatecontamination/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 59 additions & 0 deletions modules/nf-core/gatk4/calculatecontamination/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading