Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#226](https://github.com/nf-core/seqinspector/pull/226) Add pipeline level stub tests
- [#228](https://github.com/nf-core/seqinspector/pull/228) Update all modules/subworkflows
- [#234](https://github.com/nf-core/seqinspector/pull/234) Add pipeline level PICARD tests
- [#236](https://github.com/nf-core/seqinspector/pull/236) Added bbmap/clumpify module for FASTQ deduplication and compression

### `Fixed`

Expand Down Expand Up @@ -58,6 +59,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
| multiqcsav | | 0.2.0 |
| samtools | 1.22.1 | 1.23.1 |
| toulligqc | | 2.8.4 |
| bbmap | | 39.18 |
| tar | | 1.34 |

### `Deprecated`
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@

## Pipeline tools

- [BBMap](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/clumpify-guide/)

> Bushnell B. BBTools: a collection of bioinformatics tools for processing short sequencing reads. https://jgi.doe.gov/data-and-tools/software-tools/bbtools/

- [BWAMEM2](https://ieeexplore.ieee.org/abstract/document/8820962)

> Vasimuddin Md, Misra S, Li H, Aluru S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE; 2019:314-324. doi:10.1109/IPDPS.2019.00041
Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ If provided, nf-core/seqinspector can also parse statistics from an Illumina run
| `Subsampling` | [`Seqtk`](https://github.com/lh3/seqtk) | Global subsampling of reads. Only performs subsampling if `--sample_size` parameter is given. | [RNA, DNA] | [N/A] | no |
| `Lint FASTQs` | [`fq`](https://github.com/stjude-rust-labs/fq) | fq filters, generates, subsamples, and validates FASTQ files. [RNA, DNA, synthetic] | [N/A] | yes |
| `Trimming` | [`Fastp`](https://github.com/OpenGene/fastp) | Trimming of reads. Only performs trimming if `--tools` parameter is given. | [RNA, DNA, synthetic] | [N/A] | no |
| `Compression` | [`BBMap Clumpify`](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/clumpify-guide/) | Deduplicate and compress FASTQ files. Only performs clumpify if `--tools` parameter is given. | [RNA, DNA] | [N/A] | no |
| `Indexing, Mapping` | [`Bwamem2`](https://github.com/bwa-mem2/bwa-mem2) | Align reads to reference | [RNA, DNA] | [N/A] | yes |
| `Indexing` | [`SAMtools`](http://github.com/samtools) | Index aligned BAM files, create FASTA index | [DNA] | [N/A] | yes |
| `QC` | [`checkQC`](https://github.com/Molmed/checkQC) | Read QC | [RNA, DNA] | Illumina rundir | no |
Expand Down Expand Up @@ -62,6 +63,7 @@ If provided, nf-core/seqinspector can also parse statistics from an Illumina run
| Tool | Version |
| ----------- | ------- |
| bwamem2 | 2.3 |
| bbmap | 39.18 |
| checkQC | 4.1.0 |
| fq/lint | 0.12.0 |
| fastp | 1.1.0 |
Expand Down Expand Up @@ -130,6 +132,7 @@ We thank the following people for their extensive assistance in the development
- [@ctuni](https://github.com/ctuni)
- [@edmundmiller](https://github.com/edmundmiller)
- [@EliottBo](https://github.com/EliottBo)
- [@erkutilaslan](https://github.com/erkutilaslan)
- [@KarNair](https://github.com/KarNair)
- [@kjellinjonas](https://github.com/kjellinjonas)
- [@mahesh-panchal](https://github.com/mahesh-panchal)
Expand Down
4 changes: 4 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
*/

process {
withName: BBMAP_CLUMPIFY {
ext.args = ''
}

withName: CHECKQC {
tag = { "${run_dir.simpleName}" }
}
Expand Down
13 changes: 13 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and can generat
- [Rundirparser](#rundirparser) - Parse rundir metadata from Illumina runs
- [ToulligQC](#toulligqc) - Raw read QC for Oxford Nanopore runs
- [SeqFu](#seqfu) - Statistics for FASTA or FASTQ files
- [BBMap Clumpify](#bbmap-clumpify) - Deduplication and compression of FASTQ files
- [Seqtk](#seqtk) - Subsample a specific number of reads per sample
- [FastQC](#fastqc) - Raw read QC
- [Sequali](#sequali) - Sequence quality metrics for short and long reads
Expand Down Expand Up @@ -119,6 +120,18 @@ This software is written in Python and developped by the GenomiqueENS core facil
Includes functions to interleave and de-interleave FASTQ files, to rename sequences and to count and print statistics on sequence lengths.
In this pipeline, the `seqfu stats` module is used to produce general quality metrics statistics.

### BBMap Clumpify

<details markdown="1">
<summary>Output files</summary>

- `clumped/[sample_id]/`
- `*.clumped.fastq.gz`: Deduplicated and compressed FASTQ files.

</details>

[BBMap Clumpify](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/clumpify-guide/) removes duplicates from sequencing data and creates smaller, faster gzipped FASTQ files. This is particularly useful for reducing file sizes while maintaining data quality.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[BBMap Clumpify](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/clumpify-guide/) removes duplicates from sequencing data and creates smaller, faster gzipped FASTQ files. This is particularly useful for reducing file sizes while maintaining data quality.
[BBMap Clumpify](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/clumpify-guide/) removes duplicates from sequencing data and creates smaller, faster gzipped FASTQ files. This is particularly useful for reducing file sizes while maintaining data quality. Please note that the resulting files will not be random, so tools that take the first X reads will return a biased sample.


### Seqtk

<details markdown="1">
Expand Down
4 changes: 3 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ Be aware that the default tools will still be run. In order to ONLY run the sele
--tools fastqscreen,rundirparser --tools_bundle null
```

Currently the `tools` param can have the following values: fastqc, fastqscreen, picard_collecthsmetrics, picard_collectmultiplemetrics, rundirparser and seqfu_stats.
Currently the `tools` param can have the following values: bbmap_clumpify, fastqc, fastqscreen, picard_collecthsmetrics, picard_collectmultiplemetrics, rundirparser and seqfu_stats.

#### Skip specific tools

Expand Down Expand Up @@ -197,13 +197,15 @@ Requirements:

Tools:

- bbmap_clumpify
- checkQC
- fastqc
- fastqscreen
- picard_collecthsmetrics
- picard_collectmultiplemetrics
- rundirparser
- seqfu_stats
- sequali
- toulligqc

</details>
Expand Down
6 changes: 6 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"bbmap/clumpify": {
"branch": "master",
"git_sha": "f946047c97ed78d3cdcecdc64169c7f9faef99df",
"installed_by": ["modules"],
"patch": "modules/nf-core/bbmap/clumpify/bbmap-clumpify.diff"
},
"bwamem2/index": {
"branch": "master",
"git_sha": "6d46786420b4d7bc88eba026eb389c0c5535d120",
Expand Down
19 changes: 19 additions & 0 deletions modules/nf-core/bbmap/clumpify/bbmap-clumpify.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions modules/nf-core/bbmap/clumpify/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

44 changes: 44 additions & 0 deletions modules/nf-core/bbmap/clumpify/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

79 changes: 79 additions & 0 deletions modules/nf-core/bbmap/clumpify/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"tools": {
"type": "string",
"description": "Comma-separated string of tools to run",
"pattern": "^((checkqc|fastp|fastqc|fastqe|fastqscreen|fq_lint|kraken2|multiqcsav|picard_collecthsmetrics|picard_collectmultiplemetrics|rundirparser|seqfu_stats|sequali|toulligqc)?,?)*(?<!,)$",
"pattern": "^((bbmap_clumpify|checkqc|fastp|fastqc|fastqe|fastqscreen|fq_lint|kraken2|multiqcsav|picard_collecthsmetrics|picard_collectmultiplemetrics|rundirparser|seqfu_stats|sequali|toulligqc)?,?)*(?<!,)$",
"fa_icon": "fas fa-sort-amount-asc"
},
"tools_bundle": {
Expand All @@ -64,7 +64,7 @@
"skip_tools": {
"type": "string",
"description": "Comma-separated string of tools to skip - overrides any other means of tools selection",
"pattern": "^((checkqc|fastp|fastqc|fastqe|fastqscreen|fq_lint|kraken2|multiqcsav|picard_collecthsmetrics|picard_collectmultiplemetrics|rundirparser|seqfu_stats|sequali|toulligqc)?,?)*(?<!,)$",
"pattern": "^((bbmap_clumpify|checkqc|fastp|fastqc|fastqe|fastqscreen|fq_lint|kraken2|multiqcsav|picard_collecthsmetrics|picard_collectmultiplemetrics|rundirparser|seqfu_stats|sequali|toulligqc)?,?)*(?<!,)$",
"fa_icon": "fas fa-window-close "
}
}
Expand Down
2 changes: 1 addition & 1 deletion ro-crate-metadata.json

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions subworkflows/local/utils_nfcore_seqinspector_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@ def genomeExistsError() {
def toolCitationText() {
def citation_text = [
"Tools used in the workflow included:",
"BBMap (Bushnell 2014),",
"BWAMEM2 (Vasimuddin et al. 2019)",
"FastQC (Andrews 2010),",
"FastQ Screen (Wingett & Andrews 2018)",
Expand All @@ -278,6 +279,7 @@ def toolCitationText() {

def toolBibliographyText() {
def reference_text = [
"<li>Bushnell B. BBTools: a collection of bioinformatics tools for processing short sequencing reads. https://jgi.doe.gov/data-and-tools/software-tools/bbtools/.</li>",
"<li>Vasimuddin Md., Misra S., Li H, & Aluru S. (2019). Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems.</li>",
"<li>Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.</li>",
"<li>Wingett SW., & Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018 Aug 24 [revised 2018 Jan 1];7:1338. doi: 10.12688/f1000research.15931.2. eCollection</li>",
Expand Down Expand Up @@ -343,6 +345,7 @@ def defineToolsList(input_bundle, input_tools, input_skip) {
// please update the docs/usage.md section about tools selection when adding new tools here!

if ('all' in bundle_list) {
tools_list << 'bbmap_clumpify'
tools_list << 'checkqc'
tools_list << 'fastqc'
tools_list << 'fastqe'
Expand Down
29 changes: 29 additions & 0 deletions subworkflows/nf-core/utils_nfcore_pipeline/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions subworkflows/nf-core/utils_nfcore_pipeline/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions tests/.nftignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ multiqc/{global_report,group_reports/*}/multiqc_plots/{svg,pdf,png}/*.{svg,pdf,p
multiqc/{global_report,group_reports/*}/multiqc_report.html
pipeline_info/*.{html,json,txt,yml}
references/R64-1-1.dict
reports/bbmap/*/*.clumpify.log
reports/fastp/*/*fastp.*
reports/fastqc/*/*_fastqc.{html,zip}
reports/fastqscreen/*/*_screen.html
Expand Down
Loading
Loading