Skip to content

Commit 06c5ab7

Browse files
authored
Merge pull request #861 from jeffe107/dev
BIgMAG compatibility.
2 parents 877848f + 7d1a105 commit 06c5ab7

17 files changed

Lines changed: 177 additions & 4 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
- [#905](https://github.com/nf-core/mag/pull/905) - Add nf-test snapshot for `test_assembly_input` profile (by @dialvarezs)
1111
- [#930](https://github.com/nf-core/mag/pull/930) - Add binner SemiBin2 (by @d4straub)
12+
- [#861](https://github.com/nf-core/mag/pull/861) - Added `--generate_bigmag_file` to execute the bigmag workflow that generates the file to be used as input for [BIgMAG](https://github.com/jeffe107/BIgMAG) (added by @jeffe107)
1213

1314
### `Changed`
1415

CITATIONS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,10 @@
9494

9595
> Orakov, A., Fullam, A., Coelho, A. P., Khedkar, S., Szklarczyk, D., Mende, D. R., Schmidt, T. S. B., and Bork, P.. 2021. “GUNC: Detection of Chimerism and Contamination in Prokaryotic Genomes.” Genome Biology 22 (1): 178. doi: 10.1186/s13059-021-02393-0.
9696
97+
- [BIgMAG](https://doi.org/10.12688/f1000research.152290.2)
98+
99+
> Yepes-García, J., Falquet, L. (2024). Metagenome quality metrics and taxonomical annotation visualization through the integration of MAGFlow and BIgMAG. F1000Research 13:640. doi.org/10.12688/f1000research.152290.2
100+
97101
- [MaxBin2](https://doi.org/10.1093/bioinformatics/btv638)
98102

99103
> Yu-Wei, W., Simmons, B. A. & Singer, S. W. (2015) MaxBin 2.0: An Automated Binning Algorithm to Recover Genomes from Multiple Metagenomic Datasets. Bioinformatics 32 (4): 605–7. doi: 10.1093/bioinformatics/btv638.

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,15 @@
1616
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
1717
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
1818
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
19+
1920
[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/mag)
2021

2122
[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23mag-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/mag)[![Follow on Bluesky](https://img.shields.io/badge/bluesky-%40nf__core-1185fe?labelColor=000000&logo=bluesky)](https://bsky.app/profile/nf-co.re)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
2223

2324
![HiRSE Code Promo Badge](https://img.shields.io/badge/Promo-8db427?label=HiRSE&labelColor=005aa0&link=https%3A%2F%2Fgo.fzj.de%2FCodePromo)
2425

26+
[![Static Badge](https://img.shields.io/badge/%F0%9F%8D%94%20%20BIgMAG-compatible-%2324B064)](https://github.com/jeffe107/BIgMAG)
27+
2528
## Introduction
2629

2730
**nf-core/mag** is a bioinformatics best-practise analysis pipeline for assembly, binning and annotation of metagenomes.
@@ -97,6 +100,7 @@ Other code contributors include:
97100
- [Greg Fedewa](https://github.com/harper357)
98101
- [Vini Salazar](https://github.com/vinisalazar)
99102
- [Alex Caswell](https://github.com/AlexHoratio)
103+
- [Jeferyd Yepes](https://github.com/jeffe107)
100104

101105
Long read processing was inspired by [caspargross/HybridAssembly](https://github.com/caspargross/HybridAssembly) written by Caspar Gross [@caspargross](https://github.com/caspargross)
102106

bin/bigmag_summary.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
#!/usr/bin/env python
2+
3+
## Originally written by Jeferyd Yepes and released under the MIT license.
4+
## See git repository (https://github.com/nf-core/mag) for full license text.
5+
6+
import pandas as pd
7+
import re
8+
import argparse
9+
import sys
10+
import warnings
11+
12+
def parse_args(args=None):
13+
parser = argparse.ArgumentParser()
14+
parser.add_argument("-s", "--summary", metavar="FILE", help="Pipeline summary file.")
15+
parser.add_argument("-g", "--gunc_summary", metavar="FILE", help="GUNC summary file.")
16+
17+
parser.add_argument(
18+
"-o",
19+
"--out",
20+
required=True,
21+
metavar="FILE",
22+
type=argparse.FileType("w"),
23+
help="Output file containing final bigmag summary.",
24+
)
25+
return parser.parse_args(args)
26+
27+
28+
def main(args=None):
29+
args = parse_args(args)
30+
31+
if (
32+
not args.summary
33+
and not args.gunc_summary
34+
):
35+
sys.exit(
36+
"No summary specified! "
37+
"Please specify the pipeline summary and the GUNC summary."
38+
)
39+
40+
df_summary = pd.read_csv(args.summary, sep='\t')
41+
df_summary.columns = df_summary.columns.str.replace(r'(_busco|_checkm2|_checkm|_gtdbtk|_gunc|_quast)$', '', regex=True)
42+
for i in range(len(df_summary["bin"])):
43+
name = df_summary["bin"][i]
44+
name = re.sub(r'\.(fa|fasta)(\..*)?$', '', name)
45+
df_summary.at[i,"bin"] = name
46+
df_summary = df_summary.sort_values(by='bin')
47+
df_summary["bin"] = df_summary["bin"].astype(str)
48+
49+
df_gunc = pd.read_csv(args.gunc_summary, sep='\t')
50+
df_gunc["genome"] = df_gunc["genome"].astype(str)
51+
df_gunc = df_gunc.sort_values(by='genome')
52+
53+
df_summary = pd.merge(df_summary, df_gunc, left_on='bin', right_on='genome', how='left')
54+
55+
df_summary.rename(columns={'bin': 'Bin'}, inplace=True)
56+
columns_to_remove = ['Name', "genome", 'Input_file', 'Assembly', 'Bin Id']
57+
df_summary = df_summary.drop(columns=columns_to_remove, errors="ignore")
58+
59+
df_summary['sample'] = None
60+
for f in range(len(df_summary["Bin"])):
61+
match = re.search(r'^.*?-.*?-(.*)$', df_summary["Bin"][f])
62+
if match:
63+
name = match.group(1)
64+
name = re.sub(r'\.(unbinned|noclass)(\..*)?$', '', name)
65+
name = re.sub(r'\.\d+(\.[^.]+)?$', '', name)
66+
df_summary.at[f,"sample"] = name
67+
68+
df_summary.to_csv(args.out, sep="\t", index=True)
69+
70+
if __name__ == "__main__":
71+
sys.exit(main())

bin/combine_tables.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ def main(args=None):
202202
"Coding_Density",
203203
"Translation_Table_Used",
204204
"Total_Coding_Sequences",
205+
"Genome_Size"
205206
]
206207
checkm2_results = pd.read_csv(
207208
args.checkm2_summary, usecols=use_columns, sep="\t"

conf/modules.config

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1093,4 +1093,13 @@ process {
10931093
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
10941094
]
10951095
}
1096+
withName: BIGMAG {
1097+
publishDir = [
1098+
[
1099+
path: { "${params.outdir}/GenomeBinning/BIgMAG/" },
1100+
mode: params.publish_dir_mode,
1101+
pattern: '*.tsv',
1102+
]
1103+
]
1104+
}
10961105
}

conf/test_assembly_input.config

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,10 @@ params {
5555

5656
// TODO: enable when we have a suitable way to run a small test
5757
// GUNC fails with exit code 1 if no matches, see https://github.com/grp-bork/gunc/issues/42
58+
// To generate the BIgMAG file, it is necessary to include GUNC in the execution
5859
run_gunc = false
5960
gunc_db = params.pipelines_testdata_base_path + 'mag/databases/gunc/gunc-mock.dmnd'
61+
//generate_bigmag_file = true
6062

6163
skip_metaeuk = false
6264
metaeuk_mmseqs_db = 'Kalamari'

docs/output.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -843,6 +843,17 @@ Note that in contrast to the other tools, for CheckM the bin name given in the c
843843

844844
All columns other than the primary `bin` key column, and the `Depth <sample name>` columns, will include a suffix specifying from which bin QC tool the column is derived from to distinguish identically named columns from different tools.
845845

846+
## Summary file to be used as input for BIgMAG
847+
848+
<details markdown="1">
849+
<summary>Output files</summary>
850+
851+
- `GenomeBinning/BIgMAG/bigmag_summary.tsv`: Summary of bin sequencing depths together with GUNC, QUAST, GTDB-Tk, BUSCO and CheckM2 results.
852+
853+
</details>
854+
855+
The output file in this directory is used as input for the dashboard [BIgMAG](https://github.com/jeffe107/BIgMAG) for visualisation and evaluation of MAG quality.
856+
846857
## Ancient DNA
847858

848859
Optional, only running when parameter `-profile ancient_dna` is specified.

docs/usage.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,3 +543,9 @@ Up until version 4.0.0, this pipeline offered raw read taxonomic profiling using
543543
This feature was removed in version 5.0.0 to strengthen the pipeline's focus on metagenome assembly and binning.
544544

545545
If you require taxonomic profiling of raw reads, we recommend using [nf-core/taxprofiler](https://nf-co.re/taxprofiler/), which is specifically designed for taxonomic profiling of raw reads and supports a wide range of tools for this purpose.
546+
547+
## BIgMAG compatibility
548+
549+
With the parameter `--generate_bigmag_file` a module will be triggered to generate a file that contains the output from all of the bin-quality tools that can be uploaded to the [BIgMAG](https://github.com/jeffe107/BIgMAG) dashboard for visualising and evaluating MAGs.
550+
Please note that generating this file requires the parameters `--run_busco`, `--run_gunc` and `--run_checkm2`, and GTDBTk should be executed (i.e., not skipped).
551+
The file `bigmag_summary.tsv` located at `GenomeBinning/BIgMAG` is the only file needed to run the BIgMAG dashboard.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
2+
channels:
3+
- conda-forge
4+
- bioconda
5+
dependencies:
6+
- conda-forge::python=3.10.6
7+
- conda-forge::pandas=1.4.3

0 commit comments

Comments
 (0)