Skip to content

Commit 46fcb59

Browse files
authored
Wrap-up v3.3 for release (#93)
* Add pre-formatted database (#82) * add pre-formatted database info * add information about pre-formatted database * 50 add tool integron finder 20 (#87) * update version * Add pre-formatted database (#83) * add pre-formatted database info * add information about pre-formatted database * update falmeida-py package * change version * change main tools to public containers * use biocontainer * aggregate other non-biocontainer tools and diminish the size of docker images * update module labels * re-arranged docker images * add integron_finder module * update amrfinder version * trying to addintegron finder to gff * update docker * fixed image install * fixed integron finder 2 gff * remove unnecessary grouptuple * fix image and emboss module * fix organization * add docker image to module * fix indentation * fix indentation * added integron finder results to final GFF and JBROWSE * integron finder results added to HTML report * fix docker image * properly added to json summary * update changelog * update readme * update list of tools * update default config in docs * backscape tildes * update installation docs * fix indentation * update outputs docs * fix wrong pipeline name * fix typo * update quickstart * fixed mlst execution in singularity * fix indentation * 85 prokka module can get after modules stuck if the header file longer than 20 and not separated by tab or space (#89) * add awk command to clean big fasta headers * add awk statement to clean big fasta headers * update bakta version * fix bakta stats parsing * 81 add tool mob suite (#90) * Add pre-formatted database (#83) * add pre-formatted database info * add information about pre-formatted database * add mob suite module * added results to HTML report * Update Dockerfile * added mob_suite to json summary * add tool to markdown files * add tool information to docs * add example reports * update singularity config * fixed kofamscan download * fix dockerfile * Fix unicycler tag * use only docker images to avoid timeout error * use docker ocntainer to avoid singularity timeout * fixed resfinder for singularity * fixed docker image * fix gff2sql in singularity * use proper singularity images * fix singularity image download * fixed docker image * Add option for prebuilt db download (#94) * include module to download pre-built databases * update docs * 69 tools to use own docker image (#91) * moved container configurations of assembly modules * update default flye version * update container configuration for database-setup modules * re-organize container definition of 'generic' modules * reorganize container configuration for KO modules * reorganized container configuration for MGEs modules * finalizing container configuration reorganization of last modules * containers already defined in config files * update params schema * fixed zenodo download * mob_suite singularity image not always suited for low connection servers * add option to download container configs * update unicycler version (0.5.0--py310h6cc9453_3) * 96 error summary for bugfix release (#101) Update falmeida-py version * 98 include ices and prophage annotation in json summary (#106) * Try Dockerfile fix * Update Dockerfile * Update Dockerfile * Update CHANGELOG.md * 100 update pipeline docker images from docker tags to docker shasum (#108) * fix singularity run options * fix misc dockerfile * update renv docker image environment * update docker images to use shasum * Update CHANGELOG.md * 107 duplicate reads to unique read names (#109) * Add pre-formatted database (#83) * add pre-formatted database info * add information about pre-formatted database * update docs and fix report links * include information of newly known issues (#103) * add parameter to enable deduplication of reads * Update manual.md * update changelog * Update docs for v3.3 (#110) * update cli help * Update installation.md * add indentation * Update README.md * Update README.md * fix tracedir * always show from copy * Update quickstart.md * Update manual.md * update citation information * add citation example * Update CHANGELOG.md
1 parent 60922db commit 46fcb59

93 files changed

Lines changed: 1293 additions & 962 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.zenodo.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"description": "<p>The pipeline</p>\n\n<p>bacannot, is a customisable, easy to use, pipeline that uses state-of-the-art software for comprehensively annotating prokaryotic genomes having only Docker and Nextflow as dependencies. It is able to annotate and detect virulence and resistance genes, plasmids, secondary metabolites, genomic islands, prophages, ICEs, KO, and more, while providing nice an beautiful interactive documents for results exploration.</p>",
33
"license": "other-open",
44
"title": "fmalmeida/bacannot: A generic but comprehensive bacterial annotation pipeline",
5-
"version": "v3.2",
5+
"version": "v3.3",
66
"upload_type": "software",
77
"creators": [
88
{

README.md

Lines changed: 22 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
<img src="images/lOGO_3.png" width="300px">
22

3-
[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.3627669-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.3627669)
3+
[![F1000 Paper](https://img.shields.io/badge/Citation%20F1000-10.12688/f1000research.139488.1-orange)](https://doi.org/10.12688/f1000research.139488.1)
44
[![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/fmalmeida/bacannot?include_prereleases&label=Latest%20release)](https://github.com/fmalmeida/bacannot/releases)
55
[![Documentation](https://img.shields.io/badge/Documentation-readthedocs-brightgreen)](https://bacannot.readthedocs.io/en/latest/?badge=latest)
66
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.10.3-23aa62.svg?labelColor=000000)](https://www.nextflow.io/)
77
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
88
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
99
[![License](https://img.shields.io/badge/License-GPL%203-black)](https://github.com/fmalmeida/bacannot/blob/master/LICENSE)
1010
[![Follow on Twitter](http://img.shields.io/badge/twitter-%40fmarquesalmeida-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/fmarquesalmeida)
11+
[![Zenodo Archive](https://img.shields.io/badge/Zenodo-Archive-blue)](https://doi.org/10.5281/zenodo.3627669)
1112

1213
[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/github.com/fmalmeida/bacannot)
1314

@@ -47,8 +48,9 @@ Its main steps are:
4748
| Annotation of virulence genes | [Victors](http://www.phidias.us/victors/) and [VFDB](http://www.mgc.ac.cn/VFs/main.htm) |
4849
| Prophage sequences and genes annotation | [PHASTER](http://phast.wishartlab.com/), [Phigaro](https://github.com/bobeobibo/phigaro) and [PhySpy](https://github.com/linsalrob/PhiSpy) |
4950
| Annotation of integrative and conjugative elements | [ICEberg](http://db-mml.sjtu.edu.cn/ICEberg/) |
51+
| Annotation of bacterial integrons | [Integron Finder](https://github.com/gem-pasteur/Integron_Finder) |
5052
| Focused detection of insertion sequences | [digIS](https://github.com/janka2012/digIS) |
51-
| _In silico_ detection of plasmids | [Plasmidfinder](https://cge.cbs.dtu.dk/services/PlasmidFinder/) and [Platon](https://github.com/oschwengers/platon) |
53+
| _In silico_ detection and typing of plasmids | [Plasmidfinder](https://cge.cbs.dtu.dk/services/PlasmidFinder/), [Platon](https://github.com/oschwengers/platon) and [MOB-typer](https://github.com/phac-nml/mob-suite)|
5254
| Prediction and visualization of genomic islands | [IslandPath-DIMOB](https://github.com/brinkmanlab/islandpath) and [gff-toolbox](https://github.com/fmalmeida/gff-toolbox) |
5355
| Custom annotation from formatted FASTA or NCBI protein IDs | [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs) |
5456
| Merge of annotation results | [bedtools](https://bedtools.readthedocs.io/en/latest/) |
@@ -86,18 +88,7 @@ These images have been kept separate to not create massive Docker image and to a
8688

8789
## Installation
8890

89-
1. If you don't have it already install [Docker](https://docs.docker.com/) in your computer.
90-
* After installed, you need to download the required Docker images
91-
92-
```bash
93-
docker pull fmalmeida/bacannot:v3.2_misc ;
94-
docker pull fmalmeida/bacannot:v3.2_perlenv ;
95-
docker pull fmalmeida/bacannot:v3.2_pyenv ;
96-
docker pull fmalmeida/bacannot:v3.2_renv ;
97-
docker pull fmalmeida/bacannot:jbrowse ;
98-
```
99-
100-
🔥 Nextflow can also automatically handle images download on the fly when executed. If docker has exceeded its download limit rates, please try again in a few hours.
91+
1. If you don't have it already install either [Docker](https://docs.docker.com/) or [Singularity](https://docs.sylabs.io/guides/3.5/user-guide/index.html) in your computer.
10192

10293
2. Install Nextflow (version 20.10 or higher):
10394

@@ -111,48 +102,7 @@ These images have been kept separate to not create massive Docker image and to a
111102

112103
🔥 Users can get let the pipeline always updated with: `nextflow pull fmalmeida/bacannot`
113104

114-
### Downloading and updating databases
115-
116-
Bacannot databases are not inside the docker images anymore to avoid huge images and problems with connections and limit rates with dockerhub.
117-
118-
#### Pre-formatted
119-
120-
Users can directly download pre-formatted databases from Zenodo: https://doi.org/10.5281/zenodo.7615811
121-
122-
Useful for standardization and also overcoming known issues that may arise when formatting databases with `singularity` profile.
123-
124-
#### I want to generate a new formatted database
125-
126-
To download and format a copy of required bacannot databases users can execute the following:
127-
128-
```bash
129-
# Download pipeline databases
130-
nextflow run fmalmeida/bacannot --get_dbs --output bacannot_dbs -profile <docker/singularity>
131-
```
132-
133-
This will produce a directory like this:
134-
135-
```bash
136-
bacannot_dbs
137-
├── amrfinder_db
138-
├── antismash_db
139-
├── argminer_db
140-
├── card_db
141-
├── iceberg_db
142-
├── kofamscan_db
143-
├── mlst_db
144-
├── phast_db
145-
├── phigaro_db
146-
├── pipeline_info
147-
├── plasmidfinder_db
148-
├── platon_db
149-
├── prokka_db
150-
├── resfinder_db
151-
├── vfdb_db
152-
└── victors_db
153-
```
154-
155-
> To update databases you can either download a new one to a new directory. Remove the database you want to get a new one from the root bacannot dir and use the same command above to save in the same directory (the pipeline will only try to download missing databases). Or, you can use the parameter `--force_update` to download everything again.
105+
<a href="https://bacannot.readthedocs.io/en/latest/installation"><strong>Please refer to the installation page, for a complete guide on required images and databases. »</strong></a>
156106

157107
## Quickstart
158108

@@ -185,6 +135,17 @@ Create a configuration file in your working directory:
185135

186136
nextflow run fmalmeida/bacannot --get_config
187137

138+
##### Overwrite container versions with config
139+
140+
The pipeline uses pre-set docker and singularity configuration files to set all the containers and versions of images that should be used by each module in the pipeline.
141+
142+
Although not recommended, one can use these configuration files to change the version of specific tools if desired.
143+
144+
To download these configs one can:
145+
146+
nextflow run fmalmeida/bacannot --get_docker_config
147+
nextflow run fmalmeida/bacannot --get_singularity_config
148+
188149
### Interactive graphical configuration and execution
189150

190151
#### Via NF tower launchpad (good for cloud env execution)
@@ -234,7 +195,11 @@ It will result in the following:
234195

235196
## Citation
236197

237-
To cite this tool please refer to our [Zenodo tag](https://doi.org/10.5281/zenodo.3627669).
198+
In order to cite this pipeline, please refer to:
199+
200+
> Almeida FMd, Campos TAd and Pappas Jr GJ. Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation. [version 1; peer review: awaiting peer review]. F1000Research 2023, 12:1205 (https://doi.org/10.12688/f1000research.139488.1)
201+
202+
Additionally, archived versions of the pipeline are also found in [Zenodo](https://doi.org/10.5281/zenodo.3627669).
238203

239204
This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [GPLv3](https://github.com/fmalmeida/bacannot/blob/master/LICENSE).
240205

bin/gff2sql.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,8 @@ addTable <- function (con, sql, input) {
5454

5555
# Loading SQL database driver
5656
drv <- dbDriver("SQLite")
57-
dbname <- file.path("/work", opt$out)
58-
con <- dbConnect(drv, dbname=dbname)
57+
print(opt$out)
58+
con <- dbConnect(drv, dbname=opt$out)
5959

6060
#####################################
6161
### First STEP load GENOME to sql ###

bin/mlst-make_blast_db.sh

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/bin/bash
2+
3+
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
4+
MLSTDIR="$0"
5+
BLASTDIR="$DIR/../db/blast"
6+
BLASTFILE="$BLASTDIR/mlst.fa"
7+
8+
mkdir -p "$BLASTDIR"
9+
rm -f "$BLASTFILE"
10+
11+
#for N in $(find $MLSTDIR -maxdepth 1 | grep -v '_2$'); do
12+
for N in $(find $MLSTDIR -mindepth 1 -maxdepth 1 -type d); do
13+
SCHEME=$(basename $N)
14+
echo "Adding: $SCHEME"
15+
cat "$MLSTDIR"/$SCHEME/*.tfa \
16+
| grep -v 'not a locus' \
17+
| sed -e "s/^>/>$SCHEME./" \
18+
>> "$BLASTFILE"
19+
done
20+
21+
makeblastdb -hash_index -in "$BLASTFILE" -dbtype nucl -title "PubMLST" -parse_seqids
22+
23+
echo "Created BLAST database for $BLASTFILE"

bin/run_jbrowse.sh

Lines changed: 70 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Help()
1414
echo "Simple help message for the utilization of this script"
1515
echo "It takes the jbrowse data path and all the files that shall be plotted from bacannot"
1616
echo
17-
echo "Syntax: run_jbrowse.sh [-h|p|g|b|s|f|r|B|P|G|m|S|R|d|A]"
17+
echo "Syntax: run_jbrowse.sh [-h|p|g|b|s|f|r|B|P|G|m|S|R|d|A|i]"
1818
echo "options:"
1919
echo
2020
echo "h Print this help"
@@ -32,59 +32,63 @@ Help()
3232
echo "R Path to Resfinder custom GFF"
3333
echo "d Path to digIS custom GFF"
3434
echo "A Path to antismash custom GFF"
35+
echo "i Path to Integron Finder custom GFF"
3536
echo ""
3637
echo
3738
}
3839

3940
# Get the options
40-
while getopts "hp:g:b:s:f:r:B:P:G:m:S:R:d:A:" option; do
41-
case $option in
42-
h) # display Help
43-
Help
44-
exit;;
45-
p) # get genome prefix
46-
PREFIX="$OPTARG"
47-
;;
48-
g) # get genome FASTA
49-
GENOME="$OPTARG"
50-
;;
51-
b) # get GC bedgraph
52-
BEDGRAPH="$OPTARG"
53-
;;
54-
s) # get chr sizes
55-
CHRSIZES="$OPTARG"
56-
;;
57-
f) # get prokka gff
58-
PROKKAGFF="$OPTARG"
59-
;;
60-
r) # get barrnap gff
61-
rRNAGFF="$OPTARG"
62-
;;
63-
B) # get phigaro bed
64-
PHIGAROBED="$OPTARG"
65-
;;
66-
P) # get phispy bed
67-
PHISPYBED="$OPTARG"
68-
;;
69-
G) # get GIs bed
70-
GIBED="$OPTARG"
71-
;;
72-
m) # get nanopolish methylation
73-
NANOMETHYL="$OPTARG"
74-
;;
75-
S) # get nanopolish chr sizes
76-
NANOSIZES="$OPTARG"
77-
;;
78-
R) # get resfinder GFF
79-
RESFINDERGFF="$OPTARG"
80-
;;
81-
d) # get digIS GFF
82-
DIGISGFF="$OPTARG"
83-
;;
84-
A) # get antismash GFF
85-
ANTISMASHGFF="$OPTARG"
86-
;;
87-
esac
41+
while getopts "hp:g:b:s:f:r:B:P:G:m:S:R:d:A:i:" option; do
42+
case $option in
43+
h) # display Help
44+
Help
45+
exit;;
46+
p) # get genome prefix
47+
PREFIX="$OPTARG"
48+
;;
49+
g) # get genome FASTA
50+
GENOME="$OPTARG"
51+
;;
52+
b) # get GC bedgraph
53+
BEDGRAPH="$OPTARG"
54+
;;
55+
s) # get chr sizes
56+
CHRSIZES="$OPTARG"
57+
;;
58+
f) # get prokka gff
59+
PROKKAGFF="$OPTARG"
60+
;;
61+
r) # get barrnap gff
62+
rRNAGFF="$OPTARG"
63+
;;
64+
B) # get phigaro bed
65+
PHIGAROBED="$OPTARG"
66+
;;
67+
P) # get phispy bed
68+
PHISPYBED="$OPTARG"
69+
;;
70+
G) # get GIs bed
71+
GIBED="$OPTARG"
72+
;;
73+
m) # get nanopolish methylation
74+
NANOMETHYL="$OPTARG"
75+
;;
76+
S) # get nanopolish chr sizes
77+
NANOSIZES="$OPTARG"
78+
;;
79+
R) # get resfinder GFF
80+
RESFINDERGFF="$OPTARG"
81+
;;
82+
d) # get digIS GFF
83+
DIGISGFF="$OPTARG"
84+
;;
85+
A) # get antismash GFF
86+
ANTISMASHGFF="$OPTARG"
87+
;;
88+
i) # get integron finder GFF
89+
INTEGRONFINDERGFF="$OPTARG"
90+
;;
91+
esac
8892
done
8993

9094
# Main
@@ -313,7 +317,7 @@ remove-track.pl --trackLabel "${PREFIX} CARD-RGI resistance features" --dir data
313317
--trackLabel "${PREFIX} Resfinder resistance features" --out "data" --nameAttributes "Resfinder_gene,ID,Resfinder_phenotype" ;
314318
remove-track.pl --trackLabel "${PREFIX} Resfinder resistance features" --dir data &> /tmp/error
315319
[ ! -s $RESFINDERGFF ] || echo -E " { \"compress\" : 0, \
316-
\"displayMode\" : \"compact\", \
320+
\"displayMode\" : \"compact\", \
317321
\"key\" : \"${PREFIX} Resfinder resistance features\", \
318322
\"category\" : \"Resistance annotation\", \
319323
\"label\" : \"${PREFIX} Resfinder resistance features\", \
@@ -343,6 +347,22 @@ remove-track.pl --trackLabel "${PREFIX} ICE genes from ICEberg database" --dir d
343347
\"urlTemplate\" : \"tracks/${PREFIX} ICE genes from ICEberg database/{refseq}/trackData.json\" } " | add-track-json.pl data/trackList.json
344348
[ $(grep "ICEberg" $PROKKAGFF | wc -l) -eq 0 ] || rm -f iceberg ices ;
345349

350+
## Integron Finder
351+
[ $(wc -l $INTEGRONFINDERGFF) -eq 0 ] || flatfile-to-json.pl --gff $INTEGRONFINDERGFF --key "${PREFIX} Annotated Integrons - Integron Finder" --trackType CanvasFeatures \
352+
--trackLabel "${PREFIX} Annotated Integrons - Integron Finder" --out "data" --nameAttributes "ID,integron_type" ;
353+
remove-track.pl --trackLabel "${PREFIX} Annotated Integrons - Integron Finder" --dir data &> /tmp/error
354+
[ $(wc -l $INTEGRONFINDERGFF) -eq 0 ] || echo -E " { \"compress\" : 0, \
355+
\"displayMode\" : \"compact\", \
356+
\"key\" : \"${PREFIX} Annotated Integrons - Integron Finder\", \
357+
\"category\" : \"MGEs annotation\", \
358+
\"label\" : \"${PREFIX} Annotated Integrons - Integron Finder\", \
359+
\"storeClass\" : \"JBrowse/Store/SeqFeature/NCList\", \
360+
\"style\" : { \"className\" : \"feature\", \"color\": \"#6db6d9\" }, \
361+
\"trackType\" : \"CanvasFeatures\", \
362+
\"type\" : \"CanvasFeatures\", \
363+
\"nameAttributes\" : \"ID,integron_type\", \
364+
\"urlTemplate\" : \"tracks/${PREFIX} Annotated Integrons - Integron Finder/{refseq}/trackData.json\" } " | add-track-json.pl data/trackList.json
365+
346366
## PROPHAGES
347367
### PHAST
348368
[ $(grep "PHAST" $PROKKAGFF | wc -l) -eq 0 ] || grep "PHAST" $PROKKAGFF > prophage ;

conf/defaults.config

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,9 @@ params {
1414

1515
// Trigger database download and formatting workflow? --> will not run annotation
1616
// Will download and format a database inside {output} parameter
17-
get_dbs = false
18-
force_update = false
17+
get_dbs = false
18+
force_update = false
19+
get_zenodo_db = false // download pre-built database
1920

2021
/*
2122

@@ -31,6 +32,9 @@ params {
3132
// It is also documented in the main manual: https://bacannot.readthedocs.io/en/latest/samplesheet
3233
input = null
3334

35+
// Enable reads deduplication for assembly? (If input has reads)
36+
enable_deduplication = false
37+
3438
// path to directory containing databases used by bacannot
3539
// you can download databases with:
3640
// nextflow run fmalmeida/bacannot --get_dbs --output bacannot_dbs -profile <docker/conda/singularity>
@@ -175,13 +179,13 @@ params {
175179
// Select versions of bioconda quay.io additional tools
176180
// Tools that are not part of the core of the pipeline,
177181
// but can eventually be used by users
178-
unicycler_version = '0.4.8--py38h8162308_3'
179-
flye_version = '2.9--py39h39abbe0_0'
180-
bakta_version = '1.6.1--pyhdfd78af_0'
182+
unicycler_version = '0.5.0--py310h6cc9453_3'
183+
flye_version = '2.9--py39h6935b12_1'
184+
bakta_version = '1.7.0--pyhdfd78af_1'
181185

182186
// Max resource options
183187
max_memory = '20.GB'
184188
max_cpus = 16
185189
max_time = '40.h'
186190

187-
}
191+
}

0 commit comments

Comments
 (0)