B-GUT-decontamination

Snakemake pipeline to decontaminate genomes to be included in a kraken2 database.

Example of the pipeline run in a folder containing 4 genomes [An example of one of these genomes is provided in the folder "Example_genomes"]

Installation: Singularity image

Marenostrum5

You will find the .sif file at ~/current/okhannous/Decontamination_fungal_database/my_singularity.sif

Local use

You will need to build and deploy the Singularity image. First you have to download Singularity. Details of how to do it: HERE

Using the provided Dockerfile and Makefile files (present in the current github resources) run:

make singularity-image

This will create the "my_singularity.sif", ready to be used with our snakemake pipeline.

How to run the pipeline using the Singularity image

Here there is an example of command to run the pipeline.

In High-Performance Computing (HPC) environments such as marenostrum5, you may need to bind folders when using containerization tools like Singularity to ensure that your container can access the necessary files and directories that exist outside of the container's file system that is why we add the flag -B.

snakemake --use-singularity --singularity-args '-B /path_to_our_group_folder/bscXX' -s ~/current/okhannous/Decontamination_fungal_database/bgut_decontam.smk all --cores 48

In the .smk file there is indicated the path to the .sif image.

You can find a template job to run the pipeline in the cluster: "bgut_decontam.job"

Note that the pipeline is fast but requires high mem nodes because of the kraken2 step. You can have a debug interaction session and run it there by:

salloc -A bsc40 -q gp_debug --exclusive --constraint=highmem
snakemake --use-singularity --singularity-args '-B /path_to_our_group_folder/bscXX' -s ~/current/okhannous/Decontamination_fungal_database/bgut_decontam.smk all --cores 48

Changes you might need to do in the pipeline:

IMPORTANT: In the pipeline you might want to change some of the parameters (This will be addressed with a config file in future versions):

From the bgut_decontam.smk file:

** Global directories (where the genomes are located and where you want to store the results), this should be changed according to your data:

genomes_dir = "~/current/okhannous/current/okhannous/Decontamination_fungal_database/Example_genomes"
out_dir = "~/current/okhannous/current/okhannous/Decontamination_fungal_database/OUT_cluster"

** Define the wildcards (for processing multiple genome files)

GENOME_SUFFIX = "_Genome.fasta" #Change with the extension of the genome files
PREFIX = "renamedFungiDB-58_" #In case you have any prefix (in this case the genomes where renamed for kraken2)

In the R scripts there are also some strings that you might change to not have errors, that depend in the genome names you have:

Clean_fasta.R: Change in the suffix and prefix

file_name_renamed <- gsub("renamedFungiDB-58_", "", file_name)
file_name_renamed <- gsub("_Genome.fasta", "", file_name_renamed)

Parsing_kraken2_reports.R

name <- gsub("gpfs/.*/","",i) --> This is specific for marenostrum5

Assessment_gc_distribution.R --> all ok
Final_decision_kraken2_and_gc_content.R --> The path corresponds to marenostrum5

df_multimodality$X <-   gsub("~/current/okhannous/Decontamination_fungal_database/OUT_cluster/gc_content/","",df_multimodality$X)

Citations / Acknowledgments

Snakemake

Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., & Köster, J. (2021). Sustainable data analysis with Snakemake. F1000Research, 10, 33. https://doi.org/10.12688/f1000research.29032.2

Kraken2

Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome biology, 20(1), 257. https://doi.org/10.1186/s13059-019-1891-0

Tiara

Karlicki, M., Antonowicz, S., & Karnkowska, A. (2022). Tiara: deep learning-based classification system for eukaryotic sequences. Bioinformatics (Oxford, England), 38(2), 344–350. https://doi.org/10.1093/bioinformatics/btab672

Seqkit

Shen, W., Sipos, B., & Zhao, L. (2024). SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta, 3(3), e191. https://doi.org/10.1002/imt2.191

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

B-GUT-decontamination

Installation: Singularity image

How to run the pipeline using the Singularity image

Changes you might need to do in the pipeline:

Citations / Acknowledgments

Snakemake

Kraken2

Tiara

Seqkit

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Example_genomes		Example_genomes
Assessment_gc_distribution.R		Assessment_gc_distribution.R
Clean_fasta.R		Clean_fasta.R
Dockerfile		Dockerfile
Final_decision_kraken2_and_gc_content.R		Final_decision_kraken2_and_gc_content.R
Makefile		Makefile
Parsing_kraken2_reports.R		Parsing_kraken2_reports.R
README.md		README.md
bgut_decontam.job		bgut_decontam.job
bgut_decontam.smk		bgut_decontam.smk
bgutdecontam_schema.jpg		bgutdecontam_schema.jpg
get_singularity_def_file_from_dockerfile.sh		get_singularity_def_file_from_dockerfile.sh

Gabaldonlab/B-GUT-decontamination

Folders and files

Latest commit

History

Repository files navigation

B-GUT-decontamination

Installation: Singularity image

How to run the pipeline using the Singularity image

Changes you might need to do in the pipeline:

Citations / Acknowledgments

Snakemake

Kraken2

Tiara

Seqkit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages