Skip to content

Curated publicly available genomic data of fermented foods + microbes

License

Notifications You must be signed in to change notification settings

MicrocosmFoods/fermentedfood_mags_curation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fermented Foods Microbial Genomes Database

DOI KBase Narrative

This repository documents curating genomes and metadata from publicly available studies of microbes from different fermented foods.

The full database of ~13,500 microbial genomes and associated curated metadata can be accessed on Zenodo. We have also made a subset of these genomes available as a Narrative on KBase. We clustered the full set of 13,500 genomes at 99% average nucleotide identity (ANI) to obtain ~4,300 "strain"-representative genomes. You can access the static KBase Narrative here. To access the KBase platform to explore the database and run your own analyses, you will need to create a KBase account.

The most up-to-date corresponding metadata is available here.

Accessed Datasets and Repositories

For curating the set of microbial genomes from diverse fermented foods, we accessed metagenome-assembled genomes (MAGs) and isolates from publicly available sources.

Environment Setup

After installing conda for your OS, you can create a conda environment with all the dependencies required for running the scripts with:

conda env create -n fermented_foods envs/dev.yml

Metadata Curation

Metadata associated with each genome including sample accession, food information and taxonomy, and further curation of the genome set including dereplication and GTDB-tk taxonomic assignment is documented in the fermentedfood_metadata_curation repository.

Repository Structure & Files

The repository is split both for scripts and directories for handling genomes from MAG datasets or collections of isolates. The cleaned, curated metadata for the MAG datasets and bacdive isolates is in the main metadata directory and copied in the subdirectories. The subdirectories contain the raw files for curating metadata from different sources together.

- metadata/ - Most of this are now intermediate files used to create the final metadata files in the [fermentedfood_metadata_curation](https://github.com/MicrocosmFoods/fermentedfood_metadata_curation) repository.
- scripts/
    - batch_fasta_files.py - Helper script to create batches of fasta files and associated samplesheet templates for uploading to KBase.
- envs/
 - dev.yml
 - quast.yml

About

Curated publicly available genomic data of fermented foods + microbes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published