Building a Microbial Foods MAG dataset and a Microbial foods gene catalog

Here we are gathering code and documentation to build a microbial foods MAG dataset and a microbial foods gene catalog

We are uptdating documentation about all the datasets and samples considered.

There will be code:

to study the diversity coverage of the different microbial foods MAG databases,
to join and repreplicate MAG collections
to build step by step the MAG datasets from internal samples
to build a MAG-derived gene catalog
to run some diversity and taxonomic analysis on internal samples using the resource created

Data included:

cFMD
MiFoDB
Internal samples from Microbial Foods dept at DTU-Biosustain

Diversity coverage of the different microbial foods MAG databases

input files: .tsv files generated by CoverM

coverm_stats.R: R script from calculating and plotting relative abudance, alpha and beta diversity

diversity results : plots and tables created by the R script

Gene clustering statistics

gene_clustering_stats.py: python script for calculating gene redusancy statistics after clustering

HPC workflow creating gene catalog from pubicly avaible MAGs and internally constructed MAGs

gene_catalog.html: step-by-step guide for reating gene catalog from pubicly avaible MAGs and internally constructed MAGs

MAG set preprocessing for the pubibly avaible MAGs set

find_files.sh: Shell script for recursively locating files within complex directory structures produced by large workflows. Useful for managing outputs from multi-step pipelines.

tag_mags.sh: Shell script used to add standardized identifiers (e.g. sample or batch tags) to MAG filenames. This ensures consistent naming across downstream analyses and plots.

rename_headers.sh Shell script used to standardize FASTA headers for MAGs or gene sequences. This step is critical for: • Avoiding header-related issues in downstream tools • Ensuring compatibility across mapping, annotation, and visualization steps

Drep Clustering

plots generated by drep for the final dereplecation

NOTE : due to file size limitations,some files are available upon request.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Diversity Coverage		Diversity Coverage
Gene Catalog Stats		Gene Catalog Stats
HPC Workflow		HPC Workflow
MAG set pre-processing		MAG set pre-processing
drep clustering		drep clustering
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Microbial Foods MAG dataset and a Microbial foods gene catalog

Diversity coverage of the different microbial foods MAG databases

Gene clustering statistics

HPC workflow creating gene catalog from pubicly avaible MAGs and internally constructed MAGs

MAG set preprocessing for the pubibly avaible MAGs set

Drep Clustering

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Building a Microbial Foods MAG dataset and a Microbial foods gene catalog

Diversity coverage of the different microbial foods MAG databases

Gene clustering statistics

HPC workflow creating gene catalog from pubicly avaible MAGs and internally constructed MAGs

MAG set preprocessing for the pubibly avaible MAGs set

Drep Clustering

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages