Skip to content

Code to build a microbial foods MAG and microbial gene catalog

Notifications You must be signed in to change notification settings

biosustain/dsp_microbialFoodsCatalog

Repository files navigation

Building a Microbial Foods MAG dataset and a Microbial foods gene catalog

Here we are gathering code and documentation to build a microbial foods MAG dataset and a microbial foods gene catalog

We are uptdating documentation about all the datasets and samples considered.

There will be code:

  • to study the diversity coverage of the different microbial foods MAG databases,
  • to join and repreplicate MAG collections
  • to build step by step the MAG datasets from internal samples
  • to build a MAG-derived gene catalog
  • to run some diversity and taxonomic analysis on internal samples using the resource created

Data included:

Diversity coverage of the different microbial foods MAG databases

input files: .tsv files generated by CoverM

coverm_stats.R: R script from calculating and plotting relative abudance, alpha and beta diversity

diversity results : plots and tables created by the R script

Gene clustering statistics

gene_clustering_stats.py: python script for calculating gene redusancy statistics after clustering

HPC workflow creating gene catalog from pubicly avaible MAGs and internally constructed MAGs

gene_catalog.html: step-by-step guide for reating gene catalog from pubicly avaible MAGs and internally constructed MAGs

MAG set preprocessing for the pubibly avaible MAGs set

find_files.sh: Shell script for recursively locating files within complex directory structures produced by large workflows. Useful for managing outputs from multi-step pipelines.

tag_mags.sh: Shell script used to add standardized identifiers (e.g. sample or batch tags) to MAG filenames. This ensures consistent naming across downstream analyses and plots.

rename_headers.sh Shell script used to standardize FASTA headers for MAGs or gene sequences. This step is critical for: • Avoiding header-related issues in downstream tools • Ensuring compatibility across mapping, annotation, and visualization steps

Drep Clustering

plots generated by drep for the final dereplecation

NOTE : due to file size limitations,some files are available upon request.

About

Code to build a microbial foods MAG and microbial gene catalog

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages