Here we are gathering code and documentation to build a microbial foods MAG dataset and a microbial foods gene catalog
We are uptdating documentation about all the datasets and samples considered.
There will be code:
- to study the diversity coverage of the different microbial foods MAG databases,
- to join and repreplicate MAG collections
- to build step by step the MAG datasets from internal samples
- to build a MAG-derived gene catalog
- to run some diversity and taxonomic analysis on internal samples using the resource created
Data included:
- cFMD
- MiFoDB
- Internal samples from Microbial Foods dept at DTU-Biosustain
input files: .tsv files generated by CoverM
coverm_stats.R: R script from calculating and plotting relative abudance, alpha and beta diversity
diversity results : plots and tables created by the R script
gene_clustering_stats.py: python script for calculating gene redusancy statistics after clustering
gene_catalog.html: step-by-step guide for reating gene catalog from pubicly avaible MAGs and internally constructed MAGs
find_files.sh: Shell script for recursively locating files within complex directory structures produced by large workflows. Useful for managing outputs from multi-step pipelines.
tag_mags.sh: Shell script used to add standardized identifiers (e.g. sample or batch tags) to MAG filenames. This ensures consistent naming across downstream analyses and plots.
rename_headers.sh Shell script used to standardize FASTA headers for MAGs or gene sequences. This step is critical for: • Avoiding header-related issues in downstream tools • Ensuring compatibility across mapping, annotation, and visualization steps
plots generated by drep for the final dereplecation
NOTE : due to file size limitations,some files are available upon request.