This repository contains a workflow for analysing the microbiome data from the Ghent CKD cohort. The analysis includes diversity assessment, taxonomic profiling, functional analysis, and identification of covariates influencing microbial composition. The scripts are provided with synthetic/mock data to allow users to replicate the analysis steps. The original data and metadata are protected under the General Data Protection Regulation (GDPR), but can be requested through the appropriate channels (see below).
The raw amplicon sequencing and shotgun metagenomic data reported in this study have been deposited in the European Genome-Phenome Archive (EGA) under the accession code EGAS50000000646. These data can be accessed following the EGA’s data access procedures.
The repository includes synthetic/mock data to mimic the original analysis pipeline. The provided files are located in the data/ directory:
├── data/
│ ├── 16S/
│ │ ├── background_physeq.genus.rar.10000.RData
│ │ └── physeq.genus.rar.10000.RData
│ ├── metadata/
│ │ ├── GMM_GMB_GKM.names
│ │ ├── metadata_conversion_values.txt
│ │ └── metadata_demo.txt
│ └── shotgun/
│ ├── cazy_demo.txt
│ ├── mock_motus.txt
│ └── omixer_abundance_demo.txt
These datasets are synthetic and intended for testing. They replicate the structure of the actual data used in the analysis.
The clinical metadata used in the analysis cannot be publicly shared due to GDPR regulations. A synthetic version is provided in this repository (metadata/metadata_demo.txt) for demonstration purposes. Access to the original metadata may be requested upon formal inquiry:
- Contact: [email protected]
- Requests will be evaluated by the Data Access Committee and the Ethics Committee of Ghent University Hospital. Approval will require signing a data processing agreement.
To replicate this analysis:
- Clone the repository:
git clone https://github.com/your-repo/ckd-ghent.git cd ckd-ghent - Build the Docker container to ensure reproducibility:
docker build --build-arg GITHUB_PAT=<your_github_token> -t ckd_analysis .
- Use the provided synthetic data or replace it with your own processed data as described in the manuscript.
For each analysis script, we provide both the .R script and the .Rmd file. The .Rmd files generate comprehensive HTML outputs that include plots and results for easy visualisation. These outputs are located in the docs/ directory.
The raw shotgun data should be processed as described in the Methods section of the manuscript. Specifically:
-
The mOTUs table was generated using the marker-gene-based operational taxonomic units (mOTU) v3 profiler (Ruscheweyh, H. J. et al., 2022).
-
The gene catalogue was generated using the eggNOG-mapper v2 (Cantalapiedra, C. P. et al., 2021) to provide functional annotations, orthology assignments, and domain predictions. This gene catalogue was further used for:
- Metabolic modules analysis: The KO annotations were extracted, and the abundance of modules was determined using Omixer-RPM (Darzi, Y. et al., 2016) with default settings.
- Carbohydrate-active enzymes (CAZymes) analysis: The CAZyme annotations were derived from the eggNOG-mapper v2 output (Cantarel, B. L. et al., 2009).
The processed outputs are represented by the synthetic data files provided in the repository:
mock_motus.txt(based on the mOTU v3 profiler output),omixer_abundance_demo.txt(based on eggNOG-mapper v2 and Omixer-RPM outputs), andcazy_demo.txt(derived from the CAZyme annotations obtained from the eggNOG-mapper output).
These mock datasets are synthetic representations meant to mimic the structure of the actual data used in the analysis.
This script performs DMM clustering based on 16S rRNA sequencing data. The analysis requires:
- Cohort 16S Data: Available at EGAS50000000646.
- Background FGFP 16S Data: Available at EGAS00001003296.
Execute the enterotyping workflow using:
docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/Enterotyping.R')"The outputs represented by synthetic files provided in the repository are the phyloseq files based on the amplicon sequencing (16S rRNA) data. physeq.genus.rar.10000.RData is a synthetic dataset that serves as the cohort data, and background_physeq.genus.rar.10000.RData is a synthetic background dataset.
HTML documentation for this analysis is available at: docs/Enterotyping.html.
Calculate alpha and beta diversity profiles:
docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/Diversity.R')"HTML documentation for this analysis is available at: docs/Diversity.html.
Explore host-derived factors influencing microbial composition using distance-based redundancy analysis (dbRDA):
docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/dbRDA.R')"HTML documentation for this analysis is available at: docs/dbRDA.html.
Identify microbial markers associated with the decline in estimated glomerular filtration rate (eGFR):
docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/Taxa_analysis.R')"HTML documentation for this analysis is available at: docs/Taxa_analysis.html.
Investigate changes in metabolic modules associated with eGFR decline:
docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/Metabolic_modules.R')"HTML documentation for this analysis is available at: docs/Metabolic_modules.html.
Analyse carbohydrate-active enzymes differences between the groups:
docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/CAZymes.R')"HTML documentation for this analysis is available at: docs/CAZymes.html.
- The exact R environment used in this analysis is captured in the
renv.lockfile. - Use Docker for consistent execution across environments.
- All synthetic data, scripts, and configurations are provided to ensure replicability of the pipeline steps.
Pursuant to the provisions of the General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679), dissemination of personally identifiable metadata from participants, even in pseudonymised form, is strictly prohibited. Access to such data may be requested upon formal inquiry to [email protected] and will require approval by the Data Access Committee and the Ethics Committee of Ghent University Hospital.