Skip to content

raeslab/ckd-ghent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CKD Ghent Microbiome Analysis

This repository contains a workflow for analysing the microbiome data from the Ghent CKD cohort. The analysis includes diversity assessment, taxonomic profiling, functional analysis, and identification of covariates influencing microbial composition. The scripts are provided with synthetic/mock data to allow users to replicate the analysis steps. The original data and metadata are protected under the General Data Protection Regulation (GDPR), but can be requested through the appropriate channels (see below).


1. Data Availability

Raw Sequencing Data

The raw amplicon sequencing and shotgun metagenomic data reported in this study have been deposited in the European Genome-Phenome Archive (EGA) under the accession code EGAS50000000646. These data can be accessed following the EGA’s data access procedures.

Mock Data in This Repository

The repository includes synthetic/mock data to mimic the original analysis pipeline. The provided files are located in the data/ directory:

├── data/
│   ├── 16S/
│   │   ├── background_physeq.genus.rar.10000.RData
│   │   └── physeq.genus.rar.10000.RData
│   ├── metadata/
│   │   ├── GMM_GMB_GKM.names
│   │   ├── metadata_conversion_values.txt
│   │   └── metadata_demo.txt
│   └── shotgun/
│       ├── cazy_demo.txt
│       ├── mock_motus.txt
│       └── omixer_abundance_demo.txt

These datasets are synthetic and intended for testing. They replicate the structure of the actual data used in the analysis.

Metadata Access

The clinical metadata used in the analysis cannot be publicly shared due to GDPR regulations. A synthetic version is provided in this repository (metadata/metadata_demo.txt) for demonstration purposes. Access to the original metadata may be requested upon formal inquiry:

  • Contact: [email protected]
  • Requests will be evaluated by the Data Access Committee and the Ethics Committee of Ghent University Hospital. Approval will require signing a data processing agreement.

2. Prerequisites

To replicate this analysis:

  1. Clone the repository:
    git clone https://github.com/your-repo/ckd-ghent.git
    cd ckd-ghent
  2. Build the Docker container to ensure reproducibility:
    docker build --build-arg GITHUB_PAT=<your_github_token> -t ckd_analysis .
  3. Use the provided synthetic data or replace it with your own processed data as described in the manuscript.

3. Analysis Pipeline

For each analysis script, we provide both the .R script and the .Rmd file. The .Rmd files generate comprehensive HTML outputs that include plots and results for easy visualisation. These outputs are located in the docs/ directory.

Run the Preprocessing

The raw shotgun data should be processed as described in the Methods section of the manuscript. Specifically:

  • The mOTUs table was generated using the marker-gene-based operational taxonomic units (mOTU) v3 profiler (Ruscheweyh, H. J. et al., 2022).

  • The gene catalogue was generated using the eggNOG-mapper v2 (Cantalapiedra, C. P. et al., 2021) to provide functional annotations, orthology assignments, and domain predictions. This gene catalogue was further used for:

    • Metabolic modules analysis: The KO annotations were extracted, and the abundance of modules was determined using Omixer-RPM (Darzi, Y. et al., 2016) with default settings.
    • Carbohydrate-active enzymes (CAZymes) analysis: The CAZyme annotations were derived from the eggNOG-mapper v2 output (Cantarel, B. L. et al., 2009).

The processed outputs are represented by the synthetic data files provided in the repository:

  • mock_motus.txt (based on the mOTU v3 profiler output),
  • omixer_abundance_demo.txt (based on eggNOG-mapper v2 and Omixer-RPM outputs), and
  • cazy_demo.txt (derived from the CAZyme annotations obtained from the eggNOG-mapper output).

These mock datasets are synthetic representations meant to mimic the structure of the actual data used in the analysis.


Enterotyping

This script performs DMM clustering based on 16S rRNA sequencing data. The analysis requires:

  1. Cohort 16S Data: Available at EGAS50000000646.
  2. Background FGFP 16S Data: Available at EGAS00001003296.

Execute the enterotyping workflow using:

docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/Enterotyping.R')"

The outputs represented by synthetic files provided in the repository are the phyloseq files based on the amplicon sequencing (16S rRNA) data. physeq.genus.rar.10000.RData is a synthetic dataset that serves as the cohort data, and background_physeq.genus.rar.10000.RData is a synthetic background dataset.

HTML documentation for this analysis is available at: docs/Enterotyping.html.


Diversity Analysis

Calculate alpha and beta diversity profiles:

docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/Diversity.R')"

HTML documentation for this analysis is available at: docs/Diversity.html.


Microbiota Covariates Identification

Explore host-derived factors influencing microbial composition using distance-based redundancy analysis (dbRDA):

docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/dbRDA.R')"

HTML documentation for this analysis is available at: docs/dbRDA.html.


Taxonomic Abundance Analysis

Identify microbial markers associated with the decline in estimated glomerular filtration rate (eGFR):

docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/Taxa_analysis.R')"

HTML documentation for this analysis is available at: docs/Taxa_analysis.html.


Functional Analysis

Investigate changes in metabolic modules associated with eGFR decline:

docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/Metabolic_modules.R')"

HTML documentation for this analysis is available at: docs/Metabolic_modules.html.


Carbohydrate-active enzymes (CAZymes) Analysis

Analyse carbohydrate-active enzymes differences between the groups:

docker run -v $(pwd):/app ckd_analysis R -e "source('scripts/analyses/CAZymes.R')"

HTML documentation for this analysis is available at: docs/CAZymes.html.


4. Notes on Reproducibility

  • The exact R environment used in this analysis is captured in the renv.lock file.
  • Use Docker for consistent execution across environments.
  • All synthetic data, scripts, and configurations are provided to ensure replicability of the pipeline steps.

5. Legal and Ethical Considerations

Pursuant to the provisions of the General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679), dissemination of personally identifiable metadata from participants, even in pseudonymised form, is strictly prohibited. Access to such data may be requested upon formal inquiry to [email protected] and will require approval by the Data Access Committee and the Ethics Committee of Ghent University Hospital.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published