This repository contains a comprehensive pipeline for analyzing microbiome data from the LCMP cohort. The pipeline covers a range of analyses including data preprocessing, quality control, taxonomic profiling, statistical analysis, and visualization. Below is a detailed guide on how to use the pipeline for each step of the analysis.
To begin the analysis, the first step is to download the fastq files from the repository. Make sure you have the necessary permissions and access to the metadata. Once you have the fastq files, group them by sequencing batches for easy management and analysis. Use the following steps:
- Create a directory structure to organize your data, such as
data/raw_data/
for storing the raw fastq files. - Place the fastq files in the corresponding sequencing batch directories within the
data/raw_data/
directory. - Make sure to appropriately name the directories to reflect the sequencing batches and include relevant metadata if available.
After the fastq files have been grouped into sequencing batches, the next step is to perform initial analysis using the dada2 package to obtain Amplicon Sequence Variants (ASVs) for each sequencing batch. Follow these steps:
- Install the required dependencies, including R and the dada2 package.
- Create a script, such as
scripts/dada2_analysis.R
, and load the necessary libraries. - Execute the dada2_analysis.R script.
Once you have obtained the ASVs for each sequencing batch, it is important to filter out unclassified and non-bacterial ASVs to focus on the microbial taxa of interest. Follow these steps:
- Create a script, such as
scripts/ASV_filtering.R
, and load the necessary libraries. - Read the concatenated ASV file from the
data/processed_data/
directory. - Implement filtering criteria to remove unclassified and non-bacterial ASVs based on taxonomic annotations.
- Execute the ASV_filtering.R script.
After filtering the ASVs, it is essential to perform exploratory and quality control analyses to gain insights into the dataset. Follow these steps:
- Create a script, such as
scripts/exploratory_analysis.R
, and load the necessary libraries. - Read the filtered ASV file from the
data/processed_data/
directory. - Execute the exploratory_analysis.R script.
Quantitative Microbiome Profiling (QMP) at the ASV level enables the quantification of microbial abundance. Follow these steps to perform QMP:
- Create a script, such as
scripts/rmp_to_qmq_ASV.R
, and load the necessary libraries. - Read the filtered ASV file from the
data/processed_data/
directory. - Execute the rmp_to_qmq_ASV.R script.
Identification of microbiota covariates helps in understanding the factors influencing microbial community composition. Follow these steps to identify microbiota covariates:
- Create a script, such as
scripts/covariate_identification.R
, and load the necessary libraries. - Read the filtered ASV files from the
data/processed_data/
directory. - Prepare the necessary metadata, such as sample characteristics, clinical variables, etc.
- Execute the covariate_identification.R script.
To identify taxa showing differential abundance between different conditions or groups, follow these steps:
- Create a script, such as
scripts/differential_abundance.R
, and load the necessary libraries. - Read the filtered ASV files from the
data/processed_data/
directory. - Execute the differential_abundance.R script.
Investigating associations between taxa abundance and other variables can provide valuable insights. Follow these steps to analyze taxa abundance associations:
- Create a script, such as
scripts/abundance_associations.R
, and load the necessary libraries. - Read the filtered ASV files from the
data/processed_data/
directory. - Prepare the necessary metadata, including variables of interest for association analysis.
- Execute the abundance_associations.R script.
Linear model analysis helps in exploring relationships between multiple covariates and microbial abundance. Follow these steps to perform linear models analysis:
- Create a script, such as
scripts/linear_models.R
, and load the necessary libraries. - Read the filtered ASV files from the
data/processed_data/
directory. - Prepare the necessary metadata, including multiple covariates of interest.
- Execute the linear_models.R script.
Enterotyping is a method to categorize individuals based on their gut microbiota composition. Follow these steps to perform enterotyping analysis:
- Create a script, such as
scripts/enterotyping.R
, and load the necessary libraries. - Read the filtered ASV files from the
data/processed_data/
directory. - Execute the enterotyping.R script.