Code accompanying the manuscript "Microbiome responses to anthelmintic treatment depend on pre-treatment helminth infection status in young Ethiopian children".
Abstract: Mass deworming programs using effective anthelmintic drugs are essential for controlling soil-transmitted helminth (STH) infections, particularly in high-risk developing regions. However, it remains unclear whether routine deworming induces long-term alterations in gut microbiome composition, especially when accounting for individual infection status. This study aims to explore the changes in the gut microbiomes of Ethiopian children one year after the administration of an anthelmintic treatment.
- Polina Tikhonova, MSc
project
|- README # repository descriptions
|- LICENSE # the license for this project
|
|- data # project data
| |- database/ # SILVA database directory
| |- reads/ # a directory for fastq files
| |- raw/ # a directory for the raw reads
| +- filtered/ # (generated by the pipeline)
| # a directory, containing filtered reads
| |- QCcontrol # (generated by the pipeline); QC files
| |- dada2 # (generated by the pipeline); dada2 files
| |- phyloseq # (generated by the pipeline; provided);
| | # resulting phyloseq object files in .rds and .csv formats
| +- metadata.tsv # a metadata file, provided
|
|- code/ # project codes
| |- analysis/ # analysis, generates results/ files
| +- processing/ # raw data processing, generates data/proccessed/ files
|
|- results # output files, generated by the codes in analysis folder
| |- figures/ # manuscript figures
| +- tables/ # beta-diversity tables
Please note that the codes in this repository were tested on Linux x86_64 system.
- snakemake (v7.32.1)
conda install -c bioconda -c conda-forge snakemake==7.32.1
Please note that running the processing pipeline is optional since the final phyloseq objects necessary for the analysis are provided in this repository.
- Install dependencies (~10 min). In case of version incompatibility errors, please set the conda
channel_prioritytoflexible. - The raw data is publicly available at the European Nucleotide Archive (ENA) under accession number PRJEB93790. Please download the fastq.gz files to the
data/reads/rawdirectory and follow the proposed directory structure:sample_name/sample_name_R*.fastq.qz. The metadata file should be saved as adata/metadata.tsvfile (a tab-separated format). - Download Silva database version 138 to the
data/databasedirectory. Please, make sure to download both files:silva_nr99_v138.2_toGenus_trainset.fa.gzandsilva_v138.2_assignSpecies.fa.gz. - (optional) In case of custom paths to the raw files and database, please modify the corresponding parameters in the
code/processing/config.yamlfile. - Run Snakemodule
mkdir code/processing/logs
cd code/processing/logs
conda activate snakemake
snakemake --snakefile ../Snakemodule --use-conda --conda-frontend conda
Suggested parameters for running the Snakemodule using multiple sbatch jobs: --cluster "sbatch --time=47:00:00 --nodes=1 --ntasks=20 --mem=200GB" --jobs 10.
codes/processing
|
|- Snakemodule
|- conda_env.*.yaml # conda environment libraries
|- config.yaml # Snakemodule paths and settings
+- params.dada2.filter_and_trim.yaml # dada2 filter_and_trim settings
Snakemodule Steps
- FastQC/MultiQC quality assessment
1.quality_control.smk
- Trimming and filtering
2.filtering_trimming.smk
- FastQC/MultiQC quality reassessment
1.quality_control.smk
- ASV identification using DADA2 pipeline (official tutorial)
3.dada2.smk
- ASV taxonomy annotation
3.dada2.smk
- Generation of the phyloseq object.
3.dada2.smk- outputs:
- unfiltered phyloseq (all ASVs)
- filtered phyloseq (only microbial ASVs)
- filtered phyloseq objects agglomerated at genus and family levels
- Create and install a new conda environment (~30 min)
conda create -n anthelminthic_treatment
conda update -n anthelminthic_treatment --file code/analysis/conda_env.R.yaml
- Install additional R libraries:
conda activate anthelminthic_treatment
R
devtools::install_github(repo = "malucalle/selbal", ref="9f7ff2b")
devtools::install_github(repo = "gauravsk/ranacapa", ref="58c0cab")
devtools::install_github(repo = "gmteunisse/fantaxtic", ref="b822d7f")
All analysis codes are implemented in R and stored in the codes/processing directory. Each coding file is provided in two extensions: .ipynb and .Rmd
- Dataset overview
Figure 1. Data Overview.(Chi-squared test of demographic characteristics, sample counts).Figure 3. Microbial relative abundance barplots.
- Microbial diversity
Figure 2. Microbial diversity.(baseline vs follow-up groups).Figure 2. Microbial diversity. Baseline and follow-up groups stratified by baseline STH Status.(baseline vs follow-up groups).Supplementary Figure 2. Microbial diversity. Accounting for STH status.Rmd.
- Dimensionality reduction
Figure 2. PCoA based on Bray-Curtis distance. Baseline and follow-up groups.
- Genera-vanishment analysis
Figure 3. ASV vanishment analysis
- Beta-diversity
Figure 4. ALDEx2 analysis. Wilcoxon test.Supplementary Figure 4. ALDEx2 analysis. Wilcoxon with covariates.