Skip to content

niaid/cELISA-StatisticalAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cELISA Analysis Repository

R License

A comprehensive R analysis repository for competitive ELISA (cELISA) data. This repository contains statistical analysis scripts and visualization tools for investigating antibody functional activity and correlations.

Repository Structure

cELISA-analysis/
├── R/                           # Analysis scripts
│   ├── setup.R                  # Package management and setup
│   ├── generate_all_figures.R   # Main script to generate all figures
│   ├── multivariate_feature_selection.R # Automatic feature selection analysis
│   ├── pairwise_assay_analysis.R # Pairwise combination analysis
│   ├── tra_correlation_analysis.R # TRA correlation between timepoints
│   ├── odpr_deltaod_correlations.R # Functional vs level correlations
│   ├── tra_deltaod_correlations.R # TRA vs antibody level correlations
│   ├── pairwise_correlation_matrix.R # Simplified correlation matrices
│   ├── univariate_deltaod_analysis.R # Individual assay performance
│   └── roc_analysis.R           # ROC curve analysis for TRA prediction
├── data/                        # Input data files
├── figures/                     # Output plots and figures
├── outputs/                     # Analysis results and statistics
├── cELISA-analysis.Rproj       # RStudio project file
└── README.md                   # This file

Requirements

R Environment

  • R version: 4.0 or higher
  • RStudio: Recommended for interactive use

R Packages

The analysis automatically installs required packages:

CRAN Packages:

  • readxl - Excel file reading
  • ggplot2 - Data visualization
  • ggpubr - Publication-ready plots
  • dplyr - Data manipulation
  • tidyr - Data tidying
  • corrplot - Correlation matrices
  • circlize - Color mapping
  • cowplot - Plot arrangements
  • pROC - ROC curve analysis
  • openxlsx - Excel file writing
  • GGally - Pairwise correlation plots
  • reshape2 - Data reshaping
  • caret - Machine learning and classification
  • lme4 - Linear mixed-effects models

Bioconductor Packages (optional):

  • ComplexHeatmap - Advanced heatmaps

Quick Start

1. Clone Repository

git clone <repository-url>
cd cELISA-analysis

2. Open in RStudio

Double-click cELISA-analysis.Rproj to open the project in RStudio.

3. Install Dependencies

source("R/setup.R")

4. Prepare Data

Place your Excel data file as data/Compete.xlsx. The file should contain:

  • StudyDay columns (252, 560 for Post-Dose 3 and 4)
  • TRA values
  • Antibody level measurements (FABlog10, eELISAEUlog10, IgG1/3/4DeltaOD, c1qDeltaOD, etc.)
  • Subject demographics (Gender, Age) if available

5. Run Analysis

# Generate all figures and tables (main workflow)
source("R/generate_all_figures_tables.R")

# Or run specific analyses:
source("R/multivariate_feature_selection.R")    # Automatic feature selection
source("R/pairwise_assay_analysis.R")           # Pairwise combination analysis  
source("R/tra_correlation_analysis.R")          # TRA timepoint correlations
source("R/roc_analysis.R")                      # ROC curve analysis
source("R/univariate_deltaod_analysis.R")       # Individual assay performance

Analysis Workflow

1. Setup and Package Management (setup.R)

  • Centralized package management: Install and load all required packages
  • CRAN mirror configuration: Ensures reliable package installation
  • Bioconductor support: Optional advanced packages like ComplexHeatmap
  • Environment verification: Checks successful package loading

2. Core Statistical Analysis

A. Multivariate Feature Selection (multivariate_feature_selection.R)

  • Automatic feature selection: Stepwise selection using AIC criterion
  • Bootstrap validation: 50 iterations with 70/30 train-test splits
  • Class balancing: Upsampling for TRA>80% prediction
  • Performance metrics: AIC, Accuracy, F1, PPV, NPV, AUC with confidence intervals
  • Dual timepoint analysis: StudyDay 252 and 560 comparison

B. Pairwise Assay Analysis (pairwise_assay_analysis.R)

  • Combination analysis: All 15 pairwise combinations of 6 assays
  • Statistical approach: Logistic regression with bootstrap sampling
  • Comprehensive metrics: Performance evaluation across all combinations
  • 3-significant figure formatting: Standardized result presentation

C. Individual Assay Performance (univariate_deltaod_analysis.R)

  • Single assay evaluation: Performance of each assay individually
  • Bootstrap sampling: Robust performance estimation
  • Comprehensive metrics: Complete performance characterization
  • Excel export: Formatted results for publication

3. Correlation Analysis

A. TRA Timepoint Correlations (tra_correlation_analysis.R)

  • Longitudinal analysis: TRA correlations between Post-Dose 3 and 4
  • Matched subject analysis: Paired timepoint comparisons
  • Statistical significance: Spearman correlations with p-values

B. Functional vs Level Correlations (odpr_deltaod_correlations.R)

  • ODPR-DeltaOD relationships: Functional activity vs antibody levels
  • Multiple assay types: IgG1, IgG3, IgG4, C1q analysis
  • Dual timepoint analysis: Both StudyDay 252 and 560
  • Publication-ready plots: High-resolution correlation plots

C. TRA-Antibody Correlations (tra_deltaod_correlations.R)

  • TRA prediction analysis: Correlation with antibody levels
  • Threshold visualization: TRA>80% cutoff analysis
  • Multiple predictors: All antibody assays vs TRA

4. Visualization and Matrices (pairwise_correlation_matrix.R)

  • Simplified correlation matrices: Using GGally for clean visualization
  • Functional vs level separation: Clear variable grouping
  • Statistical annotations: Correlation coefficients and significance

Project Context

This repository contains R scripts for statistical analysis of competitive ELISA (cELISA) data from a clinical study investigating antibody responses.

Study Overview

  • Analysis focus: IgG subclass and C1q binding data from post-vaccination timepoints
  • Data source: Clinical study data from vaccination trials
  • Statistical approach: Correlation analysis, ROC analysis and predictive modeling
  • Output: Publication-ready figures and statistical results

Code Architecture

  • Modular design: Individual R scripts for specific analyses and figures
  • Standardized workflows: Consistent data processing across all analyses
  • Visualization: ggplot2-based plots with publication-quality themes
  • Statistical methods: Bootstrap validation, feature selection, ROC analysis
  • Reproducibility: Comprehensive documentation and error handling

Development Principles

  • Consistent variable naming conventions across all scripts
  • Proper error handling for missing data and edge cases
  • Publication-quality plots with standardized formatting
  • Complete documentation of statistical methods and assumptions
  • Reproducible analysis workflows with version control

Key Features

Statistical Methods

  • Statistical approaches: Logistic regression with bootstrap validation
  • Stepwise feature selection: AIC-based automatic model selection
  • Cross-validation: 70/30 train-test splits with 50 bootstrap iterations
  • Class balancing: Upsampling techniques for imbalanced outcomes
  • Performance metrics: Comprehensive evaluation (AIC, Accuracy, F1, PPV, NPV, AUC, Sensitivity, Specificity)
  • Correlation analysis: Spearman correlations robust to outliers
  • ROC analysis: Predictive performance assessment for TRA>80% outcomes

Visualization Features

  • Publication-ready plots: High-resolution outputs (300 DPI)
  • Consistent styling: Standardized themes and colors
  • Statistical annotations: P-values and correlation coefficients
  • Flexible functions: Reusable plotting components
  • Multiple formats: TIFF, PNG output options

Reproducibility

  • R project structure: Self-contained analysis environment
  • Automated setup: Package installation and management
  • Clear documentation: Comprehensive commenting
  • Version control ready: Git integration

Output Files

Figures

  • figures/tra_correlation_*.png - TRA correlations between timepoints
  • figures/*_vs_*.png - Scatter plots with correlation analysis
  • figures/pairwise_correlation_*.png - Correlation matrices
  • figures/roc_*.pdf - ROC curves for TRA>80% prediction

Statistical Results

  • outputs/multivariate_feature_selection_summary.csv - Feature selection results
  • outputs/Pairwise_Day252_sig3.xlsx - Pairwise combination analysis results
  • outputs/TRA_correlation_summary.csv - TRA correlation analysis
  • outputs/univariate_performance_*.xlsx - Individual assay performance metrics
  • outputs/ROC_Summary_*.xlsx - ROC analysis results with AUC confidence intervals

Usage Examples

Run Complete Analysis Pipeline

# 1. Setup environment (run once per session)
source("R/setup.R")

# 2. Run all analyses
source("R/multivariate_feature_selection.R")  # Feature selection
source("R/pairwise_assay_analysis.R")         # Pairwise combinations
source("R/tra_correlation_analysis.R")        # TRA correlations
source("R/univariate_deltaod_analysis.R")     # Individual assays
source("R/roc_analysis.R")                    # ROC analysis

Run Specific Analysis Components

# Feature selection for optimal assay combinations
source("R/multivariate_feature_selection.R")

# Evaluate all pairwise assay combinations
source("R/pairwise_assay_analysis.R")

# Correlation between timepoints
source("R/tra_correlation_analysis.R")

Troubleshooting

Package Installation Issues

If Bioconductor packages fail to install:

# Install BiocManager first
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
    
BiocManager::install("ComplexHeatmap")

Data Loading Problems

  • Ensure Excel file is named exactly Compete.xlsx
  • Check that data contains StudyDay 252 and 560 values
  • Verify required columns: StudyDay, TRA, FABlog10, eELISAEUlog10, IgG1DeltaOD, IgG3DeltaOD, IgG4DeltaOD, c1qDeltaOD
  • Ensure numeric columns contain only numbers or NA

Analysis Issues

  • Insufficient data: Some analyses require minimum sample sizes for bootstrap validation
  • Model convergence: Logistic regression may fail with sparse data - check class balance
  • Memory usage: Large bootstrap iterations may require more RAM

Memory Issues

For large datasets:

  • Process subsets of data
  • Use gc() to free memory
  • Consider data.table for efficiency

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this analysis code in your research, please cite:

[To be added]
cELISA Analysis Repository. GitHub: [https://github.com/niaid/cELISA-StatisticalAnalysis]

Support

For questions or issues:

  • Open a GitHub issue for code-related problems
  • Check R documentation for package-specific help

Primary authors: Yuyan Yi, Jingwen Gu

Contact: [email protected]


Note: This repository contains analysis code only. Actual clinical data is not included due to privacy considerations. Users must provide their own data following the specified format.

About

This is repository contains statistical analysis scripts and visualization tools for project cELISA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages