A comprehensive R analysis repository for competitive ELISA (cELISA) data. This repository contains statistical analysis scripts and visualization tools for investigating antibody functional activity and correlations.
cELISA-analysis/
├── R/ # Analysis scripts
│ ├── setup.R # Package management and setup
│ ├── generate_all_figures.R # Main script to generate all figures
│ ├── multivariate_feature_selection.R # Automatic feature selection analysis
│ ├── pairwise_assay_analysis.R # Pairwise combination analysis
│ ├── tra_correlation_analysis.R # TRA correlation between timepoints
│ ├── odpr_deltaod_correlations.R # Functional vs level correlations
│ ├── tra_deltaod_correlations.R # TRA vs antibody level correlations
│ ├── pairwise_correlation_matrix.R # Simplified correlation matrices
│ ├── univariate_deltaod_analysis.R # Individual assay performance
│ └── roc_analysis.R # ROC curve analysis for TRA prediction
├── data/ # Input data files
├── figures/ # Output plots and figures
├── outputs/ # Analysis results and statistics
├── cELISA-analysis.Rproj # RStudio project file
└── README.md # This file
- R version: 4.0 or higher
- RStudio: Recommended for interactive use
The analysis automatically installs required packages:
CRAN Packages:
readxl- Excel file readingggplot2- Data visualizationggpubr- Publication-ready plotsdplyr- Data manipulationtidyr- Data tidyingcorrplot- Correlation matricescirclize- Color mappingcowplot- Plot arrangementspROC- ROC curve analysisopenxlsx- Excel file writingGGally- Pairwise correlation plotsreshape2- Data reshapingcaret- Machine learning and classificationlme4- Linear mixed-effects models
Bioconductor Packages (optional):
ComplexHeatmap- Advanced heatmaps
git clone <repository-url>
cd cELISA-analysisDouble-click cELISA-analysis.Rproj to open the project in RStudio.
source("R/setup.R")Place your Excel data file as data/Compete.xlsx. The file should contain:
- StudyDay columns (252, 560 for Post-Dose 3 and 4)
- TRA values
- Antibody level measurements (FABlog10, eELISAEUlog10, IgG1/3/4DeltaOD, c1qDeltaOD, etc.)
- Subject demographics (Gender, Age) if available
# Generate all figures and tables (main workflow)
source("R/generate_all_figures_tables.R")
# Or run specific analyses:
source("R/multivariate_feature_selection.R") # Automatic feature selection
source("R/pairwise_assay_analysis.R") # Pairwise combination analysis
source("R/tra_correlation_analysis.R") # TRA timepoint correlations
source("R/roc_analysis.R") # ROC curve analysis
source("R/univariate_deltaod_analysis.R") # Individual assay performance- Centralized package management: Install and load all required packages
- CRAN mirror configuration: Ensures reliable package installation
- Bioconductor support: Optional advanced packages like ComplexHeatmap
- Environment verification: Checks successful package loading
- Automatic feature selection: Stepwise selection using AIC criterion
- Bootstrap validation: 50 iterations with 70/30 train-test splits
- Class balancing: Upsampling for TRA>80% prediction
- Performance metrics: AIC, Accuracy, F1, PPV, NPV, AUC with confidence intervals
- Dual timepoint analysis: StudyDay 252 and 560 comparison
- Combination analysis: All 15 pairwise combinations of 6 assays
- Statistical approach: Logistic regression with bootstrap sampling
- Comprehensive metrics: Performance evaluation across all combinations
- 3-significant figure formatting: Standardized result presentation
- Single assay evaluation: Performance of each assay individually
- Bootstrap sampling: Robust performance estimation
- Comprehensive metrics: Complete performance characterization
- Excel export: Formatted results for publication
- Longitudinal analysis: TRA correlations between Post-Dose 3 and 4
- Matched subject analysis: Paired timepoint comparisons
- Statistical significance: Spearman correlations with p-values
- ODPR-DeltaOD relationships: Functional activity vs antibody levels
- Multiple assay types: IgG1, IgG3, IgG4, C1q analysis
- Dual timepoint analysis: Both StudyDay 252 and 560
- Publication-ready plots: High-resolution correlation plots
- TRA prediction analysis: Correlation with antibody levels
- Threshold visualization: TRA>80% cutoff analysis
- Multiple predictors: All antibody assays vs TRA
- Simplified correlation matrices: Using GGally for clean visualization
- Functional vs level separation: Clear variable grouping
- Statistical annotations: Correlation coefficients and significance
This repository contains R scripts for statistical analysis of competitive ELISA (cELISA) data from a clinical study investigating antibody responses.
- Analysis focus: IgG subclass and C1q binding data from post-vaccination timepoints
- Data source: Clinical study data from vaccination trials
- Statistical approach: Correlation analysis, ROC analysis and predictive modeling
- Output: Publication-ready figures and statistical results
- Modular design: Individual R scripts for specific analyses and figures
- Standardized workflows: Consistent data processing across all analyses
- Visualization: ggplot2-based plots with publication-quality themes
- Statistical methods: Bootstrap validation, feature selection, ROC analysis
- Reproducibility: Comprehensive documentation and error handling
- Consistent variable naming conventions across all scripts
- Proper error handling for missing data and edge cases
- Publication-quality plots with standardized formatting
- Complete documentation of statistical methods and assumptions
- Reproducible analysis workflows with version control
- Statistical approaches: Logistic regression with bootstrap validation
- Stepwise feature selection: AIC-based automatic model selection
- Cross-validation: 70/30 train-test splits with 50 bootstrap iterations
- Class balancing: Upsampling techniques for imbalanced outcomes
- Performance metrics: Comprehensive evaluation (AIC, Accuracy, F1, PPV, NPV, AUC, Sensitivity, Specificity)
- Correlation analysis: Spearman correlations robust to outliers
- ROC analysis: Predictive performance assessment for TRA>80% outcomes
- Publication-ready plots: High-resolution outputs (300 DPI)
- Consistent styling: Standardized themes and colors
- Statistical annotations: P-values and correlation coefficients
- Flexible functions: Reusable plotting components
- Multiple formats: TIFF, PNG output options
- R project structure: Self-contained analysis environment
- Automated setup: Package installation and management
- Clear documentation: Comprehensive commenting
- Version control ready: Git integration
figures/tra_correlation_*.png- TRA correlations between timepointsfigures/*_vs_*.png- Scatter plots with correlation analysisfigures/pairwise_correlation_*.png- Correlation matricesfigures/roc_*.pdf- ROC curves for TRA>80% prediction
outputs/multivariate_feature_selection_summary.csv- Feature selection resultsoutputs/Pairwise_Day252_sig3.xlsx- Pairwise combination analysis resultsoutputs/TRA_correlation_summary.csv- TRA correlation analysisoutputs/univariate_performance_*.xlsx- Individual assay performance metricsoutputs/ROC_Summary_*.xlsx- ROC analysis results with AUC confidence intervals
# 1. Setup environment (run once per session)
source("R/setup.R")
# 2. Run all analyses
source("R/multivariate_feature_selection.R") # Feature selection
source("R/pairwise_assay_analysis.R") # Pairwise combinations
source("R/tra_correlation_analysis.R") # TRA correlations
source("R/univariate_deltaod_analysis.R") # Individual assays
source("R/roc_analysis.R") # ROC analysis# Feature selection for optimal assay combinations
source("R/multivariate_feature_selection.R")
# Evaluate all pairwise assay combinations
source("R/pairwise_assay_analysis.R")
# Correlation between timepoints
source("R/tra_correlation_analysis.R")If Bioconductor packages fail to install:
# Install BiocManager first
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ComplexHeatmap")- Ensure Excel file is named exactly
Compete.xlsx - Check that data contains StudyDay 252 and 560 values
- Verify required columns: StudyDay, TRA, FABlog10, eELISAEUlog10, IgG1DeltaOD, IgG3DeltaOD, IgG4DeltaOD, c1qDeltaOD
- Ensure numeric columns contain only numbers or NA
- Insufficient data: Some analyses require minimum sample sizes for bootstrap validation
- Model convergence: Logistic regression may fail with sparse data - check class balance
- Memory usage: Large bootstrap iterations may require more RAM
For large datasets:
- Process subsets of data
- Use
gc()to free memory - Consider data.table for efficiency
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this analysis code in your research, please cite:
[To be added]
cELISA Analysis Repository. GitHub: [https://github.com/niaid/cELISA-StatisticalAnalysis]
For questions or issues:
- Open a GitHub issue for code-related problems
- Check R documentation for package-specific help
Primary authors: Yuyan Yi, Jingwen Gu
Contact: [email protected]
Note: This repository contains analysis code only. Actual clinical data is not included due to privacy considerations. Users must provide their own data following the specified format.