DiscoADRD (Discovery Agent for Alzheimer's Disease and Related Dementias) is an AI-powered multi-agent scientific discovery system designed for analyzing NACC (National Alzheimer's Coordinating Center) datasets and generating research insights through automated hypothesis testing and statistical analysis.
DiscoADRD employs a sophisticated multi-agent architecture where specialized AI agents (Scientist, Critic, and Planning agents) collaborate to:
- Analyze complex medical datasets
- Generate and test scientific hypotheses
- Perform comprehensive statistical analysis
- Generate research insights and findings
- Automate the scientific discovery workflow
DiscoADRD/
├── DiscoADRD_v3/ # Latest version (recommended)
│ ├── discover.py # Main entry point
│ ├── config.yml # Configuration file
│ ├── run.sh # HPC job submission script
│ ├── batch.csv # Sample batch hypotheses file
│ └── discovery_agent_pkg/ # Core package
│ ├── agents/ # AI agent implementations
│ ├── core/ # Core system components
│ ├── data/ # Data processing modules
│ ├── execution/ # Code execution system
│ ├── experiment/ # Experiment management
│ └── ui/ # User interface components
│
├── vLLM/ # vLLM inference client utilities
│ ├── client.py # vLLM multi-GPU client
│ ├── discovery_agent_client.py
│ └── run.sh # vLLM runner script
│
├── LICENSE # License file
└── README.md # This file
- Python 3.8+
- CUDA-compatible GPU (recommended)
- Access to NACC datasets
- HPC environment with SGE/PBS support (optional)
-
Clone the repository:
git clone <repository-url> cd DiscoADRD
-
Set up environment (for HPC environments):
module load miniconda module load cuda conda activate llms
-
Install dependencies:
pip install transformers torch huggingface-hub python-dotenv pip install pandas numpy scipy scikit-learn matplotlib seaborn pip install pyyaml tqdm vllm
-
Configure the system:
- Edit
DiscoADRD_v3/config.ymlwith your dataset paths and settings - Update paths in
run.shscripts to match your environment
- Edit
cd DiscoADRD_v3
python discover.pyFor detailed usage instructions, see DiscoADRD_v3/README.md.
- DiscoADRD_v3/README.md - Comprehensive documentation for the latest version
- discovery_agent_pkg/README.md - Package-level documentation
- Multi-Agent Architecture: Scientist, Critic, and Planning agents work collaboratively
- Advanced Statistical Analysis: Comprehensive statistical testing including regression, survival analysis, and ML
- NACC Dataset Integration: Specialized for Alzheimer's disease research
- Automated Code Execution: Safe Python code execution with timeout protection
- Multiple Hypothesis Modes: Direct, batch, and keyword-based hypothesis generation
- HPC Integration: Designed for high-performance computing environments
The system uses a modular architecture with clear separation of concerns:
- Agents: Specialized AI agents for different roles (Scientist, Critic, Planner)
- Core: Main orchestration and configuration management
- Data: Data processing, caching, and file management
- Execution: Safe code execution with sandboxing
- Experiment: Experiment lifecycle management
- UI: User interface and progress tracking
- Direct Hypothesis Testing: Test a specific hypothesis directly
- CSV Batch Processing: Process multiple hypotheses from a CSV file
- Keyword-Based Generation: Generate hypotheses from keyword sampling
- Direct Mode: Process without hypothesis generation
DiscoADRD is specifically designed for:
- Alzheimer's Disease and Related Dementias (ADRD) research
- NACC dataset analysis
- Automated hypothesis generation and testing
- Clinical research data analysis
- Epidemiological studies
Configuration is managed through:
- YAML files:
config.ymlfor system-wide settings - Command-line arguments: Override YAML settings
- Environment variables: For HPC and system-specific settings
See DiscoADRD_v3/config.yml for detailed configuration options.
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
See LICENSE for license information.
For questions and support:
- Check the documentation in
DiscoADRD_v3/README.md - Review configuration options in
config.yml - Examine verbose output logs
- Contact the development team
- DiscoADRD_v3: Latest version with enhanced features (recommended)
- vLLM: Utilities for vLLM inference client
Last updated: January 2025