Skip to content

vkola-lab/npjdementia2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiscoADRD

DiscoADRD (Discovery Agent for Alzheimer's Disease and Related Dementias) is an AI-powered multi-agent scientific discovery system designed for analyzing NACC (National Alzheimer's Coordinating Center) datasets and generating research insights through automated hypothesis testing and statistical analysis.

🎯 Overview

DiscoADRD employs a sophisticated multi-agent architecture where specialized AI agents (Scientist, Critic, and Planning agents) collaborate to:

  • Analyze complex medical datasets
  • Generate and test scientific hypotheses
  • Perform comprehensive statistical analysis
  • Generate research insights and findings
  • Automate the scientific discovery workflow

📁 Repository Structure

DiscoADRD/
├── DiscoADRD_v3/              # Latest version (recommended)
│   ├── discover.py            # Main entry point
│   ├── config.yml             # Configuration file
│   ├── run.sh                 # HPC job submission script
│   ├── batch.csv              # Sample batch hypotheses file
│   └── discovery_agent_pkg/   # Core package
│       ├── agents/            # AI agent implementations
│       ├── core/              # Core system components
│       ├── data/              # Data processing modules
│       ├── execution/         # Code execution system
│       ├── experiment/        # Experiment management
│       └── ui/                # User interface components
│
├── vLLM/                      # vLLM inference client utilities
│   ├── client.py              # vLLM multi-GPU client
│   ├── discovery_agent_client.py
│   └── run.sh                 # vLLM runner script
│
├── LICENSE                    # License file
└── README.md                  # This file

🚀 Quick Start

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • Access to NACC datasets
  • HPC environment with SGE/PBS support (optional)

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd DiscoADRD
  2. Set up environment (for HPC environments):

    module load miniconda
    module load cuda
    conda activate llms
  3. Install dependencies:

    pip install transformers torch huggingface-hub python-dotenv
    pip install pandas numpy scipy scikit-learn matplotlib seaborn
    pip install pyyaml tqdm vllm
  4. Configure the system:

    • Edit DiscoADRD_v3/config.yml with your dataset paths and settings
    • Update paths in run.sh scripts to match your environment

Basic Usage

cd DiscoADRD_v3
python discover.py

For detailed usage instructions, see DiscoADRD_v3/README.md.

📚 Documentation

🔧 Key Features

  • Multi-Agent Architecture: Scientist, Critic, and Planning agents work collaboratively
  • Advanced Statistical Analysis: Comprehensive statistical testing including regression, survival analysis, and ML
  • NACC Dataset Integration: Specialized for Alzheimer's disease research
  • Automated Code Execution: Safe Python code execution with timeout protection
  • Multiple Hypothesis Modes: Direct, batch, and keyword-based hypothesis generation
  • HPC Integration: Designed for high-performance computing environments

🏗️ Architecture

The system uses a modular architecture with clear separation of concerns:

  • Agents: Specialized AI agents for different roles (Scientist, Critic, Planner)
  • Core: Main orchestration and configuration management
  • Data: Data processing, caching, and file management
  • Execution: Safe code execution with sandboxing
  • Experiment: Experiment lifecycle management
  • UI: User interface and progress tracking

📊 Supported Workflows

  1. Direct Hypothesis Testing: Test a specific hypothesis directly
  2. CSV Batch Processing: Process multiple hypotheses from a CSV file
  3. Keyword-Based Generation: Generate hypotheses from keyword sampling
  4. Direct Mode: Process without hypothesis generation

🔬 Research Applications

DiscoADRD is specifically designed for:

  • Alzheimer's Disease and Related Dementias (ADRD) research
  • NACC dataset analysis
  • Automated hypothesis generation and testing
  • Clinical research data analysis
  • Epidemiological studies

📝 Configuration

Configuration is managed through:

  • YAML files: config.yml for system-wide settings
  • Command-line arguments: Override YAML settings
  • Environment variables: For HPC and system-specific settings

See DiscoADRD_v3/config.yml for detailed configuration options.

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

📄 License

See LICENSE for license information.

📞 Support

For questions and support:

  • Check the documentation in DiscoADRD_v3/README.md
  • Review configuration options in config.yml
  • Examine verbose output logs
  • Contact the development team

🔄 Version Information

  • DiscoADRD_v3: Latest version with enhanced features (recommended)
  • vLLM: Utilities for vLLM inference client

Last updated: January 2025

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors