Skip to content

Zero-RAG powered multi-omics reasoning system for longevity research with autonomous question generation and neural document routing

Notifications You must be signed in to change notification settings

AntonioVFranco/bio-drzero

Repository files navigation

Bio-DrZero

A self-evolving bioinformatics agent for aging and longevity research, based on the Zero-RAG methodology.

Python 3.8+ License: MIT Code style: black Medium LinkedIn X

banner-bio-drzero-github_1

Overview

Bio-DrZero implements a complete RAG-MoE (Retrieval-Augmented Generation with Mixture of Experts) system that generates questions, retrieves documents from biological databases, and produces answers without requiring labeled training data.

The system integrates three components:

  • Proposer-HRPO: Question generation with biological hop complexity
  • RAG-MoE Retriever: Specialized experts for PubMed, UniProt, and KEGG
  • Solver-GRPO: Context-aware answer generation

Installation

git clone https://github.com/AntonioVFranco/bio-drzero.git
cd bio-drzero
pip install -r requirements.txt

Quick Start

from end_to_end_pipeline import BioDrZeroEndToEnd, PipelineConfig

# Initialize pipeline with real API integration
config = PipelineConfig(use_real_apis=True)
pipeline = BioDrZeroEndToEnd(config)

# Run complete pipeline
documents = ["FOXO3 regulates longevity through autophagy..."]
results = pipeline.run_complete_pipeline(documents, num_questions=5)

# View results
for qa in results['answers']:
    print(f"Q: {qa['question']}")
    print(f"A: {qa['answer']}")
    print(f"Reward: {qa['reward']:.3f}\n")

Or run the examples:

python3 examples/phase4_example.py  # Complete end-to-end demo
python3 examples/test_real_apis.py  # Test API integration

Architecture

Bio-DrZero uses a three-stage pipeline:

Documents → Proposer-HRPO → Questions
                                ↓
Questions → RAG-MoE → Context (PubMed/UniProt/KEGG)
                                ↓
Questions + Context → Solver-GRPO → Answers

Biological Hop Complexity

Questions are classified by reasoning complexity:

  • 0-hop: Direct extraction ("What is FOXO3?")
  • 1-hop: Single database ("What pathways involve FOXO3?")
  • 2-hop: Cross-database ("What drugs target FOXO3 pathway?")
  • 3-hop: Multi-omics ("How do FOXO3 variants affect longevity?")

API Integration

The system retrieves data from:

  • PubMed: Biomedical literature via E-utilities API
  • UniProt: Protein sequences and functions via REST API
  • KEGG: Pathway annotations via REST API

Fallback to synthetic data if APIs fail.

Examples

Generate Questions

from core import BiologicalProposerHRPO, ProposerConfig

proposer = BiologicalProposerHRPO(ProposerConfig())
questions = proposer.generate_questions(
    documents=["mTOR regulates cellular aging..."],
    num_questions=5
)

Retrieve Documents

from retriever import MoERetriever, MoERetrieverConfig

retriever = MoERetriever(MoERetrieverConfig())
docs = retriever.retrieve("FOXO3 longevity", top_k=3)

Generate Answers

from core import BiologicalSolverGRPO, SolverConfig

solver = BiologicalSolverGRPO(SolverConfig())
answers = solver.generate_answers(questions, contexts, hop_levels)

Configuration

Default configuration in core/config.py:

from core import get_default_config

config = get_default_config()
config.proposer.max_questions_per_doc = 5
config.solver.group_size = 8
config.retriever.fusion_method = "weighted_sum"

Performance

Tested on aging-related questions:

  • Question quality: 0.787 average
  • Answer reward: 0.398 average
  • API retrieval: 41 PubMed articles, 6 UniProt proteins, 2 KEGG pathways
  • Training improvement: +5.5% over 2 epochs

Project Structure

bio-drzero/
├── core/                      # Question/answer generation
│   ├── proposer_hrpo.py      # Question generator
│   ├── solver_grpo.py        # Answer generator
│   ├── hop_complexity.py     # Complexity classifier
│   └── reward_functions.py   # Reward computation
├── retriever/                 # Document retrieval
│   ├── moe_retriever.py      # MoE coordinator
│   ├── moe_router.py         # Neural router
│   ├── expert_retrievers.py  # Domain experts
│   └── real_expert_retrievers.py  # API integration
├── utils/
│   └── api_clients.py        # PubMed/UniProt/KEGG clients
├── examples/                  # Usage examples
└── end_to_end_pipeline.py    # Complete pipeline

Development

Run tests:

python3 examples/phase1_example.py  # Test question generation
python3 examples/phase2_example.py  # Test answer generation
python3 examples/phase3_example.py  # Test expert retrieval
python3 examples/phase4_example.py  # Test full pipeline

Limitations

  • Placeholder answer generation (template-based, not real LLM)
  • Limited to 3 biological databases (PubMed, UniProt, KEGG)
  • No fine-tuning on domain-specific datasets yet
  • Requires internet connection for API calls

Citation

If you use Bio-DrZero in your research:

@software{biodrzero2026,
  title={Bio-DrZero: Biological Zero-RAG Framework},
  author={Franco, Antonio V.},
  year={2026},
  url={https://github.com/AntonioVFranco/bio-drzero}
}

Based on the Zero-RAG methodology: arXiv:2601.07055

License

MIT License - see LICENSE file for details.

Contact

Feel free to contact me via email for any needs: [email protected]

Contributing

Contributions welcome. Please open an issue to discuss changes before submitting a PR.

About

Zero-RAG powered multi-omics reasoning system for longevity research with autonomous question generation and neural document routing

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages