A self-evolving bioinformatics agent for aging and longevity research, based on the Zero-RAG methodology.
Bio-DrZero implements a complete RAG-MoE (Retrieval-Augmented Generation with Mixture of Experts) system that generates questions, retrieves documents from biological databases, and produces answers without requiring labeled training data.
The system integrates three components:
- Proposer-HRPO: Question generation with biological hop complexity
- RAG-MoE Retriever: Specialized experts for PubMed, UniProt, and KEGG
- Solver-GRPO: Context-aware answer generation
git clone https://github.com/AntonioVFranco/bio-drzero.git
cd bio-drzero
pip install -r requirements.txtfrom end_to_end_pipeline import BioDrZeroEndToEnd, PipelineConfig
# Initialize pipeline with real API integration
config = PipelineConfig(use_real_apis=True)
pipeline = BioDrZeroEndToEnd(config)
# Run complete pipeline
documents = ["FOXO3 regulates longevity through autophagy..."]
results = pipeline.run_complete_pipeline(documents, num_questions=5)
# View results
for qa in results['answers']:
print(f"Q: {qa['question']}")
print(f"A: {qa['answer']}")
print(f"Reward: {qa['reward']:.3f}\n")Or run the examples:
python3 examples/phase4_example.py # Complete end-to-end demo
python3 examples/test_real_apis.py # Test API integrationBio-DrZero uses a three-stage pipeline:
Documents → Proposer-HRPO → Questions
↓
Questions → RAG-MoE → Context (PubMed/UniProt/KEGG)
↓
Questions + Context → Solver-GRPO → Answers
Questions are classified by reasoning complexity:
- 0-hop: Direct extraction ("What is FOXO3?")
- 1-hop: Single database ("What pathways involve FOXO3?")
- 2-hop: Cross-database ("What drugs target FOXO3 pathway?")
- 3-hop: Multi-omics ("How do FOXO3 variants affect longevity?")
The system retrieves data from:
- PubMed: Biomedical literature via E-utilities API
- UniProt: Protein sequences and functions via REST API
- KEGG: Pathway annotations via REST API
Fallback to synthetic data if APIs fail.
from core import BiologicalProposerHRPO, ProposerConfig
proposer = BiologicalProposerHRPO(ProposerConfig())
questions = proposer.generate_questions(
documents=["mTOR regulates cellular aging..."],
num_questions=5
)from retriever import MoERetriever, MoERetrieverConfig
retriever = MoERetriever(MoERetrieverConfig())
docs = retriever.retrieve("FOXO3 longevity", top_k=3)from core import BiologicalSolverGRPO, SolverConfig
solver = BiologicalSolverGRPO(SolverConfig())
answers = solver.generate_answers(questions, contexts, hop_levels)Default configuration in core/config.py:
from core import get_default_config
config = get_default_config()
config.proposer.max_questions_per_doc = 5
config.solver.group_size = 8
config.retriever.fusion_method = "weighted_sum"Tested on aging-related questions:
- Question quality: 0.787 average
- Answer reward: 0.398 average
- API retrieval: 41 PubMed articles, 6 UniProt proteins, 2 KEGG pathways
- Training improvement: +5.5% over 2 epochs
bio-drzero/
├── core/ # Question/answer generation
│ ├── proposer_hrpo.py # Question generator
│ ├── solver_grpo.py # Answer generator
│ ├── hop_complexity.py # Complexity classifier
│ └── reward_functions.py # Reward computation
├── retriever/ # Document retrieval
│ ├── moe_retriever.py # MoE coordinator
│ ├── moe_router.py # Neural router
│ ├── expert_retrievers.py # Domain experts
│ └── real_expert_retrievers.py # API integration
├── utils/
│ └── api_clients.py # PubMed/UniProt/KEGG clients
├── examples/ # Usage examples
└── end_to_end_pipeline.py # Complete pipeline
Run tests:
python3 examples/phase1_example.py # Test question generation
python3 examples/phase2_example.py # Test answer generation
python3 examples/phase3_example.py # Test expert retrieval
python3 examples/phase4_example.py # Test full pipeline- Placeholder answer generation (template-based, not real LLM)
- Limited to 3 biological databases (PubMed, UniProt, KEGG)
- No fine-tuning on domain-specific datasets yet
- Requires internet connection for API calls
If you use Bio-DrZero in your research:
@software{biodrzero2026,
title={Bio-DrZero: Biological Zero-RAG Framework},
author={Franco, Antonio V.},
year={2026},
url={https://github.com/AntonioVFranco/bio-drzero}
}Based on the Zero-RAG methodology: arXiv:2601.07055
MIT License - see LICENSE file for details.
Feel free to contact me via email for any needs: [email protected]
Contributions welcome. Please open an issue to discuss changes before submitting a PR.