AstroAgents is a large language model-based, multi-agent AI system for hypothesis generation from mass spectrometry data. With upcoming sample return missions across the solar system and the increasing availability of mass spectrometry data, AstroAgents addresses the urgent need for methods that analyze such data within the context of existing astrobiology literature and generate plausible hypotheses regarding the emergence of life on Earth.
AstroAgents is structured around eight collaborative agents:
- Data Analyst: Interprets the mass spectrometry data
- Planner: Delegates specific data segments to the scientist agents
- Domain Scientists (3): Perform in-depth exploration of data segments
- Accumulator: Collects and deduplicates the generated hypotheses
- Literature Reviewer: Identifies relevant literature using Semantic Scholar
- Critic: Evaluates the hypotheses, offering rigorous suggestions for improvement
An astrobiology expert evaluated the novelty and plausibility of more than a hundred hypotheses generated from data obtained from eight meteorites and ten soil samples. Surprisingly, 36% were identified as plausible, and among those, 66% were novel.
- Install the required Python packages:
pip install -r requirements.txt
Before running AstroAgents, you need to prepare paper context from research papers:
- Create the necessary directories:
mkdir -p papers/md
-
Place your PDF research papers in the
papers/
directory -
Run the extraction script:
python extract_text.py
This will convert all PDFs to markdown format in the papers/md/
directory, which will be used by AstroAgents.
Basic usage:
python AstroAgents.py --llm_model claude --iterations 10 --anthropic_api_key YOUR_ANTHROPIC_API_KEY --semantic_scholar_api_key YOUR_SEMANTIC_SCHOLAR_API_KEY
Command-line arguments:
Argument | Description | Default |
---|---|---|
--paper_context_file |
Path to the paper context file | paper_context.md |
--input_prompt_file |
Path to the input prompt file | prompt.txt |
--llm_model |
LLM model to use (claude or gemini) | claude |
--iterations |
Number of iterations to run | 10 |
--anthropic_api_key |
Anthropic API key | ANTHROPIC_API_KEY |
--google_api_key |
Google API key | GOOGLE_API_KEY |
--semantic_scholar_api_key |
Semantic Scholar API key | SEMANTIC_SCHOLAR_API_KEY |
anthropic>=0.7.0
langchain>=0.1.0
langchain-anthropic>=0.1.0
langchain-google-genai>=0.0.5
google-generativeai>=0.3.0
colorama>=0.4.6
requests>=2.31.0
pymupdf4llm>=0.1.0
If you use AstroAgents in your research, please cite our paper:
@misc{saeedi2025astroagentsmultiagentaihypothesis,
title={AstroAgents: A Multi-Agent AI for Hypothesis Generation from Mass Spectrometry Data},
author={Daniel Saeedi and Denise Buckner and Jose C. Aponte and Amirali Aghazadeh},
year={2025},
eprint={2503.23170},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2503.23170},
}
This project is licensed under the MIT License - see the LICENSE file for details.
- The AstroAgents project website: https://astroagents.github.io/
- Semantic Scholar for their API access