Thank you for your interest in contributing! AutoEDP is a research framework that automates the evolution of deep research pipelines. We welcome bug reports, feature requests, documentation improvements, and code contributions.
By participating in this project, you agree to abide by our Code of Conduct (we follow the spirit of the Contributor Covenant v2.1).
- Contributor Covenant: https://www.contributor-covenant.org/version/2/1/code_of_conduct/
- Report bugs and request features via GitHub Issues
- Improve documentation (README, examples, comments, diagrams)
- Add new research topics (prompts + seed ideas)
- Enhance idea generation, novelty checks, or Deep Research integration
- Improve data processing and training (GRPO) pipeline
- Add support for additional base models in vLLM
Prerequisites:
- Python 3.10+ (Deep Researcher uses Python 3.11 via
uvxunder the hood) - macOS or Linux recommended; NVIDIA GPUs for vLLM/OpenRLHF
- vLLM >= 0.10.2; Ray is pulled via OpenRLHF
uvfor running the LangGraph Deep Researcher (viauvx)
Setup steps:
# Fork the repo on GitHub, then clone your fork
# git clone https://github.com/<you>/AutoEDP.git
cd AutoEDP
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
# Install uv (macOS example); see https://github.com/astral-sh/uv for other OSes
brew install uvEnvironment variables (used by various parts of the pipeline):
export RAY_CACHE_DIR="$HOME/.cache/ray" # Required for GRPO/Ray
export S2_API_KEY="<your_semantic_scholar_api_key>" # For novelty checks (semanticscholar)
export OPENALEX_MAIL_ADDRESS="you@example.com" # For novelty checks (openalex)
export DEEP_RESEARCHER_CWD="/path/to/open_deep_research" # Optional; where Deep Researcher project livesMinimal end-to-end run (single round):
python main.py \
--model_path Qwen/Qwen2.5-7B-Instruct \
--reward_model_path <hf_or_local_reward_model> \
--topics seir earthquake-prediction \
--n_rounds 1 \
--n_questions 4 \
--num_reflections 3 \
--paper_search_engine semanticscholar \
--batch_size 1 \
--n_epochs 3This will:
- Launch vLLM, the OpenAI-compatible wrapper (with logging), and the Deep Researcher service
- Generate and optionally novelty-check research questions
- Run Deep Researcher to produce final reports
- Convert conversations into training data and run GRPO
- Export the LoRA-merged model
See the README for more details.
main.py— Orchestrates iterative roundsgenerate_ideas.py— Idea generation + self-reflection; novelty checksdeepresearch.py— Deep Researcher startup and streaming interfacemiddleware.py/wrapper_server.py— OpenAI-compatible wrapper around vLLM with loggingsync_middleware.py— Sync wrapper for the middlewareutils.py— Server lifecycle, client creation, data processinggrpo.py— GRPO training + LoRA export via OpenRLHF + Raydata/— Topics, seeds, and per-round outputsexamples/— Docs and scripts for middleware/wrapper usage
To add a model that can be served by vLLM, extend utils.get_model_configs() with a new mapping:
model_path(Hugging Face id or local path)model_name(served name, e.g.,vllm:qwen-2.5-7b-instruct)max_model_len,tensor_parallel_size, andtool_call_parser
Ensure the model can load within available GPU memory.
Create a folder under data/topics/<topic>/ with:
prompt.json— Containssystemandtopic_descriptionseed_ideas.json— Array of seed idea objects
Then include the topic in the --topics CLI argument when running main.py.
generate_ideas.search_for_papers() supports semanticscholar and openalex.
- Semantic Scholar: set
S2_API_KEY - OpenAlex: set
OPENALEX_MAIL_ADDRESS(optional but recommended to avoid rate limits)
We integrate the open-source "Open Deep Research" LangGraph workflow.
- AutoEDP starts it via
uvxand the LangGraph CLI - Ensure
uvis installed; setDEEP_RESEARCHER_CWDto point to your project or place it at../open_deep_research
Run the wrapper directly:
python wrapper_server.py --port 8082 --vllm-url http://localhost:8081/v1 --log-dir ./logs/vllmOpenAI SDKs can target http://localhost:8082/v1. Logs are written to ./logs/vllm/*.jsonl.
See examples/middleware_wrapper/WRAPPER_SERVER.md for endpoints and operations.
- Follow PEP 8 and add type hints where practical
- Write clear docstrings and comments; keep public APIs stable
- Keep changes focused and incremental
- Documentation updates (README/examples) are part of the definition of done
- Optional (if installed): format with
blackand lint withruff
- (Optional) Open an issue to discuss significant changes
- Create a feature branch (
feature/<short-description>) - Commit with clear messages; Conventional Commits are appreciated
- Ensure you:
- Update docs and examples if behavior or usage changes
- Add or update small tests/examples where relevant
- Manually sanity-check the main flows you touched (e.g., a short
main.pyrun)
- Open a pull request against
mainand fill out the PR description:- What changed and why
- How it was tested (commands, configs)
- Any follow-ups or known limitations
When filing a bug report, please include:
- OS, Python version, GPU(s)
- Exact commands and arguments used
- Relevant environment variables (e.g.,
RAY_CACHE_DIR) - Logs from the wrapper (
./logs/vllm) and console output - Expected vs. actual behavior
Do not include secrets in code or logs. The wrapper accepts a dummy API key for local use. Use environment variables for tokens/keys. If you discover a security vulnerability, please report it privately to the maintainer.
By contributing, you agree that your contributions will be licensed under the MIT License.
- GitHub Issues: https://github.com/martinakaduc/AutoEDP/issues
- Maintainer email: nqduc@u.nus.edu