Skip to content

JinheonBaek/ResearchAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ResearchAgent: Iterative Research Idea Generation over Scientific Literature

Paper Python

🚀 Welcome to the official repository of ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models!

Authors: Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang

ResearchAgent leverages Large Language Models (LLMs) to help researchers rapidly ideate and refine research problems grounded in existing literature. Starting from a core scientific paper, the system retrieves relevant publications and knowledge entities, then iteratively proposes and improves problems, methods, and experiment designs using collaborating LLM-based reviewing agents that provide structured feedback across multiple dimensions.

Overview

  • Inputs: a set of Semantic Scholar paper IDs and a knowledge store mined from papers (entities and co-occurrences).
  • Retrieval: fetch the target paper, pull relevant references via the Semantic Scholar Graph API, and select related entities from the knowledge store.
  • Problem Identification: generate a candidate research problem and rationale using LLMs.
  • Problem Validation: obtain multi-criteria reviews and feedback from LLM reviewers (five metrics) in parallel.
  • Iteration: refine the problem based on low-scoring aspects and repeat for a few rounds, keeping a concise history.

Repository structure

  • code/
    • main.py — entrypoint to run the end-to-end pipeline
    • knowledge/
      • store.py — lightweight knowledge store and entity retrieval
    • models/
      • openai.py — OpenAI Chat Completions wrapper with retries/timeouts
    • pipelines/
      • research_pipeline.py — orchestration of generate and validate iterations
      • agents/
        • base.py — shared prompt-formatting helpers
        • problem_identifier.py — generates/refines problems
        • problem_validator.py — reviews problems across 5 metrics in parallel
        • ...
    • utils/
      • s2.py — Semantic Scholar API helpers (papers, references, embeddings)
      • data_io.py — JSONL loading and ID utilities
      • formatting.py — small text utilities
  • data/
    • papers.jsonl — input list of paper IDs
    • knowledge.jsonl — knowledge base (entities/co-occurrence)

Running

Set your OpenAI key and run the pipeline:

export OPENAI_API_KEY=YOUR_KEY
python ./code/main.py \
	--data-path ./data/papers.jsonl \
	--knowledge-path ./data/knowledge.jsonl \
	--model-name gpt-4o

Citation

If you use or build upon this project, please cite:

@inproceedings{Baek2025ResearchAgent,
  title={ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models},
  author={Jinheon Baek and Sujay Kumar Jauhar and Silviu Cucerzan and Sung Ju Hwang},
  booktitle={NAACL},
  year={2025},
  url={https://api.semanticscholar.org/CorpusID:269042844}
}

About

ResearchAgent

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages