Benchmark data and automated evaluators for validating the PaperOrchestra pipeline proposed in arXiv 2604.05018 (Google).
This repo hosts PaperWritingBench — reverse-engineered raw materials from 200 top-tier AI conference papers plus automated evaluators — used to benchmark the multi-agent pipeline implementation in the sister repo.
Input Processing → Literature Synthesis → Manuscript Generation → Visual Creation → Output
v0.1 — data creation only: Brain-science subset drawn from NeurIPS, ICLR, ICML, and CVPR (2020-2025). Target 200 papers with topic-diversity and venue-weighted quotas. Evaluator implementation (Citation F1, LLM-as-a-Judge, etc.) is deferred to v0.2.
papers/— per-paper reverse-engineered raw materials (one dir per entry)evaluators/— automated quality evaluators (per stage + overall)metadata/— benchmark metadata, paper schema, selection criteriapaper_schema.json— JSON schema for one benchmark entryselection_criteria.yaml— brain-science filter rules + venue×year quota
scripts/— ingestion, parsing, raw-material generation, evaluation (TBD)docs/— plan, pipeline stages, benchmark protocolbenchmark_plan.md— full execution plan (phases, risks, tooling)
- Execution plan:
docs/benchmark_plan.md - Entry schema:
metadata/paper_schema.json - Selection criteria:
metadata/selection_criteria.yaml
- Paper: https://arxiv.org/abs/2604.05018
- Project page: https://yiwen-song.github.io/paper_orchestra/
- Sister repo (pipeline implementation):
transconnectome/PaperOrchestrator