Skip to content

SolarChemQA: Dataset, Benchmark and Software

Latest

Choose a tag to compare

@OEG-Clark OEG-Clark released this 25 Jun 09:54
acbe7f4
  • Dataset: A domain-expert-annotated corpus of solar chemistry papers covering seven experimental parameters (catalyst, co-catalyst, light source, lamp, reactor type, reaction medium, operation mode), with a filtered benchmark subset, an LLM-evaluation sample, and sentence-level retrieval evidence.
  • Generation pipeline (src/generation): a RAG pipeline that extracts evidence and infers the seven parameters from each paper.
  • Benchmark (src/evaluation): three evaluation tasks: information retrieval (NDCG), RAG-strategy comparison, and an LLM performance leaderboard.
  • Reproducibility & citation: pinned requirements.txt, Poetry pyproject.toml + lockfile, CITATION.cff, codemeta.json, and an Apache-2.0 license.