Releases: oeg-upm/solarchem-corpus
Releases · oeg-upm/solarchem-corpus
Release list
SolarChemQA: Dataset, Benchmark and Software
- Dataset: A domain-expert-annotated corpus of solar chemistry papers covering seven experimental parameters (catalyst, co-catalyst, light source, lamp, reactor type, reaction medium, operation mode), with a filtered benchmark subset, an LLM-evaluation sample, and sentence-level retrieval evidence.
- Generation pipeline (
src/generation): a RAG pipeline that extracts evidence and infers the seven parameters from each paper. - Benchmark (
src/evaluation): three evaluation tasks: information retrieval (NDCG), RAG-strategy comparison, and an LLM performance leaderboard. - Reproducibility & citation: pinned
requirements.txt, Poetrypyproject.toml+ lockfile,CITATION.cff,codemeta.json, and an Apache-2.0 license.