Skip to content

2nd LLMs4Subject - GermEval 2025 Data Release

Choose a tag to compare

@SameerSamji SameerSamji released this 05 Aug 09:43
· 8 commits to main since this release

llms4subjects - GermEval 2025 v1.0.0

  • Initial public release of the official LLMs4Subjects shared task repository for GermEval 2025. Structured around building and evaluating large language model systems for automated subject tagging of technical records from Leibniz University’s Technical Library.

  • Features detailed datasets with GND vocabulary and the TIBKAT technical library catalog in English and German.

  • Provides standard data splits and evaluation format for two subtasks: (1) Multi-domain classification and (2) Subject indexing.

Included Components

  • GND folder: Full subject taxonomy in machine- and human-readable formats. It can be accessed here

  • TIBKAT folder: Technical library records with train/dev/test split across ~140 k records in bilingual English and German. It can be accessed here