GalicianBench is a benchmark for evaluating language models in Galician tasks. This is, it evaluates the ability of a language model to understand and generate Galician text. GalicianBench offers a combination of pre-existing, open datasets and datasets developed exclusivelly for this benchmark. All the details of GalicianBench will be published in a paper soon.
The new evaluation datasets included in GalicianBench are:
| Task | Category | Homepage |
|---|---|---|
| Belebele_gl | Reading Comprehension | https://huggingface.co/datasets/proxectonos/belebele_gl |
| GalCoLA | Linguistic Acceptability | https://huggingface.co/datasets/proxectonos/galcola |
| MGSM_ca | Math | https://huggingface.co/datasets/proxectonos/mgsm_gl |
| Parafrases_gl | Paraphrasing | https://huggingface.co/datasets/proxectonos/parafrases_gl |
| PAWS-gl | Paraphrasing | https://huggingface.co/datasets/proxectonos/PAWS-gl |
| OpenBookQA_gl | Question Answering | https://huggingface.co/datasets/proxectonos/openbookqa_gl |
| Summarization_gl | Summarization | https://huggingface.co/datasets/proxectonos/summarization_gl |
| TruthfulQA_gl | Truthfulness | https://huggingface.co/datasets/proxectonos/truthfulqa_gl |
| xnli_gl | NLI | https://huggingface.co/datasets/proxectonos/xnli_gl |
| xstorycloze_gl | Commonsense Reasoning | https://huggingface.co/datasets/proxectonos/xstorycloze_gl |
The datasets included in GalicianBench that have been made public in previous pubications are:
| Task | Category | Paper title | Homepage |
|---|---|---|---|
| FLORES_gl | Translation | The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation | https://huggingface.co/datasets/facebook/flores |
@inproceedings{baucells-etal-2025-iberobench,
title = "{I}bero{B}ench: A Benchmark for {LLM} Evaluation in {I}berian Languages",
author = "Baucells, Irene and
Aula-Blasco, Javier and
de-Dios-Flores, Iria and
Paniagua Su{\'a}rez, Silvia and
Perez, Naiara and
Salles, Anna and
Sotelo Docio, Susana and
Falc{\~a}o, J{\'u}lia and
Saiz, Jose Javier and
Sepulveda Torres, Robiert and
Barnes, Jeremy and
Gamallo, Pablo and
Gonzalez-Agirre, Aitor and
Rigau, German and
Villegas, Marta",
editor = "Rambow, Owen and
Wanner, Leo and
Apidianaki, Marianna and
Al-Khalifa, Hend and
Eugenio, Barbara Di and
Schockaert, Steven",
booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
month = jan,
year = "2025",
address = "Abu Dhabi, UAE",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.coling-main.699/",
pages = "10491--10519",
}
galician_bench: All tasks included in GalicianBench.flores_gl: All FLORES translation tasks from or to Galician.
The following tasks evaluate tasks on GalicianBench dataset using various scoring methods.
belebele_glg_Latnflores_glflores_gl-caflores_gl-deflores_gl-enflores_gl-esflores_gl-euflores_gl-frflores_gl-itflores_gl-ptflores_ca-glflores_de-glflores_en-glflores_es-glflores_eu-glflores_fr-glflores_it-glflores_pt-glgalcolasummarization_glparafrases_glpaws_glopenbookqa_glmgsm_direct_gltruthfulqa_glxnli_glxstorycloze_gl
- Is the task an existing benchmark in the literature?
- Have you referenced the original paper that introduced the task?
- If yes, does the original paper provide a reference implementation?
- Yes, original implementation contributed by author of the benchmark
If other tasks on this dataset are already supported:
- Is the "Main" variant of this task clearly denoted?
- Have you provided a short sentence in a README on what each new variant adds / evaluates?
- Have you noted which, if any, published evaluation setups are matched by this variant?