Welcome! This is the official repo for AL-QASIDA (Robinson et al., 2025), the first comprehensive evaluation for LLM dialectal Arabic proficiency, presented in this paper: AL-QASIDA: Analyzing LLM Quality and Accuracy Systematically in Dialectal Arabic
Please cite:
@misc{robinson2025alqasidaanalyzingllmquality,
title={AL-QASIDA: Analyzing LLM Quality and Accuracy Systematically in Dialectal Arabic},
author={Nathaniel R. Robinson and Shahd Abdelmoneim and Kelly Marchisio and Sebastian Ruder},
year={2025},
eprint={2412.04193},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.04193},
}
To update soon:
- 🚧 I noticed an issue that the
pre_humeval
outputs for MT tasks are not getting written properly without use ofeval/correct_scores.py
, due to absense ofllm_outputs/**/*samples*
files. Need to address this ASAP. - 🚧 Need to rename hehe --> cohere, TVD --> ADI2
- 🚧 Make tutorial video in Arabic
See our video tutorial for running AL-QASIDA here. Or feel free to go through the instructions at your own pace.
To set up the environment, run:
bash setup.sh
For instructions on running AL-QASIDA, refer to
in that order.
You could also skip to step #3 if you don't want to run AL-QASIDA on your own model.
The results for the models we evaluated in our paper are already in llm_outputs
.
If you have any questions or difficulty running our evaluation, please feel free to raise issues or pull requests, or to contact the authors. (Primary contact: [email protected])