by Mohamed Jailam
MSc Information Technology (Villa College / UWE)
FENVARU is the first known benchmark designed to evaluate how well modern large language models (LLMs) understand and support Dhivehi, the national language of the Maldives. This project was developed as part of the MScIT dissertation submitted to Villa College / University of the West of England (UWE).
The benchmark includes a comprehensive suite of evaluation tasks and analyses, covering both generative and classification-based capabilities of models across four key NLP tasks:
- Machine Translation (MT)
- Named Entity Recognition (NER)
- Text Classification
- Question Answering (QA)
- Google Gemini 2.5 Pro was the top performer across most tasks, especially in MT and NER.
- Anthropic Claude models showed promise in QA and classification tasks.
- Open-source models like LLaMA, DeepSeek, and Mistral showed poor zero-shot Dhivehi support.
- Structured tasks like NER and classification yielded higher scores compared to generation-heavy tasks like QA and MT.
- The overall results highlight the low-resource status of Dhivehi and the need for more inclusive model training.
You are encouraged to extend this benchmark in the following ways:
- Expand Dataset: Increase the number and diversity of samples in each task, especially for MT and QA.
- Fine-tune Open Models: Use the provided dataset to fine-tune or adapter-train open-source models for Dhivehi.
- Add More Tasks: Include syntactic parsing, summarization, or speech-to-text tasks in Dhivehi.
- Evaluate Bias and Ethics: Explore how models treat Dhivehi cultural, religious, and political context.
- Community Contributions: Collaborate on improving Dhivehi NLP resources and open datasets.
For collaboration, feedback, or research queries, please contact:
Mohamed Jailam
๐ง [email protected]
If you use this benchmark or any part of this research, please cite: