TeleQuAD is a suite of question-answering datasets and models specifically designed for the telecommunications domain. It provides various QA capabilities including extractive, generative, retrieval-augmented generation (RAG), and tabular structured data question answering.
TeleQuAD is organized into the following task-specific subdirectories, each containing the respective dataset:
- TeleQuAD-Extractive: The extractive QA dataset based on technical documentation from 3GPP documents.
- TeleQuAD-Tabular: QA systems for table structured telecom data (specs, configurations, etc.)
- Clone the repository and change to the directory.
- Choose your QA task type and change to the relevant subdirectory.
- Follow the task-specific README available for each dataset in the respective folder.
Contributions to the dataset are welcome, please raise a pull request and we would review the changes.
[1] Holm, Henrik. "Bidirectional Encoder Representations from Transformers (BERT) for question answering in the telecom domain: Adapting a BERT-like language model to the telecom domain using the electra pre-training approach." (2021).
[2] Gunnarsson, Maria. "Multi-hop neural question answering in the telecom domain.)" LTH, Lund University: Lund, Sweden(2021).
[3] Bissessar, Daniel and Alexander Bois. "Evaluation of methods for question answering data generation: Using large language models." (2022).
[4] Nimara, Doumitrou Daniil, Fitsum Gaim Gebre and Vincent Huang. "Entity Recognition in Telecommunications using Domain-adapted Language Models." 2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN 2024).
[5] Karapantelakis, Athanasios, et al. "Using Large Language Models to understand telecom standards." 2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN 2024).
[6] Roychowdhury, Sujoy, Sumit Soman, HG Ranjani, Avantika Sharma, Neeraj Gunda and Sai Krishna Bala. “Evaluation of Table Representations to Answer Questions from Tables in Documents : A Case Study using 3GPP Specifications”. arXiv preprint arXiv:2408.17008 (2024).
[7] Roychowdhury, Sujoy, Sumit Soman, HG Ranjani, Neeraj Gunda, Vansh Chhabra and Sai Krishna Bala. "Evaluation of RAG Metrics for Question Answering in the Telecom Domain." Workshop on Foundation Models in the Wild, International Conference on Machine Learning (ICML 2024).
[8] Roychowdhury, Sujoy, Sumit Soman, HG Ranjani, Neeraj Gunda, Vansh Chhabra, Subhadip Bandyopadhyay and Sai Krishna Bala. “Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval”, Workshop on Next-Gen Networks through LLMs, Action Models, and Multi-Agent Systems, International Conference on Communications (ICC 2025).
If you use TeleQuAD in your research, please cite:
@article{
telequad2025,
title={TeleQuAD: A Suite of Question Answering Datasets for the Telecom Domain},
author={Fitsum Gebre and Henrik Holm and Maria Gunnarsson and Doumitrou Nimara and Jieqiang Wei and Vincent Huang and Avantika Sharma and H G Ranjani},
booktitle={Ericsson},
year={2025}
}
Ericsson (c) 2024-2025
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
TeleQuAD is developed and maintained by Ericsson AB and published for research purposes.