Skip to content

Machine Learning Models to decode compound emotions in a Urdu-based text

Notifications You must be signed in to change notification settings

dejah22/Multi-Label-Emotion-Classification-in-Urdu

Repository files navigation

EmoThreat: Emotions & Threat Detection in Urdu, FIRE 2022

Code scripts and working notes for the participation in Forum for Information Retrieval Evaluation, 2022 Proceedings conducted as part of the Association for Computing Machinery (ACM) DL.

Implementation Stack: Python, NumPy, Keras, TensorFlow, Scikit-learn

Quick Links

Cite Us

Link to the Research Paper.

If you find our work useful in your research, don't forget to cite us!

@inproceedings{madhusankar2022multi,
  title={Multi-Label Emotion Classification in Urdu.},
  author={Madhusankar, Dejah and Karthikeyan, Avanthika and Bharathi, B},
  booktitle={FIRE (Working Notes)},
  pages={231--237},
  year={2022}
}

Project Description

This project uses ML algorithms to automate emotion-analysis in the Urdu language. It takes in a piece of Urdu text, and identifies the multiple combination of emotions (hence, multi-label), that may be conveyed by it. The identified emotions are categorised to fall under Ekman’s six basic emotions and neutrality.

Task Objective: Multi-label emotion classification in Urdu

This task involves classifying Urdu tweets in Nastalīq script into one or more of Ekman’s six basic emotions (anger, disgust, fear, sadness, surprise, happiness) plus neutrality. Given Urdu’s widespread use on social media, the dataset fills a crucial gap for understanding public emotions, enabling applications in NLP, disaster management, public policy, commerce, and public health.

Repo structure

There are 5 Jupyter notebooks (written to execute on Google's Colaboratory) each containing the code for training and testing each ML model-combination.

  • Training Data

    • Has 7800 tweets in the Urdu language
    • Contains 8 columns of data. Each Urdu text is accompanied by corresponding emotion-labels (1's signify the presence of a particular emotion)
  • Testing Data - Has 1950 Urdu sentences for testing

Setup Instructions

  1. Go to Google Colab and create a new notebook.

  2. Clone the Repository - In a new code cell, type the following command: !git clone https://github.com/dejah22/Multi-Label-Emotion-Classification-in-Urdu.git

  3. Use cd to change to the directory of the cloned repository and open the desired .ipynb file.

    Tips

    1. Install any missing dependencies or required libraries using: !pip install
    2. Save your changes back to GitHub

Project Recognition and Acknowledgments :)

I would first like to thank Avanthika K and Dr. Bharathi B for working on this project with me. Kudos guys!

Upon acceptance, we presented our work at the FIRE 2022 National Conference held in Kollkata by the Indian Statistical Institute . I sincerely express my gratitude to them for letting us adopt their dataset, as well as for supporting our work.

About

Machine Learning Models to decode compound emotions in a Urdu-based text

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published