Skip to content

A multimodal deep learning project for classifying mental health-related memes, combining both textual and visual features.

Notifications You must be signed in to change notification settings

shobhitraj1/Multimodal-Meme-Classification

Repository files navigation

Multimodal-Meme-Classification

This repository presents a multimodal deep learning project for classifying mental health-related memes, combining both textual and visual features. The system is designed for robust multi-label and single-label classification across depression and anxiety categories.
The project was developed for the Natural Language Processing course at IIIT Delhi in Winter 2025.

🛠️ Features:

  • OCR Extraction via Google Docs API: Extracted textual content from memes using Google Drive and Docs API, enabling high-accuracy OCR by converting images to Google Docs and parsing the returned text. Scripts are available in the /Baselines folder.
  • Multimodal Pipeline: Combines OCR-based text and image content using Mental-RoBERTa and CLIP Vision encoders.
  • Cross-Attention Fusion: Cross-modal attention layers to align and integrate features from text and image modalities.
  • Mixture-of-Experts: Adaptive expert fusion using a gating network to improve representational capacity.
  • Contrastive Learning: Alignment of text, image, and fused embeddings through contrastive loss in shared embedding space.
  • Classification: Transformer-based classifier predicts multi-label depression (RESTORE) and single-label anxiety (AxiOM).
  • Ablation Study: Comprehensive experiments performed to analyze the impact of each module (e.g., contrastive loss, MoE, OCR, figurative reasoning).
  • Error Analysis: Investigated common misclassifications, especially for overlapping meme types and ambiguous sarcastic content, with qualitative examples.

📋 Installation & Usage:

  • Clone the repository and install the following Python libraries:
pip install pandas numpy matplotlib scikit-learn jupyter transformers torch torchvision tqdm pillow
  • Follow the instructions in the Jupyter notebook to train or evaluate the model.

📙 Notebooks & Dataset Details:

  • trained_anxiety.ipynb: Classifies memes into one of several anxiety categories (single-label classification) using AxiOM dataset.
  • trained_depression.ipynb: Predicts multiple subcategories of depression (multi-label classification) using RESTORE dataset.

💾 Download Pretrained Weights & Datasets:

📊 Results:

  • AxiOM Dataset (Anxiety Meme Classification) Results:
Model Macro-F1 Weighted-F1
OCR + BERT 0.6163 0.6143
OCR + Mental-BERT 0.6235 0.6232
OCR + LLAVA + Mental-BERT 0.6183 0.6173
Proposed Approach 0.6851 0.6848
  • RESTORE Dataset (Depression Meme Multi-label Classification) Results:
Model Macro-F1 Weighted-F1
OCR + BERT 0.6355 0.6347
OCR + Mental-BERT 0.6313 0.6249
OCR + LLAVA + Mental-BERT 0.6298 0.6263
Proposed Approach 0.6606 0.6628

🧑‍🤝‍🧑 Other Contributors:

My IIIT Delhi batchmates Manan Aggarwal & Souparno Ghose also contributed in this project.

📌 Important: Please make sure to follow the guidelines and policies outlined by the institution regarding the use of shared coursework materials. Use this repository responsibly and avoid any violations of academic integrity. Codes are provided for reference purposes only. It's recommended to understand the codes and implement them independently.

About

A multimodal deep learning project for classifying mental health-related memes, combining both textual and visual features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published