MAGTKD

Jie Li, Shifei Ding, Lili Guo, and Xuan Li, "Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation". (IJCAI 2025, Pytorch Code)

Abstract

Emotion Recognition in Conversation (ERC) aims to detect the emotions of individual utterances within a conversation. Generating efficient and modality-specific representations for each utterance remains a significant challenge. Previous studies have proposed various models to integrate features extracted using different modality-specific encoders. However, they neglect the varying contributions of modalities to this task and introduce high complexity by aligning modalities at the frame level. To address these challenges, we propose the Multi-modal Anchor Gated Transformer with Knowledge Distillation (MAGTKD) for the ERC task. Specifically, prompt learning is employed to enhance textual modality representations, while knowledge distillation is utilized to strengthen representations of weaker modalities. Furthermore, we introduce a multi-modal anchor gated transformer to effectively integrate utterance-level representations across modalities. Extensive experiments on the IEMOCAP and MELD datasets demonstrate the effectiveness of knowledge distillation in enhancing modality representations and achieve state-of-the-art performance in emotion recognition. Our code is available at: https://github.com/JieLi-dd/MAGTKD.

Requirements

The following pretrained models are used for feature extraction from the three modalities:

Text Modality: RoBERTa-large
Audio Modality: data2vec-audio-base-960h
Video Modality: Videomae-base and and Timesformer-base-finetuned-k400

Python environment dependencies:

python==3.9.19
torch==1.13.1+cu116
torchvision==0.14.1+cu116   
torchaudio==0.13.1+cu116
transformers==4.27.2

Datasets

We use two widely adopted ERC datasets:

IEMOCAP
MELD

In the first stage, modality-specific features are extracted via knowledge distillation. You may download and use the pre-extracted features provided below.

Project
| - datasets
    | - IEMOCAP
        | - IEMOCAP_train.csv
        | - IEMOCAP_dev.csv
        | - IEMOCAP_test.csv
        | - Session1
        ...
    | - MELD
        | - train_meld_emo.csv
        | - dev_meld_emo.csv
        | - test_meld_emo.csv
        | - dev_splits_complete
        | - train_splits
        | - output_repeated_splits_test
        ...
| - pretrained_model
    | - roberta-large
    | - data2vec-audio-base-960h
    | - timesformer-base-finetuned-k400
    | - videomae-base
| - IEMOCAP
    | - feature
        | - video
            | - train
            | - dev
            | - test
    | - IEMOCAP
        | - save_model
            | - text.bin
            | - audio.bin
            | - video.bin
            | - text_KD_audio.bin
            | - video_KD_text.bin
    | - model.py
    | - utils.py
    | - preprocessing.py
    | - dataset.py
    | - text.py
    | - audio.py
    | - video.py
    | - video_feature_extract.py
    | - KD.py
| - MELD

Train and test

To train from scratch on the IEMOCAP dataset:

# 1. Extract text features
python text.py  

# 2. Extract audio features
python audio.py  

# 3. Extract raw video features
python video_feature_extract.py  

# 4. Process video features
python video.py  

# 5. Perform knowledge distillation (audio student, text teacher)
python KD.py --student audio --teacher text  

# 6. Perform knowledge distillation (video student, text teacher)
python KD.py --student video --teacher text  

# 7. Extract fused features from all modalities (first stage)
python extract_first_stage_features.py  

# 8. Perform multimodal fusion training and testing
python multimodal_fusion.py

For the MELD dataset: The directory structure and training process are the same as for IEMOCAP.

To run testing only: Please first download and place the distilled first-stage features into the corresponding dataset folders:

Then run:

python multimodal_fusion.py --train True

Cite

If you find our work useful, please consider citing the following the paper:

@inproceedings{Li-2025-MAGTKD,
  title     = {Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation},
  author    = {Li, Jie and Ding, Shifei and Guo, Lili and Li, Xuan},
  booktitle = {Proceedings of the Thirty-Fourth International Joint Conference on
               Artificial Intelligence, {IJCAI-25}},
  pages     = {8141--8149},
  year      = {2025},
  doi       = {10.24963/ijcai.2025/905},
}

References

It is recommended to cite these papers simultaneously.

@inproceedings{song-etal-2022-supervised,
    title = "Supervised Prototypical Contrastive Learning for Emotion Recognition in Conversation",
    author = "Song, Xiaohui  and
      Huang, Longtao  and
      Xue, Hui  and
      Hu, Songlin",
    booktitle = "EMNLP",
    year = "2022",
    pages = "5197--5206",
}
@inproceedings{yun-etal-2024-telme,
    title = "{T}el{ME}: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation",
    author = "Yun, Taeyang  and
      Lim, Hyunkuk  and
      Lee, Jeonghwan  and
      Song, Min",
    booktitle = "NAACL",
    year = "2024",
    pages = "82--95",
}
@ARTICLE{10109845,
  author={Ma, Hui and Wang, Jian and Lin, Hongfei and Zhang, Bo and Zhang, Yijia and Xu, Bo},
  journal={IEEE Transactions on Multimedia}, 
  title={A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations}, 
  year={2024},
  volume={26},
  number={},
  pages={776-788},
}

Acknowledgement

Our method builds upon the implementations of the following projects:

We thank the authors of these works for sharing their codebases.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
IEMOCAP		IEMOCAP
MELD		MELD
datasets		datasets
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAGTKD

Abstract

Requirements

Datasets

Train and test

Cite

References

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

JieLi-dd/MAGTKD

Folders and files

Latest commit

History

Repository files navigation

MAGTKD

Abstract

Requirements

Datasets

Train and test

Cite

References

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages