Name	Name	Last commit message	Last commit date
Latest commit History 3 Commits
cognivoice	cognivoice
images	images
.gitignore	.gitignore
README.md	README.md
audio_to_text.py	audio_to_text.py
grid_search_xgb.py	grid_search_xgb.py
run_cls.sh	run_cls.sh
run_reg.sh	run_reg.sh
train.py	train.py
translate.py	translate.py

Name

Last commit message

Last commit date

cognivoice

CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech

INTERSPEECH 2024, Paper

Authors:

Key idea of CogniVoice

Given an input speech sample, we extract features using transformers for speech and its corresponding text as well as acoustic features obtained from DisVoice. A standard training approach concatenates all features and optimizes the cross-entropy loss. To model potential shortcut signals within each feature sets, we propose to train with PoE, applied to the multi-feature model and several uni-feature models, which predict the labels using only one set of features separately. Our approach obtain ensemble logits using the multi-feature and uni-feature models, see element-wsie product.

In addition, we show how PoE can reduce the loss for samples correctly predicted using both multimodal and unimodal inputs, and increase the loss for samples that cannot be accurately predicted using one of the modalities, which allows for identifying and mitigating weaknesses in the model's predictive capabilities. Therefore, the resulting ensemble logits can account for the spurious correlations in the dataset, while also being regularized to mitigate overfitting.

Datasets

We use the TAUKADIAL 2024 challenge dataset. Please download the dataset from Link

Requirements

CogniVoice has been tested using Python >=3.6 and transformers>=4.25.

Running the code

Due to the limited data size, we use k-fold cross-validation and report the average validation performance.

To run MCI detection (classification)

bash run_cls.sh

To run MMSE prediction (regression)

bash run_reg.sh

Some important arguments are

--poe: whether to use Product of Experts (PoE)

--poe_alpha: alpha to control debiasing strength in PoE

All Huggingface TrainingArguments can be passed in.

Citation

If you find CogniVoice useful for your research, please consider citing this paper:

@inproceedings{cheng24c_interspeech,
  title     = {CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech},
  author    = {Jiali Cheng and Mohamed Elgaar and Nidhi Vakil and Hadi Amiri},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {4308--4312},
  doi       = {10.21437/Interspeech.2024-2370},
}

Miscellaneous

Please send any questions you might have about the code and/or the algorithm to jiali_cheng@uml.edu.

License

CogniVoice codebase is released under the MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech

INTERSPEECH 2024, Paper

Authors:

Key idea of CogniVoice

Datasets

Requirements

Running the code

Citation

Miscellaneous

License

About

Uh oh!

Releases

Packages

Languages

CLU-UML/CogniVoice

Folders and files

Latest commit

History

Repository files navigation

CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech

INTERSPEECH 2024, Paper

Authors:

Key idea of CogniVoice

Datasets

Requirements

Running the code

Citation

Miscellaneous

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages