Hi, I am interested in how to extract multimodal features for emotional detection.
In your code, you use an existing model to extract video features without re-training. So how to get the original model? How to get the CNN weight?
For text, do you also use an existing model to extract features?
Hi, I am interested in how to extract multimodal features for emotional detection.
In your code, you use an existing model to extract video features without re-training. So how to get the original model? How to get the CNN weight?
For text, do you also use an existing model to extract features?