An AI-powered system that detects emotions from both Text (Speech) and Facial Expressions in real-time. It uses a multimodal fusion approach to provide supportive feedback through an interactive anime character.
- Vision: Facial Emotion Recognition (FER) via Vision Transformers (ViT).
- Audio/Text: Speech-to-Text with XLM-EMO emotion classification.
- AI Logic: TinyLlama-1.1B for generating natural, supportive responses.
- Interactive UI: Dynamic background changes and video feedback loops.
To keep this project lightweight, you must provide your own media assets.
-
Clone the repo: '''bash git clone https://github.com/Ayushhkkux/Anime_Multimodel_Emotion_Detection.git cd Anime_Multimodel_Emotion_Detection
-
Setup the assets/ folder: Create a folder named assets in the root directory and add: intro.mp4 - Played while the AI is listening. joy.mp4 - Played when a happy emotion is detected. sad.mp4 - Played when a sad emotion is detected. character_closed.png - Image of character with mouth closed. character_open.png - Image of character with mouth open.
🛠️ Installation & Setup Option 1: Using Pip (Local) pip install -r requirements.txt
-
Run the application python app.py
-
Open your browser to: http://localhost:7860
Option :2 using Docker
🧠 How it Works (Multimodal Fusion) This project doesn't just look at one thing. It "fuses" data: Text Analysis: Analyzes the sentiment of what you say. Face Analysis: Scans your facial expressions through the camera. The Fusion: If the face shows 80% joy but the text is neutral, the AI prioritizes the face to respond with a "Joy" video!
📺 Video Tutorial Watch how I built this on my YouTube channel: https://youtu.be/VNn2a7_wTZU?si=LM68ZLj4or9OM8ku