A multi-modal AI system to detect emotional distress and depression from voice analysis.
This project is a full-stack application designed to analyze voice recordings to detect early signs of depression. It works by combining acoustic analysis (how something is said) with linguistic analysis (what is said). The system processes audio to extract features like pitch and silence, and simultaneously transcribes the speech to analyze its sentiment and emotion. All these data points are then fused by an AI orchestrator to generate a comprehensive risk assessment.
- Acoustic Feature Extraction: Analyzes raw audio for pitch (mean, std, monotony index) and silence patterns (average pause duration, speech-to-silence ratio).
- Linguistic Sentiment Analysis: Uses a custom-trained Bi-Directional LSTM model (
.h5) to classify transcribed text into categories like Anxiety, Depression, or Normal. - Linguistic Emotion Analysis: Uses a
transformerspipeline to detect emotions like joy, sadness, and anger from the text. - Agentic AI Orchestration: Employs a Gemini LLM to receive all inputs (acoustic, sentiment, emotion) and make a final decision, such as dispatching to a specific agent.
- Web Interface: A vanilla JavaScript frontend for user authentication, audio recording, file upload, and a dashboard with
Chart.jsvisualizations.
- Language(s): Python, JavaScript (ES6+), HTML5, CSS3
- Frameworks: (N/A)
- Libraries:
- Python:
tensorflow(Keras),transformers,librosa,noisereduce,pydub,soundfile,pandas,scikit-learn,google-generative-ai,elevenlabs - JavaScript:
Chart.js
- Python:
- Tools: Jupyter Notebook
A brief outline of the project's structure based on the provided files.
/
├── /frontend/
│ ├── index.html (Main web interface)
│ ├── script.js (Frontend logic, API calls, Chart.js)
│ └── styles.css (All styling)
│
├── /backend/
│ ├── audio.py (Core AudioPreprocessor class)
│ └── pitch.py (Pitch extraction utilities)
│
├── /model/
│ ├── mental_health_model.h5 (Trained Bi-LSTM sentiment model)
│ ├── emotions.py (Transformer-based emotion classifier)
│ └── orch.py (Agentic AI orchestrator with Gemini)
│
└── test.ipynb (Jupyter Notebook for training the .h5 model)
- Python 3.10+
- A local web server (e.g., Python's
http.serverorLive Serverfor VSCode) - API keys for Google Gemini and ElevenLabs.
git clone https://github.com/your-username/whatIsAnApple.git
cd whatIsAnAppleCreate a virtual environment and install the required Python packages.
python -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activate
pip install tensorflow pandas numpy scikit-learn librosa noisereduce pydub soundfile transformers google-generative-ai elevenlabsCreate a .env file in the root directory and add your API keys.
Note: The orch.py script currently has keys hardcoded. You must modify it to load from os.environ for security.
-
Run the Frontend: Serve the
/frontenddirectory with a local web server.# Example using Python's built-in server from the /frontend directory cd frontend python -m http.server
Open your browser to
http://localhost:8000. -
Run a Backend Script (Example): You can test the orchestrator (with mock data) by running it directly.
python model/orch.py
Environment variables should be set in a .env file.
# API keys for AI models and TTS
ELEVENLABS_API_KEY="your_elevenlabs_key_here"
GEMINI_API_KEY="your_gemini_key_here"The frontend is configured to call a backend API hosted at a base URL.
POST /upload-audio/: Endpoint for submitting the audio file for full analysis.POST /text-output/: Endpoint for submitting journal text for analysis.POST /final: Final endpoint called by the frontend.
- Integrate Live Models: Replace the mock functions in
model/orch.pywith actual calls to themental_health_model.h5andemotions.pyclassifiers. - Speech-to-Text: Implement an STT service (e.g., Google Speech-to-Text, Whisper) to bridge the gap between
audio.pyand the linguistic models. - Refactor Frontend: Break down the monolithic
frontend/script.jsinto modular components. - Secure Keys: Modify
model/orch.pyto load API keys from environment variables instead of hardcoding them.
Please open an issue to discuss any changes or features you wish to add. Pull requests are welcome but should follow a clear description of the changes made.