Skip to content

soheil-mp/emotion-analytics-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Emotion Classification Pipeline

Advanced NLP tool for extracting emotional insights from video and audio content

Python 3.11 MIT License Poetry
Dashboard Preview

Overview

Transform unstructured video and audio content into meaningful emotional analytics using our state-of-the-art NLP pipeline. Built with DeBERTa models and deployed on Azure ML, this system provides dual-mode prediction - choose between fast local inference or high-accuracy cloud processing with automatic NGROK URL conversion (no VPN required). Perfect for content analysis, customer sentiment tracking, and research applications.

🚀 Key Features

⚡ Local Prediction

  • Fast inference - On-premise processing
  • No network dependency - Works offline
  • Privacy-first - Data stays local
  • Low latency - Instant results

☁️ Azure Cloud Prediction

  • High accuracy - Latest trained models
  • Auto NGROK conversion - No VPN required
  • Scalable - Cloud infrastructure
  • Always updated - Latest model weights

Project Structure

./
├── .github/                         # GitHub Actions CI/CD workflows
├── .azuremlignore                   # Azure ML ignore patterns
├── .dockerignore                    # Docker build ignore patterns
├── .flake8                          # Python linting configuration
├── .gitignore                       # Git ignore rules
├── .pre-commit-config.yaml          # Pre-commit hook configurations
├── assets/                          # Static assets (images, logos, screenshots)
├── data/                            # Datasets and data processing
├── dist/                            # Distribution files (build artifacts)
├── docs/                            # Sphinx documentation
├── environment/                     # Environment configurations
├── frontend/                        # React.js web application
├── logs/                            # Application and system logs
├── mlruns/                          # MLflow experiment tracking
├── models/                          # Machine learning models and artifacts
├── monitoring/                      # Infrastructure monitoring
├── notebooks/                       # Jupyter notebooks for exploration
├── outputs/                         # Generated outputs and artifacts
├── results/                         # Experiment results and analysis
├── src/                             # Main source code
│   └── emotion_clf_pipeline/        # Core Python package
│       ├── __init__.py              # Package initialization
│       ├── api.py                   # FastAPI web service
│       ├── azure_endpoint.py        # Azure ML endpoint integration
│       ├── azure_hyperparameter_sweep.py # HPT on Azure ML
│       ├── azure_pipeline.py        # Azure ML pipeline orchestration
│       ├── azure_score.py           # Azure ML scoring functions
│       ├── azure_sync.py            # Azure ML synchronization
│       ├── cli.py                   # Command-line interface
│       ├── data.py                  # Data loading and preprocessing
│       ├── features.py              # Feature engineering
│       ├── model.py                 # DeBERTa model architecture
│       ├── monitoring.py            # System monitoring and metrics
│       ├── predict.py               # Prediction pipeline
│       ├── stt.py                   # Speech-to-text processing
│       ├── train.py                 # Model training pipeline
│       ├── transcript.py            # Transcript processing
│       ├── transcript_translator.py # Multi-language transcript support
│       └── translator.py            # Text translation utilities
├── tests/                           # Comprehensive test suite
├── docker-compose.yml               # Multi-container orchestration
├── docker-compose.build.yml         # Build-specific container config
├── Dockerfile                       # Backend container configuration
├── LICENSE                          # MIT license
├── poetry.lock                      # Poetry dependency lock file
├── pyproject.toml                   # Python project configuration (Poetry)
├── start-build.bat                  # Windows build script
├── start-production.bat             # Windows production deployment
└── README.md                        # This comprehensive documentation

Quick Start

Prerequisites

  • Python 3.11+
  • Docker (recommended)
  • Poetry for dependency management

1. Clone & Setup

git clone https://github.com/BredaUniversityADSAI/2024-25d-fai2-adsai-group-nlp6.git
cd 2024-25d-fai2-adsai-group-nlp6

2. Environment Configuration

Create .env file in the project root:

# Required API Keys
ASSEMBLYAI_API_KEY="your_assemblyai_key"
GEMINI_API_KEY="your_gemini_key"

# Azure ML Configuration (Optional - for cloud predictions)
AZURE_SUBSCRIPTION_ID="your_subscription_id"
AZURE_RESOURCE_GROUP="buas-y2"
AZURE_WORKSPACE_NAME="NLP6-2025"
AZURE_LOCATION="westeurope"
AZURE_TENANT_ID="your_tenant_id"
AZURE_CLIENT_ID="your_client_id"
AZURE_CLIENT_SECRET="your_client_secret"

# Azure ML Endpoint (automatically converts private URLs to NGROK)
AZURE_ENDPOINT_URL="http://194.171.191.227:30526/api/v1/endpoint/deberta-emotion-clf-endpoint/score"
AZURE_API_KEY="your_azure_endpoint_key"

3. Launch Application

🐳 Docker (Recommended)

Complete full-stack deployment:

docker-compose up --build

Access:
• Frontend: http://localhost:3121
• API: http://localhost:3120

💻 Development Mode

For development and debugging:

poetry install && poetry shell
uvicorn src.emotion_clf_pipeline.api:app --reload

CLI Usage:

python -m emotion_clf_pipeline.cli predict "YOUTUBE_URL"

4. API Usage Examples

Dual-Mode API: Choose between fast local inference or high-accuracy cloud prediction.

Local Prediction (Fast, on-premise):

# cURL example
curl -X POST "http://localhost:3120/predict" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "method": "local"}'

# PowerShell example (Windows)
Invoke-RestMethod -Uri "http://localhost:3120/predict" -Method POST \
  -ContentType "application/json" \
  -Body '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "method": "local"}'

Azure Prediction (High-accuracy, cloud-based with automatic NGROK conversion):

# cURL example
curl -X POST "http://localhost:3120/predict" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "method": "azure"}'

# PowerShell example (Windows)
Invoke-RestMethod -Uri "http://localhost:3120/predict" -Method POST \
  -ContentType "application/json" \
  -Body '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "method": "azure"}'

Python SDK Examples:

import requests

# Local prediction (fast)
response = requests.post(
    "http://localhost:3120/predict",
    json={"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "method": "local"}
)
emotions = response.json()

# Azure prediction (high-accuracy, automatic NGROK conversion)
response = requests.post(
    "http://localhost:3120/predict",
    json={"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "method": "azure"}
)
emotions = response.json()

🗺️ Architecture Diagrams

This section provides an overview of the system's architecture and data flow.

System Architecture (High-Level)

This diagram illustrates the main components of the Emotion Classification Pipeline and how they interact, including user interfaces, backend services, external dependencies, and data storage.

graph TD
    subgraph User Interaction
        UI[Browser - React Frontend]
        CLI[Command Line Interface]
        CURL[cURL/Postman]
    end

    subgraph Backend Services [Emotion Classification Pipeline API - FastAPI]
        API[api.py - Endpoints /predict, /health]
        PRED[predict.py - Orchestration Logic]
        DATA[data.py - Data Handling]
        MODEL[model.py - Emotion Model]
    end

    subgraph External Services
        YT[YouTube API/Service]
        ASSEMBLY[AssemblyAI API]
        WHISPER[Whisper Model - Local/HuggingFace]
    end

    subgraph Data Storage
        DS_AUDIO[Local File System: /data/youtube_audio]
        DS_TRANS[Local File System: /data/transcripts]
        DS_RESULTS[Local File System: /data/results]
    end

    UI --> API
    CLI --> PRED
    CURL --> API

    API --> PRED

    PRED --> DATA
    PRED --> MODEL
    PRED --> ASSEMBLY
    PRED --> WHISPER

    DATA --> YT
    DATA --> DS_AUDIO
    ASSEMBLY --> DS_TRANS
    WHISPER --> DS_TRANS
    MODEL --> DS_RESULTS


    classDef userStyle fill:#C9DAF8,stroke:#000,stroke-width:2px,color:#000
    class UI,CLI,CURL userStyle

    classDef backendStyle fill:#D9EAD3,stroke:#000,stroke-width:2px,color:#000
    class API,PRED,DATA,MODEL backendStyle

    classDef externalStyle fill:#FCE5CD,stroke:#000,stroke-width:2px,color:#000
    class YT,ASSEMBLY,WHISPER externalStyle

    classDef storageStyle fill:#FFF2CC,stroke:#000,stroke-width:2px,color:#000
    class DS_AUDIO,DS_TRANS,DS_RESULTS storageStyle
Loading

Data Flow for /predict Endpoint

This sequence diagram details the process from a user submitting a YouTube URL to receiving the emotion analysis results. It highlights the interactions between the frontend, backend API, prediction service, data handling, transcription, and the emotion model.

sequenceDiagram
    actor User
    participant Frontend_UI as Frontend UI (React)
    participant Backend_API as FastAPI Backend (api.py)
    participant PredictionService as Prediction Service (predict.py)
    participant DataHandler as Data Handler (data.py)
    participant TranscriptionService as Transcription (AssemblyAI/Whisper)
    participant EmotionModel as Emotion Model (model.py)
    participant FileSystem as Local File System (data/*)

    User->>Frontend_UI: Inputs YouTube URL
    Frontend_UI->>Backend_API: POST /predict (URL)
    activate Backend_API

    Backend_API->>PredictionService: process_youtube_url_and_predict(URL)
    activate PredictionService

    PredictionService->>DataHandler: save_youtube_audio(URL)
    activate DataHandler
    DataHandler-->>FileSystem: Saves audio.mp3
    DataHandler-->>PredictionService: Returns audio_file_path
    deactivate DataHandler

    PredictionService->>TranscriptionService: Transcribe(audio_file_path)
    activate TranscriptionService
    TranscriptionService-->>FileSystem: Saves transcript.xlsx/json
    TranscriptionService-->>PredictionService: Returns transcript_data (text, timestamps)
    deactivate TranscriptionService

    PredictionService->>EmotionModel: predict_emotion(transcript_sentences)
    activate EmotionModel
    EmotionModel-->>PredictionService: Returns emotion_predictions (emotion, sub_emotion, intensity)
    deactivate EmotionModel

    PredictionService-->>FileSystem: Saves results.xlsx (optional)
    PredictionService-->>Backend_API: Formatted JSON with predictions
    deactivate PredictionService

    Backend_API-->>Frontend_UI: JSON Response
    deactivate Backend_API
    Frontend_UI->>User: Displays emotional analysis
Loading

Internal Component Diagram (src/emotion_clf_pipeline)

This diagram shows the primary Python modules within the src/emotion_clf_pipeline package and their main dependencies, focusing on the prediction pathway.

graph LR
    subgraph src/emotion_clf_pipeline
        A[api.py]
        B[cli.py]
        C[predict.py]
        D[model.py]
        E[data.py]
        F[train.py] -- Not directly in /predict flow --> D
    end

    A --> C
    B --> C
    C --> D
    C --> E

    D --> E


    classDef moduleStyle fill:#E6E6FA,stroke:#333,stroke-width:2px,color:#000
    class A,B,C,D,E,F moduleStyle
Loading

System Components

Component Technology Purpose
Frontend React.js Interactive web interface
API FastAPI REST endpoints and validation
ML Pipeline DeBERTa + PyTorch Emotion classification
Speech Processing AssemblyAI / Whisper Audio transcription
Cloud Platform Azure ML Training & deployment

Common Commands

There are two ways to interact with the code. To either process them on premise or on cloud. Below you can see a comprehensive guideline on how to use various commands on both option.

Option 1 - On Premise

Data preprocessing: Preprocess the data and save them in the specified location.

python -m emotion_clf_pipeline.cli preprocess --verbose --raw-train-path "data/raw/train" --raw-test-path "data/raw/test/test_data-0001.csv"

Train and evaluate: Train the model and evaluate it on various data splits, which includes model syncing with Azure model (i.e., first downloading the best model from azure, and finally registering the weight to Azure model):

python -m emotion_clf_pipeline.cli train --epochs 15 --learning-rate 1e-5 --batch-size 16

Prediction: There are various methods when it comes to get the prediction:

# Option 1 - API (Dual-Mode: Local or Azure)
uvicorn src.emotion_clf_pipeline.api:app --host 0.0.0.0 --port 3120 --reload    # Start backend api

# Local prediction API call:
Invoke-RestMethod -Uri "http://127.0.0.1:3120/predict" -Method POST -ContentType "application/json" -Body '{"url": "YOUTUBE-LINK", "method": "local"}'

# Azure prediction API call (with automatic NGROK conversion):
Invoke-RestMethod -Uri "http://127.0.0.1:3120/predict" -Method POST -ContentType "application/json" -Body '{"url": "YOUTUBE-LINK", "method": "azure"}'

# Option 2 - CLI
python -m emotion_clf_pipeline.cli predict "YOUTUBE-LINK"

# Option 3 - Docker container (backend only)
docker build -t emotion-clf-api .
docker run -p 3120:80 emotion-clf-api

# Option 4 - Docker compose (both frontend and backend)
docker-compose up --build

Option 2 - On Cloud (Azure)

Data preprocessing job: It takes the data from 'emotion-raw-train' and 'emotion-raw-test' and then registered the final preprocessed data into 'emotion-processed-train' and 'emotion-processed-test'

poetry run python -m emotion_clf_pipeline.cli preprocess --azure --register-data-assets --verbose

Training job: It takes the preprocessed data and train the model using them, evaluate them, and finally register the weights.

poetry run python -m emotion_clf_pipeline.cli train --azure --verbose

Full pipeline: This is the combination of data and train pipeline from above.

poetry run python -m emotion_clf_pipeline.cli pipeline --azure --verbose

Scheduled pipeline: This command create a schedule for the full pipeline on the specified time schedule.

python -m src.emotion_clf_pipeline.cli schedule create --schedule-name 'scheduled-deberta-full-pipeline' --daily --hour 0 --minute 0 --enabled --mode azure

Hyperparameter tunning sweep: This create multiple sweeps for doing hyperparameter tunning.

poetry run python -m emotion_clf_pipeline.hyperparameter_tuning

Prediction: We can make a prediction on Azure ML Endpoint using this command.

python -m emotion_clf_pipeline.cli predict "YOUTUBE-LINK" --use-azure
python -m emotion_clf_pipeline.cli predict "https://youtube.com/watch?v=VIDEO_ID" --use-azure --use-ngrok

Contributing

Development Workflow

# Set up development environment
poetry install
poetry run pre-commit install

# Run quality checks
poetry run pre-commit run --all-files
poetry run pytest -v

Branch Naming Convention

To ensure consistent collaboration and traceability, all branches should follow the naming convention:

<type>/<sprint>-<scope>-<action>

Example: feature/s2-data-add-youtube-transcript

Type Prefixes:

Prefix Description
feature New functionality
fix Bug fixes
test Unit/integration testing
docs Documentation updates
config Environment or dependency setup
chore Maintenance and cleanup
refactor Code restructuring

Pull Request Process

  1. Create a feature branch
  2. Make your changes
  3. Submit a pull request
  4. Wait for code review and approval

Testing

Running Tests

# All tests
poetry run pytest -v

# Specific test types
poetry run pytest tests/unit -v
poetry run pytest tests/integration -v

# With coverage
poetry run coverage run -m pytest
poetry run coverage report
poetry run coverage html

Test Structure

  • Unit Tests: Test individual components in isolation
  • Integration Tests: Test component interactions
  • API Tests: Test REST endpoint functionality

Advanced Features

Git LFS for Large Files

For managing large model files, configure Git LFS:

# Install and initialize Git LFS
git lfs install

# Track model files
git lfs track "models/*"

# Commit LFS configuration
git add .gitattributes && git commit -m "Configure Git LFS"

Development Tools

Code Quality

# Pre-commit hooks
poetry run pre-commit install
poetry run pre-commit run --all-files

# Linting
poetry run flake8 src/
poetry run black src/

Documentation

# Generate API docs
cd docs && make html

# Serve documentation
python -m http.server 8000 -d docs/_build/html

License

This project is licensed under the MIT License.

About

NLP tool for extracting emotional insights from video and audio content

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published