Skip to content

YFKxenigsegg/SentimentSuite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SentimentSuite

A high-performance, enterprise-grade video sentiment analysis and summarization platform that combines YouTube video processing with advanced AI-powered text summarization capabilities.

πŸ—οΈ Architecture Overview

SentimentSuite is built using a modern microservices architecture with clean separation of concerns, following SOLID principles and enterprise patterns.

System Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SentimentSuite Platform                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚  β”‚   Video API     β”‚    β”‚  Summarizer     β”‚                β”‚
β”‚  β”‚   (.NET 9)      │◄──►│  Service        β”‚                β”‚
β”‚  β”‚                 β”‚    β”‚  (Python)       β”‚                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚           β”‚                       β”‚                        β”‚
β”‚           β–Ό                       β–Ό                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚  β”‚    MongoDB      β”‚    β”‚     Redis       β”‚                β”‚
β”‚  β”‚   (Database)    β”‚    β”‚    (Cache)      β”‚                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Technology Stack

Component Technology Version Purpose
API Layer .NET 9 9.0 REST API, YouTube integration
AI Service Python 3.11+ Latest Text summarization, ML models
Database MongoDB 7.0 Document storage, video metadata
Cache Redis 7.2 High-performance caching layer
Orchestration Docker Compose 3.8 Container orchestration

πŸš€ Key Features

Core Functionality

  • YouTube Video Processing: Extract transcripts from YouTube videos using YoutubeExplode
  • AI-Powered Summarization: Multiple summarization strategies (local ML models + cloud APIs)
  • Intelligent Caching: Multi-layer caching with Redis for optimal performance
  • Health Monitoring: Comprehensive health checks and metrics collection
  • Enterprise Patterns: Circuit breaker, retry policies, graceful degradation

Advanced Capabilities

  • Semantic Chunking: Intelligent text segmentation for coherent summarization
  • Hybrid Summarization: Local ML models with cloud API fallback
  • Performance Optimization: Model quantization, batch processing, memory management
  • Observability: Prometheus metrics, structured logging, distributed tracing
  • Resilience: Circuit breaker protection, automatic retry, graceful degradation

πŸ“ Project Structure

SentimentSuite/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ SentimentSuite.Video.Api/          # .NET 9 Web API
β”‚   β”‚   β”œβ”€β”€ Controllers/                   # REST API endpoints
β”‚   β”‚   β”œβ”€β”€ Services/                      # Business logic services
β”‚   β”‚   β”œβ”€β”€ Models/                        # Data transfer objects
β”‚   β”‚   β”œβ”€β”€ Domain/                        # Domain entities and logic
β”‚   β”‚   β”œβ”€β”€ Persistence/                   # Data access layer
β”‚   β”‚   └── Program.cs                     # Application entry point
β”‚   β”‚
β”‚   β”œβ”€β”€ SentimentSuite.Common/             # Shared .NET library
β”‚   β”‚   β”œβ”€β”€ Services/                      # Common service interfaces
β”‚   β”‚   β”œβ”€β”€ Extensions/                    # DI and middleware extensions
β”‚   β”‚   β”œβ”€β”€ Middleware/                    # Global exception handling
β”‚   β”‚   └── Configuration/                 # Configuration models
β”‚   β”‚
β”‚   └── SentimentSuite.Summarizer.PyService/  # Python AI service
β”‚       β”œβ”€β”€ core/                          # Core interfaces and models
β”‚       β”œβ”€β”€ services/                      # Service implementations
β”‚       β”‚   β”œβ”€β”€ chunking/                  # Text chunking strategies
β”‚       β”‚   β”œβ”€β”€ summarization/             # ML summarization engines
β”‚       β”‚   β”œβ”€β”€ caching/                   # Memory caching layer
β”‚       β”‚   └── orchestration/             # Service orchestration
β”‚       β”œβ”€β”€ utils/                         # Utility functions
β”‚       └── main.py                        # FastAPI application
β”‚
β”œβ”€β”€ docker-compose.yaml                    # Container orchestration
β”œβ”€β”€ api-tests.http                         # API testing collection
└── test-api.ps1                          # PowerShell test script

πŸ› οΈ Prerequisites

Required Software

  • Docker Desktop (4.0+) - Container orchestration
  • Node.js (18+) - For MCP server integration (optional)
  • PowerShell (7+) - For Windows automation scripts

System Requirements

  • RAM: 8GB minimum, 16GB recommended
  • Storage: 10GB free space for models and data
  • CPU: Multi-core processor recommended for ML inference

πŸš€ Quick Start

1. Clone and Setup

git clone <repository-url>
cd SentimentSuite

2. Start the Platform

docker-compose up --build

3. Verify Installation

# Check API health
curl http://localhost:5156/health

# Test summarization
curl -X POST http://localhost:5156/api/summary \
  -H "Content-Type: application/json" \
  -d '{"youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'

4. Access Services

πŸ“Š API Reference

Core Endpoints

POST /api/summary

Generate a summary for a YouTube video.

Request:

{
  "youtubeUrl": "https://www.youtube.com/watch?v=VIDEO_ID"
}

Response:

{
  "summary": "Generated video summary text..."
}

GET /health

System health check with detailed component status.

Response:

{
  "status": "Healthy",
  "checks": {
    "self": "Healthy",
    "redis": "Healthy", 
    "mongodb": "Healthy"
  }
}

Error Handling

  • 400 Bad Request: Invalid YouTube URL format
  • 404 Not Found: Video not found or private
  • 500 Internal Server Error: Service unavailable
  • 503 Service Unavailable: Circuit breaker activated

βš™οΈ Configuration

Environment Variables

Video API (.NET)

# MongoDB Configuration
MongoDB__ConnectionString=mongodb://mongo:27017
MongoDB__DatabaseName=SentimentSuite

# Redis Configuration  
Redis__ConnectionString=redis:6379
Redis__DatabaseId=0
Redis__DefaultExpiration=24:00:00

# Summarizer Service
LocalSummary__BaseUrl=http://summarizer:8000

# Anthropic API (Optional)
Anthropic__ApiKey=your_api_key_here
Anthropic__Model=claude-3-sonnet-20240229

Summarizer Service (Python)

# Server Configuration
HOST=0.0.0.0
PORT=8000
WORKERS=1

# Model Configuration
MODEL_NAME=facebook/bart-large-cnn
DEVICE=auto
QUANTIZATION=true

# Performance Tuning
MAX_CHUNK_SIZE=1000
CACHE_SIZE=1000
CACHE_TTL=3600

# Quality Settings
DEFAULT_QUALITY=balanced
ENABLE_SEMANTIC_CHUNKING=true

πŸ”§ Development

Local Development Setup

Prerequisites

  • .NET 9 SDK
  • Python 3.11+
  • MongoDB (local or Docker)
  • Redis (local or Docker)

Running Locally

  1. Start Dependencies:

    docker-compose up mongo redis -d
  2. Run Video API:

    cd src/SentimentSuite.Video.Api
    dotnet run
  3. Run Summarizer Service:

    cd src/SentimentSuite.Summarizer.PyService
    pip install -r requirements.txt
    python main.py

Testing

API Tests

# Run HTTP tests
.\test-api.ps1

# Or use the HTTP file directly
# Open api-tests.http in VS Code with REST Client extension

Unit Tests

# .NET tests
dotnet test

# Python tests  
cd src/SentimentSuite.Summarizer.PyService
pytest

πŸ—οΈ Architecture Deep Dive

Video API Architecture

The Video API follows Clean Architecture principles with clear separation of concerns:

Controllers/          # Presentation layer (REST endpoints)
    ↓
Services/            # Application layer (business logic)
    ↓
Domain/              # Domain layer (entities, value objects)
    ↓
Persistence/         # Infrastructure layer (data access)

Key Design Patterns:

  • Repository Pattern: Abstracted data access with caching
  • Strategy Pattern: Multiple summarization strategies
  • Decorator Pattern: Cached repository implementation
  • Dependency Injection: Loose coupling and testability

Summarizer Service Architecture

The Python service implements a sophisticated ML pipeline:

Request β†’ Chunking β†’ Summarization β†’ Caching β†’ Response
    ↓         ↓           ↓            ↓
Semantic   Quality    Model        Memory
Chunking   Selection  Inference    Cache

Advanced Features:

  • Semantic Chunking: Uses sentence embeddings for coherent text segmentation
  • Model Optimization: Quantization and batch processing for performance
  • Intelligent Caching: LRU cache with TTL for optimal memory usage
  • Circuit Breaker: Resilience patterns for external service calls

Data Flow

  1. Video Processing: YouTube URL β†’ Transcript extraction β†’ Text preprocessing
  2. Text Analysis: Semantic chunking β†’ Quality assessment β†’ Model selection
  3. Summarization: Local ML model β†’ Cloud API fallback β†’ Result optimization
  4. Caching: Multi-layer caching (Redis + Memory) β†’ Performance optimization
  5. Response: Structured output β†’ Error handling β†’ Monitoring

πŸ“ˆ Performance Characteristics

Response Times

  • Cache Hit: < 100ms
  • Cache Miss (Local): 1-5 seconds
  • Cache Miss (Cloud): 5-15 seconds
  • Error Response: < 50ms

Throughput

  • Sustained Load: 10-50 requests/second
  • Peak Load: 100+ requests/second (with caching)
  • Concurrent Users: 50-200 (depending on hardware)

Resource Usage

  • Memory: 2-8GB (depending on model size)
  • CPU: 2-8 cores (depending on load)
  • Storage: 5-10GB (models + cache)

πŸ”’ Security Considerations

Data Protection

  • No Video Storage: Only transcripts and summaries are stored
  • Encrypted Transit: HTTPS for all API communications
  • Secure Configuration: Environment variables for sensitive data
  • Input Validation: Comprehensive URL and data validation

Access Control

  • API Rate Limiting: Built-in request throttling
  • Health Monitoring: Continuous security health checks
  • Error Sanitization: No sensitive data in error responses

πŸš€ Deployment

Production Deployment

Docker Compose (Recommended)

# Production configuration
docker-compose -f docker-compose.prod.yml up -d

Kubernetes (Enterprise)

# Example Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sentiment-suite-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sentiment-suite-api
  template:
    spec:
      containers:
      - name: api
        image: sentiment-suite/api:latest
        ports:
        - containerPort: 8080

Environment-Specific Configurations

Development

  • Hot reloading enabled
  • Detailed logging
  • Swagger UI available
  • Debug symbols included

Production

  • Optimized builds
  • Health checks enabled
  • Metrics collection
  • Error tracking

πŸ“Š Monitoring and Observability

Health Checks

  • Liveness Probe: /health endpoint
  • Readiness Probe: Database and cache connectivity
  • Dependency Checks: External service availability

Metrics Collection

  • Performance Metrics: Response times, throughput
  • Business Metrics: Summary generation rates
  • System Metrics: CPU, memory, disk usage
  • Error Metrics: Error rates, failure patterns

Logging

  • Structured Logging: JSON format with correlation IDs
  • Log Levels: Debug, Info, Warning, Error, Critical
  • Centralized Logging: Aggregated log collection

🀝 Contributing

Development Workflow

  1. Fork the repository
  2. Create a feature branch
  3. Implement changes with tests
  4. Submit a pull request
  5. Code review and merge

Code Standards

  • C#: Follow Microsoft coding conventions
  • Python: Follow PEP 8 style guide
  • Testing: Maintain >80% code coverage
  • Documentation: Update README for significant changes

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

Troubleshooting

Common Issues

  1. Docker Build Failures: Check Docker daemon and available resources
  2. Model Download Issues: Verify internet connectivity and disk space
  3. Memory Issues: Increase Docker memory allocation
  4. Port Conflicts: Check for conflicting services on ports 5156, 8000

Getting Help

  • Documentation: Check this README and inline code comments
  • Issues: Create GitHub issues for bugs and feature requests
  • Discussions: Use GitHub discussions for questions and ideas

Performance Tuning

For High Load

  • Increase Redis memory allocation
  • Use multiple summarizer service instances
  • Enable model quantization
  • Implement request queuing

For Low Latency

  • Optimize cache hit rates
  • Use faster ML models
  • Implement request batching
  • Enable connection pooling

Built with ❀️ using .NET 9, Python, and modern AI technologies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published