SentimentSuite

A high-performance, enterprise-grade video sentiment analysis and summarization platform that combines YouTube video processing with advanced AI-powered text summarization capabilities.

🏗️ Architecture Overview

SentimentSuite is built using a modern microservices architecture with clean separation of concerns, following SOLID principles and enterprise patterns.

System Components

┌─────────────────────────────────────────────────────────────┐
│                    SentimentSuite Platform                  │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌─────────────────┐                │
│  │   Video API     │    │  Summarizer     │                │
│  │   (.NET 9)      │◄──►│  Service        │                │
│  │                 │    │  (Python)       │                │
│  └─────────────────┘    └─────────────────┘                │
│           │                       │                        │
│           ▼                       ▼                        │
│  ┌─────────────────┐    ┌─────────────────┐                │
│  │    MongoDB      │    │     Redis       │                │
│  │   (Database)    │    │    (Cache)      │                │
│  └─────────────────┘    └─────────────────┘                │
└─────────────────────────────────────────────────────────────┘

Technology Stack

Component	Technology	Version	Purpose
API Layer	.NET 9	9.0	REST API, YouTube integration
AI Service	Python 3.11+	Latest	Text summarization, ML models
Database	MongoDB	7.0	Document storage, video metadata
Cache	Redis	7.2	High-performance caching layer
Orchestration	Docker Compose	3.8	Container orchestration

🚀 Key Features

Core Functionality

YouTube Video Processing: Extract transcripts from YouTube videos using YoutubeExplode
AI-Powered Summarization: Multiple summarization strategies (local ML models + cloud APIs)
Intelligent Caching: Multi-layer caching with Redis for optimal performance
Health Monitoring: Comprehensive health checks and metrics collection
Enterprise Patterns: Circuit breaker, retry policies, graceful degradation

Advanced Capabilities

Semantic Chunking: Intelligent text segmentation for coherent summarization
Hybrid Summarization: Local ML models with cloud API fallback
Performance Optimization: Model quantization, batch processing, memory management
Observability: Prometheus metrics, structured logging, distributed tracing
Resilience: Circuit breaker protection, automatic retry, graceful degradation

📁 Project Structure

SentimentSuite/
├── src/
│   ├── SentimentSuite.Video.Api/          # .NET 9 Web API
│   │   ├── Controllers/                   # REST API endpoints
│   │   ├── Services/                      # Business logic services
│   │   ├── Models/                        # Data transfer objects
│   │   ├── Domain/                        # Domain entities and logic
│   │   ├── Persistence/                   # Data access layer
│   │   └── Program.cs                     # Application entry point
│   │
│   ├── SentimentSuite.Common/             # Shared .NET library
│   │   ├── Services/                      # Common service interfaces
│   │   ├── Extensions/                    # DI and middleware extensions
│   │   ├── Middleware/                    # Global exception handling
│   │   └── Configuration/                 # Configuration models
│   │
│   └── SentimentSuite.Summarizer.PyService/  # Python AI service
│       ├── core/                          # Core interfaces and models
│       ├── services/                      # Service implementations
│       │   ├── chunking/                  # Text chunking strategies
│       │   ├── summarization/             # ML summarization engines
│       │   ├── caching/                   # Memory caching layer
│       │   └── orchestration/             # Service orchestration
│       ├── utils/                         # Utility functions
│       └── main.py                        # FastAPI application
│
├── docker-compose.yaml                    # Container orchestration
├── api-tests.http                         # API testing collection
└── test-api.ps1                          # PowerShell test script

🛠️ Prerequisites

Required Software

Docker Desktop (4.0+) - Container orchestration
Node.js (18+) - For MCP server integration (optional)
PowerShell (7+) - For Windows automation scripts

System Requirements

RAM: 8GB minimum, 16GB recommended
Storage: 10GB free space for models and data
CPU: Multi-core processor recommended for ML inference

🚀 Quick Start

1. Clone and Setup

git clone <repository-url>
cd SentimentSuite

2. Start the Platform

docker-compose up --build

3. Verify Installation

# Check API health
curl http://localhost:5156/health

# Test summarization
curl -X POST http://localhost:5156/api/summary \
  -H "Content-Type: application/json" \
  -d '{"youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'

4. Access Services

API Documentation: http://localhost:5156/swagger
Health Dashboard: http://localhost:5156/health
Summarizer Service: http://localhost:8000/docs

📊 API Reference

Core Endpoints

POST /api/summary

Generate a summary for a YouTube video.

Request:

{
  "youtubeUrl": "https://www.youtube.com/watch?v=VIDEO_ID"
}

Response:

{
  "summary": "Generated video summary text..."
}

GET /health

System health check with detailed component status.

Response:

{
  "status": "Healthy",
  "checks": {
    "self": "Healthy",
    "redis": "Healthy", 
    "mongodb": "Healthy"
  }
}

Error Handling

400 Bad Request: Invalid YouTube URL format
404 Not Found: Video not found or private
500 Internal Server Error: Service unavailable
503 Service Unavailable: Circuit breaker activated

⚙️ Configuration

Environment Variables

Video API (.NET)

# MongoDB Configuration
MongoDB__ConnectionString=mongodb://mongo:27017
MongoDB__DatabaseName=SentimentSuite

# Redis Configuration  
Redis__ConnectionString=redis:6379
Redis__DatabaseId=0
Redis__DefaultExpiration=24:00:00

# Summarizer Service
LocalSummary__BaseUrl=http://summarizer:8000

# Anthropic API (Optional)
Anthropic__ApiKey=your_api_key_here
Anthropic__Model=claude-3-sonnet-20240229

Summarizer Service (Python)

# Server Configuration
HOST=0.0.0.0
PORT=8000
WORKERS=1

# Model Configuration
MODEL_NAME=facebook/bart-large-cnn
DEVICE=auto
QUANTIZATION=true

# Performance Tuning
MAX_CHUNK_SIZE=1000
CACHE_SIZE=1000
CACHE_TTL=3600

# Quality Settings
DEFAULT_QUALITY=balanced
ENABLE_SEMANTIC_CHUNKING=true

🔧 Development

Local Development Setup

Prerequisites

.NET 9 SDK
Python 3.11+
MongoDB (local or Docker)
Redis (local or Docker)

Running Locally

Start Dependencies:
```
docker-compose up mongo redis -d
```

Run Video API:

cd src/SentimentSuite.Video.Api
dotnet run

Run Summarizer Service:

cd src/SentimentSuite.Summarizer.PyService
pip install -r requirements.txt
python main.py

Testing

API Tests

# Run HTTP tests
.\test-api.ps1

# Or use the HTTP file directly
# Open api-tests.http in VS Code with REST Client extension

Unit Tests

# .NET tests
dotnet test

# Python tests  
cd src/SentimentSuite.Summarizer.PyService
pytest

🏗️ Architecture Deep Dive

Video API Architecture

The Video API follows Clean Architecture principles with clear separation of concerns:

Controllers/          # Presentation layer (REST endpoints)
    ↓
Services/            # Application layer (business logic)
    ↓
Domain/              # Domain layer (entities, value objects)
    ↓
Persistence/         # Infrastructure layer (data access)

Key Design Patterns:

Repository Pattern: Abstracted data access with caching
Strategy Pattern: Multiple summarization strategies
Decorator Pattern: Cached repository implementation
Dependency Injection: Loose coupling and testability

Summarizer Service Architecture

The Python service implements a sophisticated ML pipeline:

Request → Chunking → Summarization → Caching → Response
    ↓         ↓           ↓            ↓
Semantic   Quality    Model        Memory
Chunking   Selection  Inference    Cache

Advanced Features:

Semantic Chunking: Uses sentence embeddings for coherent text segmentation
Model Optimization: Quantization and batch processing for performance
Intelligent Caching: LRU cache with TTL for optimal memory usage
Circuit Breaker: Resilience patterns for external service calls

Data Flow

Video Processing: YouTube URL → Transcript extraction → Text preprocessing
Text Analysis: Semantic chunking → Quality assessment → Model selection
Summarization: Local ML model → Cloud API fallback → Result optimization
Caching: Multi-layer caching (Redis + Memory) → Performance optimization
Response: Structured output → Error handling → Monitoring

📈 Performance Characteristics

Response Times

Cache Hit: < 100ms
Cache Miss (Local): 1-5 seconds
Cache Miss (Cloud): 5-15 seconds
Error Response: < 50ms

Throughput

Sustained Load: 10-50 requests/second
Peak Load: 100+ requests/second (with caching)
Concurrent Users: 50-200 (depending on hardware)

Resource Usage

Memory: 2-8GB (depending on model size)
CPU: 2-8 cores (depending on load)
Storage: 5-10GB (models + cache)

🔒 Security Considerations

Data Protection

No Video Storage: Only transcripts and summaries are stored
Encrypted Transit: HTTPS for all API communications
Secure Configuration: Environment variables for sensitive data
Input Validation: Comprehensive URL and data validation

Access Control

API Rate Limiting: Built-in request throttling
Health Monitoring: Continuous security health checks
Error Sanitization: No sensitive data in error responses

🚀 Deployment

Production Deployment

Docker Compose (Recommended)

# Production configuration
docker-compose -f docker-compose.prod.yml up -d

Kubernetes (Enterprise)

# Example Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sentiment-suite-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sentiment-suite-api
  template:
    spec:
      containers:
      - name: api
        image: sentiment-suite/api:latest
        ports:
        - containerPort: 8080

Environment-Specific Configurations

Development

Hot reloading enabled
Detailed logging
Swagger UI available
Debug symbols included

Production

Optimized builds
Health checks enabled
Metrics collection
Error tracking

📊 Monitoring and Observability

Health Checks

Liveness Probe: /health endpoint
Readiness Probe: Database and cache connectivity
Dependency Checks: External service availability

Metrics Collection

Performance Metrics: Response times, throughput
Business Metrics: Summary generation rates
System Metrics: CPU, memory, disk usage
Error Metrics: Error rates, failure patterns

Logging

Structured Logging: JSON format with correlation IDs
Log Levels: Debug, Info, Warning, Error, Critical
Centralized Logging: Aggregated log collection

🤝 Contributing

Development Workflow

Fork the repository
Create a feature branch
Implement changes with tests
Submit a pull request
Code review and merge

Code Standards

C#: Follow Microsoft coding conventions
Python: Follow PEP 8 style guide
Testing: Maintain >80% code coverage
Documentation: Update README for significant changes

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Troubleshooting

Common Issues

Docker Build Failures: Check Docker daemon and available resources
Model Download Issues: Verify internet connectivity and disk space
Memory Issues: Increase Docker memory allocation
Port Conflicts: Check for conflicting services on ports 5156, 8000

Getting Help

Documentation: Check this README and inline code comments
Issues: Create GitHub issues for bugs and feature requests
Discussions: Use GitHub discussions for questions and ideas

Performance Tuning

For High Load

Increase Redis memory allocation
Use multiple summarizer service instances
Enable model quantization
Implement request queuing

For Low Latency

Optimize cache hit rates
Use faster ML models
Implement request batching
Enable connection pooling

Built with ❤️ using .NET 9, Python, and modern AI technologies

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
README.md		README.md
SentimentSuite.sln		SentimentSuite.sln
api-tests.http		api-tests.http
docker-compose.yaml		docker-compose.yaml
test-api.ps1		test-api.ps1

YFKxenigsegg/SentimentSuite

Folders and files

Latest commit

History

Repository files navigation