A high-performance, enterprise-grade video sentiment analysis and summarization platform that combines YouTube video processing with advanced AI-powered text summarization capabilities.
SentimentSuite is built using a modern microservices architecture with clean separation of concerns, following SOLID principles and enterprise patterns.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SentimentSuite Platform β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββββ βββββββββββββββββββ β
β β Video API β β Summarizer β β
β β (.NET 9) βββββΊβ Service β β
β β β β (Python) β β
β βββββββββββββββββββ βββββββββββββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββββββ βββββββββββββββββββ β
β β MongoDB β β Redis β β
β β (Database) β β (Cache) β β
β βββββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | Technology | Version | Purpose |
|---|---|---|---|
| API Layer | .NET 9 | 9.0 | REST API, YouTube integration |
| AI Service | Python 3.11+ | Latest | Text summarization, ML models |
| Database | MongoDB | 7.0 | Document storage, video metadata |
| Cache | Redis | 7.2 | High-performance caching layer |
| Orchestration | Docker Compose | 3.8 | Container orchestration |
- YouTube Video Processing: Extract transcripts from YouTube videos using YoutubeExplode
- AI-Powered Summarization: Multiple summarization strategies (local ML models + cloud APIs)
- Intelligent Caching: Multi-layer caching with Redis for optimal performance
- Health Monitoring: Comprehensive health checks and metrics collection
- Enterprise Patterns: Circuit breaker, retry policies, graceful degradation
- Semantic Chunking: Intelligent text segmentation for coherent summarization
- Hybrid Summarization: Local ML models with cloud API fallback
- Performance Optimization: Model quantization, batch processing, memory management
- Observability: Prometheus metrics, structured logging, distributed tracing
- Resilience: Circuit breaker protection, automatic retry, graceful degradation
SentimentSuite/
βββ src/
β βββ SentimentSuite.Video.Api/ # .NET 9 Web API
β β βββ Controllers/ # REST API endpoints
β β βββ Services/ # Business logic services
β β βββ Models/ # Data transfer objects
β β βββ Domain/ # Domain entities and logic
β β βββ Persistence/ # Data access layer
β β βββ Program.cs # Application entry point
β β
β βββ SentimentSuite.Common/ # Shared .NET library
β β βββ Services/ # Common service interfaces
β β βββ Extensions/ # DI and middleware extensions
β β βββ Middleware/ # Global exception handling
β β βββ Configuration/ # Configuration models
β β
β βββ SentimentSuite.Summarizer.PyService/ # Python AI service
β βββ core/ # Core interfaces and models
β βββ services/ # Service implementations
β β βββ chunking/ # Text chunking strategies
β β βββ summarization/ # ML summarization engines
β β βββ caching/ # Memory caching layer
β β βββ orchestration/ # Service orchestration
β βββ utils/ # Utility functions
β βββ main.py # FastAPI application
β
βββ docker-compose.yaml # Container orchestration
βββ api-tests.http # API testing collection
βββ test-api.ps1 # PowerShell test script
- Docker Desktop (4.0+) - Container orchestration
- Node.js (18+) - For MCP server integration (optional)
- PowerShell (7+) - For Windows automation scripts
- RAM: 8GB minimum, 16GB recommended
- Storage: 10GB free space for models and data
- CPU: Multi-core processor recommended for ML inference
git clone <repository-url>
cd SentimentSuitedocker-compose up --build# Check API health
curl http://localhost:5156/health
# Test summarization
curl -X POST http://localhost:5156/api/summary \
-H "Content-Type: application/json" \
-d '{"youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'- API Documentation: http://localhost:5156/swagger
- Health Dashboard: http://localhost:5156/health
- Summarizer Service: http://localhost:8000/docs
Generate a summary for a YouTube video.
Request:
{
"youtubeUrl": "https://www.youtube.com/watch?v=VIDEO_ID"
}Response:
{
"summary": "Generated video summary text..."
}System health check with detailed component status.
Response:
{
"status": "Healthy",
"checks": {
"self": "Healthy",
"redis": "Healthy",
"mongodb": "Healthy"
}
}- 400 Bad Request: Invalid YouTube URL format
- 404 Not Found: Video not found or private
- 500 Internal Server Error: Service unavailable
- 503 Service Unavailable: Circuit breaker activated
# MongoDB Configuration
MongoDB__ConnectionString=mongodb://mongo:27017
MongoDB__DatabaseName=SentimentSuite
# Redis Configuration
Redis__ConnectionString=redis:6379
Redis__DatabaseId=0
Redis__DefaultExpiration=24:00:00
# Summarizer Service
LocalSummary__BaseUrl=http://summarizer:8000
# Anthropic API (Optional)
Anthropic__ApiKey=your_api_key_here
Anthropic__Model=claude-3-sonnet-20240229# Server Configuration
HOST=0.0.0.0
PORT=8000
WORKERS=1
# Model Configuration
MODEL_NAME=facebook/bart-large-cnn
DEVICE=auto
QUANTIZATION=true
# Performance Tuning
MAX_CHUNK_SIZE=1000
CACHE_SIZE=1000
CACHE_TTL=3600
# Quality Settings
DEFAULT_QUALITY=balanced
ENABLE_SEMANTIC_CHUNKING=true- .NET 9 SDK
- Python 3.11+
- MongoDB (local or Docker)
- Redis (local or Docker)
-
Start Dependencies:
docker-compose up mongo redis -d
-
Run Video API:
cd src/SentimentSuite.Video.Api dotnet run -
Run Summarizer Service:
cd src/SentimentSuite.Summarizer.PyService pip install -r requirements.txt python main.py
# Run HTTP tests
.\test-api.ps1
# Or use the HTTP file directly
# Open api-tests.http in VS Code with REST Client extension# .NET tests
dotnet test
# Python tests
cd src/SentimentSuite.Summarizer.PyService
pytestThe Video API follows Clean Architecture principles with clear separation of concerns:
Controllers/ # Presentation layer (REST endpoints)
β
Services/ # Application layer (business logic)
β
Domain/ # Domain layer (entities, value objects)
β
Persistence/ # Infrastructure layer (data access)
Key Design Patterns:
- Repository Pattern: Abstracted data access with caching
- Strategy Pattern: Multiple summarization strategies
- Decorator Pattern: Cached repository implementation
- Dependency Injection: Loose coupling and testability
The Python service implements a sophisticated ML pipeline:
Request β Chunking β Summarization β Caching β Response
β β β β
Semantic Quality Model Memory
Chunking Selection Inference Cache
Advanced Features:
- Semantic Chunking: Uses sentence embeddings for coherent text segmentation
- Model Optimization: Quantization and batch processing for performance
- Intelligent Caching: LRU cache with TTL for optimal memory usage
- Circuit Breaker: Resilience patterns for external service calls
- Video Processing: YouTube URL β Transcript extraction β Text preprocessing
- Text Analysis: Semantic chunking β Quality assessment β Model selection
- Summarization: Local ML model β Cloud API fallback β Result optimization
- Caching: Multi-layer caching (Redis + Memory) β Performance optimization
- Response: Structured output β Error handling β Monitoring
- Cache Hit: < 100ms
- Cache Miss (Local): 1-5 seconds
- Cache Miss (Cloud): 5-15 seconds
- Error Response: < 50ms
- Sustained Load: 10-50 requests/second
- Peak Load: 100+ requests/second (with caching)
- Concurrent Users: 50-200 (depending on hardware)
- Memory: 2-8GB (depending on model size)
- CPU: 2-8 cores (depending on load)
- Storage: 5-10GB (models + cache)
- No Video Storage: Only transcripts and summaries are stored
- Encrypted Transit: HTTPS for all API communications
- Secure Configuration: Environment variables for sensitive data
- Input Validation: Comprehensive URL and data validation
- API Rate Limiting: Built-in request throttling
- Health Monitoring: Continuous security health checks
- Error Sanitization: No sensitive data in error responses
# Production configuration
docker-compose -f docker-compose.prod.yml up -d# Example Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: sentiment-suite-api
spec:
replicas: 3
selector:
matchLabels:
app: sentiment-suite-api
template:
spec:
containers:
- name: api
image: sentiment-suite/api:latest
ports:
- containerPort: 8080- Hot reloading enabled
- Detailed logging
- Swagger UI available
- Debug symbols included
- Optimized builds
- Health checks enabled
- Metrics collection
- Error tracking
- Liveness Probe:
/healthendpoint - Readiness Probe: Database and cache connectivity
- Dependency Checks: External service availability
- Performance Metrics: Response times, throughput
- Business Metrics: Summary generation rates
- System Metrics: CPU, memory, disk usage
- Error Metrics: Error rates, failure patterns
- Structured Logging: JSON format with correlation IDs
- Log Levels: Debug, Info, Warning, Error, Critical
- Centralized Logging: Aggregated log collection
- Fork the repository
- Create a feature branch
- Implement changes with tests
- Submit a pull request
- Code review and merge
- C#: Follow Microsoft coding conventions
- Python: Follow PEP 8 style guide
- Testing: Maintain >80% code coverage
- Documentation: Update README for significant changes
This project is licensed under the MIT License - see the LICENSE file for details.
- Docker Build Failures: Check Docker daemon and available resources
- Model Download Issues: Verify internet connectivity and disk space
- Memory Issues: Increase Docker memory allocation
- Port Conflicts: Check for conflicting services on ports 5156, 8000
- Documentation: Check this README and inline code comments
- Issues: Create GitHub issues for bugs and feature requests
- Discussions: Use GitHub discussions for questions and ideas
- Increase Redis memory allocation
- Use multiple summarizer service instances
- Enable model quantization
- Implement request queuing
- Optimize cache hit rates
- Use faster ML models
- Implement request batching
- Enable connection pooling
Built with β€οΈ using .NET 9, Python, and modern AI technologies