A comprehensive AI/ML system for intelligent image understanding and interaction through multiple interfaces. This project demonstrates modern MLOps practices with microservices architecture, containerization, and multi-modal AI capabilities.
This system provides intelligent image analysis capabilities through three primary interfaces:
- Web Application: Interactive React-based frontend for image upload and analysis
- WhatsApp Bot: Conversational interface for image queries via Twilio integration
- REST API: Direct programmatic access to all AI capabilities
- Image Captioning: AI-powered image description generation
- Semantic Search: Find images using natural language queries
- Similarity Search: Discover visually similar images
- Image Indexing: Build searchable databases from image collections
The system follows a microservices architecture with five main components:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β React Web β β External β β WhatsApp Bot β
β Frontend β β Clients β β (Twilio) β
β Port: 3000 β β β β Port: 8003 β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ β
βββββββββββββββββββ βββββββββββββββββββ β
β React Backend ββββββΊβ Image Query β β
β (API Gateway) β β Router β β
β Port: 8002 β β Port: 8001 β β
βββββββββββββββββββ βββββββββββββββββββ β
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β IC Model API (Core ML) β
β Port: 8000 β
β β
β β’ Image Captioning (Llama3.2-11B) β
β β’ Semantic Search (CLIP) β
β β’ Vector Database (ChromaDB) β
β β’ GPU Acceleration β
βββββββββββββββββββββββββββββββββββββββββββ
- IC Model API (
api/ic_model_api/): Core ML backend with Llama3.2-11B and CLIP models - Image Query Router (
image_query_router/): LangChain-based intelligent query processing - React Backend (
frontend/custom-react/backend/): API gateway for web frontend - React Frontend (
frontend/custom-react/): Modern web interface - WhatsApp Bot (
frontend/whatsapp/): Twilio-powered messaging interface
- FastAPI: High-performance Python web frameworks
- Transformers: Hugging Face models (Llama3.2-11B, CLIP)
- LangChain: AI agent orchestration and tool integration
- ChromaDB: Vector database for semantic search
- PyTorch: Deep learning framework
- React 19: Modern web frontend with hooks
- Twilio: WhatsApp integration for messaging
- Axios: HTTP client for API communication
- Docker & Docker Compose: Containerization and orchestration
- NVIDIA Container Runtime: GPU acceleration support
- GitHub Actions: CI/CD pipelines
- AWS EC2: Cloud deployment with auto-scaling
- CORS: Cross-origin resource sharing configuration
- Docker (v20.10+) and Docker Compose (v2.0+)
- Git for repository management
- NVIDIA GPU with CUDA support
- NVIDIA Container Toolkit for Docker GPU access
- Twilio Account with WhatsApp sandbox/production access
- Public URL (ngrok, AWS ELB, etc.) for webhook endpoints
git clone https://github.com/your-username/aimlops-capstone-project.git
cd aimlops-capstone-projectcp .env.example .env
# Edit .env with your credentials:
# - HF_TOKEN: Hugging Face token for model access
# - TWILIO_*: WhatsApp bot credentials (optional)
# - PUBLIC_BASE_URL: Your public URL for webhooks (optional)# Start all services with GPU support
docker-compose up --build
# Or without GPU (CPU only)
docker-compose up --build --no-deps ic-model-api- Web Interface: http://localhost:3000
- API Documentation: http://localhost:8000/docs (Core ML API)
- Image Router API: http://localhost:8001/docs
- WhatsApp Bot: Configure webhook to http://your-public-url:8003/webhook
curl -X POST "http://localhost:8000/caption" \
-F "[email protected]"curl -X POST "http://localhost:8000/search" \
-H "Content-Type: application/json" \
-d '{"query": "sunset over mountains", "top_k": 5}'curl -X POST "http://localhost:8000/index" \
-F "[email protected]" \
-F "[email protected]"curl -X POST "http://localhost:8001/process/" \
-F "query=Find images similar to this sunset" \
-F "[email protected]"- Digital Asset Libraries: Index and search large image collections
- E-commerce: Product discovery through visual similarity
- Media Archives: Automated tagging and content retrieval
- WhatsApp Commerce: Visual product search via messaging
- Customer Support: Image-based problem identification
- Interactive Catalogs: Natural language product discovery
- Dataset Analysis: Automated image categorization and analysis
- Content Moderation: AI-powered content filtering
- Visual Quality Assurance: Automated defect detection
# Install individual service dependencies
cd api && pip install -r requirements.ic-model.txt
cd ../image_query_router && pip install -r requirements.txt
cd ../frontend/custom-react && npm install
cd backend && npm install# Core ML API
cd api && uvicorn ic_model_api.main:app --reload --port 8000
# Image Query Router
cd image_query_router && uvicorn main:app --reload --port 8001
# React Backend
cd frontend/custom-react/backend && npm start
# React Frontend
cd frontend/custom-react && npm start# API Testing
pytest api/tests/
python -m pytest image_query_router/tests/
# Frontend Testing
cd frontend/custom-react && npm testThe project includes automated AWS deployment via GitHub Actions:
-
Configure AWS Secrets in your GitHub repository:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYSERVICES_ENV(base64 encoded .env file)
-
Deploy: Push to main branch or trigger manual deployment
-
GPU Instance: Automatically provisions GPU-enabled EC2 instances for ML workloads
# Production build
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up --build
# With custom environment
ENV=production docker-compose up --build| Variable | Description | Required |
|---|---|---|
HF_TOKEN |
Hugging Face API token for model access | Yes |
TWILIO_ACCOUNT_SID |
Twilio account identifier | For WhatsApp |
TWILIO_AUTH_TOKEN |
Twilio authentication token | For WhatsApp |
TWILIO_WHATSAPP_NUMBER |
Twilio WhatsApp number | For WhatsApp |
PUBLIC_BASE_URL |
Public URL for webhook endpoints | For WhatsApp |
- API Rate Limiting: Implemented across all endpoints
- Input Validation: Comprehensive request validation
- CORS Configuration: Restricted cross-origin access
- Health Checks: Automated service monitoring
- Resource Limits: Memory and CPU constraints in Docker
All services include comprehensive health check endpoints:
# Check service status
curl http://localhost:8000/docs # Core ML API
curl http://localhost:8001/docs # Image Router
curl http://localhost:8002/models # React Backend
curl http://localhost:8003/health # WhatsApp Bot- Structured Logging: JSON-formatted logs across all services
- Error Tracking: Comprehensive error handling and reporting
- Performance Metrics: Request timing and resource usage
- Fork the Repository
- Create Feature Branch:
git checkout -b feature/amazing-feature - Commit Changes:
git commit -m 'Add amazing feature' - Push to Branch:
git push origin feature/amazing-feature - Open Pull Request
- Code Quality: Follow PEP 8 for Python, ESLint for JavaScript
- Testing: Maintain >80% test coverage
- Documentation: Update README for new features
- Performance: Profile CPU/memory usage for ML operations
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face: Pre-trained Llama3.2-11B and CLIP models
- Unsloth: 4-bit quantized version of the Llama3.2-11B-vision-instruct model
- LangChain: AI agent framework and tools
- Twilio: WhatsApp integration platform
- FastAPI: High-performance web framework
- React: Modern frontend development
- Documentation: Full documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with β€οΈ for the AI/ML community