Project Link: https://heariffy-byu8.vercel.app/
A comprehensive end-to-end project for training and deploying deep learning models for audio classification. Learn to build a CNN from scratch with PyTorch that can classify sounds like dog barking or birds chirping, then deploy it with a beautiful interactive dashboard.
This project provides a complete tutorial and implementation for:
- Training a deep audio classification CNN using PyTorch
- Deploying the model with serverless GPU inference
- Building an interactive Next.js dashboard for visualization and testing
- Understanding advanced ML concepts through hands-on implementation
- Deep Audio CNN: Custom convolutional neural network for sound classification
- ResNet Architecture: Residual blocks for robust feature learning
- Mel Spectrogram Processing: Convert audio to visual representations
- Advanced Data Augmentation: Mixup & Time/Frequency Masking techniques
- Optimized Training: AdamW optimizer with OneCycleLR scheduler
- Batch Normalization: For stable and fast training convergence
- TensorBoard Integration: Real-time training analysis and monitoring
- Serverless GPU Inference: Deploy with Modal for scalable predictions
- FastAPI Endpoint: Robust API for model inference
- Pydantic Validation: Type-safe data validation for API requests
- 100% Free Services: No cost barriers for learning and experimentation
- Next.js & React Frontend: Modern, responsive web interface
- Audio Upload Interface: Drag-and-drop audio file processing
- Real-time Classification: Instant predictions with confidence scores
- Feature Map Visualization: See what the CNN "sees" in internal layers
- Waveform Display: Visual representation of audio signals
- Spectrogram Visualization: Time-frequency analysis of audio
- Modern UI: Built with Tailwind CSS & Shadcn UI components
- Audio Preprocessing: Learn about Mel Spectrograms and audio-to-image conversion
- CNN Architecture: Understand convolutional layers and feature extraction
- ResNet Concepts: Implement residual connections for deep networks
- Data Augmentation: Implement Mixup and masking techniques
- Training Loop: Set up PyTorch training with proper validation
- Optimization: Use advanced schedulers and techniques for best results
- Model Serving: Deploy with Modal for serverless inference
- API Development: Build FastAPI endpoints with proper validation
- Performance Optimization: Ensure fast, reliable predictions
- Frontend Setup: Create responsive React interface
- Audio Processing: Handle file uploads and audio display
- Visualization: Build feature map and spectrogram components
- Real-time Inference: Connect frontend to ML API
- Modal: Serverless GPU compute
- FastAPI: High-performance API framework
- Next.js 15: React framework with App Router
- TypeScript: Type-safe JavaScript
- Tailwind CSS: Utility-first styling
- Shadcn UI: Modern component library
- Clerk: Authentication and user management
- Modal: Recommended for serverless GPU inference
- Vercel: Frontend deployment with automatic CI/CD
The trained CNN achieves:
- Training Accuracy: >95% on common audio classes
- Inference Speed: <100ms per audio file
- Model Size: <50MB for efficient deployment
- Supported Formats: WAV, MP3, FLAC audio files
Deployment Issues:
- Modal timeouts: Optimize model loading and inference code
- API errors: Check Pydantic validation and input formats
Dashboard Issues:
- Audio upload fails: Verify supported file formats and size limits
- Visualization problems: Check browser audio/canvas permissions