Skip to content

This repository contains implementation of a ResNet-style CNN in PyTorch for real-time environmental sound classification.

Notifications You must be signed in to change notification settings

AmanKrSahu/deep-audio-cnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Audio Classification CNN

Classify short audio clips (e.g., dog bark, bird chirp, siren, rain) with a ResNet-style CNN trained on Mel Spectrograms. The project includes a full training pipeline (PyTorch), FastAPI inference service, serverless GPU inference with Modal, and an interactive Next.js + React dashboard for uploads, real-time predictions, and feature‑map visualization.


✨ Features

  • 🧠 Deep Audio CNN for sound classification
  • 🧱 ResNet-style architecture with residual blocks
  • 🎼 Mel Spectrogram audio-to-image conversion
  • 🎛️ Data augmentation: Mixup + SpecAugment (Time/Freq masking)
  • Serverless GPU inference with Modal
  • 📊 Interactive Next.js & React dashboard (Tailwind + shadcn/ui)
  • 📈 Real-time classification with confidence scores
  • 🌊 Waveform & Spectrogram visualization
  • 🚀 FastAPI inference endpoint (+ Pydantic validation)
  • 📈 TensorBoard integration for training analysis
  • Pydantic validation for robust API requests

🧱 Architecture Overview

  • Why Mel Spectrograms? They convert audio to a perceptual time–frequency image that CNNs handle well.
  • Why ResNet? Residual connections ease optimization of deeper models and boost accuracy.
  • Why Mixup/SpecAugment? Strong regularization for robustness against noise and domain shift.

🧩 Project Setup

1. Python environment

cd server
conda create -n audio-cnn python=3.11 -y
conda activate audio-cnn
pip install -r requirements.txt

2. Next.js frontend

cd client
npm install
npm run dev

🔧 Environment Variables

Create .env in your client root

NEXT_PUBLIC_MODAL_API_ENDPOINT="Your_API_Key"

Features and Interfaces

cnn-1 cnn-2

🧰 Troubleshooting

  • Torchaudio backend errors: ensure ffmpeg/libsndfile installed.
  • Noisy predictions: raise clip length, tweak Mixup alpha, reduce masks.
  • Overfitting: stronger Mixup/SpecAug, Dropout in classifier, early stopping.
  • Underfitting: deeper ResNet, higher base_channels, longer training, lower weight decay.

🚀 Need Help??

Feel free to contact me on Linkedin

Instagram URL   Discord URL

About

This repository contains implementation of a ResNet-style CNN in PyTorch for real-time environmental sound classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published